Professional Documents
Culture Documents
Impact of Software Development Processes On The Outcomes of Student Computing Projects: A Tale of Two Universities
Impact of Software Development Processes On The Outcomes of Student Computing Projects: A Tale of Two Universities
Keywords: Context: Project-based courses are more and more commonly used as an opportunity to teach students
Software engineering structured methods of developing software. Two well-known approaches in this area – traditional and Agile
Comparative study – have been successfully applied to drive academic projects. However too often the default is still to have
Capstone project
no organizational process at all. While a large variety of software development life-cycle models exists, little
Student projects
guidance is available on which one to choose to fit the context of working with students.
Education
Computer science education
Objective: This paper assesses the impact of iterative, sequential and ‘‘hands-off’’ development approaches
on the success of student computing projects. A structured, metric-based assessment scheme was applied to
investigate team productivity, teamwork and the quality of the final product.
Method: Empirical evidence was collected during a controlled experiment carried out at two engineering
schools in Europe. More than 100 students at Bachelor’s and Master’s levels participated in the research, with
varied software development and teamwork skill sets.
Results: Similar patterns were observed among both sets of subjects, with iterative teams demonstrating the
highest productivity and superior team cohesion but a decline in the quality of the final product. Sequential
development led to a considerable improvement in the external quality characteristics of the software produced,
owing to the method’s stress on design activities.
Conclusion: The findings of this study will be of use to educators interested in applying software development
processes to student groupwork. A set of guidelines is provided for applying a structured way of working in a
project-based course.
∗ Corresponding author.
E-mail address: aneta.poniszewska-maranda@p.lodz.pl (A. Poniszewska-Marańda).
https://doi.org/10.1016/j.infsof.2021.106787
Received 12 April 2021; Received in revised form 4 August 2021; Accepted 12 November 2021
Available online 3 December 2021
0950-5849/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
productivity, and teamwork. A total of eighteen teams of student devel- At the other end of the spectrum lies a study by Benediktsson
opers applied either sequential (Waterfall-like), iterative (Scrum-like), et al. [7], who applied extensive measures of team effort, productivity,
or ‘‘hands-off’’ (control group) methods to complete similar projects and software quality to compare the V-model (VM), the incremental
within the same domain. model (IM), the evolutionary model (EM) and Extreme Programming
The two groups of students comprised: (XP). Their experiment involved 55 participants in 15 teams of 3 or
4 students, developing comparable products according to one of the
• early Bachelor’s level students, most with no experience of work-
methods (assigned randomly). The study presents several findings:
ing in teams (of more than two) to design and develop software,
• Master’s level students, most of whom had participated in capstone- 1. projects following VM took longer than those using the other
like courses in the past and possessed some relevant work expe- three models, between which there was a time difference of no
rience. more than 10%,
2. XP was associated with the largest number of hours spent on
The study was designed to answer the following research questions:
testing and code correction, whereas in the VM project team
• RQ1: Does introducing a software development process add value most time was spent on establishing the requirements, designing
in at least one of the success dimensions of the student project? and programming,
• RQ2: Do the results of following a given process depend on the 3. XP teams produced on average 3.5 times more lines of code
level of studies (undergraduate/graduate)? (LOC) than the VM team, 2.7 times the LOC produced by the
IM team and 2.2 times more code than the EM teams,
This paper is an extended version of a preliminary report [10], 4. the XP teams were 4.8 times more productive than the VM
which in addition provides a comprehensive overview of study, discus- teams, 2.8 more productive than the IM teams and 2.3 more
sion of related work, visualizations of the results, and extended analysis productive than the EM teams.
and conclusions.
The major contributions of this study are three-fold: However, the underlying assignments were distinct, implying differ-
ing workloads and preventing straightforward comparison. Moreover,
1. demonstrating the impact of two software development pro- the purpose of the study was to provide data to guide IT professionals
cesses on three different series of data (concerning the project, using student teams as a proxy, rather than for academics wishing to
product and people) relative to a control group, apply a given approach to capstone-like projects.
2. providing empirical evidence collected during a controlled, multi- Germain et al. [5] also focused on efficiency in their comparative
regional experiment with two sets of subjects: Bachelor’s and study of the Unified Process for Education (UPEDU, derived from
Master’s students, the Rational Unified Process) and an Extreme Programming-like ap-
3. presenting a set of recommendations for organizing project- proach. The principal objective of their experiment was to investigate
based courses and students’ work in order to facilitate successful the distribution of effort across different cognitive activities: active
outcomes. (writing code, testing, integration efforts), reflexive (browsing technical
Certain patterns were observed in both study groups. The iterative documentation, browsing the web to find solutions) and interactive
teams showed the highest productivity, with superior team cohesion (discussing progress/issues, reviewing peer work). Three teams of four
but a decline in the quality of the final product. The upfront design students followed each of the methods in parallel, developing the same
principle inherent in sequential development led to a considerable project specifications over a period of 60 working days. The authors
improvement in the external quality characteristics of the software report that heavy-weight processes such as UPEDU put more emphasis
produced. The data on different dimensions of success can be used to in- on pre-coding activities, whereas XP required more ad-hoc communica-
form the choice of a process model for projects in the university setting. tion. Although these conclusions are in line with the characteristics of
Our conclusions may also provide a starting point for the development the evaluated processes, the authors note that their impact on overall
of tailored, well-conceived software development processes for aca- effort was not significant, contrary to the findings of Benediktsson
demic projects. Course instructors looking to structure students’ work et al. [7].
over the semester could apply our insights to increase the chances of a A study by Rundle and Dewar [9] partially anticipates the set-up
positive outcome and create a learning environment that is favorable of our experiment, as they applied plan-driven and Agile approaches
to the development of skills sought after in the software industry. The to the development of undergraduate group projects. The Return on
rest of the paper is organized as follows. Related work is discussed first, Investment (ROI) principle was used to compare the outcomes of the
then the process of planning the experiment and operational aspects are frameworks, and did not reveal any startling differences between the
described. The results of the study are presented and discussed, before two methods. It is important to note that the ‘‘plan-driven’’ method
final conclusions are drawn and guidelines provided. was similar to the control group in our study, as it involved minimal
intervention from the instructor. The students were by default inclined
2. Related work to work in a plan-driven manner, but were not guided to do so.
A recent study comparing Agile and plan-driven development meth-
Software development methods have been taught at both graduate ods was carried out by Missiroli et al. [11]. A total of 160 students from
and undergraduate levels for at least two decades. However, only a seven schools took part, divided into 34 teams working either according
handful of studies have investigated the use of different approaches to to Scrum principles or a Waterfall process. All of the deliverables were
the management of software creation in educational settings. One of graded based on the number of functions completed, adherence to the
the first investigations in this area was by Umphress et al. [6], who assigned process and the overall learning experience. The Scrum teams
traced the evolution of applying different processes – from a delegative scored highest for the first two evaluation criteria, whereas the projects
or ‘‘hands-off’’ approach to a heavy-weight method (MIL-STD-498, IEEE developed in a sequential manner exhibited better non-functional char-
1074, Team Software Process) and Agile (Extreme Programming) – acteristics, such as performance and usability. It is important to note
to student projects. The insights gathered over a total of 49 capstone that the grading scheme suffered from a high degree of subjectivity,
projects revealed that introducing software processes into the classroom with the exception of the number of functions delivered.
is challenging, requiring proper tools as well as tailoring to facilitate the Finally, the most pertinent study to the research presented in this
learning objectives. Although the authors present lessons learned from paper was conducted by Wellington et al. [8]. These authors examined
the application of various processes, no quantitative data is provided team cohesion, source code metrics and usability aspects of the devel-
which would allow for direct comparison. oped solutions to compare student performance using the Traditional
2
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
Table 1
Comparative studies of software development methods used by student teams (information listed in rows 1–5 as reported by the study authors).
Benediktsson et al. [7] Germain et al. [5] Rundle and Missiroli et al. Wellington et al. [8]
Dewar [9] [11]
Development V-model, incremental Unified Process for Plan-driven, Waterfall, Scrum Traditional Life Cycle,
methods compared model, evolutionary Education, Extreme Agile Extreme Programming
model, Extreme programming
Programming
Subject final year students of senior-year students in Third-year High school Upper division Computer
demographics Master’s in Computer Computer Engineering students of students with Science students
Science Computer two years of
Science programming
experience
Sample size per Three to four teams Three teams Two teams Seventeen teams One team
development (Agile), five (of four to six
method teams students) per
(plan-driven) development
method
Response variables Team effort, Team effort Product size in Productivity, Team productivity, source
productivity and terms of value adherence to the code and product quality,
software quality added, team process team cohesion
effort
Metrics for the Hours spent on the Cognitive activity Return on Number of Lines of code, method
response variables project, lines of code, classification: investment (ROI) functions length, cyclomatic
lines of code per active/reflexive/interactive delivered complexity, nested block
month, metrics for depth, number of
quality not provided parameters/method,
number of attributes per
class, weighted
methods/class, lines of
code per engineer,
attachment to the team
and project
Findings reported (1) V-model teams took No significant difference in No significant (1) Scrum teams (1) Similar amounts of
longer to complete the the overall effort differences scored higher in functionality were
projects and were least between the terms of delivered by both teams
productive (2) XP methods functionalities (2) The source code
teams were most delivered and produced by XP teams was
productive adherence to the of higher quality (3) The
process (2) Traditional Life Cycle
Waterfall teams produced more usable
produced software (4) XP teams
software of exhibited higher levels of
higher usability team cohesion
and performance
Limitations Lack of a control group Lack of a control group No guidance Subjectivity of Very low sample size; lack
Results for quality provided.to the metrics; lack of of a control group
aspects not significant plan-driven a control group
teams; lack of a
control group
Life Cycle and Extreme Programming. The experimental setup involved audience of academics. Each group developed the same software speci-
two groups of 15 and 16 students, each following one of the methods fication. The work processes were adapted to be relatively generic and
over one semester. Both groups delivered a similar amount of function- represented a family of approaches, rather than a specific framework
ality. However, the code developed by the XP team was of significantly for development.
higher quality, thanks to frequent refactoring. Students following this
approach demonstrated consistently higher levels of team cohesion. 3. Experimental design
The solution developed under the plan-driven method was perceived
as being of higher usability by the students; nevertheless, this was a Both parts of the experiment were completed during the spring
subjective evaluation, rather than an expert opinion or quantitative semester of the 2018/2019 academic year. One of the authors per-
assessment. formed the experiments, with the other two supervising at their re-
Given the inconsistent results and methodologies reported in the spective universities. A teaching assistant provided support in each
existing body of knowledge (Table 1), further experimental evidence is class.
necessary before the effectiveness of different development methods on
the outcomes of student projects can be reliably evaluated. Our study 3.1. The study environment: setting
complements the previous research in this area, by taking a more sys-
tematic approach to the assessment of sequential and iterative software Bachelor’s students. The experiment was conducted within the context
development. The experimental groups, including the control groups, of a compulsory Web-programming course, introducing tools and tech-
were composed of undergraduate (Bachelor’s) or graduate (Master’s) nologies (PHP, HTML, CSS) used to build Web applications. It consisted
students. The teams were composed of between four and six students. of lectures (5 h), tutorials (20 h) and supervised assignment work (10 h)
The setup of the investigation was designed to be typical of student – equivalent of 2,5 ECTS points. Following the European Credit Transfer
projects in higher education, and therefore of interest and use to a large and Accumulation System guidelines [12], students were expected to
3
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
4
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
5
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
Table 3
Operationalization of constructs and instrumentation of dependent variables examined in the experiment (*These metrics were tracked for
Bachelor students only).
Construct Operationalization Instrumentation
Product Quality — Client-side code quality* HTML errors and warnings detected while
internal performing W3C validation
Server-side code quality Maintainability rating
Effort in use metric measured as the number of
Product Quality — Software
clicks needed to perform user scenarios, as per the
external usability
requirements specification
Effort in use metric measured as the distance
traveled by the mouse to perform user scenarios,
as per the requirements specification
Efficiency in use metric measured as the time
needed to perform user scenarios, as per the
requirements specification
Team Functional completeness Functional completeness metric measured as the
productivity degree of realization of the functional
requirements of the project
Functional correctness Functional correctness metric measured as the
precision of functionalities specified in the project
requirements
Teamwork quality Team cohesion Team cohesion level measured via the Team
Environment Questionnaire
• the outcomes of Belbin’s team role inventory test, To measure team cohesion, Carron et al. [36] developed the Group
• the personal preferences of each student (two priorities of the Environment Questionnaire, which has proved to be a successful assess-
roles were disclosed, applicable for the sequential laboratory ment tool for sports teams. Although it requires some adjustments for
group). computer science-related projects, it is still applicable, as confirmed by
a study at Shippensburg University [8]. The following subset of adapted
Nine teams were formed with similar combined technical skills. questions was used to evaluate team cohesion:
6
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
1. Satisfaction: the teams were multidisciplinary and in most cases team members
contributed to both code bases. Moreover, the control group scored
• The development approach applied was a useful tool to
the highest in the Maintainability ranking, in both parts of the study.
manage the project.
This may be explained by the fact that the absence of ceremonies
• Overall, I am satisfied with the development approach used
and scheduled communication left more time for actual development
in the course.
and quality assurance. Cockburn observes [41] that a relatively small
• I believe future offerings of this course should continue to
increase in methodology size or density adds a relatively large amount
use this development approach.
to the project cost and a balance must be found between the size
2. Relative Advantage: of a problem, the number of people solving it and the demands of
the methodology. That said, the freedom left to the control group
• Using this development approach had a positive impact on contributed to only a marginal increase in this code-related metric,
my effectiveness. especially when compared to the sequential approach. Therefore, we
• Using this development approach improved the quality of cannot conclude with certainty that not following any development
my work. approach contributes positively to internal quality.
• Overall, I find using this development approach to be
advantageous. Finding 1: Among the Bachelor’s students, he control group scored
considerably higher on the HTML quality index. Nevertheless, according
3. Compatibility: to the instructors’ observations, this result was due to the team structure
(dedicated sub-teams worked on HTML or PHP code bases) rather than
• I think that using this development approach fits well with
to the ‘‘hands-off’’ approach. In both parts of the experiment, the control
the way I like to work.
group also did best in terms of the Maintainability ranking. However, the
• Using this development approach is compatible with the
differences between the approaches were minor.
way I like to complete projects.
External quality analysis showed that the sequential approach
4. Ease-of-use: (Gwaterfall) resulted in applications that were noticeably more user-
friendly (Table 6). Although the three groups had to submit the same
• Learning this development approach was easy for me. deliverables, including wireframes of the application screens, the
• Overall, I believe that this development approach is easy Waterfall-like teams had to conceive them upfront. The other groups
to use. (Gagile and Gcontrol) had to submit the same artifacts, but could
work on them at any time during the semester. Although 100% of
3.7. Execution the students claimed to have prepared wireframes before or during
implementation, this did not result in a coherent application design
At the start of the course, the students were informed of the ex- that would positively impact the usability of the application (Fig. 2).
periment and it was explained that the final grades would not be Interestingly, the feedback provided on regular basis to the Gagile
linked to the applied development approach. The class schedule and group did not seem to improve the overall usability of the applications.
activities were presented. Each group was given a brief description of Although most of suggestions made during sprint reviews were incor-
the method of work assigned to them. All teams in the study were eval- porated, it was only during end-to-end testing that shortcomings were
uated according to the same scheme, which was fully transparent and revealed. This can be explained by the fact that the groups worked on
communicated upfront. Grades were given for the developed product their user interfaces incrementally, without having defined a overall
(assessed in terms of its internal and external quality, as well as team vision of the application beforehand, as Gwaterfall did.
productivity, see Section 3.6), the associated documentation (calibrated
Finding 2: Both data sets revealed that the sequential approach with its
to be the same for all groups), and the timely delivery of artifacts.
upfront design practice had a positive impact on the external quality of the
The students were given guidance by the course supervisors on the
software produced.
project management methods throughout the semester. A framework
for the planning meetings in the sequential approach was provided Assessment of the project output confirmed our hypothesis that the
(based on the three-point estimation model). Stand-up meetings, sprint control group would deliver the least amount of functionality (lowest
planning meetings and retrospectives were supervised, and the students Functional completeness score in both the Bachelor’s and Master’s
received coaching from the supervisor. The concept of Maintainability teams), as well as that the iterative teams would be the most efficient,
and the supporting tool were presented during the second follow-up owing to regular status checkpoints [42] and imposed intermediary
class (Table 4). application demos (Table 7). However, it appears that the increased
delivery frequency impacted the quality of the artifacts. In both ex-
perimental groups, the Gagile teams received the lowest scores for
4. Results
Functional correctness, due to the amounts of anomalies. Moreover, the
groups following the iterative approach received the lowest scores for
Evaluation along multiple axes exposed a relationship between
the internal quality aspect of the projects (Fig. 3).
the software development approach and the project outcomes. In this
The larger gaps in the Functional Completeness metric among the
section, we present the descriptive statistics and plots for response
Bachelor’s teams indicate that the development process had a more
variables. The results are regrouped into three dimensions of success:
significant impact on the shorter projects. This may be explained by
project quality, team productivity and teamwork quality.
the fact that projects undertaken over the course of a semester are
The control group performed best in terms of the quality of the
long enough to implement the required functionality (100% of the re-
source code, in both experimental groups (Master’s and Bachelor’s)
quirements were covered by Gagile and Gwaterfall), unless the work is
and for both metrics used (Table 5). The results for client-side code
started too late due to a lack of intermediary deadlines (as was the case
(Fig. 1) show that the teams that did not follow any development
with the control group, which completed 85.71% of the requirements).
process produced HTML code of generally higher quality. Based on
observations during the classes, this result can be explained by the team Finding 3: In both experimental groups, the teams following an iterative
organization. In Gcontrol, most of the work was divided between the approach scored higher in terms of the number of functionalities delivered.
team members based on the technology used. Sub-teams worked on the However, their solutions contained more bugs than those of the sequential
HTML or PHP code, in each case; whereas in Gagile and Gwaterfall, and control groups.
7
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
Table 4
Subset of product-related metrics used in the study.
Measurements, formulas and data elements Interpretation of values
Functional correctness: Number of functions suitable for performing the specified tasks
compared to the number of evaluated functions.
𝑋 = 1 − 𝐴∕𝐵; A: number of functions in which problems are detected in the 0 <= 𝑋 <= 100%; the closer to 100, the more adequate the solution
evaluation, B: number of evaluated functions
Functional completeness: Functional (black box) tests of the system according to the
requirement specifications. The number of missing functions are compared with the number of
functions described in the requirement specifications.
𝑋 = 1 − 𝐴∕𝐵; A: number of missing functions detected, B: number of functions 0 <= 𝑋 <= 100; the closer to 100% the better
described in the specifications
Effort in use — mouse clicks: evaluates the static usability of an application in terms of the
number of mouse clicks and mouse wheel scrolls. The mouse clicks denote the sum of left,
right and middle mouse clicks. Mouse wheel scrolls refer to the amount of scrolls made by the
user while reaching the assignment solution.
𝑋 = 𝐴 + 𝐵; A: number of left, right and middle mouse clicks, B: number of mouse 0 <= 𝑋 < ∞; the closer to 0 the better
wheel scrolls
Effort in use — distance: evaluates the static usability of an application in terms of the
movement span while executing a given use case. The distance refers to the number of
millimeters traveled while moving the mouse between the starting and end points.
𝑋 = 𝐴; A: distance traveled in millimeters 0 <= 𝑋 < ∞; the closer to 0 the better
Efficiency in use: evaluates how long it takes to complete a given task.
𝑋 = 𝐴; A: time elapsed in milliseconds 0 <= 𝑋 < ∞; the closer to 0 the better
Table 5
Results for internal project quality associated with different development methods; the best score for a given metric is marked in bold (*Bounds are given for 95% confidence
interval of the mean).
Iterative Sequential Control
Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound*
Maintainability Rating — Bachelor’s students
44.00 5.10 29.84 58.16 45.00 8.66 17.44 72.56 50.00 4.08 37.01 62.99
Maintainability Rating — Master’s students
7.67 0.67 4.80 10.54 8.75 0.25 7.95 9.55 9.0 0.00 9.0 9.0
HTML errors — Bachelor’s students
40.40 14.79 8.68 72.12 45.00 8.66 17.44 72.56 6.00 2.82 −0.50 12.50
HTML warnings — Master’s students
5.40 2.00 1.11 9.69 9.92 4.29 0.48 19.36 1.78 0.80 −0.06 3.61
Fig. 1. Mean values of HTML errors (left) and HTML warnings (right) – measures of internal product quality among different laboratory groups of Bachelor students.
Finally, the iterative teams showed the highest levels of team co- 4.1. Student’s perspective
hesion throughout the semester (positive perceptions of collaboration
were around 10% higher in both study groups). This finding is in The perceptions of the students concerning the prescribed methods
line with previous reports regarding the benefits of Agile methods are aggregated in Table 9. The response rates were 96.6% and 74.0%
(Table 8) [3,43]. for the Bachelor’s and Master’s students, respectively. The average
Finding 4: The iterative teams in both experimental groups demon- scores for questions (listed in Section 3.6) probing the same facet are
strated higher quality of teamwork (∼10%) compared to the other groups. provided and should be interpreted as follows:
8
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
Fig. 2. Mean values of usability metrics used to assess the external quality of the product.
Table 6
External project quality results associated with different development methods; the best score for a given metric is marked in bold (*Bounds are given for 95% confidence interval
of the mean).
Iterative Sequential Control
Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound*
Efficiency in use — Bachelor’s students
17.84 2.38 11.22 24.46 12.35 4.24 −1.15 25.85 17.73 1.87 9.70 25.76
Efficiency in use — Master’s students
36.76 2.65 25.34 48.18 25.78 1.04 22.48 29.07 33.50 4.63 −25.27 92.27
Effort in use (mouse clicks) – Bachelor’s students
6.60 0.67 4.73 8.47 3.70 1.04 0.39 7.01 5.67 0.93 1.65 9.68
Effort in use (mouse clicks) – Master’s students
26.17 6.41 −1.39 53.73 12.47 0.20 11.84 13.11 29.06 7.44 −65.44 123.56
Effort in use (distance) – Bachelor’s students
43.64 9.01 18.63 68.65 29.25 7.27 6.12 52.38 40.53 6.63 11.99 69.08
Effort in use (distance) – Master’s students
264.31 51.01 44.85 483.77 196.06 8.82 167.99 224.12 182.31 45.69 −398.20 762.83
• A mean of 5 or above suggests that, on average, students at least whereas the sequential approach received the lowest score for all four
‘‘Somewhat Agree’’ with the statement. evaluated facets. The iterative approach, with its regular follow ups and
• A mean of 4 is a neutral response (‘‘Neither Agree nor Disagree’’). feedback loop, was also praised in the open-ended questions. The only
• A mean of 3 or below suggests that, on average, students’ percep-
possible area of improvement that was suggested concerned allocation
tions range from ‘‘Strongly Disagree’’ to ‘‘Somewhat Disagree’’.
of the Scrum Master role. Instead of this role being assigned on a
Among the Bachelor’s students, the iterative approach was given voluntary basis, it was proposed that a student with appropriate soft
the highest score for perceived satisfaction and relative advantage, skills should be appointed.
9
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
Table 7
Results for team productivity associated with different development methods; the best score for a given metric is marked in bold (*Bounds are given for 95% confidence interval
of the mean).
Iterative Sequential Control
Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound*
Functional completeness — Bachelor’s students
60.0 5.75 44.02 75.98 46.43 9.79 15.26 77.59 42.86 14.81 −4.26 89.97
Functional completeness — Master’s students
100 0 100 100 100 0 100 100 85.71 0 85.71 85.71
Functional adequacy — Bachelor’s students
80.53 5.52 65.21 95.86 89.76 3.26 79.40 100.12 92.86 2.94 83.49 102.22
Functional adequacy — Master’s students
83.61 3.41 68.92 98.30 96.25 2.17 89.36 103.14 76.67 1.67 55.49 97.84
Table 8 towards a ‘‘hands-off’’ approach that does not require a learning curve.
Results for team cohesion associated with different development methods; the best score
The students in the control group also expressed their appreciation of
in a given data set is marked in bold (*Bounds are given for 95% confidence interval
of the mean)
being allowed to define their own way of working in the open-ended
Team cohesion Iterative Sequential Control
questions. Nonetheless, when asked about possible improvements, a
few indicated the need for more guidance and control. One of the
Positive perception Bachelor’s Students 93.35 80.27 77.15
Negative perception Bachelor’s Students 6.65 19.73 22.85
students shared that he wished a structured development approach had
Positive perception Master’s students 78.84 67.64 69.68 been imposed.
Negative perception Master’s students 21.16 32.36 30.32 Among the Master’s students, all the approaches received consis-
tent scores in terms of perceived satisfaction, relative advantage and
compatibility. The sequential group scored highest (average of 5.25),
Table 9
Average scores of students’ perceptions of the approach applied — the best score in a followed by the iterative group (4.48), with the control group placed
given data set is marked in bold and the lowest score is underlined. last (3.97). Despite the overall satisfaction with the approach, many
Satisfaction Relative Compatibility Ease-of-use of the students working in a sequential manner pointed out that the
advantage workload was not equal among the team members. Despite the presence
Bachelor’s Iterative 5.23 4.88 5.11 4.55 of a Project Manager responsible for overseeing progress and the distri-
students Sequential 3.38 3.46 3.58 3.3 bution of tasks, students pointed out that ‘‘more control over workload
Control 4.56 4.48 5.05 5.26
distribution is needed’’ and ‘‘peers end up doing work of others who did
Master’s Iterative 4.45 4.46 4.58 5.63 not complete tasks on time’’. The agile way of working was perceived
students Sequential 5.21 5.32 5.22 5.62
positively and its application was seen as advantageous given that it
Control 3.85 4.14 3.99 5.71
is widely adopted in the commercial setting. Many students working
in the iterative manner appreciated the demos and feedback provided
as part of the Sprint review. Comments in this area include: ‘‘Regular
Some students following the sequential approach underlined that checkpoints with supervisors contributed to an even distribution of
they had been unable to conceive a technical design for the solution workload during the semester instead of doing things last minute’’ and
upfront, due to insufficient familiarity with the technologies involved. ‘‘Verifying every couple of weeks that requirements were implemented
On the other hand, the team that succeeded in the exercise explicitly ex- as planned motivated the team to deliver on time’’. Nonetheless, the
pressed their satisfaction with the upfront design stage, which allowed ceremonies associated with the approach (Sprint planning, Stand up
them to foster a common vision of the project. meetings) were perceived as a burden by some of the students and
The control group gave the method of work relatively high scores distracted them from their coding activities. Finally, multiple students
for perceived satisfaction and relative advantage (average of 4.52). wished the approach had included an explicit testing phase to ensure
Even better scores were given for the compatibility and ease of use the functionality delivered worked correctly: ‘‘The project plan lacked
facets (above 5.0), which can be understood as students lean naturally time allocated to testing so we were not able to resolve many bugs that
10
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
were discovered in our software during the Jigsaw exercise’’; ‘‘It was minimal effort (mouse moves, scrolls). However, it does not account
clear during the Jigsaw exercise that the projects had not been tested for a situation in which a well-designed interface separates elements
properly—almost every application contained issues’’. widely to minimize confusion or manual error.
The control group students appreciated the freedom given to them, Regarding the conclusion validity, some of the tools used in the
which also allowed them to balance the workload with other courses. experiment suffer from a degree of subjectivity. The functional correct-
One of the students suggested that we should provide students with ness metric mirrors the evaluator’s assessment of the severity of the
a further degree of liberty by making attendance to follow-up classes detected anomalies, whereas team cohesion questionnaires rely entirely
non-compulsory. on students’ perceptions of collaboration on a given day, which may be
Regardless of the approach they had followed, students expressed a impacted by team conflicts and other factors which are not related to
wish that they had been allowed to form teams themselves, rather than the process.
having this aspect decided for them. The opinion was also expressed It is important to note that the findings might have differed if
that the Team Impact Questionnaire took too much time to identify the assignment had not been characterized by stable requirements—a
under-performers. typical assumption for projects in academia. Boehm [45] stresses that
Finally, based on the collective open feedback, it appears that Waterfall-like approaches work best when requirement specifications
introducing a software development process increases the perceived are frozen early in the project and may have problems keeping up
inventiveness of the course. Some of the control group students sug- with rapidly changing stakeholders’ needs. Unhelkar [46] suggests that
gested that the course did not offer anything new, whereas several the projects that benefit the most from the Agile way of working are
students who had followed the other two approaches mentioned that ‘‘greenfield’’ development projects that are relatively small in scope,
the coursework had been organized in an original and well thought-out comprising five team members and lasting for about two months.
way. Therefore, the experimental set-up was partly characteristic of both ag-
When looking at the questionnaire results across both data sets, ile and plan-driven approaches, making it difficult to predict the impact
a much wider spread can be observed in the Bachelor’s group than of changing requirements. The iterative approach can be expected to
among the Master’s students when it comes to the perceived ease-of- tackle them relatively well, as frequent deadlines reduce the variance
use of the methods. The range of minimum and maximum values for of a software process and so possibly increase its predictability and
the junior students was 2.29, while for senior students it was only 0.31. efficiency [47]. Nevertheless, sequential methods could also succeed in
It seems likely that students who have no prior experience working as delivering high-quality software on time if the architecture anticipates
part of a team have more trouble learning how to apply a structured and accommodates changes to the requirements [45]. While producing
way of working than those who have attended multiple project-based a scalable architecture is a non-trivial task, more senior students could
courses and have some working experience in a professional setting. probably accomplish it given appropriate guidance
It is difficult, if not impossible, to establish a link between the results Reliability of treatment was addressed when designing the experi-
of the experiment, measured in terms of the dependent variables, and ment, to minimize the risk of unintentional bias. For instance, with
students’ perceptions. While intuitively team productivity or teamwork regard to software testing, all groups were aware of the set of use
quality metrics could be representative of the students’ satisfaction, cases that would be evaluated. Furthermore, the verification phase
incoherent results in the two data sets prevent us from drawing any of the Waterfall-like approach was not formalized by the instructors.
firm conclusions. The students were free to test their projects as they chose. Those
working in an iterative manner were invited to incorporate test criteria
5. Threats to validity into their Definition of Done and the control group was reminded on
multiple occasions that the number of bugs detected in the delivered
As with all empirical research, this study is subject to different solution would impact their final grade. Although conforming to the
types of threats, which may be described according to the classification development process was not an explicit part of the grading scheme, the
suggested by Wohin et al. [44]. The comprehensive, metrics-based reliability of treatment implementation was addressed using a penalties
approach used in the study in principle limits bias and uncertainty, system. If a deadline for an artifact was not respected, it impacted
and therefore ensures relatively high internal validity. Furthermore, the the maximum grade (−0.5 of a grade) of the team. This had a limited
random assignment of students to the groups is an adequate way of effect, as only the artifacts that were common to all groups fell into this
distributing the study sample in a controlled experiment. Nevertheless, category and observance of process-specific requirements (such as voice
there is a risk that success in the project may have been influenced by recordings from a stand-up meetings) did not influence the final grade.
uncontrolled factors, other than the development approach used, such Respecting the conventions of a given approach (meetings, artifacts
as the workload from other elective courses or varying student affinity. etc.) was to some extent left to the students’ discretion, which often
These aspects, inherent to a university setting, were not accounted for was a cause of concern for the instructors. Especially in the Bachelor’s
in our study. However, we have no evidence that there were significant groups, the participants were not used to rigorous teamwork and none
differences between the groups in these regards, and therefore we do of the sequential groups had their solution functioning by the validation
not consider them to constitute an internal threat. phase, making testing difficult or impossible.
Turning to external validity threats, it is possible that comparable Finally, the effectiveness of some of the process-related activities
results would be obtained by running the same courses in consecutive was limited because the students lacked experience of identifying,
years at both universities. The generalizability of the experiment to breaking down and estimating the work necessary to complete the
other courses in Web programming is limited. The impact of a given assignment.
development approach on students working in larger groups or over
the course of two semesters would vary significantly. It is also difficult 6. Discussion
to generalize the results to courses in other domains of Information
Technology. Although previous studies have considered the application of dif-
Construct validity includes one major threat concerning usability ferent software development approaches to student coursework, they
metrics. Although the metrics were chosen based on their objectivity either do not provide quantitative data for comparison [6], or lack
and ease of application, they might not fully cover the notion of an a firm research setup permitting conclusions to be drawn for the
astute UX design. For instance, the ‘‘effort in use’’ metric assesses the academic setting [7]. In our study, we investigated how the choice
application in terms, among others, of the distance traveled by the of a sequential or iterative development method impacts the success
mouse. This reflects the general preference to achieve results with of student team computing projects, with a ‘‘hands-off’’ no-process
11
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
approach serving as the control. Three axes of evaluation provided not proactively follow a structured development process nor employ the
quantitative data to answer the research questions. practices that characterized the other two groups (e.g. sprint planning,
RQ1: The first research question asked whether following a struc- stand up meetings, a dedicated testing phase). Therefore, we believe
tured development process would provide added value compared to that freeing students from project-management tasks allowed them to
the self-organizing teams. Both the sequential and iterative approaches allocate more time to programming and contributed to the high quality
were indeed found to have had a positive impact on at least one of the of the source code.
evaluated success dimensions. RQ2: The secondary aim of the study was to investigate whether fol-
Following the sequential method led to a considerable improvement lowing a given implementation approach would yield different results
in the external quality of the solutions, across all three of the as- from Bachelor’s and Master’s students. Similar results (described above)
sessed metrics. Asking the students to conceive the entire user interface were observed for the undergraduates and post-graduates. The best and
upfront facilitated the development of a coherent design, which was worst performers in all dimensions of success in the two data sets corre-
more user-friendly than those developed by the other groups. This sponded in the overwhelming majority of cases (visually represented by
conclusion supports the observation by Wellington et al. [8] that plan- bold numbers in Tables 5–8). The impact of the development methods
driven student development produces software with higher usability. was more visible in the case of the post-graduate students, with a larger
Missiroli et al. [11] reported a similar pattern for other non-functional spread for several metrics.
characteristics. In contrast, the teams working in an iterative manner
Implications. The results of this study provide a basis for researchers
received the lowest scores for almost all of the collected data points
and the teaching community to design or adapt an existing software
used to define the external quality facet in our study. This may have
development processes for their students’ project work. To address the
been because the incremental approach to design and development
shortcomings detected in this study, the following guidelines should
does not encourage the formulation of an initial coherent vision of
be incorporated by educators introducing a structured process in the
the final product. Nonetheless, the iterative approach increased pro-
classroom:
ductivity and resulted in higher team cohesion compared to the other
groups. This is not surprising, as the pressure to demonstrate a working 1. Inclusion of a dedicated design phase is advised, evaluated in
piece of software on a regular basis obliges students to complete certain terms of the quality of the target system. In our study, the
functionalities earlier in the semester, which in turn increases the usability aspect of Web solutions was explored by requiring a
chances of developing a fully operational project. Umphress et al. [6] set of wireframes prior to implementation and by the evalu-
also found that XP teams were significantly more productive in terms of ation of multiple GUI-related metrics. However, other courses
LOC per Project Month than teams following any of the other processes could focus on many of the quality attributes specified by the
considered in their study (Section 2). Agile methods are known from ISO 25010 standard [24]. For example, performance efficiency
commercial software development to encourage frequent communica- is relevant for CPU programming or Cloud-based applications,
tion, self-organization and joint accountability, all of which help to where proper architecture and use of design patterns can pos-
foster interpersonal connections and strengthen team cohesion [3]. It itively influence both time behavior and resource utilization,
is plausible that this is also the case for student teams. In the study reducing costs. Network programming assignments could focus
by Wellington et al. [8], the teams following XP demonstrated higher on the reliability feature, with its availability and recoverability
overall team cohesion than those working according to a plan-driven sub-characteristics.
method. 2. Whenever iterative development is used in a course, a for-
Despite the benefits of the iterative approach, the final products mal process of quality assurance should be implemented to
of these teams showed lower functional correctness (i.e. there were ensure functional correctness. This can be achieved by either
more bugs in the produced software). Their scores for internal quality introducing a dedicated testing phase before submission or by
metrics were also inferior to those awarded to the other teams, although administering tests as part of the Definition of Done, which could
not significantly. It seems that the iterative teams focused on the goal be enforced by requiring the production of an artifact that proves
of producing the required software on time, but somehow neglected a given functionality has been tested. An adapted form of a
the issue of quality. These results stand opposed to studies reporting Requirements Traceability Matrix could be used, linking a User
the performance of agile teams in commercial development [42] and Story to a test case and evaluating the outcome of its execution in
could be explained by the differences between the commercial and terms of the number of anomalies detected. Upon the completion
university settings. Firstly, receiving verbal approval of functionality of an iteration, students could update the matrix and provide it
from an instructor during a basic use-case scenario may give the to the course instructor.
message that whatever was implemented meets the required standards. 3. For all structured team project work, frequent communication
In the professional setting, demos are executed in a similar fashion. among team members should be established. This can be en-
However, there is usually a formalized quality assurance process, the forced by introducing team meetings in between classes (in
purpose of which is to minimize the number of functional deficiencies person or remotely), structuring their execution and requiring
in the software before delivery. a written summary as the outcome. A template, to be completed
Although the students were asked to define and respect a cer- with information relevant to each team member, could mesh the
tain Definition of Done, the extent to which it was applied was not standard form used for stand-up meetings (what was done, work
monitored. planned, and issues encountered) with elements from traditional
Finally, the teams that did not follow any prescribed process re- project management, such as risk identification, and give a
ceived the highest scores for the internal quality of their software — global overview of the assignment. While answering the ques-
albeit only slightly higher than those received by other teams. In the tions raised by the Agile practice is intuitive and straightforward,
case of the Bachelor’s students, this success was attributed to the team identifying risks is less so for inexperienced students, and should
structure that naturally emerged: each student focused on a single pro- be supported with examples. Ultimately, the team members
gramming language rather than contributing to different layers of the need to identify any potential obstacles, such as high workload
application. It has been reported in the literature that letting students from different courses and deadline conflicts, which could be
choose their own way of working, team structure and tools can lead resolved at the project level (by shifting part of the workload
to higher quality software—particularly in the context of studio-based to others or removing task dependencies), or unfamiliarity with
assignments [48,49]. Based on informal talks with groups and instruc- certain technologies or tools (requiring upskilling activities). The
tors, the students (Bachelor’s and Master’s) from the control groups did added value of this guided communication channel is twofold,
12
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
13
R. Włodarski et al. Information and Software Technology 144 (2022) 106787
[29] N. Ramasubbu, A. Bharadwaj, G. Tayi, Software process diversity: Conceptual- [39] G.C. Moore, I. Benbasat, Development of an instrument to measure the percep-
ization, measurement, and analysis of impact on project performance, MIS Q. tions of adopting an information technology innovation, Inf. Syst. Res. 2 (3)
Manage. Inf. Syst. 39 (4) (2015) 787–808. (1991) 173–191.
[30] M. Hoegl, H.G. Gemuenden, Teamwork Quality and the Success of Innovative [40] E.M. Rogers, Diffusion of Innovations, fifth ed., Free Press, New York, N.Y., 2003.
Projects: A Theoretical Concept and Empirical Evidence, Vol. 12, Organization [41] A. Cockburn, Selecting a project’s methodology, IEEE Softw. 17 (2000) 64–71,
Science, 2001. http://dx.doi.org/10.1109/52.854070.
[31] A.J. Shenhar, D. Dvir, Project management research – the challenge and [42] A. Campanelli, F. Parreiras, Agile methods tailoring – a systematic literature
opportunity, Proj. Manag. J. 38 (2) (2007) 93–99. review, J. Syst. Softw. 110 (2015) 85–100.
[32] A. Carron, L. Brawley, Cohesion: Conceptual and measurement issues, in: Small [43] S. Licorish, A. Philpott, S.G. MacDonell, Supporting agile team composition:
Group Research, Vol. 31, 2000. A prototype tool for identifying personality (In)compatibilities, in: Proceedings
[33] E. Salas, R. Grossman, Measuring team cohesion: observations from the Science; of 2009 ICSE Workshop on Cooperative and Human Aspects on Software
Human Factors, Vol. 57, 2015. Engineering, 2009.
[34] A. Tosun, O. Dieste, D. Fucci, S. Vegas, B. Turhan, H. Erdogmus, A. Santos, An [44] C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A. Wesslén, Exper-
industry experiment on the effects of test-driven development on external quality imentation in Software Engineering, Springer-Verlag Berlin Heidelberg, 2012,
and productivity, Empir. Softw. Eng. 22 (6) (2016) 2763–2805. http://dx.doi.org/10.1007/978-3-642-29044-2.
[35] Blacksun software, mousotron 7.0. March 18, 2012, 2012, [Online]. Available [45] B. Boehm, Get ready for agile methods, with care, Computer 35 (2002) 64–69,
from: http://www.blacksunsoftware.com/mousotron.html. (Accessed 5 January http://dx.doi.org/10.1109/2.976920.
2021). [46] B. Unhelkar, The Art of Agile Practice: A Composite Approach for Projects and
[36] A. Carron, G.E.Q. L. Brawley, The group environment questionnaire test Manual, Organizations, CRC Press, 2016.
in: Fitness Information Technology, 1135 Inc. 200. [47] L. Williams, A survey of agile development methodologies, 2007.
[37] N.P. Melone, A theoretical assessment of the user-satisfaction construct in [48] A.S. Carter, C.D. Hundhausen, A review of studio-based learning in computer
information systems research, Manage. Sci. 36 (1) (1990) 76–91. science, J. Comput. Sci. Coll. 27 (1) (2011) 105–111, 2011.
[38] Hayes B.E., Measuring Customer Satisfaction: Survey Design, Use, and Statistical [49] C.N. Bull, J. Whittle, Observations of a software engineering studio: Reflecting
Analysis Methods, ASQ Quality Press, Milwaukee, 1998. with the studio framework, in: Proceedings of 2014 IEEE 27th Conference on
Software Engineering Education and Training (CSEE & T), 2014, pp. 74–83,
http://dx.doi.org/10.1109/CSEET.2014.6816784.
14