You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/326239274

What Do We (Really) Know about Test-Driven Development?

Article  in  IEEE Software · July 2018


DOI: 10.1109/MS.2018.2801554

CITATIONS READS
14 1,585

2 authors:

Itir Karac Burak Turhan


University of Oulu University of Oulu
7 PUBLICATIONS   42 CITATIONS    184 PUBLICATIONS   3,145 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Understanding 'creativity' in Software Engineering. View project

Predicting Software Defects Across Project View project

All content following this page was uploaded by Itir Karac on 29 May 2019.

The user has requested enhancement of the downloaded file.


Editor: Tim Menzies
Editor:Editor Name
FROM
REDIRECTIONS
THE EDITOR Nor th Carolina
affiliation
University
email@email.com
tim@ menzies.us
State

What Do We (Really)
Know about Test-Driven
Development?
Itir Karac and Burak Turhan

Call for Submissions


Do you have a surprising result or industrial experience? Something that chal-
lenges decades of conventional thinking in software engineering? If so, email a
one-paragraph synopsis to timm@ieee.org (use the subject line “REDIRECTIONS:
Idea: your idea”). If that looks interesting, I’ll ask you to submit a 1,000- to 2,400-
word article (in which each figure or table counts as 250 words) for review for the
Redirections department. Please note: heresies are more than welcome (if sup-
ported by well-reasoned industrial experiences, case studies, or other empirical
results).—Tim Menzies

TEST-DRIVEN DEVELOPMENT Let us guess: your response is we told you that some of these cogs
(TDD) is one of the most controver- most likely along the lines of, “TDD contribute more toward fulfilling
sial agile practices in terms of its is a practice in which you write the promises of TDD than the order
impact on software quality and pro- tests before code.” This emphasis of test implementation? (Hint: you
grammer productivity. After more on its test-first dynamic, strongly should ask for evidence.)
than a decade’s research, the jury is implied by the name, is perhaps the
still out on its effectiveness. TDD root of most, if not all, of the con- 15 Years of (Contradictory)
promised all: increased quality and troversy about TDD. Unfortunately, Evidence
productivity, along with an emerg- it’s a common misconception to use Back in 2003, when the software
ing, clean design supported by the “TDD” and “test-first” interchange- development paradigm started to
safety net of a growing library ably. Test-first is only one part of change irrevocably (for the bet-
of tests. What’s more, the recipe TDD. There are many other cogs ter?), Kent Beck posed a claim based
sounded surprisingly simple: Don’t in the system that potentially make on anecdotal evidence and paved
write code without a failing test. TDD tick. the way for software engineering
Here, we revisit the evidence of How about working on small researchers:
the promises of TDD.1 But, before tasks, keeping the red–green–refactor
we go on, just pause and think of an cycles short and steady, writing only No studies have categorically
answer to the following core ques- the code necessary to pass a fail- demonstrated the difference be-
tion: What is TDD? ing test, and refactoring? What if tween TDD and any of the many

0740-7459/18/$33.00 © 2018 IEEE J U LY / A U G U S T 2 0 1 8 | IEEE SOFTWARE 1


Table 1. Systematic literature reviews on test-driven development (TDD).
Overall conclusion for quality Overall conclusion for Inconsistent results in the study
Study with TDD productivity with TDD categories

Bissi et al.3 Improvement Inconclusive Productivity:


Academic vs. industrial setting

Munir et al.4 Improvement or no difference Degradation or no difference Quality:


• Low vs. high rigor
• Low vs. high relevance

Productivity:
• Low vs. high rigor
• Low vs. high relevance

Rafique and Mišić5 Improvement Inconclusive Quality:


Waterfall vs. iterative test-last

Productivity:
• Waterfall vs. iterative test-last
• Academic vs. industrial

Turhan et al.6 and Shull et al.1 Improvement Inconclusive Quality:


• Among controlled experiments
• Among studies with high rigor

Productivity:
• Among pilot studies
• Controlled experiments vs.
industrial case studies
• Among studies with high rigor

Kollanus7 Improvement Degradation Quality:


• Among academic studies
• Among semi-industrial studies

Siniaalto8 Improvement Inconclusive Productivity:


• Among academic studies
• Among semi-industrial studies

alternatives in quality, productiv- benefit from TDD, particularly for • most studies have focused on
ity, or fun. However, the anecdotal overall productivity and within sub- only the test-first aspect.
evidence is overwhelming, and the groups for quality.
secondary effects are unmistakable.2 Why the inconsistent results? Be- Identifying the inconsistencies’
sides the reasons listed in Table 1, sources is important for designing
Since then, numerous studies— other likely reasons are that further studies that control for those
for example, experiments and case sources.
studies—have investigated TDD’s • TDD has too many cogs, Matjaž Pančur and Mojca
effectiveness. These studies are pe- • its effectiveness is highly influ- Ciglarič speculated that the results of
riodically synthesized in secondary enced by the context (for ex- studies showing TDD’s superiority
studies (see Table 1), only to reveal ample, the tasks at hand or skills over a test-last approach were due to
contradictory results across the pri- of individuals), the fact that most of the experiments
mary studies. This research has also • the cogs highly interact with employed a coarse-grained test-last
demonstrated no consistent overall each other, and process closer to the waterfall

2 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
REDIRECTIONS

approach as a control group.9 This • the number of bug-fixing com- These findings are consistent with
created a large differential in granu- mits as an indicator of the num- earlier research demonstrating that
larity between the treatments, and ber of defects, and TDD experts had much shorter and
sometimes even a complete lack of • the number of issues reported less variable cycle lengths than nov-
tests in the control, resulting in un- for the project as a predictor of ices did.14 The significance of short
fair, misleading comparisons. In the quality. development cycles extends beyond
end, TDD might perform better only TDD; Alistair Cockburn, in explain-
when compared to a coarse-grained Additionally, a comparison of the ing the Elephant Carpaccio concept,
development process. number of pull requests and the dis- states that “agile developers apply
tribution of commits per author micro-, even nano-incremental de-
Industry Adoption didn’t indicate any effect on devel- velopment in their work.”15
(or Lack Thereof) oper collaboration. Another claim of Elephant Car-
Discussions on TDD are common Adnan Causevic and his col- paccio, related to the TDD concept
and usually heated. But how com- leagues identified seven factors limit- of working on small tasks, is that
mon is the use of TDD in practice? ing TDD’s use in the industry:12 agile developers can deliver fast
Not very—at least, that’s what the “not because we’re so fast we can
evidence suggests. • increased development time [develop] 100 times as fast as other
For example, after monitoring (productivity hits), people, but rather, we have trained
the development activity of 416 de- • insufficient TDD experience or ourselves to ask for end-user-visible
velopers over more than 24,000 knowledge, functionality 100 times smaller than
hours, researchers reported that the • insufficient design, most other people.”15 To test this,
developers followed TDD in only • insufficient developer testing we conducted experiments in which
12 percent of the projects that skills, we controlled for the framing of task
claimed to use it.10 We’ve observed • insufficient adherence to TDD descriptions (finer-grained user sto-
similar patterns in our work with protocol, ries versus coarser-grained generic
professional developers. Indeed, if it • domain- and tool-specific limita- descriptions). We observed that the
were possible to reanalyze all exist- tions, and type of task description and the task
ing evidence considering this facet • legacy code. itself are significant factors affect-
only, the shape of things might ing software quality in the context
change significantly (for better or It’s not surprising that three of these of TDD.
worse). We’ll be the devil’s advocate factors are related to the developers’ In short, working on small,
and ask, what if the anecdotal evi- capacity to follow TDD and their well-defined tasks in short, steady
dence from TDD enthusiasts is based rigor in following it. development cycles has a more
on misconceived personal experience positive impact on quality and
from non-TDD activities? What Really Makes TDD Tick? productivity than the order of test
Similarly, a recent study analyzed A more refined look into TDD is implementation.
a September 2015 snapshot of all the concerned with not only the order
(Java) projects in GitHub.11 Using in which production code and test Deviations from the
heuristics for identifying TDD-like code are written but also the average Test-First Mantra
repositories, the researchers found duration of development cycles, that Even if we consider the studies that
that only 0.8 percent of the projects duration’s uniformity, and the refac- focus on only the test-first nature
adhered to TDD protocol. Further- toring effort. A recent study of 39 of TDD, there’s still the problem of
more, comparing those projects to professionals reported that a steady conformance to the TDD process.
a control set, the study reported no rhythm of short development cycles TDD isn’t a dichotomy in which
difference between the two groups in was the primary reason for improved you either religiously write tests
terms of quality and productivity.13 Indeed, first every time or always test after
the effect of test-first completely di- the fact. TDD is a continuous spec-
• the commit velocity as a measure minished when the effects of short trum between these extremes, and
of productivity, and steady cycles were considered. developers tend to dynamically span

J U LY/A U G U S T 2 0 1 8 | I E E E S O F T WA R E 3
REDIRECTIONS

this spectrum, adjusting the TDD makes sense to have realistic expecta- IEEE Software, vol. 27, no. 6,
process as needed. In industrial set- tions rather than worship or discard pp. 16–19, 2010.
tings, time pressure, lack of disci- TDD. Focus on the rhythm of devel- 2. K. Beck, Test-Driven Development:
pline, and insufficient realization of opment; for example, tackle small By Example, Addison-Wesley, 2003.
TDD’s benefits have been reported tasks in short, steady development 3. W. Bissi et al., “The Effects of Test
to cause developers to deviate from cycles, rather than bother with the Driven Development on Internal
the process.12 test order. Also, keep in mind that Quality, External Quality and Pro-
To gain more insight, in an ethno- some tasks are better (suited) than ductivity: A Systematic Review,” In-
graphically informed study, research- others with respect to “TDD-bility.” formation and Software Technology,
ers monitored and documented the This doesn’t mean you should June 2016, pp. 45–54.
TDD development process more avoid trying TDD or stop using it. 4. H. Munir, M. Moayyed, and K.
closely by means of artifacts includ- For example, if you think that TDD Petersen, “Considering Rigor and Rel-
ing audio recordings and notes.16 offers you the self-discipline to write evance When Evaluating Test Driven
They concluded that developers per- tests for each small functionality, Development: A Systematic Review,”
ceived implementation as the most following the test-first principle will Information and Software Technol-
important phase and didn’t strictly certainly prevent you from taking ogy, vol. 56, no. 4, 2014, pp. 375–394.
follow the TDD process. In par- shortcuts that skip tests. In this case, 5. Y. Rafique and V.B. Mišic, “The Ef-
ticular, developers wrote more pro- there’s value in Beck’s suggestion, fects of Test-Driven Development on
duction code than necessary, often “Never write a line of functional code External Quality and Productivity:
omitted refactoring, and didn’t keep without a broken test case.”2 How- A Meta-analysis,” IEEE Trans. Soft-
test cases up to date in accordance ever, you should primarily consider ware Eng., vol. 39, no. 6, 2013, pp.
with the progression of the produc- those tests’ quality (without obsessing 835–856; http://dx.doi.org/10.1109
tion code. Even when the develop- over coverage),17 instead of fixating /TSE.2012.28.
ers followed the test-first principle, on whether you wrote them before 6. B. Turhan et al., “How Effective Is
they thought about how the produc- the code. Although TDD does result Test-Driven Development?,” Making
tion code (not necessarily the design) in more tests,1,6 the lack of attention Software: What Really Works, and
should be before they wrote the test to testing quality,12 including main- Why We Believe It, A. Oram and
for the next feature. In other words, tainability and coevolution with pro- G. Wilson, eds., O’Reilly Media,
perhaps we should simply name this duction code,16 could be alarming. 2010, pp. 207–219.
phenomenon “code-driven testing”? As long as you’re aware of and 7. S. Kollanus, “Test-Driven
comfortable with the potential trad- Development—Still a Promising

T
eoff between productivity and test- Approach?,” Proc. 7th Int’l Conf.
DD’s internal and external ability and quality (perhaps paying Quality of Information and Commu-
dynamics are more complex off in the long term?), using TDD nications Technology (QUATIC 10),
than the order in which tests is fine. If you’re simply having fun 2010, pp. 403–408; http://dx.doi
are written. There’s no convincing and feeling good while performing .org/10.1109/QUATIC.2010.73.
evidence that TDD consistently fares TDD without any significant draw- 8. M. Siniaalto, “Test Driven Develop-
better than any other development backs, that’s also fine. After all, the ment: Empirical Body of Evidence,”
method, at least those methods that evidence shows that happy develop- tech. report, Information Technology
are iterative. And enough evidence ex- ers are more productive and produce for European Advancement, 3 Mar.
ists to question whether TDD fulfils better code!18 2006.
its promises. 9. M. Pančur and M. Ciglarič, “Im-
How do you decide whether and Acknowledgments pact of Test-Driven Development on
when to use TDD, then? And what Academy of Finland Project 278354 partly Productivity, Code and Tests: A Con-
about TDD’s secondary effects? supports this research. trolled Experiment,” Information
As always, context is the key, and and Software Technology, vol. 53,
any potential benefit of TDD is likely References no. 6, 2011, pp. 557–573.
not due to whatever order of writing 1. F. Shull et al., “What Do We Know 10. M. Beller et al., “When, How, and
tests and code developers follow. It about Test-Driven Development?,” Why Developers (Do Not) Test

4 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
REDIRECTIONS

in Their IDEs,” Proc. 10th Joint

ABOUT THE AUTHORS


Meeting Foundations of Soft-
ware Eng. (ESEC/FSE 15), 2015,
pp. 179–190; http://doi.acm. ITIR KARAC is a project researcher in the M-Group research group
org/10.1145/2786805.2786843. and a doctoral student in the Department of Information Processing
11. N.C. Borle et al., “Analyzing the Science at the University of Oulu. Contact her at itir.karac@oulu.fi.
Effects of Test Driven Development
in GitHub,” Empirical Software
Eng., Nov. 2017.
12. A. Causevic, D. Sundmark, and
S. Punnekkat, “Factors Limiting
Industrial Adoption of Test Driven BURAK TURHAN is a senior lecturer in Brunel University’s
Development: A Systematic Review,” Department of Computer Science and a professor of software
Proc. 4th IEEE Int’l Conf. Software engineering at the University of Oulu. Contact him at turhanb@
Testing, Verification and Validation, computer.org.
2011, pp. 337–346.
13. D. Fucci et al., “A Dissection of the
Test-Driven Development Process:
Does It Really Matter to Test-First
or to Test-Last?,” IEEE Trans. Soft-
ware Eng., vol. 43, no. 7, 2017, pp.
597–614. 16. S. Romano et al., “Findings from a 18. D. Graziotin et al., “What Happens
14. M.M. Müller and A. Höfer, “The Ef- Multi-method Study on Test-Driven When Software Developers Are (Un)
fect of Experience on the Test-Driven Development,” Information and happy,” J. Systems and Software,
Development Process,” Empirical Software Technology, Sept. 2017, June 2018, pp. 32–47.
Software Eng., vol. 12, no. 6, 2007, pp. 64–77.
pp. 593–615; https://doi.org/10.1007 17. D. Bowes et al., “How Good Are My
/s10664-007-9048-2. Tests?,” Proc. IEEE/ACM 8th Work- Read your subscriptions
15. A. Cockburn, “Elephant Carpaccio,” shop Emerging Trends in Software through the myCS
publications portal at
blog; http://alistair.cockburn.us Metrics (WETSoM 17), 2017, pp.
/Elephant1carpaccio. 9–14.
http://mycs.computer.org

J U LY/A U G U S T 2 0 1 8 | I E E E S O F T WA R E 5

View publication stats

You might also like