Professional Documents
Culture Documents
net/publication/326239274
CITATIONS READS
14 1,585
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Itir Karac on 29 May 2019.
What Do We (Really)
Know about Test-Driven
Development?
Itir Karac and Burak Turhan
TEST-DRIVEN DEVELOPMENT Let us guess: your response is we told you that some of these cogs
(TDD) is one of the most controver- most likely along the lines of, “TDD contribute more toward fulfilling
sial agile practices in terms of its is a practice in which you write the promises of TDD than the order
impact on software quality and pro- tests before code.” This emphasis of test implementation? (Hint: you
grammer productivity. After more on its test-first dynamic, strongly should ask for evidence.)
than a decade’s research, the jury is implied by the name, is perhaps the
still out on its effectiveness. TDD root of most, if not all, of the con- 15 Years of (Contradictory)
promised all: increased quality and troversy about TDD. Unfortunately, Evidence
productivity, along with an emerg- it’s a common misconception to use Back in 2003, when the software
ing, clean design supported by the “TDD” and “test-first” interchange- development paradigm started to
safety net of a growing library ably. Test-first is only one part of change irrevocably (for the bet-
of tests. What’s more, the recipe TDD. There are many other cogs ter?), Kent Beck posed a claim based
sounded surprisingly simple: Don’t in the system that potentially make on anecdotal evidence and paved
write code without a failing test. TDD tick. the way for software engineering
Here, we revisit the evidence of How about working on small researchers:
the promises of TDD.1 But, before tasks, keeping the red–green–refactor
we go on, just pause and think of an cycles short and steady, writing only No studies have categorically
answer to the following core ques- the code necessary to pass a fail- demonstrated the difference be-
tion: What is TDD? ing test, and refactoring? What if tween TDD and any of the many
Productivity:
• Low vs. high rigor
• Low vs. high relevance
Productivity:
• Waterfall vs. iterative test-last
• Academic vs. industrial
Productivity:
• Among pilot studies
• Controlled experiments vs.
industrial case studies
• Among studies with high rigor
alternatives in quality, productiv- benefit from TDD, particularly for • most studies have focused on
ity, or fun. However, the anecdotal overall productivity and within sub- only the test-first aspect.
evidence is overwhelming, and the groups for quality.
secondary effects are unmistakable.2 Why the inconsistent results? Be- Identifying the inconsistencies’
sides the reasons listed in Table 1, sources is important for designing
Since then, numerous studies— other likely reasons are that further studies that control for those
for example, experiments and case sources.
studies—have investigated TDD’s • TDD has too many cogs, Matjaž Pančur and Mojca
effectiveness. These studies are pe- • its effectiveness is highly influ- Ciglarič speculated that the results of
riodically synthesized in secondary enced by the context (for ex- studies showing TDD’s superiority
studies (see Table 1), only to reveal ample, the tasks at hand or skills over a test-last approach were due to
contradictory results across the pri- of individuals), the fact that most of the experiments
mary studies. This research has also • the cogs highly interact with employed a coarse-grained test-last
demonstrated no consistent overall each other, and process closer to the waterfall
2 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
REDIRECTIONS
approach as a control group.9 This • the number of bug-fixing com- These findings are consistent with
created a large differential in granu- mits as an indicator of the num- earlier research demonstrating that
larity between the treatments, and ber of defects, and TDD experts had much shorter and
sometimes even a complete lack of • the number of issues reported less variable cycle lengths than nov-
tests in the control, resulting in un- for the project as a predictor of ices did.14 The significance of short
fair, misleading comparisons. In the quality. development cycles extends beyond
end, TDD might perform better only TDD; Alistair Cockburn, in explain-
when compared to a coarse-grained Additionally, a comparison of the ing the Elephant Carpaccio concept,
development process. number of pull requests and the dis- states that “agile developers apply
tribution of commits per author micro-, even nano-incremental de-
Industry Adoption didn’t indicate any effect on devel- velopment in their work.”15
(or Lack Thereof) oper collaboration. Another claim of Elephant Car-
Discussions on TDD are common Adnan Causevic and his col- paccio, related to the TDD concept
and usually heated. But how com- leagues identified seven factors limit- of working on small tasks, is that
mon is the use of TDD in practice? ing TDD’s use in the industry:12 agile developers can deliver fast
Not very—at least, that’s what the “not because we’re so fast we can
evidence suggests. • increased development time [develop] 100 times as fast as other
For example, after monitoring (productivity hits), people, but rather, we have trained
the development activity of 416 de- • insufficient TDD experience or ourselves to ask for end-user-visible
velopers over more than 24,000 knowledge, functionality 100 times smaller than
hours, researchers reported that the • insufficient design, most other people.”15 To test this,
developers followed TDD in only • insufficient developer testing we conducted experiments in which
12 percent of the projects that skills, we controlled for the framing of task
claimed to use it.10 We’ve observed • insufficient adherence to TDD descriptions (finer-grained user sto-
similar patterns in our work with protocol, ries versus coarser-grained generic
professional developers. Indeed, if it • domain- and tool-specific limita- descriptions). We observed that the
were possible to reanalyze all exist- tions, and type of task description and the task
ing evidence considering this facet • legacy code. itself are significant factors affect-
only, the shape of things might ing software quality in the context
change significantly (for better or It’s not surprising that three of these of TDD.
worse). We’ll be the devil’s advocate factors are related to the developers’ In short, working on small,
and ask, what if the anecdotal evi- capacity to follow TDD and their well-defined tasks in short, steady
dence from TDD enthusiasts is based rigor in following it. development cycles has a more
on misconceived personal experience positive impact on quality and
from non-TDD activities? What Really Makes TDD Tick? productivity than the order of test
Similarly, a recent study analyzed A more refined look into TDD is implementation.
a September 2015 snapshot of all the concerned with not only the order
(Java) projects in GitHub.11 Using in which production code and test Deviations from the
heuristics for identifying TDD-like code are written but also the average Test-First Mantra
repositories, the researchers found duration of development cycles, that Even if we consider the studies that
that only 0.8 percent of the projects duration’s uniformity, and the refac- focus on only the test-first nature
adhered to TDD protocol. Further- toring effort. A recent study of 39 of TDD, there’s still the problem of
more, comparing those projects to professionals reported that a steady conformance to the TDD process.
a control set, the study reported no rhythm of short development cycles TDD isn’t a dichotomy in which
difference between the two groups in was the primary reason for improved you either religiously write tests
terms of quality and productivity.13 Indeed, first every time or always test after
the effect of test-first completely di- the fact. TDD is a continuous spec-
• the commit velocity as a measure minished when the effects of short trum between these extremes, and
of productivity, and steady cycles were considered. developers tend to dynamically span
J U LY/A U G U S T 2 0 1 8 | I E E E S O F T WA R E 3
REDIRECTIONS
this spectrum, adjusting the TDD makes sense to have realistic expecta- IEEE Software, vol. 27, no. 6,
process as needed. In industrial set- tions rather than worship or discard pp. 16–19, 2010.
tings, time pressure, lack of disci- TDD. Focus on the rhythm of devel- 2. K. Beck, Test-Driven Development:
pline, and insufficient realization of opment; for example, tackle small By Example, Addison-Wesley, 2003.
TDD’s benefits have been reported tasks in short, steady development 3. W. Bissi et al., “The Effects of Test
to cause developers to deviate from cycles, rather than bother with the Driven Development on Internal
the process.12 test order. Also, keep in mind that Quality, External Quality and Pro-
To gain more insight, in an ethno- some tasks are better (suited) than ductivity: A Systematic Review,” In-
graphically informed study, research- others with respect to “TDD-bility.” formation and Software Technology,
ers monitored and documented the This doesn’t mean you should June 2016, pp. 45–54.
TDD development process more avoid trying TDD or stop using it. 4. H. Munir, M. Moayyed, and K.
closely by means of artifacts includ- For example, if you think that TDD Petersen, “Considering Rigor and Rel-
ing audio recordings and notes.16 offers you the self-discipline to write evance When Evaluating Test Driven
They concluded that developers per- tests for each small functionality, Development: A Systematic Review,”
ceived implementation as the most following the test-first principle will Information and Software Technol-
important phase and didn’t strictly certainly prevent you from taking ogy, vol. 56, no. 4, 2014, pp. 375–394.
follow the TDD process. In par- shortcuts that skip tests. In this case, 5. Y. Rafique and V.B. Mišic, “The Ef-
ticular, developers wrote more pro- there’s value in Beck’s suggestion, fects of Test-Driven Development on
duction code than necessary, often “Never write a line of functional code External Quality and Productivity:
omitted refactoring, and didn’t keep without a broken test case.”2 How- A Meta-analysis,” IEEE Trans. Soft-
test cases up to date in accordance ever, you should primarily consider ware Eng., vol. 39, no. 6, 2013, pp.
with the progression of the produc- those tests’ quality (without obsessing 835–856; http://dx.doi.org/10.1109
tion code. Even when the develop- over coverage),17 instead of fixating /TSE.2012.28.
ers followed the test-first principle, on whether you wrote them before 6. B. Turhan et al., “How Effective Is
they thought about how the produc- the code. Although TDD does result Test-Driven Development?,” Making
tion code (not necessarily the design) in more tests,1,6 the lack of attention Software: What Really Works, and
should be before they wrote the test to testing quality,12 including main- Why We Believe It, A. Oram and
for the next feature. In other words, tainability and coevolution with pro- G. Wilson, eds., O’Reilly Media,
perhaps we should simply name this duction code,16 could be alarming. 2010, pp. 207–219.
phenomenon “code-driven testing”? As long as you’re aware of and 7. S. Kollanus, “Test-Driven
comfortable with the potential trad- Development—Still a Promising
T
eoff between productivity and test- Approach?,” Proc. 7th Int’l Conf.
DD’s internal and external ability and quality (perhaps paying Quality of Information and Commu-
dynamics are more complex off in the long term?), using TDD nications Technology (QUATIC 10),
than the order in which tests is fine. If you’re simply having fun 2010, pp. 403–408; http://dx.doi
are written. There’s no convincing and feeling good while performing .org/10.1109/QUATIC.2010.73.
evidence that TDD consistently fares TDD without any significant draw- 8. M. Siniaalto, “Test Driven Develop-
better than any other development backs, that’s also fine. After all, the ment: Empirical Body of Evidence,”
method, at least those methods that evidence shows that happy develop- tech. report, Information Technology
are iterative. And enough evidence ex- ers are more productive and produce for European Advancement, 3 Mar.
ists to question whether TDD fulfils better code!18 2006.
its promises. 9. M. Pančur and M. Ciglarič, “Im-
How do you decide whether and Acknowledgments pact of Test-Driven Development on
when to use TDD, then? And what Academy of Finland Project 278354 partly Productivity, Code and Tests: A Con-
about TDD’s secondary effects? supports this research. trolled Experiment,” Information
As always, context is the key, and and Software Technology, vol. 53,
any potential benefit of TDD is likely References no. 6, 2011, pp. 557–573.
not due to whatever order of writing 1. F. Shull et al., “What Do We Know 10. M. Beller et al., “When, How, and
tests and code developers follow. It about Test-Driven Development?,” Why Developers (Do Not) Test
4 I E E E S O F T WA R E | W W W. C O M P U T E R . O R G / S O F T W A R E | @ I E E E S O F T WA R E
REDIRECTIONS
J U LY/A U G U S T 2 0 1 8 | I E E E S O F T WA R E 5