A Case for Continuous Automated Testing Blended with
Continuous Refactoring, Instead of Test Driven Development
A Research Paper by Robert Holzhauser Regis University College of Professional Studies
MSSE 600 - Object Oriented Software Engineering
8-24-2014 ABSTRACT
Over approximately the last 20 years, spanning 1994 to 2014, two related practice disciplines have emerged in the software engineering literature as being valuable for delivering quality software on schedule and with very low defect rates. These two practices are: Test Driven Development (TDD), and refactoring. As these practices are maturing, even the creators of TDD and refactoring are now saying that a test first methodology is optional, but what is considered essential are the practices of continuous automated testing and refactoring.
Introduction
Originally created by NASA in the late 1960s (McClure, 1968), with roots going back earlier, Test Driven Development is generally attributed to have been created and popularized by Kent Beck starting in the mid 1990s. In a breath taking internal study in 2003 IBM found that they were able to reduce their defect rate by 50% with minimal impact to developer productivity. (Maximilien, 2003). Early results like this got the attention of both the academic and business communities. However, most studies have not adequately measured developer compliance to the TDD steps, and have not taken into account the complexity of refactoring. In this paper I will show that improvements to code quality, previously thought to have been attributed to TDD, may actually have come from the combination of continuous automated testing and refactoring, rather than the practice of writing tests first before coding.
Analysis
Test Driven Development is a way of approaching code construction which consists of the three primary steps of: 1) Write a test for a piece of new code that you are about to write. 2) Write the code, and refine as needed until the test from step 1) passes. 3) Refactor, improving the code without changing the functionality. My personal experience is that practicing Test Driven Development (TDD) seems counter intuitive, but it is sometimes useful in clarifying what I am doing as I code, at least at first. My natural inclination as a programmer is to write some code and then to check if its right. TDD flips that idea on its side and says, Write the test before you write some code. According to Kent Beck in his book, Test Driven Development by Example, (Beck, 2003) the two most important rules for a developer to follow when writing code are: First, always write an automated test that fails before writing any code. And, second, once that has been done, refactor to remove any duplication. Sometimes Beck describes this as, red, green, refactor, in which he is referring to the way testing tools, such as Junit, display failed tests as red and passed tests as green. Beck gives a number of reasons for the test first approach. He makes a number of references to the developers stress level and state of mind. His belief is that TDD provides a positive feedback loop which increases confidence and task enjoyment, simultaneously decreasing stress, and keeping the developer in a peak performance zone. The other compelling reason Beck gives for testing first is that it requires a good deal of clarity about how the code will be implemented in order to devise a test for it. (Beck, 2003). There have been several studies done which compare TDD, sometimes called Test First Development, with another style of unit testing called, Test Last Development, or TLD. In TLD, the developer: 1) Writes a piece of new code. 2) Writes and runs a test for the new code, modifying the code as needed until it passes the test. 3) Refactoring. (Munir, H., et al, 2014). To gain familiarity with TDD for this paper, I undertook a very small project of building an app in Java for creating a sample of random numbers. My personal experience was that I tended to fall into a rhythm of coding, testing, starting to refactor, and realizing that I had done things out of sequence. Then, I adjusted the next iteration back to test first, but by the subsequent iteration, I had slipped back into a TLD rhythm. This was probably just a matter of my training and my needing to build the habit of TDD. It was actually much more difficult for me to follow the TDD steps than I had ever expected. However, I noticed that as soon as I had clarity about how I would test a piece of code, I also had envisioned what that line of code would be, and I therefore would want to move forward with implementing the code over the test. This raised the question for me of whether studies which used developers who were not accustomed to and were not in the habit of implementing the Test, Code, Refactor sequence of TDD might not have evaluated a pure TDD sequence. After using pure TDD, or hybrid TDD TLD, I did feel the increase in confidence and enjoyment that Beck described. As far as it goes however, this boost in confidence and joy was a nice to have, not a must have aspect which noticeably improved code. However, I also believed that my code was solid, and it seemed like I completed it at least a bit faster than I would have without using some kind of approach of continuous testing and refactoring. As interesting as my little experiment was, it was only a start. A sample size of one, without much scientific rigor, does not make good science. The Hammond and Umphress 2012 review of the literature on TDD studies confirms my experience. Alternatelysince TDD is not well defined, it is possible that some respondents may be incorrectly claiming to use TDD. (Hammond, 2012). This literature review also found that many developers perceive TDD as too difficult, different, or requiring too much discipline compared to what they normally do. These developers had difficulty adopting a TDD mindset, or they frequently made mistakes in following the TDD protocol. Furthermore, the Hammond review reports that some developers feel that TDD can lead to overall architecture mistakes, even if the process is followed rigorously. Hammond and Umphress went on to say that the studies on TDD, while finding some benefit to TDD, had only found overall marginal benefit in productivity, to internal code structure, or to external code structure. (Hammond, 2012). They ventured that a plausible reason for the variance between developers is that the studies may have had invisible conformance issues. (Hammond, 2012). Seemingly in response to this, Finish researchers Fucci, Turhan, and Oivo conducted a study similar to previous ones, but this time, they introduced a measure of conformance to the TDD methodology. (Fucci, 2014). They concluded that there is no significant difference TDD and TLD. (Fucci, 2014). However, there was a high degree of variability in the results obtained by those with a high degree of conformance. The researchers speculated that there were other factors for the variance, possibly skill. (Fucci, 2014). Reading this study makes me wonder to which skills that they might be referring. The sub-group with high conformance must be adequately skilled at Test Driven Development in order to stick to the process, unless there is problem with how they are measuring conformance. Let us assume that their method for measurement is acceptable and that their conjecture about skill as the missing co-variant factor; again, we have the question of which skill or skills caused the variance. Becks original work on TDD explicitly devotes around one third of its pages to refactoring and maintains it is a background theme for most of the book. (Beck, 2003). Of the steps Red, Green, and Refactor, the third is the most elusive. While sometimes difficult to implement, the first two steps write a failing test and write code to make it pass are in my opinion very straightforward conceptually. Refactoring involves one or more maneuvers known as refactorings, which are specific ways of moving the code toward a more optimal design, but these steps are not necessarily intuitive or obvious. The abundance of literature on refactoring alone, even though it is only a sub-step of both TDD and TLD, supports my suggestion that it is the most complicated piece of TDD. One such book is the classic Refactoring by Martin Fowler. Fowler has the distinction, along with Beck and others, of being instrumental in bringing the Agile Manifesto into being and of belonging to the project team from which many of the practices of eXtreme Programming emerged. Refactoring is defined by Fowler as, A change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior. (Fowler, 2002). Refactoring.com, a site maintained by Fowler, has a catalog of over 90 refactoring patterns. So, we start to see a bit of a different picture. While the three steps of TDD listed at the beginning of this paper seem simple and straight forward, the inclusion of refactoring actually makes the TDD process far more complicated and complex. Perhaps we could restate TDD as Red, Green, and appropriately select the applicable refactoring patterns to apply from a list of over 90. In this light, TDD is suddenly not quite the simple creature it first appears to be at first glance. It is not quite so simple a task as iterating through steps 1, 2, 3 over and over again. Even Beck concedes this point in Fowlers book, suggesting that refactoring is not easy to learn, and he implies that there are implicit meta- patterns of knowing which of the refactoring patterns to apply. Deveolopers would know how to apply these patterns effectively only with substantial experience in using them. (Fowler, 2002). Beck, in his chapter 15 contribution to Fowlers Refactoring, has this to say on the topic, The list of techniques is only the beginning. It is the gate you must pass through. Without the techniques, you can't manipulate the design of running programs. With them, you still can't, but at least you can start. Beck continues, Why are all these wonderful techniques really only the beginning? Because you don't yet know when to use them and when not to, when to start and when to stop, when to go and when to wait. It is the rhythm that makes for refactoring, not the individual notes. (Fowler, 2002). With a catalog of over 90 refactoring maneuvers to understand and be fluent with in applying appropriately, there is without question is quite a lot of refactoring know-how with which to be conversant. Fortunately, now we also have IDEs and other automated tools that support a number of these operations. Based on the lack of measures of compliance in most studies, I do not believe that studies to date have validated the exact sequence of TDD: write a test, write code, and refactor. Rather, what has probably been shown is the effects of having both of the components of: 1) write tests for each piece of code, and 2) writing the code have been tested, but they have not necessarily been implemented in that sequence. In order to measure the full three steps, including refactoring, will require a very different methodology of assessing or teaching programmers the refactoring patterns and of somehow tracking the refactoring which was done or attempted. To do less than this is a disservice to TDD, thinking that we have studied an apple, when actually, we have studied an orange. Admittedly, specifically including refactoring in an experiment is a much more difficult undertaking. On the other hand, doing so would help to validate the exact definition of TDD. Modern IDEs such as NetBeans and Eclipse are able to perform some refactorings. From looking at the refactoring menu on NetBeans 7.4, it is capable of doing approximately 15 refactorings. Fifteen out of ninety is certainly a good start. A 2008 study reviewed refactoring tools, and it showed that some can do as many as 24 refactorings. (Huiking, 2008). So, I anticipate that there will be continual improvement in the refactoring support available from automated tools, largely because of the number of many papers and conference proceedings on the topic of automated refactoring. For now though, we cannot assume that we can leave refactoring entirely up to the IDE or any other specialized refactoring software. The general consensus among the software engineering community appears to be that Fowler, Beck and others are right about the importance of refactoring. Now there is IDE support for it, specialized refactoring tools available, and dozens of papers investigating ways to automate it; refactoring is easier than ever to accomplish. Yet, even though it can be helpful, developers should refactor with awareness rather than blind faith in the tools. As Abadi, et al., found in an attempt to recode a java servlet into the Model View Controller pattern, ...the whole conversion could be described as a series of refactorings, most of these were inadequately supported by the IDE, and some were not supported at all. (Abadi, 2008). As described above, refactoring is inherently a complex activity. (Abadi, 2008). Therefore, to do refactoring well requires an amount of knowledge, skill, and experience, which can also be augmented by automation. There are two times when developers refactor: 1) during the development process as design problems are discovered, and 2) when the software becomes unhealthy. According to Eclipse usage data, the second scenario is extremely rare. (Murphy-Hill, E.). Of course, the first scenario is that which is advocated by TDD. This kind of refactoring goes beyond fixing bugs and cleaning up code. Fixing bugs and cleaning up code can often be done with changing design. Refactoring is inherently about design improvements. My hypothesis is that while some developers may find some gains from going test first, the most important thing to be gleaned from TDD is to constantly write automated unit tests for every piece of code and to continually look for opportunities to refactor. Whether the test is built before the code is constructed is more of a matter of personal preference. The crucial discipline is to write automated tests for nearly every piece of code that you intend to go into production. That includes writing tests for every class and every method. When you go along in building your program, stop at each step and look for ways to refactor it. To phrase this more succinctly: Always Test and Refactor Everything as you go. Can this discipline be effective if its only, Always Test Everything? Most of the experiments that have been done with TDD, due to lack of tighter controls around process, may have essentially been measuring the benefits of doing some variation of the more general practice of unit testing. Recent reviews have questioned how closely earlier studies actually measured TDD. The discipline of testing to verify that code is working is something which can be both taught and measured fairly easily. Some of the initial excitement about TDD was due to positive results from studies which in retrospect were at least measuring some kind of continuous automated testing, if not strictly TDD. It is likely that this initial excitement was misattributed. Again, the three practices of: 1) Creating a library of automated regressions tests, which includes every piece of code of significance immediately before, during, or after its development. 2) Running the library of tests after implementing any new code, and not moving forward until all tests pass. 3) Continuously looking for refactoring opportunities, both assisted by tools, and manually. Based on recent internet conversations that Beck, Fowler, and others have engaged in, it seems that they would now agree with the above three steps as being what is crucial to take from TDD (Fowler, 2014). In this discussion, Beck states that he has no problem mixing styles of TDD and TLD and that he has done so on at least one recent project. Some have taken TDD to mean using a lot of mock variable values in the code, something which can potentially damage the code quality (Fowler, 2014). This was a technique Beck had demonstrated at length in his 2002 book (Beck, 2002). In the recent conversations, Beck said that not only are mocks not necessary, but that he rarely uses them (Fowler, 2014). They further say that when less experienced developers do TDD, they often don't refactor enough, which leads to sub-optimal designs. Further, they point out that its not accurate to compare an inexperienced developer's work and productivity to that of an experienced developer (Fowler, 2014). One postulate they explored was TDD as the gateway to self-testing code. This implies that what the creators of the TDD movement now value about TDD is that it gives developers access to an automated set of regression tests. They go on to say that there are types of coding where TDD isnt the best choice.
Conclusion
With many studies having been done on TDD, they have generally shown TDD to have a positive impact on code quality. However, some of the studies now appear to have generally been measuring the benefit of the pairing of new code with the creation of an automated test and of having the running of the library automated tests. Studies have also shown that some developers have a difficult time adjusting to the test first sequence of TDD. Furthermore, the same studies have mostly glossed over the refactoring step, leaving the possibility of a wide variation in how, and even if, developers implemented it. Therefore, I theorize that the benefit of TDD can be achieved through the more flexible approach of a combination of continuous automated regression testing and continuous factoring.
References
Abadi, A, Ettinger, R., Feldman, Y. (2008). Reapproaching the Refactoring Rubicon. Nashville, TN: ACM Beck, K. (2003). Test Driven Development by Example. Boston, MA: Addison-Wesley. Beck, K., Fowler, M., Heinemeier Hansson, D. (2014). Is TDD Dead? A series of conversations between Kent Beck, David Heinemeier Hansson, and myself on the topic of Test-Driven Development (TDD) and its impact upon software design. Retrieved July, 2014 from http://martinfowler.com/articles/is-tdd-dead/ Dig, D. (2008). Refactoring.info retrieved from http://refactoring.info Fontana, F. & Spinelli, S. (2011). Impact of Refactoring on Quality Code Evaluation. Honolulu, HI: ACM
Fowler, M. (2000). Refactoring: Improving the Design of Existing Code. Addison Wesley.
Fucci, D., Turhan, B., Oivo, M. (2014). Conformance Factor in Test-driven Development: Initial Results from an Enhanced Replication. London, England, BC, United Kingdom: ACM
Huiqing, L. & Simon, T., (2008). Tool Support for Refactoring Functional Programs. Nashville, TN: ACM
Jeffries, R. & Melnik, G. (2007). TDD: The Art of Fearless Programming. IEEE Software. Kerievsky, J. (2004). Refactoring To Patterns Catalog. Retrieved July, 2014 from http://www.industriallogic.com/xp/refactoring/catalog.html
Maximilien, E.M., & Williams, L. (2003). Assessing Test-Driven Development at IBM. Software. Engineering, 2003. Proceedings. 25th International Conference On .: IEEE.
Mclure, R., Bauer, F.L., Bolliet, L., Helms, H.J., Naur, P., Randell, B. (1968). Software Engineering. Report on a conference sponsored by the NATO Science Committee, Garmisch, Germany. Murphy-Hill, E., & Black, A. (year unknown). Why Dont People Use Refactoring Tools? Portland State University. Retrieved July, 2014 from http://people.engr.ncsu.edu/ermurph3/papers/wrt07.pdf Percival, J. & Harrison, N. (2013). Developer Perceptions of Process Desirability: Test Driven Development and Cleanroom Compared. 2013 46 th Hawaii International Conference on System Sciences. Sierra, K. & Bates, B. (2012). Headfirst Java: Headfirst Java A Learners Guide. 2 nd Edition. O'Reilly Media, Inc.
Wnuk, K., Munir, H., Petersen, K., Moayyed, M. (2014). An Experimental Evaluation of Test Driven Development vs. Test-Last Development with Industry Professionals. London, England, BC, United Kingdom: ACM.
Umphress, D., & Hammond, S. (2012). Test Driven Development: The State of the Practice. Tuscaloosa, AL: ACM