Testing Is Overrated

?
Who is this guy?
I didn’t re-implement Ruby in Erlang, or write a web server in assembly that’s 10x faster than
Apache, or start a successful company. At the end of the day, I’m just a guy who makes web
sites.
Jan Tik: http://flickr.com/photos/jantik/6708183/

Testing is overrated
Luke Francl
So I had to pick something controversial.
Don’t get me wrong, testing is great. Never forget the first time I saved myself from
committing buggy code with my own unit test. And once written, programmatic tests provide
a nice regression framework that helps catch future errors and makes refactoring possible.
But I think it’s overemphasized to the detriment of other defect-detection techniques.

fuzz
story runner fixtures green bar
RSpec object mother

miniunit Shoulda
unit tests
stub Mocha
Watir
mock
random behaviors
rcov Test::Unit
BDD TDD
Selenium test-along
test-first
coverage
autotest test cases
We as developers hear, read, and write a lot about testing.
Why so much?
I think it’s because it’s something we, as programmers, can control.

We usually can’t hire QA testers. It may be a struggle to institute code review in our company.
We may not have the authority to set up usability tests.
But we can write code! And so we play to our strength -- coding -- and try to code our way
out of buggy software.
All you need is tests
In the worst case, this leads to a mindset that developer testing is all you need, and if we can
only get to 100% code coverage, we’ll be bug free. You’ve got people having Rcov length
contests.
I read a blog entry just last week by a guy who was suggesting the “End of Bugs” due to
behavior driven development and 100% rcov code coverage.
(I didn’t mention his name in my talk, but this was Adam Wiggins from Heroku: http://
adam.blog.heroku.com/past/2008/7/6/the_end_of_bugs/ I didn’t know he’d be at
RubyFringe, but he came up to me later and was like “Hi, I’m Adam. You called me an idiot.”
Sorry Adam! Seriously, he was really nice about it. We had a good talk about testing.)
Extensive research
So I’ve been doing extensive research about the benefits of developer testing...
- Code Complete 2nd, Steve McConnell
- Facts and Fallacies of Software Engineering, Robert L. Glass
And I’ve come to the conclusion that there are some significant weaknesses of developer
testing.
audreyjm529: http://flickr.com/photos/audreyjm529/678762774/
testing is hard
Testing is hard, and most developers aren’t very good at it.
The reason is that most developers tend to write “clean” tests that verify the normal path of
program execution, instead of “dirty” tests that verify error states or boundry conditions
(which is where most errors lie).
McConnell reports: Immature: 5 clean for every 1 dirty. Mature testing org: 5 dirty for 1 clean.
Not less clean tests -- 25x more dirty tests!
aussiegall: http://flickr.com/photos/aussiegall/2238073479/
total_withholdings = 0
employees.each do |employee|
if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

government_retirement = compute_government_retirement(employee)
end
company_retirement = 0
if employee.wants_retirement && eligible_for_retirement(employee)

company_retirement = get_retirement(employee)
end
gross_pay = compute_gross_pay(employee)
personal_retirement = 0
if eligible_for_personal_retirement(employee)
personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay)
end
withholding = compute_withholding(employee)
net_pay = gross_pay - withholding - company_retirement -
government_retirement - personal_retirement
pay_employee(employee, net_pay)
total_withholdings = total_withholdings + withholding

total_government_retirement = total_government_retirement + government_retirement
total_retirement = total_retirement + company_retirement
end
save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken
from CC2e and I have translated it to Ruby.
How many test cases do you think it should take to fully test this code? A simple “clean” test
with all booleans true will give you 100% rcov code coverage.
Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test
cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions,
error states...Full list of test cases in the hand out
1

end

end
end

end
1

end

end
end

end
1

end

end
end

end
1

end

end
end

end
1

end

end
end

end
1

end

end
end

end
1

end

end
end

end
16

end

end
end

end
17
16

end

end
end

end
Code Coverage
code coverage
Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable
ones”. Rcov documentation is very clear about this -- if you read it. But people boil down
something very complicated (their tests) to this one number (code coverage) and then
compare. Makes no sense.
Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells
you just about nothing.
Code Coverage
code coverage
Code Coverage
code coverage
def test_last_day_items_are_privacy_scoped_for_non_friends
non_friend = create_user
story = stories(:learning_no)
story.published_at = 10.minutes.ago
story.save!
story = stories(:aaron_private_story)
story.published_at = 5.minutes.ago
story.save!
items_for_non_friend = accounts(:quentin_and_aaron).last_day_items
assert_privacy_status(items_for_non_friend, "Public")
end
You can’t test what isn’t in the spec. Requirements errors are the most expensive to fix if they
sneak into production.
Story: Slantwise client wanted monthly billing. We thought “Basecamp”. What they really
wanted: customer punches in how many users they want, for how many months, and is then
billed all at once.
Fortunately they are cheap to fix if caught in production. Iterative development.

You can’t test code that’s
not there
You can’t test what isn’t in the spec. Requirements errors are the most expensive to fix if they
sneak into production.
Story: Slantwise client wanted monthly billing. We thought “Basecamp”. What they really
wanted: customer punches in how many users they want, for how many months, and is then
billed all at once.
Fortunately they are cheap to fix if caught in production. Iterative development.

Tests have bugs
Tests are code, code has bugs. Tests are just as likely to have bugs as the code they’re
testing.
jpctalbot: http://flickr.com/photos/laserstars/640499324/
def test_critical_functionality
begin
... Bunch of stuff to exercise code ...
# Commented out by Luke to fix test failure
# assert "Some important assert", condition
rescue
# Don't let anything fail this test!

end
end
Sweet! 100% test coverage!
So who tests the tests? I don’t think there’s a way to do this automatically. You need to review
them by hand.
Adapted from: http://thedailywtf.com/Comments/AddComment.aspx?

ArticleId=5128&ReplyTo=138758&Quote=Y
Developer testing
isn’t very good
at finding defects
Flowizm: http://flickr.com/photos/flowizm/178152601/
Defect Detection Rates of Selected Techniques
Unit testing
Code reviews
Code inspections
Prototyping
System test
0% 25% 50% 75% 100%
Defection detection rates from Code Complete. Full table is in your handout.
Unit test: 15-50%
Informal code reviews: 20-35%
Formal code inspections: 45-70%
Modeling/prototyping: 35-80%
System test (black box): 25-55%
Note: First of all, unit testing isn’t all that great at finding defects. Formal code inspections
can catch up to 70% of the defects. Note also the strength of prototyping, with up to 80%. I
think this is what makes iterative development such a big win.
Manual testing
Code reviews
Unit tests
User testing
* Set overlap completely fabricated
The interesting thing is that different defect detection techniques tend to find different types
of defects.
Complements to developer
testing
GAV01: http://flickr.com/photos/gavinatkinson/196048031/
Manual testing
And of course, there is manual QA. A good QA person is worth their weight in gold. I once
worked with a guy who was an absolute machine at finding bugs, and he was really good at
explaining how they happened and creating bug reports.
You always end up doing some amount of manual testing. It makes sense to have testers to
do this instead of making programmers do it.
Story: how we do manual testing: QA person responsible for verifying fixes; also does
exploratory, blackbox tests.
Stuck in Customs: http://flickr.com/photos/stuckincustoms/858339201/

So if developers aren’t very good at testing, what are they good at? Criticizing other people’s
code.
http://www.osnews.com/story/19266/WTFs_m
Informal “Code reviews” can find between 20-35% of all defects. Formal “code inspections”
between 45-70%. The difference between formal and informal code reviews.
code review kitty is not pleased with your code
Sociological aspect to code reviews. Tell story of my first code review.
Reviewee’s ego as well as code is on the line.
http://flickr.com/photos/louse101/454412441/in/set-72157600062650522/
Growing better developers
Aside: can code reviews help us become better developers?
Skeptical of methodology. 10x developers will be successful no matter what methodology

they use.
So, can code reviews help us become better programmers?
- reading code is the best way to learn.

- constructive criticism from better programmers
As a programmer who’s not young enough to know everything any more, I am hopeful.
Usability
testing
I have been blown away by the problems we have found using usability testing.
The ultimate
You can have 150% code coverage and thousands of unit tests. Not one of them will tell you if
your application sucks.
Jeff Atwood calls usability problems The Ultimate Unit Test Failure.
hans.gerwitz: http://flickr.com/photos/phobia/2308371224/
From Don’t Make Me Think
by Steve Krug
You may thing usability testing involves expensive labs with two-way mirrors and cameras
everywhere. But usability testing testing doesn’t have to be expensive! It’s fun and cheap with
Steve Krug’s techniques.
We use $20 screen recording software and a USB microphone and pay participants about $50.
Don’t put all your
eggs in one basket...
I’m not saying don’t write tests. I’m saying, don’t put all your eggs in one basket.
Andrew Dowsett <http://flickr.com/photos/andrew_dowsett/510812719/>

...or you’ll end up as
roadkill
...or you’ll end up as roadkill.
Thanks!
(You can yell at me over drinks.)
Jan Tik audreyjm529 aussiegall GAV01
jpctalbot Flowizm Stuck in Customs hans.gerwitz
Andrew Dowsett
Jan Tik: http://flickr.com/photos/jantik/6708183/
jpctalbot: http://flickr.com/photos/laserstars/640499324/
Flowizm: http://flickr.com/photos/flowizm/178152601/
GAV01: http://flickr.com/photos/gavinatkinson/196048031/
Stuck in Customs: http://flickr.com/photos/stuckincustoms/858339201/
hans.gerwitz: http://flickr.com/photos/phobia/2308371224/
Andrew Dowsett <http://flickr.com/photos/andrew_dowsett/510812719/>

Testing Is Overrated

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Testing Is Overrated

Uploaded by

Copyright:

Available Formats

?

Who is this guy?

Jan Tik: http://flickr.com/photos/jantik/6708183/

So I had to pick something controversial.

But I think it’s overemphasized to the detriment of other defect-detection techniques.

RSpec object mother

I think it’s because it’s something we, as programmers, can control.

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT

if employee.wants_retirement && eligible_for_retirement(employee)

total_withholdings = total_withholdings + withholding

save_pay_records(total_withholdings, total_government_retirement, total_retirement)

Fortunately they are cheap to fix if caught in production. Iterative development.

Fortunately they are cheap to fix if caught in production. Iterative development.

# Commented out by Luke to fix test failure

# assert "Some important assert", condition

# Don't let anything fail this test!

Sweet! 100% test coverage!

Adapted from: http://thedailywtf.com/Comments/AddComment.aspx?

0% 25% 50% 75% 100%

* Set overlap completely fabricated

Stuck in Customs: http://flickr.com/photos/stuckincustoms/858339201/

Sociological aspect to code reviews. Tell story of my first code review.

Reviewee’s ego as well as code is on the line.

Skeptical of methodology. 10x developers will be successful no matter what methodology

So, can code reviews help us become better programmers?

- reading code is the best way to learn.

Andrew Dowsett <http://flickr.com/photos/andrew_dowsett/510812719/>