You are on page 1of 52

Building Software: An Artful Science

Michael Hogarth, MD
Software development is risky
“To err is human, to really foul things up requires a computer”

IBM’s Consulting Group survey :

55% of the software developed cost more that projected

68% took longer to complete than predicted.

88% had to be substantially redesigned.

Standish Group Study of 8,380 software projects (1996):

31% of software projects were canceled before they were completed

53% of those are completed cost an average of 189% of their original estimates.

42% of completed projects - have their original set of proposed features and functions.

9% - completed on time and on budget.


Standish Group report 2006
19% of projects were outright failures
35% could be categorized as successes (better than 1996, but not great)
46% of projects were “challenged” (either had cost overruns or delays, or both)
McDonald’s gets McFried
McDonald’s “Innovate” Project
$500 million spent for nothing....
Objective --
“McDonald's planned to spend $1 billion over five years to tie all its
operations in to a real-time digital network. Eventually, executives in company
headquarters would have been able to see how soda dispensers and frying
machines in every store were perfect.
Why was it scrubbed?
“information systems don't scrub toilets and they don't fry potatoes”

Barrett, 2003. http://www.baselinemag.com/c/a/Projects-Supply-Chain/McDonalds-McBusted/


FBI’s “Virtual Case File”
2003 - Virtual Case File - networked system for tracking criminal cases

SAIC spent months writing over 730,000 lines of computer code

Found to have hundreds of software problems during testing

$170 million dollar project was cancelled -- SAIC reaped more than $100 million

Problems

delayed by over a year. In 2004, the system was 1/10th of the functionality intended and thus largely unusable after
$170 spent

SAIC delivered what FBI requested, the requesting was flawed, poorly planned, not tied to scheduled deliverables

Now what?

Lockheed Martin given contract for $305 million tied to benchmarks

http://www.washingtonpost.com/wp-dyn/content/article/2006/08/17/AR2006081701485_pf.html
Causes of the VCF Failure
Changing requirements (conceived before 9/11, after 9/11 requirements were
altered significantly)
14 different managers over the project lifetime (2 years)
Poor oversight by the primary ‘owner’ of the project (FBI) - did not oversee
construction closely
Did not pay attention to new, better commercial products -- kept head in the sand
because it “had to be built fast”
Hardware was purchased first, waiting on software (common problem) -- if
software is delayed, hardware is “legacy” quickly

http://www.inf.ed.ac.uk/teaching/courses/seoc2/2004_2005/slides/failures.pdf
Washington State Licensing Dept
1990 - Washington State License Application Mitigation Project

$41.8 million over 5 years to automate the State’s vehicle registration and license renewal process

1993 - after $51 million, the original design and requirements were expected to be obsolete when
finally built

1997 - Washington legislature pulled the plug -- $40 million wasted

Causes

ambitious

lack of early deliverables

development split between in-house and contractor


J Sainsbury IT failure
“to err is human, to really foul up requires a root password.”
anonymous

UK food retailer, J. Sainsbury, invested in an automated supply-chain


management system
System did not perform the functions as needed
As a result, merchandise was stuck in company warehouses and not
getting to the stores
Company added 3,000 additional clerks to stock the shelves manually
They killed the project after spending $526 million.....
Other IT nightmares
1999 - $125 million NASA Mars Climate Orbiter lost in space due to a data conversion error...

Feb 2003 - U.S. Treasury Dept. mailed 50,000 Social Security checks without beneficiary names.
Checks had to be ‘cancelled’ and reissued...

2004-2005 - UK Inland Revenue (IRS) software errors contribute to a $3.45billion tax-credit


overpayment

May 2005 - Toyota had to install a software fix on 20,000 hybrid Prius vehicles due to problems with
invalid engine warning lights. It is estimated that the automobile industry spends $2-$3billion/year
fixing software problems

Sept 2006 - A U.S. Government student loan service software error made public the personal data of
21,000 borrowers on it’s web site

2008 - new Terminal 5 at Heathrow Airport -New automated baggage routing system leads to over
20,000 bags being put in temporary storage...
does it really matter?
Software bugs can kill...

http://www.wired.com/software/coolapps/news/2005/11/69355
When users inadvertently cause
disaster

http://www.wired.com/software/coolapps/news/2005/11/69355?currentPage=2
How does this happen?
Many of the runaway projects are ‘overly ambitious’ -- a major issue (senior
management has unrealistic expectations of what can be done)
Most projects failed because of multiple problems/issues, not one.
Most problems/issues were management related.
In spite of obvious signs of the runaway software project (72% of project members
are aware), only 19% of senior management is aware
Risk management, an important part of identifying trouble and managing it, was
NOT done in any fashion in 55% of major runaway projects.
Causes of failure
Project objectives not fully specified -- 51%
Bad planning and estimating -- 48%
Technology is new to the organization -- 45%
Inadequate/no project management methods -- 42%
Insufficient senior staff on the team -- 42%
Poor performance by suppliers of software/hardware (contractors) --
42%

http://members.cox.net/johnsuzuki/softfail.htm
The cost of IT failures
2006 - $1 Trillion dollars spent on IT hardware, software,
and services worldwide...
18% of all IT projects will be abandoned before delivery
(18% of $1 trillion = $180 billion?)
53% will be delivered late or have cost overruns
1995 - Standish estimated the U.S. spent $81 billion for
cancelled software projects.....
Conclusions
IT projects are more likely to be unsuccessful than
successful
Only 1 in 5 software projects bring full satisfaction
(succeed)
The larger the project, the more likely the failure

http://www.it-cortex.com/Stat_Failure_Rate.htm#The%20Robbins-Gioia%20Survey%20(2001)
Software as engineering
Software has been viewed more as “art” than engineering
has lead to lack of structured methods and organization for building software systems

Why is a software development methodology important?


programmers are expensive
many software system failures can be traced to poor software development
requirements gathering is incomplete or not well organized
requirements are not communicated effectively to the software programmers

inadequate testing (because testers don’t understand the requirements)


Software Development Lifecycle
Domain Analysis

Software Analysis

Requirements Analysis

Specification Development

Programming (software coding)

Testing

Deployment

Documentation

Training and Support

Maintenance
Software Facts and Figures
Maintenance consumes 40-80% of software costs during the lifetime of a software system
-- the most important part of the lifecycle
Error correction accounts for 17% of software maintenance costs
Enhancement is responsible for 60% of software maintenance costs -- most of the cost is
adding new capability to old software, NOT ‘fixing’ it.
Relative time spent on phases of the lifecycle
Development -- defining requirements (15%), design (20%), programming (20%),
testing and error removal (40%), documentation (5%)

Maintenance -- defining the change (15%), documentation review (5%), tracing logic
(25%), implementing the change (20%), testing (30%), updating documentation (5%)

RL Glass. Facts and Fallacies of Software Engineering. 2003


Software development models
Waterfall model
specification --> development --> testing --> deployment
Although many use this still, it is flawed and at the root of
much of the waste in software development today
Evolutionary development -- interleaves activities of
specification, development, and validation (testing)
Evolutionary development
Exploratory Development
work with customer/users to explore their requirement and deliver a
final system. The development starts with the parts of the system that
are understood. New features are added in an evolutionary fashion.
Throw-away prototyping
create a prototype (not formal system), which allows for understanding
of the customer/users requirements. Then one builds “the real thing”

Sommerville, Software Engineering, 2004


Spiral Model
Spiral Model - process that goes
through all steps of the software
development lifecycle
repeatedly, with each cycle
ending up with a prototype for
the user to see -- it is just for
getting the requirements “right”,
the prototypes are discarded
after each iteration
Challenges with Evolutionary
Development
The process is not visible to management -- managers often need regular
deliverables to measure progress.
causes a disconnect as managers want “evidence of progress”, yet the
evolutionary process is fast and dynamic making ‘deliverables’ not
cost-effective to produce (they change often)
System can have poor structure
Continual change can create poor system structure
Incorporating changes becomes more and more difficult

Sommerville, Software Engineering, 2004


Agile software development
Refers to a group of software development methods that promote iterative
development, open collaboration, and adaptable processes
Key characteristics
minimize risk by developing software in multiple repetitions (timeboxes),
iterations last 2-4 weeks
Each iteration passes through a full software development lifecycle -
planning, requirements gathering, design, writing unit tests, then coding until
the unit tests pass, acceptance testing by end-users
Emphasizes face-to-face communication over written communication
Agile software methods
Scrum
Crystal Clear
Extreme Programming
Adaptive Software Development
Feature Driven Development
Test Driven Development
Dynamic Systems Development
Scrum
A type of Agile methodology
Composed of “sprints” that run anywhere from 15-30 days during which the team
creates an increment of potentially shippable software.
The features that go into that ‘sprint’ version come from a “product backlog”, a set
of prioritized high level requirements of work to be done
During a ‘backlog meeting’, the product owner tells the team of the items in the
backlog they want completed.
The team decides how much can be completed in the next sprint
*requirements are frozen for a sprint” -- no wandering or scope shifting...

http://en.wikipedia.org/wiki/Scrum_(development)
Scrum and useable software...
A key feature of Scrum is the idea that one creates useable software with each iteration

It forces the team to architect “the real thing” from the start -- not a “prototype” that is only
developed for demonstration purposes

For example, a system would start by using the planned architecture (web based
application using java 2 enterprise architecture, oracle database, etc...)

It helps to uncover many potential problems with the architecture, particularly one that
requires a number of integrated components (drivers that don’t work, connections between
machines, software compatibility with the operating system, digital certificate
compatibility or usability, etc...)

It allows users and management to actually use the software as it is being built....
invaluable!
Scrum team roles
Pigs and Chickens -- think scrambled eggs and bacon -- the chicken is supportive,
but the pig is committed.
Scrum “pigs” are committed the building the software regularly and frequently
Scrum Master -- the one who acts as a project manager and removes impediments to the
team delivering the sprint goal. Not the leader of the team, but buffer between team and
any chickens or distracting influences.
Product owner -- the person who has commissioned the project/software. Also known as
the “sponsor” of the project.

Scrum “chickens” are everyone else


Users, stakeholders (customers, vendors), and other managers
Adaptive project management
Scrum general practices
customers become part of the development team (you have to have interested users...)
Scrum is meant to deliver working software after each sprint, and the user should
interact with this software and provide feedback

Transparency in planning and development -- everyone should know who is accountable


for what and by when
Stakeholder meetings to monitor progress

No problems are swept under the carpet -- nobody is penalized for uncovering a
problem

http://en.wikipedia.org/wiki/Scrum_(development)
Typical Scrum Artifacts
Spring Burn Down Chart
a chart showing the features for that sprint and the daily progress in
completing these
Product Backlog
a list of the high level requirements (in plain ‘user speak’)
Sprint Backlog
A list of tasks to be completed during the sprint
Agile methods and systems
Agile works well for small to medium sized projects (around 50,000 - 100,000
lines of source code)
Difficult to implement in large, complex system development with hundreds of
developers in multiple teams
Requires each team be given “chunks of work” that they can develop
Integration is key -- need to use standard components and standards for coding,
interconnecting, data modeling so each team does not create their own naming
conventions and interfaces to their components.
Quality assurance
The MOST IMPORTANT ASPECT of software development
Quality Assurance does not start with “testing”
Quality Assurance starts at the requirements gathering stage
“software faults” -- when the software does not perform as the user intended
bugs
requirements are good/accurate, but the programming causes a crash or other
abnormal state that is unexpected
requirements were wrong, programming was correct -- still a bug from the
user’s perspective
Some facts about bugs
Bugs in the form of poor requirements gathering or poor communication
with programmers is by far the most expense in a software development
effort
Bugs caught at the requirements or design stage are cheap
Bugs caught in the testing phase are expensive to fix
Bugs not caught are VERY EXPENSIVE in many ways
loss of customers/user trust
need to “fix” it quick -- lends itself to yet more problems because
everyone is panicking to get it fixed asap.
Software testing
System Testing
“black box” testing
“white box” testing
Regression Testing
Black box testing
Treats software as a black-box without knowledge of its interior
workings
It focuses simply on testing the functionality according to the
requirements
Tester inputs data, and sees the output from the process
White box testing
Tester has knowledge of the internal data structures and algorithms

Types of white box testing

Code Coverage - The tester creates tests to cause all statements in the program to be executed at
least once

Mutation Testing - software code is created that modifies the software slightly to emulate typical
user mistakes (using the wrong operator or variable name). Meant to test whether code is ever
used.

Fault injection - Introduce faults in the system on purpose to test error handling. Makes sure the
error occurs as expected and the system handles the error rather than crashing or causing an
incorrect state or response.

Static testing - primarily syntax checking and manual reading of the code to check errors (code
inspections, walkthroughs, code reviews)
Test Plan
Outlines the ways in which tests will be developed, the naming and classification for the various
failed tests (critical, show stopper, minor, etc..)

Outlines the features to be tested, the approach to be used, suspension criteria (the conditions
under which a test fails)

Describes the environment -- the test environment, including hardware, networking, databases,
software, operating system, etc..

Schedule -- lays out a schedule for the testing

Acceptance criteria - an objective quality standard that the software must meet in order to be
considered ready for release (minimum defect count and severity levels, minimum test
coverage, etc...)

Roles and responsibilities -- who does what in the testing process


Test cases
A description of a specific ‘test’ or interaction to test a single
behavior or function in the software
Similar to ‘use cases’ as they outline a scenario of interaction --
however, one can have many tests for a single use case
Example -- login is a use case; need a test for successful login,
one for unsuccessful login, one to test the expiration, lockout,
how many tries before lockout, etc..
Components of a test case
Name and number for the test case
The requirement(s) or feature(s) the test case is exercising
Preconditions -- what must be set in place for the test to take place
example, to test whether one can register a death certificate, one must have a death
certificate filled out and which has passed validations and has been submitted to the
local registrar...

Steps -- list of steps describing how to perform the test (log in, select patient A,
select medication list, pick Amoxicillin, click ‘submit to pharmacy’, etc..)
Expected results - describe the expected results up front so the tester knows
whether it failed or passed.
Regression testing
designed to find ‘software regressions’ -- when previously working functionality is
now not working because of changes made in other parts of the system
As software is versioned, this is the most common type of bug or “fault”
The list of ‘regression tests’ grows
a test for the functions in all previous versions

a test for any previously found bugs -- create a test to test that scenario

Manual vs. Automated


mostly done manually, but can be automated -- we have automated 500 tests
Risk is good.... huh?
There is no worthwhile project that has no risk -- risk is part of the game
Those that run away from risk and focus on what they know never advance the
standard and leave the field open to their competitors
Example: Merryl Lynch ignored online trading at first, allowing other brokerage
firms to create a new market - eTrade, Fidelity, Schwab. Merril Lynch eventually
entered 10 years later.
Staying still (avoiding risk) means you are moving backwards
Bob Charrette’s Risk Escalator -- everyone is on an escalator and it is moving against
you, you have to walk to stay put, run to get ahead. If you stop, you start moving
backwards

DeMarco and Lister. Waltzing with Bears: Managing Risk on Software Projects. 2003.
But don’t be blind to risk
Sometimes those who are big risk takers have a tendency to
emphasize positive thinking by ignoring the consequences of the
risk they are taking
If there are things that could go wrong, don’t be blind to them --
they exist and you need to recognize them.
If you don’t think of it, you could be blind-sided by it

DeMarco and Lister. Waltzing with Bears: Managing Risk on Software Projects. 2003.
Examples of risks
“Risk management often gives you more reality than you want.”
-- Mike Evans, Senior VP, ASC Corporation

BCT.org -- a dependency on externally built and maintained software (caMATCH)

BCT.org -- a need to have a hard “launch” date


eCareNet -- a dependency on complex software only understood by a small group of
“gurus” (Tolven system)
TRANSCEND -- integration of system components that have never been integrated before
(this is common -- first time integration).
TRANSCEND -- clinical input to CRF process has never been done before.
TRANSCEND -- involves multiple sites not under our control, user input will be difficult
to obtain because everyone is busy, training will be difficult because everyone is busy,
there are likely detractors already and we have not voice in their venue
Managing risks
What is a risk? -- “a possible future event that will lead to an undesirable outcome”

Not all risks are the same

they have different probabilities that they will happen

They have different consequences -- high impact, low impact

Some may or may not have alternative actions to avoid or mitigate the risk if it comes to pass --
“is there a feasible plan B”

“Problem” -- a risk is a problem that is yet to occur, a problem is a risk that has occurrred

“Risk transition” -- when a risk becomes a problem, thus it is said the risk ‘materialized’

“Transition indicator” -- things that suggest the risk may transition to a problem. Example -- Russia
masses troops on the Georgian border...
DeMarco and Lister. Waltzing with Bears: Managing Risk on Software Projects. 2003.
Managing risks
Mitigation - steps you take before the transition or after to make corrections (if
possible) or to minimize the impact of the now “problem”.
Steps in risk management
risk discovery
exposure analysis (impact analysis)
contingency planning -- creating planB, planC, etc.. as options to engage if the risk
materializes
mitigation -- steps taken before transition to make contingency actions possible
transition monitoring -- tracking of managed risks, looking for transitions and
materializations (risk management meetings).
DeMarco and Lister. Waltzing with Bears: Managing Risk on Software Projects. 2003.
Common software project risks
Schedule flaw - almost always due to neglecting work or minimizing work that is necessary

Scope creep (requirements inflation) or scope shifting (because of market conditions or changes
in business requirements) -- inevitable -- don’t believe you can keep scope ‘frozen’ for very
long

recognize it, create a mitigation strategy, recognize transition, and create a contingency

for example, if requirements need to be added or changed, need to make sure ‘management’
is aware of the consequences and adjustments are made in capacity, expectation, timeline,
budget.

It is not bad to change scope -- it is bad to change scope and believe nothing else needs to
change

DeMarco and Lister. Waltzing with Bears: Managing Risk on Software Projects. 2003.
“Post mortem” evaluations
No project is “100% successful” -- they all have problems, some have less than
others, some have fatal problems.
It is critical to evaluate projects after they are completed to characterize common
risks/problems and establishing methods of mitigation before the next project
Capability Maturity Model (CMM)
A measure of the ‘maturity’ of an organization in how they
approach projects
Originally developed as a tool for assessing the ability of
government contractors processes to perform a contracted software
project (can they do it?)
Maturity Levels -- 1-5. Level 5 is where a process is optimized by
continuous process improvement
CMM in detail
Level 1 - Ad hoc: -- processes are undocumented and in a state of dynamic change,
everything is ‘ad hoc’
Level 2 - Repeatable: -- some processes are repeatable with possibly consistent reults
Level 3 - Defined: -- set of defined and documented standard processes subject to
improvement over time
Level 4 - Managed: --using process metrics to control the process. Management can
iddntify ways to adjust and adapt the process
Level 5 - Optimized: -- process improvement objectives are established (post mortem
evaluation...), and process improvements are developed to address common causes of
process variation.
Why medical software is hard...

Courtesy Dr. Andy Coren, Health Information Technology: A Clinician’s View. 2008
Healthcare IT failures
Hard to discover -- nobody airs dirty laundry
West Virginia -- system has to be removed a week after
implementation
Mt Sinai -- 6 weeks after implementation, system is “rolled
back” due to staff complaints

You might also like