Professional Documents
Culture Documents
Computer Reliability
2
Introduction
• Computer systems are sometimes unreliable
– Erroneous information in databases
– Misinterpretation of database information
– Malfunction of embedded systems
• Effects of computer errors
– Inconvenience
– Bad business decisions
– Injuries or Fatalities
3
What is Software Reliability ?
• According to ANSI, “Software Reliability is
defined as the probability of failure-free software
operation for a specified period of time in a
specified environment”.
8
Distinct Characteristics of
Software and Hardware
• Software is not manufactured- A software is developed it
is not manufactured like hardware. It depends upon the
individual skills and creative abilities of the developer
which is very difficult to specify and even more difficult to
quantify and virtually impossible to standardize.
• Time dependency and life cycle- Software reliability is
not a function of operational time. But it is applicable on
hardware reliability.
• Environmental Factors- Environment factors do not affect
software reliability, but it affect to the hardware.
9
Software faults
• Software is said to contain fault if for some set
of input data the output is not correct.
• Failure Is not the same thing as a “Bug “ or
“ fault” . There is a lot of difference between
these two terms.
• Software failure: It is the departure of the
external results of program operation from
requirements . So failure is dynamic.
• It depends upon the operation and behavior.
10
Software faults
• In other word : A fault is a defect in a program
which arises when programmer makes an error
and causes failure when executed under
particular conditions.
11
Need for software reliability
measurement
• In any software industry , system quality plays an
important role.
• We know that hardware quality is constantly
high .So if the system quality changes , it is because
of the variation in software quality only.
• Software quality can be measured in many ways.
Reliability is an user – oriented measure of
“software quality”.
12
Need for software reliability
measurement
• As an Example, assume that there are 3 programs
that are executing to solve a problem.
• By finding the reliability of each program we can
find which program has less reliability and we can
put more effort to modify that program to improve
the overall reliability of the system.
• So always there is a need to measure the reliability.
13
Increasing reliability
• Reliability can be increased by preventing the above said
errors and developing quality software through all of the
stages of software life cycle. To do this,
– We have to ensure that whether the requirements are clearly
specifying the functionality of the final product or not.
(Requirement phase)
– Among the phases of the software reliability , the second one i.e
useful life is the most important one and so the software product
must be maintained carefully. So we have to ensure that the code
generated can support maintainability to avoid any additional
errors. (Coding phase)
14
Increasing reliability
16
Requirements Reliability Metrics
• Requirements indicate what features the software must
contain.
• So for this requirement document, a clear understanding
between client and developer should exist. Otherwise it is
critical to write these requirements .
• The requirements must contain valid structure to avoid the
loss of valuable information.
• Next , the requirements should be thorough and in a detailed
manner so that it is easy for the design phase.
• The requirements should not contain inadequate information
. 17
Requirements Reliability Metrics
• Next one is to communicate easily .There should not be
any ambiguous data in the requirements. If there exists
any ambiguous data , then it is difficult for the developer
to implement that specification.
• Requirement Reliability metrics evaluates the above said
quality factors of the requirement document.
18
Design and Code Reliability Metrics
• The quality factors that exists in design and coding plan
are complexity , size and modularity.
• If there exists more complex modules, then it is difficult
to understand and there is a high probability of occurring
errors. So complexity of the modules should be less.
• Next coming to size, it depends upon the factors such as
total lines, comments, executable statements etc.
• According to SATC , the most effective evaluation is the
combination of size and complexity.
19
Design and Code Reliability Metrics
• The reliability will decrease if modules have a
combination of high complexity and large size or high
complexity and small size. In the later combination also
the reliability decreases because , the smaller size results
in a short code which is difficult to alter.
• These metrics are also applicable to object oriented code ,
but in this , additional metrics are required to evaluate the
quality.
20
Testing Reliability Metrics
• Testing Reliability metrics uses two approaches to
evaluate the reliability.
• First, it ensures that the system is fully equipped with the
functions that are specified in the requirements. Because
of this, the errors due to the lack of functionality
decreases .
• Second approach is nothing but evaluating the code ,
finding the errors and fixing them.
21
Basic Reliability Metrics
• Some reliability metrics which can be used to quantify the
reliability of the software product are discussed below:-
• MEAN TIME TO FAILURE (MTTF) The first metric
that we should understand is the time that a system is not
failed, or is available. Often referred to as “uptime” in the
IT industry, the length of time that a system is online
between outages or failures can be thought of as the “time to
failure” for that system.
22
Basic Reliability Metrics
• For example, if I bring my RAID array online on Monday at
noon and the system functions normally until a disk failure
Friday at noon, it was “available” for exactly 96 hours.
• If this happens every week, with repairs lasting from Friday
noon until Monday noon, I could average these numbers to
reach a “mean time to failure” or “MTTF” of 96 hours.
• I would probably also call my system vendor and demand
that they replace this horribly unreliable device.
23
Basic Reliability Metrics
• MEAN TIME BETWEEN FAILURE (MTBF) We can
combine MTTF &MTTR metrics to get the MTBF metric.
• MTBF = MTTF + MTTR
24
Basic Reliability Metrics
• MEAN TIME TO REPAIR (MTTR)
25
26
Basic Reliability Metrics
• RATE OF OCCURRENCE OF FAILURE (ROCOF)
30
Disfranchised Voters
• November 2000 general election
• Florida disqualified thousands of voters
• Reason: People identified as felons
• Cause: Incorrect records in voter database
• Consequence: May have affected outcome
of national presidential election
31
False Arrests
• Sheila Jackson Arrested and spent five days in detention
mistaken for Shirley Jackson
32
Accuracy of NCIC Records
• March 2003: Justice Dept. announces FBI not
responsible for accuracy of National Crime
Information Center (NCIC) information
33
Dept. of Justice Position
• Impractical for FBI to be responsible for data’s
accuracy
• Much information provided by other law
enforcement and intelligence agencies
• Agents should be able to use discretion
• If Privacy Act strictly followed, much less
information would be in NCIC
• Result: fewer arrests
34
Position of Privacy Advocates
• Number of records is increasing
• More erroneous records more false
arrests
• Accuracy of NCIC records more important
than ever
35
Errors When Data Are Correct
• Assume data correctly fed into
computerized system
• System may still fail if there is an error in
its programming
36
Errors Leading to System
Malfunctions
• Qwest sent incorrect bills to cell phone customers
• Faulty The United States Department of Agriculture
(USDA) beef price reports
• U.S. Postal Service returned mail addressed to Patent
and Trademark Office
• New York City Housing authority overcharged renters
• About 450 California prison inmates mistakenly
released
37
Errors Leading to System
Failures
• Ambulance dispatch system in London
• Japan’s air traffic control system
• Comair’s Christmas Day shutdown (The 2004 crash
of a critical legacy system at Comair is a classic risk
management mistake that cost the airline $20 million
and badly damaged its reputation)
• NASDAQ stock exchange shut down
• Insulin pump demo at Black Hat conference
38
Comair Cancelled All Flights on
Christmas Day, 2004
39
Analysis: E-Retailer Posts Wrong
Price, Refuses to Deliver
• Amazon.com in Britain offered iPad for £7
instead of £275
• Orders flooded in
• Amazon.com shut down site, refused to deliver
unless customers paid true price
• Was Amazon.com wrong to refuse to fill the
orders?
40