You are on page 1of 40

CHAPTER FIVE

Computer Reliability

Prof. Hatem Abd-Elkader


Dr. Sherif M. Tawfik
1
Learning Objectives
1. Introduction
2. What is Software Reliability ?
3. Software reliability and Hardware reliability
4. Need for software reliability measurement
5. Increasing reliability
6. Software Metrics for Reliability
7. Two Kinds of Data-related Failure

2
Introduction
• Computer systems are sometimes unreliable
– Erroneous information in databases
– Misinterpretation of database information
– Malfunction of embedded systems
• Effects of computer errors
– Inconvenience
– Bad business decisions
– Injuries or Fatalities

3
What is Software Reliability ?
• According to ANSI, “Software Reliability is
defined as the probability of failure-free software
operation for a specified period of time in a
specified environment”.

• The IEEE defines reliability as “The ability of


a system or component to perform its required
functions under stated conditions for a specified
period of time.”
4
Software reliability and
Hardware reliability
• Software reliability The software reliability not
measured on the basis of time, because the software
is never wear out. There is no problem of rust as
like in case of hardware.
• Hardware parts Electronic and mechanical parts
may become ―Old and wear out with time and
usage. In the hardware reliability time is used to
define the reliability of hardware. It means how
much time the hardware remain working without
any defect. 5
Hardware reliability
• In Hardware reliability , in the first phase of the
manufacturing , there may be a high number of faults.
• But after discovering and removing faults this number may
decrease and gradually in the second phase (Useful life) ,
there exists only a few number of faults.
• After this phase , there will be wear out phase in which , the
physical component wear out due to the time and usage and
the number of faults will again increase.
Burn In Useful Wear out
Life

Phases of hardware when considering reliability


6
Software reliability
• In software reliability , at the first phase , i.e while testing
and integration there will be high number of faults, but after
removing the faults , there exists only a few number of
faults and this process of removing the faults continues at a
slower rate .
• Software products will not wear out with time and usage ,
but may become outmoded at a later stage.

Integration Useful obsolete


and testing Life

Phases of software when considering reliability


7
Distinct Characteristics of
Software and Hardware
• Fault- Software faults are mainly design faults where as
hardware faults are mostly physical.
• Wear out- It is an important point, software remain reliable
overtime instead of wearing out like hardware. It become
obsolete (out of fashion) if the environment for which it is
developed changes. Hence software may be retired due to
environmental changes, new requirements, new
expectations etc.

8
Distinct Characteristics of
Software and Hardware
• Software is not manufactured- A software is developed it
is not manufactured like hardware. It depends upon the
individual skills and creative abilities of the developer
which is very difficult to specify and even more difficult to
quantify and virtually impossible to standardize.
• Time dependency and life cycle- Software reliability is
not a function of operational time. But it is applicable on
hardware reliability.
• Environmental Factors- Environment factors do not affect
software reliability, but it affect to the hardware.
9
Software faults
• Software is said to contain fault if for some set
of input data the output is not correct.
• Failure Is not the same thing as a “Bug “ or
“ fault” . There is a lot of difference between
these two terms.
• Software failure: It is the departure of the
external results of program operation from
requirements . So failure is dynamic.
• It depends upon the operation and behavior.
10
Software faults
• In other word : A fault is a defect in a program
which arises when programmer makes an error
and causes failure when executed under
particular conditions.

11
Need for software reliability
measurement
• In any software industry , system quality plays an
important role.
• We know that hardware quality is constantly
high .So if the system quality changes , it is because
of the variation in software quality only.
• Software quality can be measured in many ways.
Reliability is an user – oriented measure of
“software quality”.

12
Need for software reliability
measurement
• As an Example, assume that there are 3 programs
that are executing to solve a problem.
• By finding the reliability of each program we can
find which program has less reliability and we can
put more effort to modify that program to improve
the overall reliability of the system.
• So always there is a need to measure the reliability.

13
Increasing reliability
• Reliability can be increased by preventing the above said
errors and developing quality software through all of the
stages of software life cycle. To do this,
– We have to ensure that whether the requirements are clearly
specifying the functionality of the final product or not.
(Requirement phase)
– Among the phases of the software reliability , the second one i.e
useful life is the most important one and so the software product
must be maintained carefully. So we have to ensure that the code
generated can support maintainability to avoid any additional
errors. (Coding phase)
14
Increasing reliability

– Next we have to verify that all the requirements specified in the


requirement phase are satisfied or not . ( Testing phase )

• As reliability is an attribute of quality , we can say that


reliability depends on software quality .
• So to build a high reliable software there is a need to
measure the attributes of quality that are applied at each
development cycle.
• Software metrics are used to measure these applicable
attributes. The following slides shows different types of
metrics that are applied to improve the reliability of system.
15
Software Metrics for Reliability

• The Metrics are used to improve the reliability of


the system by identifying :
– The areas of requirements (for specification),
– Coding (for errors),
– Testing (for verifying) phases.

16
Requirements Reliability Metrics
• Requirements indicate what features the software must
contain.
• So for this requirement document, a clear understanding
between client and developer should exist. Otherwise it is
critical to write these requirements .
• The requirements must contain valid structure to avoid the
loss of valuable information.
• Next , the requirements should be thorough and in a detailed
manner so that it is easy for the design phase.
• The requirements should not contain inadequate information
. 17
Requirements Reliability Metrics
• Next one is to communicate easily .There should not be
any ambiguous data in the requirements. If there exists
any ambiguous data , then it is difficult for the developer
to implement that specification.
• Requirement Reliability metrics evaluates the above said
quality factors of the requirement document.

18
Design and Code Reliability Metrics
• The quality factors that exists in design and coding plan
are complexity , size and modularity.
• If there exists more complex modules, then it is difficult
to understand and there is a high probability of occurring
errors. So complexity of the modules should be less.
• Next coming to size, it depends upon the factors such as
total lines, comments, executable statements etc.
• According to SATC , the most effective evaluation is the
combination of size and complexity.

19
Design and Code Reliability Metrics
• The reliability will decrease if modules have a
combination of high complexity and large size or high
complexity and small size. In the later combination also
the reliability decreases because , the smaller size results
in a short code which is difficult to alter.
• These metrics are also applicable to object oriented code ,
but in this , additional metrics are required to evaluate the
quality.

20
Testing Reliability Metrics
• Testing Reliability metrics uses two approaches to
evaluate the reliability.
• First, it ensures that the system is fully equipped with the
functions that are specified in the requirements. Because
of this, the errors due to the lack of functionality
decreases .
• Second approach is nothing but evaluating the code ,
finding the errors and fixing them.

21
Basic Reliability Metrics
• Some reliability metrics which can be used to quantify the
reliability of the software product are discussed below:-
• MEAN TIME TO FAILURE (MTTF) The first metric
that we should understand is the time that a system is not
failed, or is available. Often referred to as “uptime” in the
IT industry, the length of time that a system is online
between outages or failures can be thought of as the “time to
failure” for that system.

22
Basic Reliability Metrics
• For example, if I bring my RAID array online on Monday at
noon and the system functions normally until a disk failure
Friday at noon, it was “available” for exactly 96 hours.
• If this happens every week, with repairs lasting from Friday
noon until Monday noon, I could average these numbers to
reach a “mean time to failure” or “MTTF” of 96 hours.
• I would probably also call my system vendor and demand
that they replace this horribly unreliable device.

23
Basic Reliability Metrics
• MEAN TIME BETWEEN FAILURE (MTBF) We can
combine MTTF &MTTR metrics to get the MTBF metric.
• MTBF = MTTF + MTTR

• Thus, an MTBF of 300 indicates that once the failure


occurs, the next failure is expected to occur only after 300
hours.
• In this case the time measurements are real time & not the
execution time as in MTTF.

24
Basic Reliability Metrics
• MEAN TIME TO REPAIR (MTTR)

• Once the failure occur sometime is required to fix the error.

• MTTR measures the average time it takes to track the errors


causing the failure & to fix them.

25
26
Basic Reliability Metrics
• RATE OF OCCURRENCE OF FAILURE (ROCOF)

• It is the number of failures occurring in unit time interval.


The number of unexpected events over a particular time of
operation.
• ROCOF is the frequency of occurrence with which
unexpected behavior is likely to occur.
• An ROCOF of 0.02 means that two failures are likely to
occur in each 100 operational time unit steps. It is also
called failure intensity metric.
27
Basic Reliability Metrics
• PROBABILITY OF FAILURE ON DEMAND (POFOD)
• POFOD is defined as the probability that the system will
fail when a service is requested. It is the number of system
failures given a number of systems inputs.
• POFOD is the likelihood that the system will fail when a
service request is made.
• A POFOD of 0.1 means that one out of a ten service
requests may result in failure. POFOD is an important
measure for safety critical systems. POFOD is appropriate
for protection systems where services are demanded
occasionally. 28
Basic Reliability Metrics
• AVAILABILITY (AVAIL)
• Availability is the probability that the system is available for
use at a given time. It takes into account the repair time &
the restart time for the system.
• An availability of 0.995 means that in every 1000 time
units, the system is likely to be available for 995 of these.
• The percentage of time that a system is available for use,
taking into account planned and unplanned downtime. If a
system is down an average of four hours out of 100 hours of
operation, its AVAIL is 96%.
29
Two Kinds of Data-related Failure

• A computerized system may fail because


wrong data entered into it
• A computerized system may fail because
people incorrectly interpret data they retrieve

30
Disfranchised Voters
• November 2000 general election
• Florida disqualified thousands of voters
• Reason: People identified as felons
• Cause: Incorrect records in voter database
• Consequence: May have affected outcome
of national presidential election

31
False Arrests
• Sheila Jackson Arrested and spent five days in detention
mistaken for Shirley Jackson

• Terry Dean Rogan arrested after someone stole his


identity
– Arrested five times.

32
Accuracy of NCIC Records
• March 2003: Justice Dept. announces FBI not
responsible for accuracy of National Crime
Information Center (NCIC) information

• Should government take responsibility for data


correctness?

33
Dept. of Justice Position
• Impractical for FBI to be responsible for data’s
accuracy
• Much information provided by other law
enforcement and intelligence agencies
• Agents should be able to use discretion
• If Privacy Act strictly followed, much less
information would be in NCIC
• Result: fewer arrests

34
Position of Privacy Advocates
• Number of records is increasing
• More erroneous records  more false
arrests
• Accuracy of NCIC records more important
than ever

35
Errors When Data Are Correct
• Assume data correctly fed into
computerized system
• System may still fail if there is an error in
its programming

36
Errors Leading to System
Malfunctions
• Qwest sent incorrect bills to cell phone customers
• Faulty The United States Department of Agriculture
(USDA) beef price reports
• U.S. Postal Service returned mail addressed to Patent
and Trademark Office
• New York City Housing authority overcharged renters
• About 450 California prison inmates mistakenly
released

37
Errors Leading to System
Failures
• Ambulance dispatch system in London
• Japan’s air traffic control system
• Comair’s Christmas Day shutdown (The 2004 crash
of a critical legacy system at Comair is a classic risk
management mistake that cost the airline $20 million
and badly damaged its reputation)
• NASDAQ stock exchange shut down
• Insulin pump demo at Black Hat conference
38
Comair Cancelled All Flights on
Christmas Day, 2004

AP Photo/Al Behrman, File

39
Analysis: E-Retailer Posts Wrong
Price, Refuses to Deliver
• Amazon.com in Britain offered iPad for £7
instead of £275
• Orders flooded in
• Amazon.com shut down site, refused to deliver
unless customers paid true price
• Was Amazon.com wrong to refuse to fill the
orders?

40

You might also like