Professional Documents
Culture Documents
How To Manage A Risk Analysis Program
How To Manage A Risk Analysis Program
SUMMARY
A risk analysis program is one of the most important activities in which a reliability engineer, quality engineer or project designer can take part. The team activities that go into creating a risk analysis are often some of the most critical and important for any company. Primary results are the reduction of direct risk to a company and the improvement of products and processes. The secondary results are often more important. These include time to market, corporate reputation, enhanced system performance, reduced reliability and warranty costs and maintenance of a favorable reception of a corporate product line. Most importantly, included in secondary benefits is the extension and enhancement of corporate reputation. Risk analysis can be viewed as the balancing of a number of dynamic, non-linear and critical corporate business factors all related to corporate success (McLinn 1994, 1996, 1997). Thus, the method of and outcomes from the risk analysis are vital. Three common tools are used for risk analysis: failure mode effects analysis (FMEA), fault tree analysis (FTA) and hazard analysis.
KEY WORDS
failure mode effects analysis, hazard analysis, risk analysis
INTRODUCTION
The three common tools of risk analysis are briefly and incompletely described in a number of reliability books and articles. There are several major problems with these brief descriptions. They might be grouped as technical problems, execution problems and team problems. This paper will show how many of these problems may be avoided or prevented. In addition, it will go beyond the simple technical description to show how the level of effort to complete the FMEA, FTA or hazard analysis may be reduced while the quality of the output is improved. Harm is defined in the European Standard (EN1441:1998), as physical injury and/or damage to health or property, while hazard is defined as a potential source of harm. Risk is the probable rate of occurrence of a hazard causing harm and the degree of severity of the harm, while risk analysis is the investigation of available information to identify hazards and estimate risks. Which method is best? This question is often asked and the answer depends upon what you know, the stage of the project and the main goal. Each method has strengths and weaknesses. Hazard analysis is a short method best applied during the early stages. Since this method has somewhat vague scales for severity and occurrence, it is compatible with the lack of detailed knowledge that is typical of the early design/development stage. It may provide estimates of problems and safety concerns. The strength of the FTA is that multiple failure causes can be easily described. For example, when a hardware and software failure both occur, FTA easily covers the possible complex interrelationship, but gives little help in determining which failure modes are the most significant. No simple priority of importance results during an FTA. One value of FTA is that it can be done any time, even very early in the design process. One need only have a knowledge of the system functions and relationships. Detailed design information is not required. Safety issues can be identified but absolute estimates of likelihood do not typically result. FMEA, on the other hand, gives priority of importance but requires detailed design knowledge and cannot easily handle multiple failure causes. Think of hazard
40
41
analysis as a quick estimate, FTA to cover complex relationships and FMEA to identify priorities. Some companies choose to perform 2 or all 3 of these activities.
42
The FMEA is a collection of ideas that are most of the time not organized in a logical fashion. The FMEA typically starts with the failure mode, moves to the effect on the customer and lastly lists some of the causes. Normal everyday logic operates in a cause, mode and then effect. Most often, the failure mode can be described as the overt, directly observable mechanism that leads to a system effect. It is often confused with the cause. Treat the mode as the way and the cause as the why and the effect as the what happened.
Purpose 1. ______________________________________________________________________________ 2. ______________________________________________________________________________ 3. ______________________________________________________________________________ Assumptions ________________________________________________________________________________________ ________________________________________________________________________________________ ________________________________________________________________________________________ Natural Limits ________________________________________________________________________________________ ________________________________________________________________________________________ ________________________________________________________________________________________ Figure 1. FocusSheet.
43
Figure 2.
is possible to describe 1, 2 or even 3 levels of low system impact. These are called simply aesthetics and typically describe paint blemishes, dented system chassis or even the loss of back lighting. Sometimes it is said that these require special tools or knowledge to measure. Beyond this level, there are always minor system functions that are not critical to system operation. One, 2 or even 3 levels of these may be described as appropriate. Next, in ascending importance is the failure of major system functions. These are the functions that are required for system operation, so there is strong customer impact. One, 2 or even 3 levels of impact can be described. These might be a slight error in a reading, such as 2%. Next, is a large error in a reading such as 40% or an intermittent reading (operation). The last level could be no reading or operation. Above these are failures that lead to safety situations or violations of regulatory standards. One to 5 levels might be described, such as minor injury not requiring medical attention, minor injury requiring professional medical attention, injury reversible such as broken arms, irreversible serious injury and serious hazard (see Figure 3). Now meld these into ten categories. The occurrence table can be easily created. It represents either some probability of failure or is proportional to a failure rate. It measures the probability of the cause being present. It is not the probability of the effect, as is often mistaken. There are a variety of ways to create an occurrence table. Consider the following approach for creating a ten level table.
Figure 3.
Severity table.
44
Determine how many systems, N, will be built over some period of time. Then, lowest probability of failure is 1 . This often sets a lower limit. At the other extreme consider the threshold for a single component or assembly is N of economic pain. Imagine 20% of the systems would fail during a period of time such as the first five years. With the two extremes defined, the eight remaining can be determined one of two ways. Divide the distance between the upper and lower limit into eight even categories. This approach tends to overemphasize the high probabilities of failure. It ignores the large relative differences between low probabilities. A viable alternative is to take the ratio of the highest number divided by the lowest and then the eighth root (Method B shown in Figure 4). This gives a multiplicative factor for forming each of the categories. Method A has all the probabilities as 2 while method B has four different numbers for the four different probabilities. Other methods do exist for creating simpler scales. These are shown later.
Number 1 2 3 4 5 6 7 8 9 10
Range - Less than 1/1,000,000 - 1 to 4.6/1,000,000 - 4.61 to 21/1,000,000 - 21.1 to 97/1,000,000 - 97.1 to 447/1,000,000 - 447.1 to 2/1,000 - 2.1 to 9.45/1,000 - 9.46 to 43.5/1,000 - 43.6 to 0.20 1.20 - Greater than 0.20 Figure 4. Occurrence table.
Std. Dev. 4.83 4.61 4.26 3.93 3.51 3.09 2.60 2.02
45
Number 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Description of Verification Effectivity Probably catch more than 98% of mode during development. Catch more than 95% of mode during development. Catch more than 90% of mode during development. Catch more than 80% of mode during development. Catch more than 60% of mode during development. Catch more than 40% of mode during development. Catch more than 20% of mode during development. Catch more than 10% of mode during development. Catch more than 5% of mode during development. Catch more than 2% of mode during development. Figure 5. Verification table.
4. Once a discussion starts on an entry, the facilitator should mentally ask after about a minute Can this discussion go anywhere? If so, let it continue as is by saying nothing. At about five minutes, if the discussion has not reached a conclusion, take a more active role to bring it to one. Summarize, restate and clarify as necessary to get the team to an agreement or a recognition that outside data or a different approach is required. 5. If a strong disagreement arises, let it go on briefly. About 30 seconds after it starts, the facilitator should ask if this can be resolved. If it does not look like it can be resolved or is only a matter of opinion, the facilitator should stop the team and ask Who will take the action to find out or resolve the disagreement? Resolve the problem outside of the meeting, perhaps by those who disagree. Now it is time to get started for the FMEA team, so begin at the FocusSheet to get the team moving.
Figure 6.
46
Number 1 2 3 4 5
Description Aesthetic failure or Negligible Impact Minor function failure or Marginal Impact Major function failure or Significant Impact Minor safety issue or Reversible Injury Serious safety/regulatory issue or Serious Hazard Figure 7. Severity table for hazard.
The fault tree is built of simple AND and OR logic elements. These may be combined through complex logic paths. Low level events such as the two that contribute to the electrical failure AND in Figure 6 may be built from complex combinations. The circle represents a lowest component level failure, while the diamond is an event of low significance. Common mode failure causes may also be easily depicted in an FTA. Thus, the same low level problem that contributes to the electrical failure may also be part of another fault event. All FTA teams can certainly follow the same facilitator recommendations as outlined in the FMEA section. A FocusSheet is appropriate in this case as well. Few terms appear on the fault tree, but occasionally a detailed written table may be created by the team to explain or provide more detail to the FTA.
Number 1 2 3 4 5
Description Very rare event Rare event Occasional Frequent Very Frequent Figure 8.
Numerical Probability Less than 1/10,000 probability from 1/10,000 to 1/1000 probability from 1/1000 to 1/100 probability from 1/100 to 1/10 probability greater than 1/10 probability
47
Figure 9.
Hazard table.
CONCLUSIONS
The management of risk analysis programs is one of the most important product development activities that an engineer or manager can perform. Three common tools were described in this paper with recommendations on their use. In addition, methods of easy implementation were noted. Planning for the risk analysis program is the best way to ensure that it will come to successful completion.
REFERENCES
European Standard EN1441:1998. Ireson, Grant, Coombs, Clyde and Moss, B. 1996. The Handbook of Reliability Engineering and Management, New York, McGraw-Hill. Mattsson, Fredric, SEMKO. 1995. An Introduction to Risk Analysis for Medical Devices, Compliance Engineering, November/December, pp. 4757. McLinn, James. 1994. Improving the Product Development Process, Proceedings of the 48th Annual Quality Congress. pp. 507516. McLinn, James. 1996. Reliability Development and Improvement of a Medical Instrument, Proceedings of the Annual Reliability and Maintainability Symposium, pp. 236242. McLinn, James. 1997. TQM Metrics for Product Development and Projects, 41st Annual Congress of the European Organization for Quality, Trondheim, Norway, vol. 1 pp. 149158. OConnor, Patrick. 1996. Practical Reliability Engineering, 3rd. edition. John Wiley and Sons, New York. Raheja, Dev. 1990. Assurance Technologies: Principles and Practices, McGraw-Hill, pp. 142157. SAE Potential Failure Mode and Effects Analysis. 1993. Published by the Automotive Industry Action Group (AIAG). Stamatis, D. H. 1995. Failure Mode and Effect Analysis: FMEA from Theory to Execution. ASQ, Milwaukee.
How to Manage a Risk Analysis Program May 24, 1999 James A. McLinn
mean! Dont talk around the topic. The contents are very important. I Be precise and succinct with words and phrases. I Dont say too much. Each entry is not a book. Keep the entries focused.
sentence to describe a complete thought. I Avoid certain words - Wrong, Bad, Too, No, Good, Out of Specification are usually part of vague terms or phrases.
heavy use and light use customers. I Remember to include the possibility of customer misuse, abuse and rough handling of the system. I What level of automatic protections and warnings exist to prevent misuse or injury?
The three tables are the yards sticks of the FMEA. If they are well designed, the FMEA contents to be meaningful.
Design FMEA Process FMEA
1 to 3 levels
1 to 3 levels
Std. Dev.
+4.83 +4.61 +4.26 +3.93 +3.51 +3.09 +2.60 +2.02 +1.28
prepared with data at the first team meeting. I Encourage all people to participate. I Employ good time management.
Focus Sheet
Purpose 1 -Design ____________________________________ 2 - Manufacturing______________________________ 3 - Customer__________________________________ I Assumptions ____________________________________________ ____________________________________________ I Natural Limits ____________________________________________ ____________________________________________
I
identifying the failures of a incompletely defined system. I May show complex logic and interconnections. I Can show how software and hardware events interact.
Hazard Analysis
A simple approach to estimate risk. I Can be best utilized when few details about a design or process are known. I Much less work than FMEA or FTA. I Lower quality output, hazards less precisely known. I Employs Severity and Occurrence only.
I
Hazard Table
Severity Serious Major Requires Action Occurrence Frequent 1 3 Probable 2 5 Occasional 4 6 Remote8 10 Improbable 12 15 Significant
May Require Action
Marginal 13 16 18
7 9 11 14 17 19
20
No Activity Required