You are on page 1of 5

SAFETY ENGINEER

Safety engineering is an engineering discipline which assures that engineered systems


provide acceptable levels of safety. It is strongly related to industrial engineering/systems
engineering, and the subset system safety engineering. Safety engineering assures that a life-
critical system behaves as needed, even when components fail.
The scope of a safety engineer is to perform their professional functions. Safety
engineering professionals must have education, training and experience in a common body of
knowledge. They need to have a fundamental knowledge
of physics, chemistry, biology, physiology, statistics, mathematics, computer
science, engineering mechanics, industrial processes, business, communication and psychology.
Professional safety studies include industrial hygiene and toxicology, design of engineering
hazard controls, fire protection, ergonomics, system and process safety, system safety, safety
and health program management, accident investigation and analysis, product
safety, construction safety, education and training methods, measurement of safety
performance, human behavior, environmental safety and health, and safety, health
and environmental laws, regulations and standards. Many safety engineers have backgrounds
or advanced study in other disciplines, such as management and business administration,
engineering, system engineering / industrial engineering, requirements engineering, reliability
engineering, maintenance, human factors, operations, education, physical and social
sciences and other fields. Others have advanced study in safety. This extends their expertise
beyond the basics of the safety engineering profession.
Functions of a Safety Engineer
The major areas relating to the protection of people, property and the environment are:

Anticipate, identify and evaluate hazardous conditions and practices.


Develop hazard control designs, methods, procedures and programs.
Implement, administer and advise others on hazard control programs.
Measure, audit and evaluate the effectiveness of hazard control programs.
Draft a future safety plan and statement based on real time experiences and facts

Personality and Role


Oddly enough, personality issues can be paramount in a safety engineer. They must be
personally pleasant, intelligent, and ruthless with themselves and their organization. In
particular, they have to be able to "sell" the failures that they discover, as well as the attendant
expense and time needed to correct them. They can be the messengers of bad news.
Safety engineers have to be ruthless about getting facts from other engineers. It is common
for a safety engineer to consider software, chemical, electrical, mechanical, procedural, and
training problems in the same day. Often the facts can be very uncomfortable as many safety
related issues point towards mediocre management systems or worse, questionable business
ethics
Teamwork
It's difficult and expensive to retrofit safety into an unsafe system. To prevent safety
problems, an organization should therefore treat safety as an early design and "architecture"
activity, using the principles of Inherent safety, rather than as a "paperwork requirement" to be
cleaned up after the real design is done.
Safety engineers also must work in a team that includes other engineering specialties,
quality assurance, quality improvement, regulatory compliance specialists, educators and
lawyers.
Subpar safety and quality problems often indicate underlying deficiencies in an
organization's goals, recruitment, succession, training, management systems and business
culture.
Safety often works well in a true matrix-management organization, in which safety is a
managed discipline integrated into a project plan
Analysis Techniques
Analysis techniques can be split into two categories: qualitative and quantitative
methods. Both approaches share the goal of finding causal dependencies between a hazard on
system level and failures of individual components. Qualitative approaches focus on the
question "What must go wrong, such that a system hazard may occur?", while quantitative
methods aim at providing estimations about probabilities, rates and/or severity of
consequences.
The complexity of the technical systems such as Improvements of Design and Materials,
Planned Inspections, Fool-proof design, and Backup Redundancy decreases risk and increases
the cost. The risk can be decreased to ALARA (as low as reasonably achievable) or ALAPA (as
low as practically achievable) levels.
Traditionally, safety analysis techniques rely solely on skill and expertise of the safety
engineer. In the last decade model-based approaches have become prominent. In contrast to
traditional methods, model-based techniques try to derive relationships between causes and
consequences from some sort of model of the system.
Traditional methods for safety analysis
The two most common fault modeling techniques are called failure mode and
effects analysis and fault tree analysis. These techniques are just ways of finding
problems and of making plans to cope with failures, as in probabilistic risk assessment.
One of the earliest complete studies using this technique on a commercial nuclear plant
was the WASH-1400 study, also known as the Reactor Safety Study or the Rasmussen
Report.
Failure Modes and Effects Analysis
Failure Mode and Effects Analysis (FMEA) is a bottom-up, inductive analytical
method which may be performed at either the functional or piece-part level. For
functional FMEA, failure modes are identified for each function in a system or
equipment item, usually with the help of a functional block diagram. For piece-part
FMEA, failure modes are identified for each piece-part component (such as a valve,
connector, resistor, or diode). The effects of the failure mode are described, and
assigned a probability based on the failure rate and failure mode ratio of the function or
component. This quantization is difficult for software ---a bug exists or not, and the
failure models used for hardware components do not apply. Temperature and age and
manufacturing variability affect a resistor; they do not affect software.
Failure modes with identical effects can be combined and summarized in a
Failure Mode Effects Summary. When combined with criticality analysis, FMEA is known
as Failure Mode, Effects, and Criticality Analysis or FMECA, pronounced "fuh-MEE-kuh".
Fault Tree Analysis
Fault tree analysis (FTA) is a top-down, deductive analytical method. In FTA,
initiating primary events such as component failures, human errors, and external events
are traced through Boolean logic gates to an undesired top event such as an aircraft
crash or nuclear reactor core melt. The intent is to identify ways to make top events less
probable, and verify that safety goals have been achieved.
Fault trees are a logical inverse of success trees, and may be obtained by
applying de Morgan's theorem to success trees (which are directly related to reliability
block diagrams).
FTA may be qualitative or quantitative. When failure and event probabilities are
unknown, qualitative fault trees may be analyzed for minimal cut sets. For example, if
any minimal cut set contains a single base event, then the top event may be caused by a
single failure. Quantitative FTA is used to compute top event probability, and usually
requires computer software such as CAFTA from the Electric Power Research
Institute or SAPHIRE from the Idaho National Laboratory.
Some industries use both fault trees and event trees. An event tree starts from
an undesired initiator (loss of critical supply, component failure etc.) and follows
possible further system events through to a series of final consequences. As each new
event is considered, a new node on the tree is added with a split of probabilities of
taking either branch. The probabilities of a range of "top events" arising from the initial
event can then be seen.
Safety Certification
Typically, safety guidelines prescribe a set of steps, deliverable documents, and exit
criterion focused around planning, analysis and design, implementation, verification and
validation, configuration management, and quality assurance activities for the development of
a safety-critical system. In addition, they typically formulate expectations regarding the creation
and use of traceability in the project. For example, depending upon the criticality level of a
requirement, the US Federal Aviation Authority guideline DO-
178B/C requires traceability from requirements to design, and from requirements to source
code and executable object code for software components of a system. Thereby, higher quality
traceability information can simplify the certification process and help to establish trust in the
maturity of the applied development process.
Usually a failure in safety-certified systems is acceptable if, on average, less than one life
per 109 hours of continuous operation is lost to failure.{as per FAA document AC 25.1309-1A}
Most Western nuclear reactors, medical equipment, and commercial aircraft are certified to
this level. The cost versus loss of lives has been considered appropriate at this level (by FAA for
aircraft systems under Federal Aviation Regulations).
Preventing Failure
Once a failure mode is identified, it can usually be mitigated by adding extra or
redundant equipment to the system. For example, nuclear reactors contain
dangerous radiation, and nuclear reactions can cause so much heat that no substance might
contain them. Therefore, reactors have emergency core cooling systems to keep the
temperature down, shielding to contain the radiation, and engineered barriers (usually several,
nested, surmounted by a containment building) to prevent accidental leakage. Safety-critical
systems are commonly required to permit no single event or component failure to result in a
catastrophic failure mode.
Most biological organisms have a certain amount of redundancy: multiple organs,
multiple limbs, etc.
For any given failure, a fail-over or redundancy can almost always be designed and incorporated
into a system.
There are two categories of techniques to reduce the probability of failure: Fault
avoidance techniques increase the reliability of individual items (increased design margin, de-
rating, etc.). Fault tolerance techniques increase the reliability of the system as a whole
(redundancies, barriers, etc.).
Safety and Reliability
Safety engineering and reliability engineering have much in common, but safety is not
reliability. If a medical device fails, it should fail safely; other alternatives will be available to the
surgeon. If the engine on a single-engine aircraft fails, there is no backup. Electrical power grids
are designed for both safety and reliability; telephone systems are designed for reliability,
which becomes a safety issue when emergency (e.g. US "911") calls are placed.
Probabilistic risk assessment has created a close relationship between safety and
reliability. Component reliability, generally defined in terms of component failure rate, and
external event probability are both used in quantitative safety assessment methods such as
FTA. Related probabilistic methods are used to determine system Mean Time Between Failure
(MTBF), system availability, or probability of mission success or failure. Reliability analysis has a
broader scope than safety analysis, in that non-critical failures are considered. On the other
hand, higher failure rates are considered acceptable for non-critical systems.
Safety generally cannot be achieved through component reliability alone. Catastrophic
failure probabilities of 109 per hour correspond to the failure rates of very simple components
such as resistors or capacitors. A complex system containing hundreds or thousands of
components might be able to achieve a MTBF of 10,000 to 100,000 hours, meaning it would fail
at 104 or 105 per hour. If a system failure is catastrophic, usually the only practical way to
achieve 109 per hour failure rate is through redundancy.
When adding equipment is impractical (usually because of expense), then the least
expensive form of design is often "inherently fail-safe". That is, change the system design so its
failure modes are not catastrophic. Inherent fail-safes are common in medical equipment,
traffic and railway signals, communications equipment, and safety equipment.
The typical approach is to arrange the system so that ordinary single failures cause the
mechanism to shut down in a safe way (for nuclear power plants, this is termed a passively
safe design, although more than ordinary failures are covered). Alternately, if the system
contains a hazard source such as a battery or rotor, then it may be possible to remove the
hazard from the system so that its failure modes cannot be catastrophic. The U.S. Department
of Defense Standard Practice for System Safety (MILSTD882) places the highest priority on
elimination of hazards through design selection.
One of the most common fail-safe systems is the overflow tube in baths and kitchen
sinks. If the valve sticks open, rather than causing an overflow and damage, the tank spills into
an overflow. Another common example is that in an elevator the cable supporting the car
keeps spring-loaded brakes open. If the cable breaks, the brakes grab rails, and the elevator
cabin does not fall.
Some systems can never be made fail safe, as continuous availability is needed. For
example, loss of engine thrust in flight is dangerous. Redundancy, fault tolerance, or recovery
procedures are used for these situations (e.g. multiple independent controlled and fuel fed
engines). This also makes the system less sensitive for the reliability prediction errors or quality
induced uncertainty for the separate items. On the other hand, failure detection & correction
and avoidance of common cause failures becomes here increasingly important to ensure
system level reliability.

You might also like