Professional Documents
Culture Documents
As mission critical designers and engineers, we are often asked to explain the concept of
redundancy. After all, a system’s redundancy can have a major effect on availability, reliability,
maintainability, and total cost of ownership. Often, one of the first design decision made relates to
redundancy. In recent years the waters have been muddied with terms like “distributed
redundancy”, “3 to make 2”, and “catcher systems”. In this article we will explain the different
levels of redundancy and how it impacts other system characteristics.
Before we get started, lets define two terms that will be used throughout this article:
UPS Module or UPM - a UPM can be standalone (figure a) or parallel with other UPMs (figure b)
for capacity or redundancy. When paralleled it is connected to a common output bus and shares
controls with the parallel modules.
UPS System or UPS – a UPS is comprised of either a single UPM (figure a) or multiple, parallel
UPMs (figure b).Figure (c) shows two UPS systems, each comprised of a single UPM.A UPS system
does not share controls or output buses with other UPS systems.
Notice in figure (b), the UPMs are designated as A1 and A2, as if to say they are modules 1 and 2
of system ‘A’. Similarly, in figure (c), the two different UPS systems are designated as ‘A’ and ‘B’.
While these designations are not universal, they are very common and will be used for the
purposes of this article.
Non-Redundant, N+0
UPS designs have evolved as the importance of data centers have grown. In the beginning, as it
relates to UPS power, the concern was to protect the critical load from power outages, sags and
swells and other power anomalies. A single UPS system provided the means to bridge the gap for
when utility power failed, and a generator started and powered the UPS and its critical load. This
was a non-redundant UPS design. It met the basic requirement of protecting the load from utility
issues and little more. If the UPS failed, the load was dropped.
Relative to other levels of redundancy, this is designated as N+0, where N represents the ‘System’
of N capacity. Figure (d) below shows a basic, non-redundant UPS system.
The opposite can be said about systems that are underutilized. If each system was only carrying
25% of its redundant capacity, then a system could afford to have two modules fail and still be
redundant. This is a common occurrence for new systems where IT loads are prescribed over time
and why we often include modular expansion capabilities in our design.
This concept of utilization is very important because it has a direct impact on CAPEx and OPEx for
two reasons:
1. The higher the module utilization, the more efficient the modules will operate.
2. The most efficient use of capital are designs that have the highest module and system
utilizations.
Let’s see how this applies to the redundancy examples we introduced so far. Table 1 shows higher
levels of redundancy have lower utilizations for the four examples we considered so far.That’s
because more module capacity is waiting in reserve in case of failure.A non-redundant system
(N+0), provides only enough capacity to match the load with nothing in reserve and therefore has
the highest utilization but at the risk of dropping the load.The 2N+1 system provides the most
redundancy but least amount of module utilization.This added protection comes at a higher cost
of installation and operation.Remember, the 2N+1 system requires 6MW of total capacity to
provide 2MW of redundant capacity.
:
Distributed Redundancy
As the data center industry matured, designers and operators looked for more efficient ways to
deliver high redundant systems. The N+N system design has been the standard configurations
once dual cord power supplies were adopted. The design is simple, reliable and easy to manage.
The downside is low utilization and stranded capacity. Since many data center builds never reach
their full load potential, real utilizations can be closer to 30%.
To address this issue and lower the total cost of ownership, distributed redundancy has gained
favor in the industry. The concept is not new but different from the classical N+N designs. In fact,
our first distributed redundant design was installed and commissioned in 2007. The advantages of
distributed redundancy include higher capacity utilization with system level redundancy.
Maintaining system level redundancy means there are no single points of failure like non-
redundant or N+1 systems.
Figure (h) shows the normal configuration of a 3 to make 2, distributed redundant system. Three
systems are installed to deliver the capacity of two. Any one of the three systems can fail, and load
will seamlessly transfer to the other two systems as shown in figure (i).With this arrangement, each
system can be loaded to 2/3 capacity so that any two systems will not exceed 100% of their
capacity in a failure scenario. For distributed redundancy to work, downstream loads must be
properly managed. If not done so properly a failure of one UPS system may result in an overload
of one of the two remaining systems. Successful load management requires proper design,
monitoring and discipline to maintain distributed redundancy.This is slightly more complicated
than a classical N+N system where its obvious how loads will transfer.
:
The distributed redundant concept can be expanded to a 4 to make 3 arrangement shown in
figure (j).This is a popular colocation design.In this arrangement, four systems are installed to
deliver the capacity of three.Each system can be loaded up to 75% and when a system fails, its
load divides between the three remaining systems as shown in figure (k).
:
Table 2 shows how the two distributed redundant designs compare to the four other types of
redundancy examples previously discussed.With distributed redundancy, system level redundancy
can be achieved with fewer modules and higher utilizations which helps lower the total cost of
ownership.
:
Theoretically, more than four systems can be combined to create distributed redundancy. This will
further increase utilization, but we don’t recommend it. Going beyond 4 to make 3 systems
introduces additional complexities and will ultimately compromise the system reliability for
diminishing returns.
One final redundancy design worth mentioning is the catcher system shown in figure (l). In a
catcher system there is one redundant system capable of backing up all the rest. The redundant
system will be online and unloaded. When a system fails, all its load is transferred to the
redundant system, usually through static transfer switches. The redundant system can be oversized
to support more than one failed system at a time but typically they are sized to match the other
systems as shown in figure (l).
Redundancy comes in a variety of flavors. No one design fits all applications. The best choice
depends on the project requirements. Determining factors include a data center’s operational and
maintenance requirements, uptime targets, risk tolerance, and total cost of ownership. The
general concepts presented here can be extrapolated out into greater detail to include cost and
failure models but hopefully this overview helps provide a foundation for understanding the many
options available for your next data center build.
: