You are on page 1of 8

An Introduction to UPS Redundancy

As mission critical designers and engineers, we are often asked to explain the concept of
redundancy. After all, a system’s redundancy can have a major effect on availability, reliability,
maintainability, and total cost of ownership. Often, one of the first design decision made relates to
redundancy. In recent years the waters have been muddied with terms like “distributed
redundancy”, “3 to make 2”, and “catcher systems”. In this article we will explain the different
levels of redundancy and how it impacts other system characteristics.

Before we get started, lets define two terms that will be used throughout this article:

UPS Module or UPM - a UPM can be standalone (figure a) or parallel with other UPMs (figure b)
for capacity or redundancy. When paralleled it is connected to a common output bus and shares
controls with the parallel modules.

UPS System or UPS – a UPS is comprised of either a single UPM (figure a) or multiple, parallel
UPMs (figure b).Figure (c) shows two UPS systems, each comprised of a single UPM.A UPS system
does not share controls or output buses with other UPS systems.

Notice in figure (b), the UPMs are designated as A1 and A2, as if to say they are modules 1 and 2
of system ‘A’. Similarly, in figure (c), the two different UPS systems are designated as ‘A’ and ‘B’.
While these designations are not universal, they are very common and will be used for the
purposes of this article.

Non-Redundant, N+0
UPS designs have evolved as the importance of data centers have grown. In the beginning, as it
relates to UPS power, the concern was to protect the critical load from power outages, sags and
swells and other power anomalies. A single UPS system provided the means to bridge the gap for
when utility power failed, and a generator started and powered the UPS and its critical load. This
was a non-redundant UPS design. It met the basic requirement of protecting the load from utility
issues and little more. If the UPS failed, the load was dropped.

Relative to other levels of redundancy, this is designated as N+0, where N represents the ‘System’
of N capacity. Figure (d) below shows a basic, non-redundant UPS system.

Module Redundancy, N+1


Figure (e) shows a N+1 redundant UPS system. In this arrangement a single UPS system,
comprised of two parallel modules, connected to a common system output, powers all the load.
Each module is rated for N capacity and share the load. If the load remains at or below N, the
system is redundant. The ‘+1’ indicates there is one more module than needed to power the load.
However, if the load increases above N, the system will be non-redundant. If there was a third
module A3 installed, then the system would be N+2 redundant when the load was at or below N.
The system capacity of these types of systems is governed by the common system output bus
rating, not the sum of the module capacities. Site operators must manage the load and ensure the
total load does not exceed the redundant capacity of ‘N.’ N+1 or N+2 redundancy is sometimes
referred to as ‘module redundancy’, as opposed to ‘system redundancy’ described next.

System Redundancy, N+N or 2N


Figure (f) shows a typical redundant design for UPS systems that power IT equipment with dual
corded power supplies. In this arrangement there are two UPS systems, each system is comprised
of one module with N capacity. Under normal conditions, each system powers half the load.
Should one UPS system fail, the load will automatically transfer to the other UPS system via the IT
equipment dual corded power supplies. Site operators must manage the load and ensure that the
total load does not exceed the capacity ‘N’ of one system. Since UPS A and B are separate
systems, they share neither common output buses nor common controls and operate
independently of one another.

Redundant Systems with Redundant Modules, 2N+1


Higher levels of redundancy can be achieved by using two N+1 UPS systems as shown in figure
(g).This is sometimes referred to as 2N+1.Each system is comprised of three 1MW modules.If the
common bus is rated for 2MW, then each system would be considered N+1 redundant.If UPS
system A should fail, UPS B will assume all 2MW and still have module redundancy.This system
design can withstand a complete system failure and one module failure and still deliver full UPS
capacity.This design is very costly and mostly used by financial companies or the highest mission
critical installations.
:
Redundancy and UPS Utilization
In figure (g) the redundant capacity, N, is 2MW. The design uses six 1MW UPS modules and
connects them in a way to provide 2MW of redundant capacity. Here is where it gets a little
confusing. How do we distinguish between module utilization, system utilization and overall
utilization? System A’s redundant capacity is 2MW. Normally, it will carry a load of 1MW, when both
systems are operational. Therefore, the system utilization is 50%. The 50% of unused capacity is
waiting in reserve to support System B should it fail. System A consists of three 1MW modules and
the load is equally divided between them. Therefore, each module supports 333kW for a module
utilization of 33%. What about the overall utilization? Well, if System A is 50% utilized, System B is
50% utilized and they back each other up, the overall utilization is 100%. In other words, no more
load can be added to the redundant system even though there is more capacity on an individual
system (A or B) and their individual modules. If more load were to be added the system would not
meet its designed redundancy target.

The opposite can be said about systems that are underutilized. If each system was only carrying
25% of its redundant capacity, then a system could afford to have two modules fail and still be
redundant. This is a common occurrence for new systems where IT loads are prescribed over time
and why we often include modular expansion capabilities in our design.

This concept of utilization is very important because it has a direct impact on CAPEx and OPEx for
two reasons:

1. The higher the module utilization, the more efficient the modules will operate.
2. The most efficient use of capital are designs that have the highest module and system
utilizations.

Let’s see how this applies to the redundancy examples we introduced so far. Table 1 shows higher
levels of redundancy have lower utilizations for the four examples we considered so far.That’s
because more module capacity is waiting in reserve in case of failure.A non-redundant system
(N+0), provides only enough capacity to match the load with nothing in reserve and therefore has
the highest utilization but at the risk of dropping the load.The 2N+1 system provides the most
redundancy but least amount of module utilization.This added protection comes at a higher cost
of installation and operation.Remember, the 2N+1 system requires 6MW of total capacity to
provide 2MW of redundant capacity.
:
Distributed Redundancy
As the data center industry matured, designers and operators looked for more efficient ways to
deliver high redundant systems. The N+N system design has been the standard configurations
once dual cord power supplies were adopted. The design is simple, reliable and easy to manage.
The downside is low utilization and stranded capacity. Since many data center builds never reach
their full load potential, real utilizations can be closer to 30%.

To address this issue and lower the total cost of ownership, distributed redundancy has gained
favor in the industry. The concept is not new but different from the classical N+N designs. In fact,
our first distributed redundant design was installed and commissioned in 2007. The advantages of
distributed redundancy include higher capacity utilization with system level redundancy.
Maintaining system level redundancy means there are no single points of failure like non-
redundant or N+1 systems.

Figure (h) shows the normal configuration of a 3 to make 2, distributed redundant system. Three
systems are installed to deliver the capacity of two. Any one of the three systems can fail, and load
will seamlessly transfer to the other two systems as shown in figure (i).With this arrangement, each
system can be loaded to 2/3 capacity so that any two systems will not exceed 100% of their
capacity in a failure scenario. For distributed redundancy to work, downstream loads must be
properly managed. If not done so properly a failure of one UPS system may result in an overload
of one of the two remaining systems. Successful load management requires proper design,
monitoring and discipline to maintain distributed redundancy.This is slightly more complicated
than a classical N+N system where its obvious how loads will transfer.
:
The distributed redundant concept can be expanded to a 4 to make 3 arrangement shown in
figure (j).This is a popular colocation design.In this arrangement, four systems are installed to
deliver the capacity of three.Each system can be loaded up to 75% and when a system fails, its
load divides between the three remaining systems as shown in figure (k).
:
Table 2 shows how the two distributed redundant designs compare to the four other types of
redundancy examples previously discussed.With distributed redundancy, system level redundancy
can be achieved with fewer modules and higher utilizations which helps lower the total cost of
ownership.
:
Theoretically, more than four systems can be combined to create distributed redundancy. This will
further increase utilization, but we don’t recommend it. Going beyond 4 to make 3 systems
introduces additional complexities and will ultimately compromise the system reliability for
diminishing returns.

One final redundancy design worth mentioning is the catcher system shown in figure (l). In a
catcher system there is one redundant system capable of backing up all the rest. The redundant
system will be online and unloaded. When a system fails, all its load is transferred to the
redundant system, usually through static transfer switches. The redundant system can be oversized
to support more than one failed system at a time but typically they are sized to match the other
systems as shown in figure (l).

Redundancy comes in a variety of flavors. No one design fits all applications. The best choice
depends on the project requirements. Determining factors include a data center’s operational and
maintenance requirements, uptime targets, risk tolerance, and total cost of ownership. The
general concepts presented here can be extrapolated out into greater detail to include cost and
failure models but hopefully this overview helps provide a foundation for understanding the many
options available for your next data center build.
:

You might also like