You are on page 1of 19

HIGH AVAILABILITY, DATA

PROTECTION AND DATA INTEGRITY


IN THE XTREMIO ARCHITECTURE

ABSTRACT
A key function of an enterprise-class storage system is to host data in
a safe and reliable manner. The storage must provide continuous,
uninterrupted access to data, meet stringent performance
requirements, and deliver advanced functionality to streamline
operations and simplify data management.

XtremIO’s system architecture was designed from the ground up to


provide continuous availability. The array does not have a single point
of failure and is built with high levels of protection that allows data to
survive all but the most catastrophic events.

This white paper defines high availability, data protection and data
integrity, and examines how XtremIO’s unique hardware and software
design achieves the utmost in uptime and resiliency against failures.
The combination of a scale-out design with a service oriented and
modular software architecture allows XtremIO to operate as a unified
system with the ability to adapt independent modules in case of
unexpected hardware failures. This paper details the monitoring, the
redundancy levels, the integrity checks and the extreme flexibility in
the architecture to maintain system performance and data availability
by adjusting to failures.
XtremIO High Availability and Data Protection Architecture
Copyright © 2014 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change
without notice.

The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with
respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

EMC2, EMC, the EMC logo, and the RSA logo are registered trademarks or trademarks of EMC Corporation in the United States
and other countries. VMware is a registered trademark of VMware, Inc. in the United States and/or other jurisdictions. All other
trademarks used herein are the property of their respective owners. © Copyright 2014 EMC Corporation. All rights reserved.
Published in the USA. 02/14 White Paper H12914

XtremIO High Availability and Data Protection Architecture


TABLE OF CONTENTS

ABSTRACT ............................................................................................................................................. 1

TABLE OF CONTENTS ............................................................................................................................. 4

INTRODUCTION ..................................................................................................................................... 5

ABOUT HIGH AVAILABILITY .................................................................................................................. 6

DATA INTEGRITY ................................................................................................................................... 6

XTREMIO’S ARCHITECTURE ................................................................................................................... 7

HARDWARE ARCHITECTURE .................................................................................................................. 7

SOFTWARE ARCHITECTURE ................................................................................................................... 9


Infrastructure Modules......................................................................................................................... 9
System Wide Management (SYM) Module ......................................................................................... 9
Platform Manager Module .............................................................................................................. 10
I/O Modules ...................................................................................................................................... 10
Routing Module ............................................................................................................................ 10
Control Module ............................................................................................................................. 10
Data Module ................................................................................................................................ 10
I/O Flow ...................................................................................................................................... 11

SECURE DISTRIBUTED JOURNALING ................................................................................................... 11

INDEPENDENT SOFTWARE AND HARDWARE MODULES........................................................................ 13

CONNECTIVITY REDUNDANCY ............................................................................................................. 13

END-TO-END VERIFICATION................................................................................................................ 15
Hardware verification ......................................................................................................................... 15
Cryptographic data fingerprint ............................................................................................................. 15
Separate message path for data and cryptographic fingerprint................................................................. 15

FAULT AVOIDANCE, DETECTION AND CONTAINMENT .......................................................................... 16


New data written to new location ......................................................................................................... 16
Service Oriented Architecture .............................................................................................................. 16
Fault Detection .................................................................................................................................. 16
Fault Prevention................................................................................................................................. 16
Advanced Healing .............................................................................................................................. 16

NON DISRUPTIVE UPGRADES .............................................................................................................. 17


XtremIO OS (XIOS) ............................................................................................................................ 17
Component firmware and Kernel .......................................................................................................... 17

SYSTEM RECOVERABILITY ................................................................................................................... 17


Power down/up .................................................................................................................................. 17
Orderly Shutdown ........................................................................................................................ 17
Emergency Shutdown ................................................................................................................... 17
XMS Communications loss ................................................................................................................... 18
Communication lost between Storage Controllers ................................................................................... 18

CONCLUSION ....................................................................................................................................... 19

HOW TO LEARN MORE.......................................................................................................................... 19

CONTACT US ........................................................................................................................................ 19

XtremIO High Availability and Data Protection Architecture


INTRODUCTION
A major goal of every enterprise storage system is to host data in a safe and reliable manner. The storage must provide
continuous, uninterrupted access to data, meet stringent performance requirements, and deliver advanced functionality to
streamline operations and simplify data management. An enterprise storage system must provide the upmost resiliency and
have no single point of failure, while protecting data during nearly every imaginable failure. The system should also provide high
service levels in the face of component failures.

Even specialized storage systems are built on software and general-purpose computing components that can all fail. Some
failures may be immediately visible, such as a disk or SSD failure. Others can be subtle, such as not having enough memory
resources that results in performance issues. To ensure high availability and data integrity in such failures, the best storage
systems have an architecture that will maintain I/O flow as long as data protection is not at risk, and include various data
integrity checks that are generally optimized for system performance.

This white paper defines high availability, data protection and data integrity, and examines how XtremIO’s unique hardware and
software design achieves the utmost in uptime and resiliency against failures. The combination of a scale-out design with a
service oriented and modular software architecture allows multiple XtremIO X-Brick nodes to operate as a single system allows
XtremIO to operate as a unified system with the ability to adapt independent modules in case of unexpected hardware failures.
This paper details the monitoring, the redundancy levels, the integrity checks and the extreme flexibility in the architecture to
maintain system performance and data availability by adjusting to failures.

Audience

This white paper is intended for EMC customers, technical consultants, partners, and members of the EMC and partner
professional services community who are interested in learning more about XtremIO’s architecture to achieve High Availability
Data Integrity, and Data Protection.

XtremIO High Availability and Data Protection Architecture


ABOUT HIGH AVAILABILITY
High Availability is a system design approach that ensures service will be provided continuously and with the expected
performance. Users want their applications to be continuously available storage is required to achieve this goal. A good design
for high availability has enough redundancy to prevent any single point of failure from causing data unavailability, and may
provide enough redundancy to protect against multiple concurrent failures. Enterprise storage systems must also ensure that
failures, even unlikely ones, do not result in physical data loss or corruption. Thus, enterprise storage should have high levels of
failure detection, and react to either recover the failed component quickly and automatically, or return to a redundant and
balanced state by changing resource allocation. For instance, if two array controllers service data, and one of them fails, then
the system needs to reroute I/O through the remaining controller as quickly as possible.

There are two types of redundancy: passive redundancy and active redundancy. Passive redundancy provisions excess
components that are idle and are not operational unless the primary component fails. Two examples of passive redundancy in
enterprise storage are active/passive controller designs (where one controller serves I/O and the second controller does not
serve I/O unless the primary controller fails) and “hot spare” drives, which are designated spare drives in the system waiting to
be used upon failure of another drive. In general, a passive design wastes resources and cost by having additional hardware that
is rarely used but is part of the system. An active redundancy design maintains activity on all system components and ideally
balances all of them, getting the highest utilization of resources and the least impact upon any component failure. It is highly
desirable to have an active redundancy system.

While a system should always maintain availability upon any single failure, the best system designs do not lose data even during
dual simultaneous failures.

DATA INTEGRITY
The primary function of an enterprise storage array is to reliably store user data. When a host reads data, the storage system
must provide the correct data stored at the requested location. The accuracy of the data must be validated from the reception of
data by the storage system, through travel within the system, to the data being written to the back-end storage medium (end-
to-end verification). In order to verify that the data is correct upon a read, the system needs to create a fingerprint based upon
the stored data, and check the fingerprint when reading the data. The fingerprint ensures that the data has not changed at rest
or in flight. Ideally, the system should use independent locations for the data and its fingerprint. This reduces the probability of
any single component affecting the data and fingerprint in the same way, which could lead to a false indication that the data is
good while it is not. The worst thing a storage system can do is to provide corrupted data to the host while indicating that the
data is good.

XtremIO High Availability and Data Protection Architecture


XTREMIO’S ARCHITECTURE
HARDWARE ARCHITECTURE
XtremIO’s main building block is called an X-Brick. An X-Brick is a highly available, active/active building block. X-Bricks can be
clustered together to create a large scale-out system, which linearly grows in performance and capacity as more X-Bricks are
added. An X-Brick does not have any single point of failure, allowing XtremIO clusters to begin with just a single X-Brick. Each
X-Brick contains dual independent Storage Controllers and a 25 SSD array enclosure. The array enclosure has two SAS
controllers with dual connections to each Storage Controller. Each XtremIO system has dual battery backup units (BBU) to help
vault unwritten data to permanent storage in the event of a power failure. The entire XtremIO array is built using standard
components (e.g. x86 servers, standard form factor SSDs, and off-the-shelf interface cards) with no proprietary hardware of any
kind. This allows XtremIO to leverage best-of-breed high quality suppliers and benefit from general advances made by the
component suppliers.

Figure 1: Single X-Brick XtremIO system hardware

Each XtremIO Storage Controller and array enclosure has dual power supplies and each has power provided from two separate
power circuits (per XtremIO installation best practices).

Figure 2: Single XtremIO X-Brick hardware logical block diagram

XtremIO High Availability and Data Protection Architecture


In a system with more than one X-Brick there are dual Infiniband switches. The dual switches are required for redundancy and
are connected to two separate power lines. Each Storage Controller connects to both switches, and the switches are also
connected to each other for increased bandwidth and redundancy.

Figure 3: Dual X-Brick XtremIO hardware

Figure 4: Dual X-Brick XtremIO hardware logical block diagram

XtremIO High Availability and Data Protection Architecture


SOFTWARE ARCHITECTURE
In XtremIO’s architecture, any software component can run on any Storage Controller in the cluster. This capability allows
continuous operations despite hardware failures. The software components within XtremIO are called Modules and all XtremIO
modules run in User Space. Linux provides the underlying kernel, with a proprietary operating environment, XtremIO OS (XIOS),
providing scheduling, messaging and special utilities for the XtremIO Modules.

There are six main module types in the system and multiple instances of each can be running in the system independently.
Three module types are infrastructure modules and are responsible for system wide management, availability, and services for
other modules. The other three module types are I/O modules responsible for data services with the array and host
communication.

Infrastructure Modules
System Wide Management (SYM) Module
The System Wide Management module has a complete view of all the hardware and software components. It is responsible for
system availability and initiates any changes in system configuration to achieve maximum availability and redundancy. It makes
the decisions about which modules will execute on which Storage Controller, initiates failovers of data ownership from one
Storage Controller to another, and initiates rebuilds upon SSD failures. For redundancy purposes, multiple SYM modules are
running at the same time in the system, but at any single point in time only one is the active management entity and is the sole
entity to make system wide decisions. Should the component running the active SYM module fail, another SYM module quickly
becomes active and takes over.

Additional software logic, which runs on every Storage Controller, has the responsibility to verify that there is one, and only one
SYM active in the system. This simple process eliminates the possibility of not having any SYM module running.

Figure 5: XtremIO Storage Controller software block diagram

XtremIO High Availability and Data Protection Architecture


Platform Manager Module
Each Storage Controller has a single Platform Manager module running. The Platform Manager is responsible for all activity on
the Storage Controller. It monitors the Storage Controller’s health and communicates it to the SYM. It is responsible for verifying
all processes are running appropriately in the Storage Controller. Module shutdowns and restarts are executed by the Platform
Manager on behalf of the SYM module. The Platform Manager communicates hardware failures to the SYM module. The Platform
Manager also provides the facilities for replicating important data structures between Storage Controllers (journaling). It
replicates journal memories between Storage Controllers using Remote Direct Memory Access (RDMA) over the system’s
Infiniband fabric. The activity of journaling is critical for redundancy of user data and system metadata. The Platform Manager
initiates a shutdown of the Storage Controller upon discovery of loss of power and/or complete loss of communication to other
Storage Controllers.

I/O Modules
The I/O Modules are responsible for storing data from hosts and retrieving it upon request. They are each run on every Storage
Controller (although the XtremIO architecture is flexible enough to run different modules on different controllers, this is not done
today). As mentioned before, the assignment of which module runs on each controller is done by the SYM. Every I/O passes
through all three types of I/O modules (Routing, Control, and Data).

Routing Module
The Routing Module is the only entity in the system that communicates with the Host. It accepts SCSI commands from the Host
and parses them. It is stateless and simply translates the requests into volume and Logical Block Addresses (LBAs). It then
forwards the request to the appropriate Control Module (and Storage Controller) that manages those LBAs. The Routing Module
inherently balances load across the entire XtremIO clustered system. It runs a content-based fingerprinting function that results
in data being evenly distributed across all the X-Bricks in the system. Please refer to the Introduction to the XtremIO All-Flash
Array White Paper for a detailed explanation of this process.

Control Module
The Control Module is responsible for translating the Host user address (Logical Block Address) to an XtremIO internal mapping.
It acts as a virtualization layer between the Host SCSI Volume/LBA and XtremIO back-end deduplicated location. Having this
virtualization layer provides the ability to efficiently implement a range of rich data services. Data stored on XtremIO is content
addressable: its location in the array is determined according to its content, and not based on its address as in other storage
products. The LBAs of every volume on an XtremIO array are distributed among many Control Modules.

Data Module
The Data Module is responsible for storing data on the SSDs. It works as a service to the Control Module where the Control
Module provides a content fingerprint and the Data Module will write or read the data according to that fingerprint. There are
only three basic operations the Data Module executes: Read, Write or Erase a block. The goal is to keep the module as simple as
possible to maintain a robust and reliable system design. The Control Module does not need to worry about XtremIO Data
Protection (XDP) allocation. Centralizing the XDP scheme in the Data Module provides flexibility and efficiency in the system.

In the same manner that the Control Module evenly maps Host address to content fingerprint, the Data Module evenly maps
content fingerprint to physical location on SSD. This process guarantees that the data is balanced not only across all Storage
Controllers, but also across all SSDs in the array. This additional translation layer also allows the Data Module to place the data
optimally on the SSDs. Even in challenging scenarios like failed components, minimal free space, and frequent data overwrite,
XDP can find optimal locations to store data in the system. (To learn more about how XDP provides redundancy and flash-
optimized data placement, please see the EMC whitepaper titled “XtremIO Data Protection”.)

Restarting Modules
Since all modules run in user space, XIOS can quickly restart modules as needed. Any software failures or questionable behavior
in a module results in an automated module restart. The restarts are non-disruptive and generally undetectable at the user
level. This capability also serves as the foundation for Non-disruptive Upgrades (NDU).

XtremIO High Availability and Data Protection Architecture


I/O Flow
Host I/O is initially received at the Routing Module (R) that parses SCSI, calculates a fingerprint for the data and forwards the
I/O to an optimally selected Control Module (C) in the system. The Control Module does not have to be physically located be in
the same Storage Controller as the Routing Module. The Control Module will translate the host request to XtremIO’s internal data
management scheme and forwards data to the appropriate Data Module (D), which verifies the fingerprint and stores data on
SSD using XtremIO’s flash-optimized and highly redundant XDP protection scheme. Regardless of the size of the XtremIO
cluster, the I/O path always follows these exact same steps. Thus latency in the system remains consistent regardless of scale.

Infiniband

Figure 6: Example of Host Write I/O Flow

SECURE DISTRIBUTED JOURNALING


As with any enterprise storage design, the array is not only responsible for protecting data but also has its own metadata for
operation. It is paramount to protect the metadata and maintain coherency. XtremIO has developed a unique distributed
journaling mechanism to protect all the system’s important metadata and internal datasets.

A copy of all array metadata is stored in the Storage Controller memory. Updated metadata is synchronously replicated over
Infiniband RDMA in a distributed fashion to one or more physical Storage Controllers so that every real-time change is protected
in multiple locations. In a system with more than one X-Brick, the journal data on each node is protected by all other nodes
using a distributed replication process. The system-wide Management module manages the journal replication relationships
between Storage Controllers in the cluster. For resiliency reasons, a Storage Controller that is on backup battery power cannot
be a target for replicated journal data. If, for any reason, the Storage Controller cannot write the journal data to a separate
Storage Controller then it will write it locally to its SSDs as a final fail-safe. If a controller fails, the replicated journal is used to
rebuild the lost contents from the failed controller. All journal contents are periodically de-staged to SSD non-volatile storage.
In the event of a power loss, the system’s battery backup units allow this de-staging to take place and for the system to
complete an orderly shutdown. Certain highly critical metadata is de-staged and stored using triple replication, while other less-
critical metadata (that can be recovered in other ways) is stored using the same XDP scheme as user data.

XtremIO High Availability and Data Protection Architecture


This provides the capability to recover the system even in the unlikely catastrophic event that communication between all
Storage Controllers is lost. Each Storage Controller becomes a self-sustained metadata protector and can be brought up and
reconnected to the system once communications is restored. Due to the importance of journaling, the journal mechanism code is
completely separate from any other software module. It is a standalone, simple software module designed to be highly resilient.

Figure 7: Example of Secure Distributed Journaling

XtremIO High Availability and Data Protection Architecture


INDEPENDENT SOFTWARE AND HARDWARE MODULES
XtremIO’s flexible architecture allows any software component to run on any Storage Controller in the system. Having this
flexibility provides the utmost availability and resiliency, while maintaining optimal performance. Any change in the hardware
configuration dynamically changes the number of active software modules. It guarantees that all available resources are being
optimally used by the system. For instance, a system comprised of four Storage Controllers will have double the throughput and
IOPS of a system with two Storage Controllers. Another example is that the main management module (SYM) runs on one
Storage Controller. Upon hardware failure it will activate and run on a different Storage Controller without any user intervention.
When the failed hardware is replaced, the system will quickly come back to optimal availability and performance automatically.
Another factor that allows XtremIO to shift resources around is the fact that all of the software modules are loosely coupled.
There is neither affinity between the software and a specific hardware server, nor between specific software instances. A Data
Module can receive requests from any Control Module and will respond to it as a transaction. There is no need to remember a
transaction and the next transaction is separate. This architecture is similar to Service Oriented Architecture (SOA).

CONNECTIVITY REDUNDANCY
XtremIO’s connectivity maintains communications redundancy to every system component (see Tables 1 and 2).

Not only does every component have at least two paths for communication, the management communication is on a separate
network from the data flow. Host I/O is done via Fibre Channel or iSCSI ports, while management of the system is done via
dedicated Ethernet management ports on each Storage Controller. Such a design allows separation of control from the I/O path.
Monitoring on a different network gives the ability to correlate events and system health independent of load or I/O behavior.

Table 1: XtremIO connectivity redundancy

Redundancy Comment/Best Practice

Each Storage Controller has two Fibre


Channel ports Connect each port to separate SAN switch

Each Storage Controller has two iSCSI Connect each port to separate SAN switch
ports

Each Storage Controller has two Each port connects to an independent Infiniband
Infiniband ports fabric to provide fault tolerance against Infiniband
component failures

Two Infiniband switches (when more than Each switch connects to every system Storage
one X-Brick in system) Controller and protects against Infiniband switch
failure.

Two Infiniband interconnect cables (with Redundant Infiniband paths between the two Storage
one X-Brick in a system) Controllers

Each array enclosure (DAE) has two SAS Failure of a SAS controller module does not result in
controller modules loss of connectivity between the DAE and X-Brick
Storage Controllers.

Each array enclosure (DAE) SAS controller Redundant SAS paths ensure that SAS port or SAS
module utilizes two SAS cables cable failures do not cause service loss

XtremIO High Availability and Data Protection Architecture


Table 2: XtremIO failure service impact

Failure Action Service Impact

Fibre Channel or iSCSI port Host multi-pathing software will No effect


use remaining ports

Infiniband port System uses remaining No effect


Infiniband port for data transfers
to/from Storage Controllers

Infiniband switch System uses remaining No effect


Infiniband switch for internal
data transfers

Storage Controller failure Storage Controller partner in No service loss. Some


same X-Brick takes over performance loss since less
responsibility for all of data in overall I/O processing
disk enclosure. capability remains active.

Ethernet failure XMS cannot communicate with No effect on I/O or


Storage Controller performance. System data
path communications remain
online via Infiniband. Array
cannot be configured or
monitored until connectivity
is restored.

Storage Controller, DAE, System notifies administrator of No effect. Dual power


Infiniband switch power supply failure. Replacement power supplies allow component to
supply can be installed without stay online.
service impact.

Loss of power on one circuit System notifies administrator of No effect.


failure. System remains
operational on second,
redundant circuit.

Loss of power on both circuits System performs de-stage of No service until power is
RAM to non-volatile storage and restored.
performs an orderly shutdown

Failure of SSD System notifies administrator. Some performance loss until


Automated SSD rebuild occurs. rebuild completes depending
on the fullness and utilization
of the array. Less full and
less busy arrays exhibit less
performance impact and
faster rebuilds.

XtremIO High Availability and Data Protection Architecture


END-TO-END VERIFICATION
Hardware Verification
An important aspect of a storage system is to have verification at every step of the data path. The different hardware data
protection verification mechanisms are shown in the table below. On data transfers between components a CRC is generated by
the sending hardware and is verified by the receiver. For any data at rest (in memory and on SSD), error-correcting code (ECC)
with a CRC is generated upon write to memory and is verified upon read.

Table 3: XtremIO Hardware Data Protections Verification Mechanisms

Hardware Component Verification Type

Data Transfers – Fibre Channel, Hardware-based CRC


Ethernet, Infiniband, PCIe, SAS

Data at Rest in Memory DRAM ECC

Data at Rest on SSD SSD ECC, SSD CRC, XtremIO XDP

XtremIO uses standard x86 servers, interface cards, Infiniband components, and eMLC SSDs. These components all include very
mature and robust hardware verification steps. EMC XtremIO avoids custom hardware modules in the array: Custom hardware
requires substantial engineering work to achieve the same level of resiliency that is readily available in standard enterprise-
proven components.

Cryptographic Data Fingerprint


More importantly, in addition to each component in the data path having its own data verification mechanism, XtremIO employs
an independent data check beyond the design of other storage systems. Upon receipt of an I/O from a host, the XtremIO
Routing Module (“R Module”) computes a unique cryptographic data fingerprint based on the contents received from the host.
The cryptographic fingerprint is unique and can only be correlated to a specific 4KB data pattern. This cryptographic fingerprint
is leveraged by the array’s content-based data placement algorithms, as well as by the inline deduplication process. The entire
library of fingerprints is maintained in the Storage Controllers’ memory. Each and every time data is read by a host, the
cryptographic data fingerprint is recalculated from the outbound data and compared with the original fingerprint. This
guarantees that the original information received from the host is stored safely on SSDs, was not changed inadvertently, and is
properly delivered back to the host upon request.

Separate Message Paths


Moreover, the calculated fingerprint information travels to the Data module in a separate message and along a different path
than the data itself. This ensures that no component on the way can corrupt the data and fingerprint in the same manner and
cause undetected data corruption. In short, a fingerprint is calculated at data entry to system and recalculated and compared on
every read from SSD and upon data transfer to host. That fingerprint travels within the XtremIO system separately from the
data itself, providing independent checking.

XtremIO High Availability and Data Protection Architecture


FAULT AVOIDANCE, DETECTION, AND CONTAINMENT

Pattern Independent Writes


The XtremIO array performs inline deduplication and stores blocks according to their content fingerprint. When data in a
particular logical block address (LBA) changes, the new data will have a different content fingerprint and will be written to a new
location on SSD. This isolates any incorrect overwrite and allows data to be recovered by simply reading the previous location.

Service Oriented Architecture


XtremIO avoids cascading failure scenarios such as may occur on shared memory systems. XtremIO is built from different
services that communicate with each other. Each service has its own data structures and if there is a fault in a service or data it
is contained to that service. Storage systems with large shared memory and data structures are inherently more vulnerable to
software errors, and need to expend more resources trying to prevent cascading failures. XtremIO leveraged Service Oriented
Architecture from the start to build a more robust, scalable system than practical with the large monolithic architectures
previously used for high performance storage systems.

Fault Detection
The System Wide Management Module (SYM) continuously monitors and detects hardware and software faults in the system. It
continuously monitors Storage Controllers, Disk Array Enclosures (DAE), Fibre Channel HBAs, Ethernet NICs, Infiniband HCAs,
Infiniband Switches, and Battery Backup Units. The SYM also continuously monitors the SCSI driver, HBA controller drivers,
Linux kernel, and battery communication software components.

Every component and every data path used in the system has its own error detection method (see “End-to-End Verification”
earlier in this document). For instance, the eMLC SSDs XtremIO have an LBA-seeded 32-bit CRC for ECC mis-correct detection
and on-the-fly correction. The SSD also has 22-bit correction for each and every 512-byte sector and hardware based RAID-5
within the SSD itself to protect against internal flash module failures. This is separate and in addition to XtremIO’s XDP
technology and adds orders of magnitude greater resiliency than typical in consumer MLC (cMLC) SSDs.

Fault Prevention
In any system, hardware components may have occasional faults, thus it is important to isolate defective areas and refrain from
using them. XtremIO can isolate Storage Controllers, communications ports, SSDs and can even isolate portions of SSDs. For
example, if SSD sectors suffer uncorrectable corruption for any reason, XtremIO will refrain from using the affected addresses.
This is an added level of logical prevention on top of the SSD flash hardware controller that isolates defective flash components.

Advanced Healing
As previously mentioned, the SYM will automatically restart software components upon failure and can also reallocate software
to different Storage Controllers upon hardware failures. For instance, if the SYM recognizes that the service to accept I/O from
hosts (the “R Module”) is not running, it will restart it automatically. This capability ensures utmost availability and optimized
service levels at all times.

The XtremIO array identifies unexpected data differences due to the fingerprint check upon reading from the SSD. Upon
detection of such an inconsistency, XtremIO automatically rebuilds the missing data from all possible sources. This can be as
simple as rereading the data from the SSD in case the issue is transient. If not able to read the data (or if the re-read also
produces incorrect results) the array will rebuild the data from the other SSDs in the XDP redundancy group. As explained in the
XtremIO Flash Specific Data Protection White Paper, XtremIO is able to rebuild the information even when two SSDs or data
stripes are inaccessible in the same redundancy group.

The journaling and metadata in the system are critical for recovery from catastrophic events. Due to the importance of such
datasets the journals are protected by CRC for every written block and there are three mirrored copies of the metadata. Having
three copies not only provides extra redundancy, but also gives a majority vote scenario. For instance, in case two metadata

XtremIO High Availability and Data Protection Architecture


sectors are different but the different CRC is logically correct for both sectors, then the data will be compared and corrected
according to a third stored copy.

NON DISRUPTIVE UPGRADES


XtremIO is built to be continuously available. Occasionally new firmware code will be provided in order to add functionality,
improve existing functionality or performance, or fix known issues. There are two types of upgrades in the system: XtremIO
Operating System (XIOS) upgrades and individual component firmware and Linux kernel upgrades. Either upgrade type will be
completely online to the Host without any downtime.

XtremIO OS (XIOS)
XtremIO system updates are usually limited to XIOS and only modify the executable code that runs in user space. XIOS code is
upgraded by loading the new code into resident memory on the individual Storage Controllers and instantaneously flipping all
the Storage Controllers to run the new code. There is no impact to the host application and the system is completely available
during all this time.

Component Firmware and Linux Kernel


Individual hardware components can be upgraded one at a time. For instance, a Fibre Channel HBA can be upgraded by having
it offline to the host, upgrading the firmware and brining it back online to the host. Once that Fibre Channel HBA is online, the
system can upgrade the next Fibre Channel HBA. By leveraging Host multi-pathing according to XtremIO best practices, there is
neither downtime nor unavailability to the host. The same is true for any other firmware upgrade, be it SAS controller,
Infiniband, SSD firmware, etc. In rare cases the Linux kernel of the Storage Controllers may need to be upgraded. This upgrade
is done in the same fashion as firmware. The Storage Controllers are individually upgraded one at a time with no impact to
availability.

SYSTEM RECOVERABILITY
XtremIO shutdown and power up processes are risk-free due to the simple design of the system. There are two modes of
shutdown: Orderly and Emergency.

Orderly Shutdown
Orderly shutdowns are a graceful process, initiated as part of external power loss or upon user request. The battery backup units
hold enough power to have two complete orderly shutdowns. When the system comes up it checks that the batteries have at
least enough power in order to provide a complete graceful shutdown upon loss of power.

In an Orderly Shutdown all the Storage Controllers in the system coordinate the shutdown process (via the System Wide
Management module). The system will stop accepting new I/O requests and coordinate stoppage of all journaling activity, write
data held in memory to the SSDs, and halt the system.

Emergency Shutdown
In the case of a catastrophic event where system communication is inadequate for any reason, each Storage Controller has the
capability to shut itself down and persistently maintain consistent data until power is restored. Each Storage Controller has two
local vault drives mirrored in a RAID-1 configuration. Upon loss of communications or power, the Storage Controller will save
two copies of all user data and system metadata. All the local memory and journal is dumped to the vault drives. When power
returns and communications get restored, the Storage Controller will reconcile its journal information with the rest of the
system.

XtremIO High Availability and Data Protection Architecture


XMS Communications Loss
The XtremIO Management System (XMS) is the application users interact with to manage the XtremIO system. It provides the
graphical, command line, and programmatic interfaces used to configure, provision, and monitor the system. The system is
managed via the dedicated Ethernet management ports. However the system remains functional even in the case when
communication to the XMS is lost. The internal System Wide Management Module (SYM) is part of the array and is running on
the Storage Controllers. I/O will continuously be served, hardware will still be monitored, any failed SSD will initiate a rebuild
and return to complete redundancy, etc. The only activities that will be stopped are the user initiated activities and monitoring
(e.g. volume creation), but there is no impact to host I/O or customer applications.

Communication Lost Between Storage Controllers


XtremIO Storage Controllers work in a loosely coupled architecture. Each works as a service to the other Storage Controllers,
orchestrated by the System Wide Management module. However, each Storage Controller is an independent entity with the
ability to protect the data it stores and maintain consistency when communication to other Storage Controllers is lost. Upon loss
of communication the Storage Controller will write all metadata and journal information to the two locally redundant vault drives
and halt its services. This is similar to an emergency shutdown procedure. When communication is resumed and the Storage
Controller becomes part of the system again, it will reconcile the journal data and once again become a resource in the system.
The System Wide Management module will the reintegrate it back into the system and use it as an I/O, caching and data facility.

XtremIO High Availability and Data Protection Architecture


CONCLUSION
The XtremIO system’s hardware and software design represents a leap forward in storage array technology. Multiple tiers
orchestrated together achieve system high availability, data integrity, and data protection across all components in the system.
The hardware provides redundancy for every host connection and path to data at rest. The XIOS operating environment provides
robust protection throughout the array’s software stack through fingerprint generation on data entry, separate paths for data
and metadata, and a modular software design built in a service-oriented architecture. The different software modules run
independently but act as a unified system. The overall system management is coherent, redundant, and can instantiate software
modules on different hardware components. XtremIO achieves high availability, data integrity and data protection using:

 Hardware redundancy for every component

 Unique content fingerprinting as data is written

 Separate path from system entry to SSD for user data and its accompanying fingerprint

 Secured journaling protecting against unexpected system shutdown, component failures, or communication failures

 Loosely coupled software modules working together in a service oriented architecture

 Centralized redundant management

 N+2 redundancy against SSD failures

 Non disruptive system software upgrades

HOW TO LEARN MORE


For a detailed presentation explaining XtremIO’s storage array capabilities and how it substantially improves performance,
operational efficiency, ease-of-use, and total cost of ownership, please contact XtremIO at XtremIOinfo@emc.com. We will
schedule a private briefing in person or via web meeting. XtremIO has benefits in many environments, but is particularly
effective for virtual server, virtual desktop, and database applications.

CONTACT US
To learn more about how
EMC products, services, and
solutions can help solve your EMC2, EMC, the EMC logo, XtremIO and the XtremIO logo are registered trademarks or
business and IT challenges, trademarks of EMC Corporation in the United States and other countries. VMware is a registered
trademark of VMware, Inc., in the United States and other jurisdictions. © Copyright 2014 EMC
contact your local
Corporation. All rights reserved. Published in the USA. 02/14 EMC White Paper H12914
representative or authorized
reseller—or visit us at EMC believes the information in this document is accurate as of its publication date.
www.EMC.com. The information is subject to change without notice.

19 FLASH IMPLICATIONS IN ENTERPRISE STORAGE ARRAY DESIGNS

You might also like