Professional Documents
Culture Documents
Information Storage and Management (ISM) v4
Information Storage and Management (ISM) v4
STORAGE AND
MANAGEMENT (ISM) V4
Revision [1.0]
PARTICIPANT GUIDE
PARTICIPANT GUIDE
Dell Confidential and Proprietary
Copyright © 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies,
Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be trademarks of their respective owners.
Course Introduction.................................................................................. 1
Summary................................................................................................................... 25
Summary................................................................................................................... 86
Summary................................................................................................................. 178
Summary................................................................................................................. 239
Summary................................................................................................................. 283
Summary................................................................................................................. 344
Summary................................................................................................................. 409
Summary................................................................................................................. 466
Summary................................................................................................................. 506
Summary................................................................................................................. 561
Summary................................................................................................................. 674
Summary................................................................................................................. 736
Summary................................................................................................................. 794
Introduction
Prerequisite Skills
Course Agenda
Introductions
Introduction
This module presents digital data, types of digital data, and information. This
module also focuses on data center characteristics and technologies driving digital
transformation.
Notes
The data in the digital universe comes from diverse sources, including both
individuals and organizations. Individuals constantly generate and consume
information through numerous activities, such as web searches, emails, uploading
and downloading content and sharing media files. In organizations, the volume and
importance of information for business operations continue to grow at astounding
rates. Technologies driving digital transformation including Internet of Things (IoT)
have significantly contributed to the growth of the digital universe.
In the past, individuals created most of the data in the world. Now IDC predicts
organizations will create 60 percent of world’s data through applications relying on
machine learning, automation, machine-to-machine technologies, and embedded
devices.
Notes
Digital Data
Video
Laptop
Text
Photos
Desktop
Notes
Digital data is stored as strings of binary values on a storage medium. This storage
medium is either internal or external to the devices generating or accessing the
data. The storage devices may be of different types, such as magnetic, optical, or
SSD. Examples of digital data are electronic documents, text files, emails, ebooks,
digital images, digital audio, and digital video.
Unstructured
Quasi-Structured
Semi-Structured
Structured
Database
Notes
Based on how it is stored and managed, digital data can be broadly categorized
into structured, semi-structured, quasi-structured, and unstructured.
Structured data is organized in fixed fields within a record or file. To structure
the data, you require a data model. A data model specifies the format for
organizing data, and also specifies how different data elements are related to
each other. For example, in a relational database, data is organized in rows and
columns within named tables.
Semi-structured data does not have a formal data model but has an apparent,
self-describing pattern and structure that enable its analysis. Examples of semi-
structured data include spreadsheets that have a row and column structure, and
XML files that are defined by an XML schema.
Quasi-structured data consists of textual data with erratic data formats, and
can be formatted with effort, software tools, and time. An example of quasi-
structured data is a “clickstream” that includes data about which webpages a
user visited and in what order – which is the result of the successive mouse
clicks the user made. A clickstream shows when a user entered a website, the
pages viewed, the time that is spent on each page, and when the user exited.
Unstructured data does not have a data model and is not organized in any
particular format. Some examples of unstructured data include text documents,
PDF files, emails, presentations, images, and videos.
The majority, which is more than 90 percent of the data that is generated in the
digital universe today is non-structured data (semi-, quasi-, and unstructured).
Although the illustration shows four different and separate types of data, in reality a
mixture of these data is typically generated.
What is Information?
Definition: Information
Processed data that is presented in a specific content to enable useful
interpretation and decision-making.
Notes
The terms “data” and “information” are closely related and you can use these two
terms interchangeably. However, it is important to understand the difference
between the two. Data, by itself, is simply a collection of facts that requires
processing for it to be useful. For example, annual sales figures of an organization
is data. When data is processed and in a specific context, it can be interpreted in a
useful manner. This processed and organized data is called information.
For example, when you process the annual sales data into a sales report, it
provides useful information, such as the average sales for a product (indicating
product demand and popularity), and a comparison of the actual sales to the
projected sales.
managing, analyzing, and deriving value from unstructured data coming from
numerous sources.
Information Storage
Notes
The compute systems that run business applications are provided storage capacity
from storage systems. Storage systems are covered in Module, ‘Intelligent Storage
Systems (ISS)’. Organizations typically house their IT infrastructure, including
compute systems, storage systems, and network equipment within a data center.
Data Center
Notes
Digital transformation is disrupting every industry, and with the evolution of modern
technologies, organizations are facing too many business challenges.
Organizations must operate in real time, develop smarter products, and deliver a
great user experience. They must be agile, operate efficiently, and make decisions
quickly to be successful. However, these disruptive technologies along with agile
methodologies are less resilient on traditional IT infrastructure and services.
Organization’s IT department also faces several challenges in supporting business
challenges. So, organizations are moving towards modern data center to overcome
the business challenges and be successful in their digital transformation journey.
Data centers are designed and built to fulfill the key characteristics as shown in the
figure. Although the characteristics are applicable to almost all data center
components, the details here primarily focus on storage systems.
Availability
Manageability
Performance Capacity
Scalability
Notes
Digital Transformation
Notes
Digital transformation is imperative for all businesses. Businesses of all shapes and
sizes are changing to a more digital mindset. This digital mindset is being driven by
the need to innovate more quickly. Digital transformation puts technology at the
heart of an organization’s products, services, and operations.
In this digital world, organizations need to develop new applications using agile
processes and new tools to assure rapid time-to-market. Simultaneously, the
organizations still expect IT to operate and manage the traditional applications
which provide much revenue.
Assessment
C. Database tableTBF
D. Webserver log
Summary
Introduction
This module presents an overview on the modern technologies that are driving
digital transformation in today’s world. The modern technologies covered in this
lesson include cloud computing, big data analytics, Internet of Things (IoT), and
machine learning.
Introduction
This lesson presents an overview of cloud computing along with its essential
characteristics, various cloud deployment and service models, and uses cases.
Cloud Computing
Cloud Infrastructure
VM VM
APP APP
Desktop
OS OS
Applications
Hypervisor
LAN/WAN
Laptop
Compute Network Storage Applications Platform Software
Tablet and
Mobile
Notes
The term “cloud” originates from the cloud-like bubble that is commonly used in
technical architecture diagrams to represent a system. This system may be the
Internet, a network, or a compute cluster. In cloud computing, a cloud is a collection
of IT resources, including hardware and software resources. You can deploy these
resources either in a single data center, or across multiple geographically
dispersed data centers that are connected over a network.
A cloud service provider is responsible for building, operating, and managing cloud
infrastructure. The cloud computing model enables consumers to hire IT resources
as a service from a provider. A cloud service is a combination of hardware and
software resources that are offered for consumption by a provider. The cloud
infrastructure contains IT resource pools, from which you can provision resources
to consumers as services over a network, such as the Internet or an intranet.
Resources are returned to the pool when the consumer releases them.
Example: The cloud model is similar to utility services such as electricity, water,
and telephone. When consumers use these utilities, they are typically unaware of
how the utilities are generated or distributed. The consumers periodically pay for
the utilities based on usage. Similarly, in cloud computing, the cloud is an
abstraction of an IT infrastructure. Consumers hire IT resources as services from
the cloud without the risks and costs that are associated with owning the resources.
Cloud services are accessed from different types of client devices over wired and
wireless network connections. Consumers pay only for the services that they use,
either based on a subscription or based on resource consumption.
In SP 800-145, NIST specifies that a cloud infrastructure should have the five
essential characteristics.
Measured Service
Resource Pooling
On-demand Self-
service
Broad Network
Access
Notes
resources but may be able to specify location at a higher level of abstraction (for
example, country, state, or datacenter). Examples of resources include storage,
processing, memory, and network bandwidth.” – NIST
Rapid Elasticity: “Capabilities can be rapidly and elastically provisioned, in
some cases automatically, to scale rapidly outward and inward commensurate
with demand. To the consumer, the capabilities available for provisioning often
appear to be unlimited and can be appropriated in any quantity at any time.” –
NIST
On-demand Self-service: “A consumer can unilaterally provision computing
capabilities, such as server time or networked storage, as needed automatically
without requiring human interaction with each service provider.” – NIST
Broad Network Access: “Capabilities are available over the network and
accessed through standard mechanisms that promote use by heterogeneous
thin or thick client platforms (for example, mobile phones, tablets, laptops, and
workstations).” – NIST
Infrastructure as a Service
Software as a Service
A cloud service model specifies the services and the capabilities that are
provided to consumers
In SP 800-145, NIST classifies cloud service offerings into the three primary
models:
Notes
service level agreement (SLA). SLA is a legal document that describes items such
as what service level will be provided, how it will be supported, service location,
and the responsibilities of the consumer and the provider.
Many alternate cloud service models based on IaaS, PaaS, and SaaS are defined
in various publications and by different industry groups. These service models are
specific to the cloud services and capabilities that are provided. Examples of such
service models include Network as a Service (NaaS), Database as a Service
(DBaaS), Big Data as a Service (BDaaS), Security as a Service (SECaaS), and
Disaster Recovery as a Service (DRaaS). However, these models eventually
belong to one of the three primary cloud service models.
Examples:
Application
Database
- Amazon EC2, S3
Consumer's Resources
Programming Framework
- Virtustream
Operating System
- Google Compute Engine
Cloud Infrastructure
Compute
Provider's Resources
Storage
Network
In the PaaS model, a cloud service includes compute, storage, and network
resources along with platform software
Platform software includes software such as:
Operating system, database, programming frameworks, middleware
Tools to develop, test, deploy, and manage applications
Most PaaS offerings support multiple operating systems and programming
frameworks for application development and deployment
Typically you can calculate PaaS usage fees based on the following factors:
Number of consumers
Types of consumers (developer, tester, and so on)
The time for which the platform is in use
The compute, storage, or network resources that the platform consumes
Consumer's Resources
Application
Database
Programming Framework
Operating System
Provider's Resources Cloud
Infrastructure
Compute
Storage
Network
Examples:
- Microsoft Azure
Application
Database
Programming Framework
Provider's Resources
Operating System
Cloud Infrastructure
Compute
Storage
Network
Examples:
- Salesforce
- Google Apps
- Oracle
A cloud deployment model provides a basis for how cloud infrastructure is built,
managed, and accessed
In SP 800 to 145, NIST specifies the four primary cloud deployment models
listed in the figure
Each cloud deployment model may be used for any of the cloud service models:
IaaS, PaaS, and SaaS
The different deployment models present several tradeoffs in terms of control,
scale, cost, and availability of resources
Public Cloud
Private Cloud
Community Cloud
Public Cloud
Network availability
Risks associated with multitenancy
Visibility
Control over the cloud resources and data
Restrictive default service levels.
Enterprise P Enterprise Q
Resources of
Cloud Provider
VM VM
Individual R
Hypervisor
Applications
Private Cloud
Many organizations may not want to adopt public clouds due to concerns
related to privacy, external threats, and lack of control over the IT resources and
data
When compared to a public cloud, a private cloud offers organizations a
greater degree of privacy and control over the cloud infrastructure,
applications, and data
There are two variants of private cloud: on-premise and externally hosted
An organization deploys on-premise private cloud in its data center within its
own premises
Enterprise P
Resources of Cloud
Provider
Enterprise P
Resources of
Enterprise P
Dedicated for
Enterprise P
Notes
In the externally hosted private cloud (or off-premise private cloud) model:
An organization outsources the implementation of the private cloud to an
external cloud service provider
The cloud infrastructure is hosted on the premises of the provider and multiple
tenants may share
Community Cloud
The organizations participating in the community cloud typically share the cost
of deploying the cloud and offering cloud services
This enables them to lower their individual investments
Since the costs are shared by a fewer consumer than in a public cloud, this
option may be more expensive
However, a community cloud may offer a higher level of control and protection
than a public cloud
There are two variants of a community cloud: on-premise and externally hosted
Community
Users
Resources of Cloud
Provider
Dedicated for
Community
Hybrid Cloud
Enterprise Q
Public
Enterprise P
Private
Individual R
To create the best possible solution for their businesses, today organizations
want to choose different public cloud service providers
To achieve this goal, some organizations have started adopting a multicloud
approach
Public Cloud
Private Cloud
Public Cloud
Notes
The drivers for adopting this approach include avoiding vendor lock-in, data control,
cost savings, and performance optimization. This approach helps to meet the
business demands since, sometimes no single cloud model can suit the varied
requirements and workloads across an organization. Some application workloads
run better on one cloud platform while other workloads achieve higher performance
and lower cost on another platform.
various cloud service providers can be selected. Each cloud vendor offers different
service options at different prices.
Big Data Analytics Using cloud to analyze the voluminous data to gain
insights and for deriving business value
Disaster Recovery Adopting cloud for a DR solution can provide cost benefit,
scalability and faster recovery of data
Introduction
This lesson presents an overview of Big Data along with its characteristics, data
repositories, components of big data analytics solution, and uses cases.
Characteristics of Data
Business Value
Big Data
Big Data:
Represents the information assets whose high volume, high velocity, and high
variety require the use of new technical architectures and analytical methods to
gain insights and for deriving business value.
Many organizations such as government departments, retail,
telecommunications, healthcare, social networks, banks, and insurance
companies employ data science techniques to benefit from Big Data analytics.
The definition of Big Data has three principal aspects, which are:
Characteristics of Data
Big Data includes data sets of considerable sizes containing both structured and
non-structured digital data. Apart from its size, the data gets generated and
changes rapidly, and also comes from diverse sources. These and other
characteristics are covered next.
Big Data also exceeds the storage and processing capability of conventional IT
infrastructure and software systems. It not only needs a highly-scalable architecture
for efficient storage, but also requires new and innovative technologies and
methods for processing.
Business Value
Notes
Volume: The word “Big” in Big Data refers to the massive volumes of data.
Organizations are witnessing an ever-increasing growth in data of all types.
These types include transaction-based data that is stored over the years,
sensor data, and unstructured data streaming in from social media. The volume
of data has already reached Petabyte and Exabyte scales, and it is still growing
everyday. The excessive volume not only requires substantial cost-effective
storage, but also rises challenges in data analysis.
Velocity: Velocity refers to the rate at which data is produced and changes, and
also how fast the data must be processed to meet business requirements.
Today, data is generated at an exceptional speed, and real-time or near-real
time analysis of the data is a challenge for many organizations. It is essential to
process and analyze the data, and to deliver the results in a timely manner. An
example of such a requirement is real-time face recognition for screening
passengers at airports.
Variety: Variety refers to the diversity in the formats and types of data. There
are numerous sources that generate data in various structured and unstructured
forms. Organizations face the challenge of managing, merging, and analyzing
Data Repositories
Notes
Data for analytics typically comes from repositories such as enterprise data
warehouses and data lakes.
A data lake is a collection of structured and unstructured data assets that are
stored as exact or near-exact copies of the source formats. The data lake
architecture is a “store-everything” approach to Big Data. Unlike conventional data
warehouses, you do not classify the data when it is stored in the repository, as the
value of the data may not be clear at the outset. The data is also not arranged as
per a specific schema and is stored using an object-based storage architecture. As
a result, data preparation is eliminated and a data lake is less structured compared
to a data warehouse. Data is classified, organized, or analyzed only when it is
accessed. When a business need arises, the data lake is queried, and the resultant
subset of data is then analyzed to provide a solution. The purpose of a data lake is
to present an unrefined view of data to highly skilled analysts. Also to enable them
to implement their own data refinement and analysis techniques.
The technology layers in a Big Data analytics solution include storage plus
MapReduce and query technologies
These components are collectively called the ‘SMAQ stack’
SMAQ solutions may be implemented as a combination of multi-component
systems
May also be offered as a product with a self-contained system comprising
storage, MapReduce, and query – all in one
Notes
The technology layers in a Big Data analytics solution include storage, MapReduce
technologies, and query technologies. These components are collectively called
the ‘SMAQ stack’.
Storage
Storage systems consist of multiple nodes that are collectively called a “cluster”
Based on distributed file systems
Each node has processing capability and storage capacity
Highly scalable architecture
You may implement a NoSQL database on top of the distributed file system
Notes
The distributed file system like HDFS typically provides only an interface similar to
that of regular file systems. Unlike a database, they can only store and retrieve
data and not index it, which is essential for fast data retrieval. To mitigate this
challenge and gain the advantages of a database system, SMAQ solutions may
implement a NoSQL database on top of the distributed file system. NoSQL
databases may have built-in MapReduce features that enable processing to be
parallelized over their data stores. In many applications, the primary source of data
is in a relational database. Therefore, SMAQ solutions may also support the
interfacing of MapReduce with relational database systems.
MapReduce fetches datasets and stores the results of the computation in storage.
The data must be available in a distributed fashion, to serve each processing node.
The design and features of the storage layer are important not just because of the
interface with MapReduce, but also because they affect the ease with which data
can be loaded and the results of computation extracted and searched.
MapReduce
MapReduce is the driving force behind most Big Data processing solutions
A parallel programming framework for processing large datasets on a
compute cluster
The key innovation of MapReduce is the ability to take a query over a dataset,
divide it, and run it in parallel over multiple compute systems or nodes
This distribution solves the issue of processing data that is too large for a
single machine to process
Notes
MapReduce is the driving force behind most Big Data processing solutions. It is a
parallel programming framework for processing large datasets on a compute
cluster. The key innovation of MapReduce is the ability to take a query over a
dataset, divide it, and run it in parallel over multiple compute systems or nodes.
This distribution solves the issue of processing data that is too large for a single
machine to process.
MapReduce works in two phases namely ‘Map’ and ‘Reduce’ as the name
suggests. An input dataset is split into independent chunks which are distributed to
multiple compute systems. The Map function processes the chunks in a parallel
manner, and transforms them into multiple smaller intermediate datasets. The
Reduce function condenses the intermediate results and reduces them to a
summarized dataset, which is the wanted end result. Typically both the input and
the output datasets are stored on a file-system. The MapReduce framework is
highly scalable and supports the addition of processing nodes to process chunks.
Apache’s Hadoop MapReduce is the predominant open source Java-based
implementation of MapReduce.
Another example is the task of grouping customer records within a dataset into
multiple age groups, such as 20- 30, 30- 40, 40- 50, and so on. In the Map phase,
you split the records and process in parallel to generate intermediate groups of
records. In the Reduce phase, you summarize the intermediate datasets to obtain
the distinct groups of customer records (depicted in the colored groups).
MapReduce Example
In the Map phase, you split the records and process in parallel to generate
intermediate groups of records
In the Reduce phase, you summarize the intermediate datasets to obtain the
distinct groups of customer records
The illustration depicts a generic representation of how MapReduce works; it
can be used to represent various examples
Query
Languages are designed to handle not only the processing, but also the
loading and saving of data from and to the MapReduce cluster
Languages typically support integration with NoSQL databases that you
implement on the MapReduce cluster
Notes
social networks. This also enables in controlling customer acquisition costs and
target sales promotions more effectively. Big Data analytics is also being used
extensively in detecting credit card frauds.
eCommerce: eCommerce organizations use Big Data analytics to gain
valuable insights from the data. They use this solution to understand customer
buying patterns, and anticipate future demand. Also for effective marketing
campaigns, optimize inventory assortment, and improve distribution. This
solution enables them to provide optimal prices and services to customers, and
also improve operations and revenue.
Government: In government organizations, Big data analytics enables
improved efficiency and effectiveness across a variety of domains such as
social services, education, defense, national security, crime prevention,
transportation, tax compliance, and revenue management.
Social Network Analysis: The increasing use of online social networking
services has led to a massive growth of data in the digital universe. Through Big
Data analytics, organizations can gain valuable insights from the data that is
generated through social networking. This analysis enables the discovery and
analysis of communities, personalization for solitary activities (for example,
search) and social activities (for example, discovery of potential friends). It also
involves the analysis of user behavior in open forums (for example,
conventional sites, blogs, and communities) and in commercial platforms (for
example, eCommerce).
Gaming: Big Data plays a very important role in gaming industry due to billions
of video game players in the world. Gamers are generating a massive amount
of data through offline and online games. There are many factors that contribute
to the rapid growth of data in the gaming industry. These factors include what
game the gamers play and with whom they play, advertisements, and real time
information of the gamer. These industries use Big Data technologies to
improve their revenue and gaming experience.
Geolocation Services: Businesses like finance, social media, retailers, and
transport are using geolocation services in their applications to locate their
customers. This service generates a huge amount of data which requires them
to use Big Data algorithms to derive a meaning information. Businesses are
using this information to improve their service, customer experience, and to gain
competitive advantage.
Introduction
This lesson presents an overview of the Internet of Things (IoT) along with its
components and protocols used. It also focuses on the impact of IoT on data center
and its use cases.
Internet of Things
Internet of Things
Notes
The Internet of Things (IoT) is the concept of networking things such as objects and
people to collect and exchange data. The idea is that real-life objects can
independently share and process information - without humans having anything to
do with the data input stage.
center to meet the network, security, and data storage and management
requirements.
Detect changes in the Collect data from sensors to Manage data traffic and
surrounding environment perform required action translate network protocols
Produce and transmit digital In IoT, they help to automate Ensure that the devices are
data the operations interoperable
Notes
Home Automation Allows home owners to monitor and control home appliances
anytime irrespective of the location
Smart Cities Highlights the need to enhance the quality of life of the
citizens using IoT
Wearables Helps to collect data about the users health. Also helps to
detect and report crimes
Notes
Home Automation: The use of IoT has entered the residential environment with
the introduction of smart home technology. Various electronic objects at home
such as air conditioner, lights, refrigerators, security cameras, kitchen stoves
can be connected to the Internet with the help of sensors. This will allow the
home owners to efficiently monitor and control the objects anytime irrespective
of the location.
Smart Cities: The smart cities concept highlights the need to enhance the
quality of life of the citizens using smart public infrastructure. This process
enables optimization of power usage, efficient water supply, manage waste
collections, reliable public transportation using IoT sensors. All these data will
collected and sent to a control center which directs the necessary actions. This
application of IoT can also be extended to build smarter environment by early
detection of earthquake, air pollution, and forest fire.
Wearables: With the use of wearables and embedded devices on people, IoT
sensors can collect data about the users regarding their health, heartbeat, and
exercise patterns. For example, embedded chips enable doctors to monitor
patients who are in critical care, by tracking and charting all their vital signs
Introduction
This lesson presents an overview of machine learning and the different types of
machine learning algorithms. It focuses on the impact of machine learning on data
center, and its use cases.
Machine Learning
Artificial Intelligence
Machine Learning
Deep Learning
Notes
Artificial intelligence, machine learning, and deep learning are three intertwined
concepts that help to build this human-like ability into computer systems. Artificial
Intelligence (AI) is an umbrella term, while machine and deep learning are the
techniques that make AI possible. AI is a technology of creating intelligent systems
that work and think like humans.
Machine learning refers to the process of ‘training’ the machine, feeding large
amounts of data into algorithms that give it the ability to learn how to perform the
Deep learning is a machine learning technique that uses neural networks as the
underlying architecture for training models. Fast compute and storage with a lot of
memory and high-bandwidth networking will enable machine to learn faster and
provide accurate results. Neural networks is a set of algorithms that are used to
establish relationships in a dataset by imitating a human brain. A training model is
an object which is provided with an algorithm along with a set of data from which it
can learn.
Algorithm Types
Train the model with test data sets, and improvise the model accordingly for
future decision making
Most machine learning algorithms can be classified into the following three
types:
Models are trained to predict future Models are left to discover the Algorithms/models interacts with their
events information environment
Algorithms try to find patterns using Algorithms uses unlabeled data and try Produces results based on trial and
labeled dataset to find similarities/differences error method
Inputs and outputs can be clearly Only input data is given and output data Uses rewards and errors as feedback to
defined is not available learn
Notes
Notes
Media and Content from the media and entertainment industry can be
Entertainment automatically tagged using metadata by applying machine
learning solutions. This method enhances content-based
search activity by finding the right content quickly and helps
the content developers to optimize the content to specific
audiences based on their search data. It also plays an
important role in creating video subtitles using natural
language processing.
Financial Services Banks and other businesses use machine learning to detect
and prevent fraudulent activities for credit cards and bank
accounts. It also helps to identify investment opportunities for
traders by monitoring market changes. It is used to provide
risk management solutions like predicting financial crisis,
loan repayment capabilities of the customers, and securing
financial data.
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts in Practice
Concepts in Practice
An intelligent device that is designed to aggregate, secure, analyze, and relay data
from diverse sensors and equipment at the edge of the network. These gateways
bridge both legacy systems and modern sensors to the internet, helping to get
business insights from the real-time, pervasive data in your machines and
equipment. It is compact, consumes less power, and suitable for challenging field
and mobile use cases. It is designed for flexible manageability using Dell Edge
Device Manager or a third-party on-premise console.
These solutions shorten the deployment time from months to days. They include
software that streamlines the set‑ up of data science environments to just a few
clicks, boosting data scientist productivity. These solutions are optimized with
software, servers, networking, storage, and services to help organizations to get
faster and deeper insights. These solutions include:
Dell EMC Machine Learning with Hadoop: Builds on the power of tested and
proven Dell EMC Ready Bundles for Hadoop, created in partnership with
Cloudera®. This solution includes an optimized solution stack along with data
science and framework optimization. It consists of Cloudera Data Science
Workbench with the added ease of a Dell EMC Data Science Engine
Dell EMC Deep Learning with Intel: Simplifies and accelerates the adoption of
deep learning technology with an optimized solution stack that simplifies the
entire workflow from model building to training to inferencing. It consists of
PowerEdge C servers and Dell EMC H-series networking based on Intel Omni-
Path networking.
Dell EMC Deep Learning with NVIDIA: Provides a GPU‑ optimized solution
stack that can shave valuable time from deep learning projects. It consists of
PowerEdge servers with NVIDIA GPUs and Isilon Scale-out NAS storage.
Extends the VMware Software Defined Data Center (SDDC) software onto the
AWS cloud. This SDDC software consists of several other products including
vCenter Server for data center management, vSAN for software-defined storage,
and NSX for software-defined networking. It enables customers to run their
VMware vSphere based applications across private, public, and hybrid cloud
environments with optimized access to AWS services. It helps virtual machines in
SDDC to access AWS EC2 and S3 services. This solution provides workload
migration, allows customers to use the global presence of AWS data centers, and
flexibility of management.
Assessment
A. Deep learning
B. Bigdata analytics
C. Edge computing
D. Internet of Things
2. Identify the cloud computing characteristic that controls and optimizes resource
use by leveraging a metering capability.
A. Measured services
C. Resource pooling
D. Rapid elasticity
Summary
Introduction
This module focuses on the compute system, its components, and its types. This
module also focuses on compute virtualization and application virtualization.
Further, this module focuses on an overview of storage and connectivity in a data
center. Finally, this module focuses on an overview of software-defined data
center.
Introduction
This lesson covers compute system, and its key physical and logical components.
This lesson also covers the types of compute systems.
Compute System
Compute System
Notes
The compute systems used in building data centers are typically classified into
three categories: tower compute system, rack-mounted compute system, and blade
compute system
Tower
A tower compute system, also known as a tower server, is a compute system built
in an upright stand-alone enclosure called a “tower”, which looks similar to a
desktop cabinet. Tower servers have a robust build, and have integrated power
supply and cooling. They typically have individual monitors, keyboards, and mice.
Tower servers occupy significant floor space and require complex cabling when
deployed in a data center. They are also bulky, and a group of tower servers
generate considerable noise from their cooling units. Tower servers are typically
used in smaller environments. Deploying many tower servers in large environments
may involve substantial expenditure.
Rack-mounted
enclosure containing multiple mounting slots called “bays”, each of which holds a
server in place with the help of screws. A single rack contains multiple servers
stacked vertically in bays, thereby simplifying network cabling, consolidating
network equipment, and reducing the floor space use. Each rack server has its own
power supply and cooling unit. Typically, a console is mounted on a rack to enable
administrators to manage all the servers in the rack.
Some concerns with rack servers are that they are cumbersome to work with, and
they generate many heat because of which more cooling is required, which in turn
increases power costs. A “rack unit” (denoted by U or RU) is a unit of measure of
the height of a server designed to be mounted on a rack. One rack unit is 1.75
inches (44.45 mm). A 1 U rack server is typically 19 inches (482.6 mm) wide.
The standard rack cabinets are 19 inches wide and the common rack cabinet sizes
are 42U, 37U, and 27U. The rack cabinets are also used to house network,
storage, telecommunication, and other equipment modules. A rack cabinet may
also contain a combination of different types of equipment modules.
Blade
A blade server is housed in a slot inside a blade enclosure (or chassis), which
holds multiple blades and provides integrated power supply, cooling, networking,
and management functions. The blade enclosure enables interconnection of the
blades through a high-speed bus and also provides connectivity to external storage
systems.
The modular design of the blade servers makes them smaller, which minimizes the
floor space requirements, increases the compute system density and scalability,
and provides better energy efficiency as compared to the tower and the rack
servers. It also reduces the complexity of the compute infrastructure and simplifies
compute infrastructure management. It provides these benefits without
compromising on any capability that a non-blade compute system provides.
Some concerns with blade servers include the high cost of a blade system (blade
servers and chassis), and the proprietary architecture of most blade systems due to
which a blade server can typically be plugged only into a chassis from the same
vendor.
Component Description
Random-Access Volatile data storage that contains the programs for execution
Memory and the data that are used by the processor
Motherboard A PCB that holds the processor, RAM, ROM, network and I/O
ports, and other integrated components, such as GPU and NIC
Notes
User Applications
Operating System
User Interface
Services
Notes
The image depicts a generic architecture of an OS. Some functions (or services) of
an OS include program execution, memory management, resources management
and allocation, and input/output management. An OS also provides networking and
basic security for the access and usage of all managed resources. It also performs
basic storage management tasks while managing other underlying components,
such as the device drivers, logical volume manager, and file system. An OS also
contains high-level Application Programming Interfaces (APIs) to enable programs
to request services.
The amount of physical memory (RAM) in a compute system determines both the
size and the number of applications that can run on the compute system.
Address Translation
Unavailable
Unavailable
Physical Memory
Notes
present in the compute system. This virtual memory enables multiple applications
and processes, whose aggregate memory requirement is greater than the available
physical memory to run on a compute system without impacting each other.
The VMM manages the virtual-to-physical memory mapping. This VMM fetches
data from the secondary storage when a process references a virtual address that
points to data at the secondary storage. The space used by the VMM on the
secondary storage is known as a swap space. A swap space (also known as page
file or swap file) is a portion of the storage drive that is used as physical memory.
Notes
Logical Volume Manager (LVM) is software that runs on a compute system and
manages logical and physical storage. LVM is an intermediate layer between the
file system and the physical drives. It can partition a larger-capacity disk into virtual,
smaller-capacity volumes (partitioning) or aggregate several smaller disks to form a
larger virtual volume (concatenation). LVMs are mostly offered as part of the OS.
The evolution of LVMs enabled dynamic extension of file system capacity and
efficient storage management. The LVM provides optimized storage access and
simplifies storage resource management. It hides details about the physical disk
and the location of data on the disk. It enables administrators to change the storage
allocation even when the application is running.
The basic LVM components are physical volumes, logical volume groups, and
logical volumes. In LVM terminology, each physical disk that is connected to the
compute system is a physical volume (PV). A volume group is created by grouping
one or more PVs. A unique physical volume identifier (PVID) is assigned to each
PV when it is initialized for use by the LVM. Physical volumes can be added or
Logical volumes (LV) are created within a given volume group. A LV can be
thought of as a disk partition, whereas the volume group itself can be thought of as
a disk. The size of a LV is based on a multiple of the number of physical extents.
The LV appears as a physical device to the OS. A LV is made up of noncontiguous
physical extents and may span over multiple physical volumes. A file system is
created on a logical volume.
Disk partitioning was introduced to improve the flexibility and utilization of disk
drives
In partitioning, a disk drive is divided into logical containers called logical
volumes.
Compute Systems
Logical
Volume(s)
Physical
Volume(s)
Partitioning Concatenation
Notes
For example, a large physical drive can be partitioned into multiple LVs to maintain
data according to the file system and application requirements. The partitions are
created from groups of contiguous cylinders when the hard disk is initially set up on
the host. The host’s file system accesses the logical volumes without any
knowledge of partitioning and physical structure of the disk. Concatenation is the
process of grouping several physical drives and presenting them to the host as one
large logical volume.
Notes
Files are of different types, such as text, executable, image, audio/video, binary,
library, and archive. Files have various attributes, such as name, unique identifier,
type, size, location, owner, and protection.
A file system is an OS component that controls and manages the storage and
retrieval of files in a compute system. A file system enables easy access to the files
residing on a storage drive, a partition, or a logical volume. It consists of logical
structures and software routines that control access to files. It enables users to
perform various operations on files, such as create, access (sequential/random),
write, search, edit, and delete.
A file system block is the smallest unit allocated for storing data. Each file system
block is a contiguous area on the physical disk. The block size of a file system is
fixed at the time of its creation. The file system size depends on the block size and
the total number of file system blocks
Disk-based
A disk-based file system manages the files stored on storage devices such as
solid-state drives, disk drives, and optical drives. Examples of disk-based file
systems are Microsoft NT File System (NTFS), Apple Hierarchical File System
(HFS) Plus, Extended File System family for Linux, Oracle ZFS, and Universal Disk
Format (UDF).
Network-based
A network-based file system uses networking to enable file system access between
compute systems. Network-based file systems may use either the client/server
model, or may be distributed/clustered. In the client/server model, the file system
resides on a server, and is accessed by clients over the network. The client/server
model enables clients to mount the remote file systems from the server.
NFS for UNIX environment and CIFS for Windows environment (both covered in
Module, ‘File-based Storage System (NAS)’) are two standard client/server file
sharing protocols. Examples of network-based file systems are: Microsoft
Distributed File System (DFS), Hadoop Distributed File System (HDFS), VMware
Virtual Machine File System (VMFS), Red Hat GlusterFS, and Red Hat CephFS.
Virtual
A virtual file system is a memory-based file system. This process enables compute
systems to transparently access different types of file systems on local and network
storage devices. It provides an abstraction layer that enables applications to
access different types of file systems in a uniform way. It bridges the differences
between the file systems for different operating systems, without the application’s
knowledge of the type of file system they are accessing. The examples of virtual file
systems are Linux Virtual File System (VFS) and Oracle CacheFS.
Introduction
This lesson covers compute virtualization, hypervisor, and virtual machine. This
lesson also covers desktop virtualization.
VM VM VM
OS OS OS
Notes
memory, storage, and network resources to all the VMs. Depending on the
hardware capabilities, many VMs can be created on a single physical compute
system.
Before Virtualization
Drawbacks
IT silos and underutilized resources
Inflexible and expensive
Management inefficiencies
Risk of downtime
APP
After Virtualization
OS
X86 Hardware
Benefits:
Server consolidation and improved
resource utilization
Flexible infrastructure at lower costs
Increased management efficiency
Increased availability and improved business continuity
VM VM VM
OS OS OS
What is a Hypervisor?
Definition: Hypervisor
Software that provides a virtualization layer for abstracting compute
system hardware, and enables the creation of multiple virtual
machines.
o Provides functionality
similar to an OS kernel OS OS
requests to physical
Hypervisor Kernel
hardware
Virtual machine manager Physical Compute System
(VMM)
o Each VM is assigned a
VMM
There are also two types of hypervisor
Bare-metal
Hosted
Notes
operating systems to run concurrently on the same physical compute system. The
hypervisor provides standardized hardware resources to all the VMs.
A hypervisor has two key components: kernel and virtual machine manager (VMM).
A hypervisor kernel provides the same functionality as the kernel of any OS,
including process management, file system management, and memory
management. It is designed and optimized to run multiple VMs concurrently. It
receives requests for resources through the VMM, and presents the requests to the
physical hardware. Each virtual machine is assigned a VMM that gets a share of
the processor, memory, I/O devices, and storage from the physical compute
system to successfully run the VM.
Hypervisors are categorized into two types: bare-metal (Type I) and hosted (Type
II). A bare-metal hypervisor is directly installed on the physical compute hardware
in the same way as an OS. It has direct access to the hardware resources of the
compute system and is therefore more efficient than a hosted hypervisor. A bare-
metal hypervisor is designed for enterprise data centers and third platform
infrastructure. It also supports the advanced capabilities such as resource
management, high availability, and security. The image represents a bare-metal
hypervisor. A hosted hypervisor is installed as an application on an operating
system. The hosted hypervisor does not have direct access to the hardware, and
all requests pass through the OS running on the physical compute system.
Notes
A virtual machine (VM) is a logical compute system with virtual hardware on which
a supported guest OS and its applications run. A VM is created by a hosted or a
bare-metal hypervisor installed on a physical compute system. An OS, called a
“guest OS”, is installed on the VM in the same way it is installed on a physical
compute system. From the perspective of the guest OS, the VM appears as a
physical compute system.
that to a virtual disk drive are translated by the hypervisor and mapped to a file on
the physical compute system’s disk drive
Compute virtualization software enables creating and managing several VMs. Each
VM has a different OS of its own—on a physical compute system or on a compute
cluster. VMs are created on a compute system, and provisioned to different users
to deploy their applications. The VM hardware and software are configured to meet
the application’s requirements. The different VMs are isolated from each other, so
that the applications and the services running on one VM do not interfere with
those running on other VMs. The isolation also provides fault tolerance so that if
one VM crashes, the other VMs remain unaffected.
VM Hardware
HBA
RAM
Graphics
Card Storage
Device
VM
Mouse Hardware
SCSI/IDE
Controllers
Keyboard
USB
Controller
Processor NIC
Notes
Based on the requirements, the virtual components can be added or removed from
a VM. However, not all components are available for addition and configuration.
Some hardware devices are part of the virtual motherboard and cannot be modified
or removed. For example, the video card and the PCI controllers are available by
default and cannot be removed.
Virtual optical drives and floppy drives can be configured to connect to either
physical devices or to image files, such as ISO on the storage. SCSI/IDE virtual
controllers provide a way for the VMs to connect to the storage devices. The virtual
USB controller is used to connect to a physical USB controller and to access the
connected USB devices. Serial and parallel ports provide an interface for
connecting peripherals to the VM.
VM Files
Virtual disk file Stores the contents of the VM's disk drive
Log file Keeps a log of the VM’s activity and is used in troubleshooting
Notes
The memory state file stores the memory contents of a VM and is used to resume a
VM that is in a suspended state. The snapshot file stores the running state of the
VM including its settings and the virtual disk, and may optionally include the
memory state of the VM. It is typically used to revert the VM to a previous state.
Log files are used to keep a record about the VM’s activity and are often used for
troubleshooting purposes.
For managing VM files, a hypervisor may use a native clustered file system, or the
Network File System (NFS). A hypervisor’s native clustered file system is optimized
to store VM files. It may be deployed on Fibre Channel and iSCSI storage, apart
from the local storage. The virtual disks are stored as files on the native clustered
file system. Network File System enables storing of VM files on remote file servers
(NAS device) accessed over an IP network. The NFS client built into the hypervisor
uses the NFS protocol to communicate with the NAS device.
Notes
With the traditional desktop machine, the OS, applications, and user profiles are all
tied to a specific piece of hardware. With legacy desktops, business productivity is
impacted greatly when a client device is broken or lost. Managing a vast desktop
environment is also a challenging task.
Desktop virtualization decouples the OS, applications, and user state (profiles,
data, and settings) from a physical compute system. These components,
collectively called a virtual desktop, are hosted on a remote compute system. It can
be accessed by a user from any client device, such as laptops, desktops, thin
clients, or mobile devices. A user accesses the virtual desktop environment over a
network on a client through a web browser or a client application.
The OS and applications of the virtual desktop execute on the remote compute
system, while a view of the virtual desktop’s user interface (UI) is presented to the
end-point device. Desktop virtualization uses a remote display protocol to transmit
the virtual desktop’s UI to the end-point devices. The remote display protocol also
sends back keystrokes and graphical input information from the end-point device,
enabling the user to interact with the virtual desktop.
Introduction
This lesson covers evolution of storage architecture and the types of storage
devices. This lesson also covers compute-to-compute and compute-to-storage
connectivity. Further, this lesson covers different storage connectivity protocols.
Sales Server
Storage Device
Clients
LAN/WAN
Finance Server
Storage Device
R&D Server
Storage Device
Notes
Sales Server
Storage Area
Clients LAN/WAN
Network
Finance Server
Storage Devices
R&D Server
Notes
Storage devices assembled within storage systems form a storage pool, and
several compute systems access the same storage pool over a specialized, high-
speed storage area network (SAN). A SAN is used for information exchange
between compute systems and storage systems, and for connecting storage
systems. It enables compute systems to share storage resources, improve the
utilization of storage systems, and facilitate centralized storage management.
SANs are classified based on protocols they support. Common SAN deployment
types are Fibre Channel SAN (FC SAN), Internet Protocol SAN (IP SAN), and Fibre
Channel over Ethernet SAN (FCoE SAN). These topics are covered later in the
course.
Storage Description
Type
Introduction to Connectivity
Compute-to-compute connectivity
Compute-to-storage connectivity
Compute-to-Compute Connectivity
VM VM
APP AP
P
OS OS
Hypervisor
Client Compute
Systems
Ethernet Switch IP Router Ethernet Switch
VM VM
AP AP
P P
OS OS
Hypervisor
Compute-to-Storage Connectivity
Clients
Ethernet Switch
LAN
Servers
iSCSI Target
FC Switch
Storage System
Notes
Host bus adapter: A host bus adapter (HBA) is a host interface device that
connects a compute system to storage or to a SAN. It is an application-specific
integrated circuit (ASIC) board. It performs I/O interface functions between a
compute system and storage, relieving the processor from more I/O processing
workload. A compute system typically contains multiple HBAs.
Port: A port is a specialized outlet that enables connectivity between the compute
system and storage. An HBA may contain one or more ports to connect the
compute system to the storage. Cables connect compute systems to internal or
external devices using copper or fiber optic media.
What is a Protocol?
Definition: Protocols
Define formats for communication between devices. Protocols are
implemented using interface devices (or controllers) at both the
source and the destination devices.
Protocol Description
Applications Lesson
Introduction
This lesson covers traditional and modern application. Further, this lesson covers
microservices and application virtualization.
Applications
Application Overview
Definition: Application
Definition: Application
– A software program or set of
programs that is designed to
perform a group of
coordinated tasks.
Examples
Applications
– Customer relationship
management (CRM)
– Enterprise Resource
Planning (ERP)
– Email such as Microsoft
Outlook
Notes
For anyone who uses computers or smartphones, applications are used every day.
From reading your email to Facebook and Twitter, when you post pictures or write
your tweet, you are using an application.
For the business, applications unlock value from the digital world. Using a great
application reshapes user experiences and creates touch points in how to get the
information you want. Applications are crucial in how businesses provide value to
their customers, which drives fundamental business objectives. Applications
manage the information and provide it in a form that is useful to the business to
meet specific requirements.
Modern Applications
Modern Applications
Microservices
Monolithic Distributed
Resiliency and scale are infrastructure Resiliency and scale are application
managed managed
Examples: CRM, ERP, and Email – Examples: Facebook, Uber, and Netflix
Microsoft Outlook
Notes
Notes
Application Encapsulation
The application’s virtual container isolates it from the underlying OS and other
applications, thereby minimizing application conflicts. During application execution,
all function calls made by the application to the OS for assets get redirected to the
assets within the virtual container. The application is thus restricted from writing to
the OS file system or registry, or modifying the OS in any other way.
Application Presentation
This process makes it appear as if the application is running on the client when, in
fact, it is running on the remote compute system. Application presentation enables
the delivery of an application on devices that have less computing power than what
is normally required to execute the application. In application presentation,
application sessions are created in the remote compute system and a user
connects to an individual session from a client by means of the software agent.
Individual sessions are isolated from each other, which secures the data of each
user and also protects the application crashes.
Application Streaming
Since a limited portion of the application is delivered to the client before the
application starts, the user experiences rapid application launch. The streaming
approach also reduces network traffic. As the user accesses different application
functions, more of the application is downloaded to the client. The additional
portions of the application may also be downloaded in the background without user
intervention. Application streaming requires an agent or client software on clients.
Introduction
This lesson covers software-defined data center and its architecture. This lesson
also covers software-defined controller and the benefits of software-defined
architecture.
Compute, storage, network, security, and availability services are pooled and
delivered as a service
SDDC services are managed by intelligent, policy-driven software
Regarded as the foundational infrastructure for the modern data centere
Notes
SDDC Architecture
Applications
APIs
APIs
Notes
This decoupling of the control path and data path enables the centralization of data
provisioning and management tasks through software that is external to the
infrastructure components. The software runs on a centralized compute system or
a stand-alone device, called the software-defined controller. The figure illustrates
Software-Defined Controller
Notes
The controller provides a single control point to the entire infrastructure enabling
policy-based infrastructure management. The controller enables an administrator to
use a software interface to manage the resources, node connectivity, and traffic
flow; control behavior of underlying components; apply policies uniformly across
the infrastructure components; and enforce security.
The controller also provides interfaces that enable applications, external to the
controller, to request resources and access these resources as services.
Benefit Description
Introduction
This lesson covers the building blocks of a data center infrastructure. It covers the
components and functions of the five layers of a data center. It also covers the
three cross-layer functions in a data center.
The image is a block diagram depicting the core IT infrastructure building blocks
that make up a data center.
APPLICATIONS
ORCHESTRATION Orchestration
Software
SOFTWARE-DEFINED INFRASTRUCTURE
Software-Defined Compute Software-Defined Software-Defined Fault Tolerance
Storage Network Mechanisms
VIRTUAL INFRASTRUCTURE
Notes
continuity and security functions include mechanisms and processes that are
required to provide reliable and secure access to applications, information, and
services. The management function includes various processes that enable the
efficient administration of the data center and the services for meeting business
requirements. Applications that are deployed in the data center may be a
combination of internal applications, business applications, and modern
applications that are either custom-built or off-the-shelf. The fulfillment of the five
essential cloud characteristics ensures the infrastructure can be transformed into a
cloud infrastructure that could be either private or public. Further, by integrating
cloud extensibility, the infrastructure can be connected to an external cloud to
leverage the hybrid cloud model.
Physical Infrastructure
center infrastructure
Applications Applications Extensibilit
Cl
Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms
Storage
Governance, Risk,
Notes
The physical infrastructure forms the foundation layer of a data center. It includes
equipment such as compute systems, storage systems, and networking devices.
This equipment along with the operating systems, system software, protocols, and
tools that enable the physical equipment to perform their functions. A key function
of physical infrastructure is to execute the requests generated by the virtual and
software-defined infrastructure. Additional functions are: storing data on the storage
devices, performing compute-to-compute communication, executing programs on
compute systems, and creating backup copies of data.
Virtual Infrastructure
SOFTWARE-DEFINED INFRASTRUCTURE
Software-Defined Software-Defined Software-Defined Fault Tolerance
VIRTUAL INFRASTRUCTURE
Storage Network Mechanisms
storage, and virtual network. Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms
Storage
Compute Storage Network Governance, Risk,
Benefits of virtualization:
Notes
For example, storage virtualization software pools the capacity of multiple storage
devices to create a single large storage capacity. Similarly, compute virtualization
software pools the processing power and memory capacity of a physical compute
system. This physical computes create an aggregation of the power of all
processors (in megahertz) and all memory (in megabytes). Examples of virtual
resources include virtual compute (virtual machines), virtual storage (LUNs), and
virtual networks.
Note: While deploying a data center, an organization may choose not to deploy
virtualization. In such an environment, the software-defined layer is deployed
directly over the physical infrastructure. Further, it is also possible that part of the
infrastructure is virtualized and rest is not virtualized.
Software-Defined Infrastructure
on physical layer
Applications Applications Extensibilit
Cloud
VIRTUAL INFRASTRUCTURE
Underlying resources are Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms
Enables ITaaS
Components:
Software-defined compute
Software-defined storage
Software-defined network
Notes
Orchestration
DATA CENTER INFRASTRUCTURE
automated tasks
ORCHESTRATION
Orchestration
Software
Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms
Storage
Governance, Risk,
Notes
The orchestration layer includes the orchestration software. The key function of this
layer is to provide workflows for executing automated tasks to accomplish a wanted
outcome. Workflow refers to a series of interrelated tasks that perform a business
operation. The orchestration software enables this automated arrangement,
coordination, and management of the tasks. This function helps to group and
sequence tasks with dependencies among them into a single, automated workflow.
Associated with each service listed in the service catalog, there is an orchestration
workflow defined. When a service is selected from the service catalog, an
associated workflow in the orchestration layer is triggered. Based on this workflow,
the orchestration software interacts with the components across the software-
defined layer and the BC, security, and management functions. This orchestration
entities executes the provisioning of tasks.
Services
Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms
resources Storage
Operation
PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)
Components:
Service catalog
Self-service portal
Stores service information in service catalog and presents them to the users
Enables users to access services using a self-service portal
Notes
This layer includes a service catalog that presents the information about all the IT
resources being offered as services. The service catalog is a database of
information about the services and includes various information about the services,
including the description of the services, the types of services, cost, supported
SLAs, and security mechanisms.
Business Continuity
SOFTWARE-DEFINED INFRASTRUCTURE
VIRTUAL INFRASTRUCTURE
Storage Network Mechanisms
mitigate the impact of downtime Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms
Measure Description
Notes
The proactive measures include activities and processes such as business impact
analysis, risk assessment, and technology solutions such as backup, archiving, and
replication.
The reactive measures include activities and processes such as disaster recovery
and disaster restart to be invoked in the event of a service failure.
Security
secure services
Applications Applications Extensibilit
Cl
SOFTWARE-DEFINED INFRASTRUCTURE
Software-Defined Software-Defined Software-Defined Fault Tolerance
VIRTUAL INFRASTRUCTURE
Storage Network Mechanisms
Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms
Storage
Operation Compute Storage Network Replication Governance, Risk,
Firewall
Intrusion detection and prevention systems
Anti-virus
Notes
Management
provisioning ORCHESTRATION
Orchestration
Software
SOFTWARE-DEFINED INFRASTRUCTURE
VIRTUAL INFRASTRUCTURE
Capacity and availability Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms
management Storage
Operation
Management
Compute Storage Network Replication Governance, Risk,
and Compliance
Compliance conformance
Monitoring services
Notes
This function supports all the layers to perform monitoring, management, and
reporting for the entities of the infrastructure.
Do-It-Yourself Infrastructure
In the Do-It-Yourself
(DIY) approach,
organizations integrate
the best in class Vendor A Products
infrastructure
Router
components including Switc
hardware and software h
that is purchased from
different vendors.
Greenfield
Brownfield
Notes
Greenfield Method
Brownfield Method
There are two types of converged systems; to learn more, click each tab.
HCI offers efficiency using modular building blocks that are known as
nodes. A node consists of a server with Direct Attached Storage. They
are software-defined systems that decouple the compute, storage,
networking functions and run these functions on a common set of
physical resources. They do not have a physical Storage Area Network (SAN), or a
distinct physical storage controller like converged infrastructure.
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts in Practice
Concepts in Practice
Dell EMC VxRail Appliances are the fastest growing hyper-converged systems
worldwide. They are the standard for transforming VMware infrastructures,
dramatically simplifying IT operations while lowering overall capital and operational
costs.
Drives operational efficiency for a 30% TCO advantage versus HCI systems built
using VSAN Ready Nodes. Unifies support for all VxRail hardware and software
delivering 42% lower total cost of serviceability.Engineered, manufactured,
managed, supported, and sustained as ONE for single end-to-end lifecycle
support.Fully loaded with enterprise data services for built-in data protection, cloud
storage, and disaster recovery.
The architecture enables you to scale from as few as four nodes to over a
thousand nodes. In addition, it provides enterprise-grade data protection,
multitenant capabilities, and add-on enterprise features such as QoS, thin
provisioning, and snapshots. VxRack FLEX delivers the scalability, flexibility,
As the foundation for a complete, adaptive and scalable solution, the 13th
generation of Dell EMC PowerEdge servers delivers outstanding operational
efficiency and top performance at any scale. It increases productivity with
processing power, exceptional memory capacity, and highly scalable internal
storage. PowerEdge provide insight from data, environment virtualization, and
enable a mobile workforce. Major benefits of PowerEdge Servers are:
Dell offers a wide selection of secure, reliable, cost-effective Wyse thin clients
designed to integrate into any virtualized or web-based infrastructure, while
meeting the budget and performance requirements for any application. Wyse thin
and zero clients are built for easy integration into VDI or web-based environment
with instant, hands-free operation and performance that meets demands. Simplify
security and scalability with simple deployment and remote management in an
elegant, space-saving design. Malware-resistant and tailored for Citrix, Microsoft
and VMware.
VMware Horizon
VMware Horizon is a VDI solution for delivering virtualized or hosted desktops and
applications through a single platform to the end users. These desktop and
application services—including RDS, hosted apps, packaged apps with VMware
ThinApp, and SaaS apps—can all be accessed from one unified workspace across
devices and locations. Horizon provides IT with a streamlined approach to deliver,
protect, and manage desktops and applications while containing costs and
ensuring that end users can work anytime, anywhere, on any device. Horizon
supports both Windows as well as Linux-based desktops.
VMware ESXi
VMware Cloud Foundation makes it easy to deploy and run a hybrid cloud. It
provides integrated cloud infrastructure (compute, storage, networking, and
security) and cloud management services to run enterprise applications in both
private and public environments.
Cloud Foundation delivers end to end security for all applications by delivering
microsegmentation, distributed firewalls, and VPN (NSX), VM, hypervisor, and
vMotion encryption (vSphere), and data at rest, cluster, and storage encryption
(vSAN).
Assessment
A. Security
B. Service
C. Business continuity
D. Management
A. Orchestration
B. Security
C. Services
D. Management
Summary
Introduction
This module focuses on the key components of an intelligent storage system. This
module also focuses on storage subsystems and provides details on components,
addressing, and performance parameters of a hard disk drive (HDD),solid state
drive (SSD) and hybrid storage drives. Then, this module focuses on RAID
techniques and their use to improve performance and protection. Finally, this
module focuses on the types of intelligent storage systems and their architectures.
Introduction
This lesson covers components of intelligent storage systems. This lesson also
covers components, addressing, and performance of hard disk drives, solid state
drives and Hybrid drives.
Notes
Intelligent storage systems are feature-rich storage arrays that provide highly
optimized I/O processing capabilities. These intelligent storage systems have the
capability to meet the requirements of today’s I/O intensive modern applications.
These applications require high levels of performance, availability, security, and
scalability.
The storage systems have an operating environment that intelligently and optimally
handles the management, provisioning, and utilization of storage resources. The
storage systems are configured with a large amount of memory (called cache) and
multiple I/O paths and use sophisticated algorithms to meet the requirements of
performance-sensitive applications. The storage systems also support various
technologies such as automated storage tiering and virtual storage provisioning.
These capabilities have added a new dimension to storage system performance.
Further, the intelligent storage systems support APIs to enable integration with
SDDC and cloud environments.
ISS Components
Storage
Controller(s)
Controller Storage
Notes
An intelligent storage system has two key components, controller and storage. A
controller is a compute system that runs a purpose-built operating system that is
responsible for performing several key functions for the storage system. Examples
of such functions are serving I/Os from the application servers, storage
management, RAID protection, local and remote replication, provisioning storage,
automated tiering, data compression, data encryption, and intelligent cache
management. An intelligent storage system typically has more than one controller
for redundancy. Each controller consists of one or more processors and a certain
amount of cache memory to process a large number of I/O requests. These
controllers are connected to the compute system either directly or via a storage
network. The controllers receive I/O requests from the compute systems that are
read or written from/to the storage by the controller. Depending on the type of the
data access method used for a storage system, the controller can either be
classified as block-based, file-based, object-based, or unified. An storage system
can have all hard disk drives, all solid state drives, or a combination of both.
A hard disk drive is a persistent storage device that stores and retrieves data using
rapidly rotating disks (platters) coated with magnetic material.
Controller
Board
HDA
Platter and
Read/Write
Power
Head Interface
Connectors
Notes
I/O operations in hard drives are performed by rapidly moving the arm across the
rotating flat platters that are coated with magnetic material.
Data is transferred between the disk controller and magnetic platters through the
read/write (R/W) head which is attached to the arm. Data can be recorded and
erased on magnetic platters any number of times.
Platter
A typical hard disk drive consists of one or more flat circular disks called platters.
The data is recorded on these platters in binary codes (0s and 1s). The set of
rotating platters is sealed in a case, called Head Disk Assembly (HDA). A platter is
a rigid, round disk coated with magnetic material on both surfaces (top and
bottom).
The data is encoded by polarizing the magnetic area or domains of the disk
surface. Data can be written to or read from both surfaces of the platter. The
number of platters and the storage capacity of each platter determine the total
capacity of the drive.
Spindle
A spindle connects all the platters and is connected to a motor. The motor of the
spindle rotates with a constant speed. The disk platter spins at a speed of several
thousands of revolutions per minute (rpm).
Read/Write head
Read/write (R/W) heads, read and write data from or to the platters. Drives have
two R/W heads per platter, one for each surface of the platter. The R/W head
changes the magnetic polarization on the surface of the platter when writing data.
While reading data, the head detects the magnetic polarization on the surface of
the platter.
During reads and writes, the R/W head senses the magnetic polarization and never
touches the surface of the platter. When the spindle rotates, a microscopic air gap
is maintained between the R/W heads and the platters, known as the head flying
height. This air gap is removed when the spindle stops rotating and the R/W head
rests on a special area on the platter near the spindle. This area is called the
landing zone
R/W heads are mounted on the actuator arm assembly, which positions the R/W
head at the location on the platter where the data needs to be written or read. The
R/W heads for all platters on a drive are attached to one actuator arm assembly
and move across the platters simultaneously.
The controller is a printed circuit board, mounted at the bottom of a disk drive. It
consists of a microprocessor, internal memory, circuitry, and firmware.
The firmware controls the power supplied to the spindle motor as well as controls
the speed of the motor. It also manages the communication between the drive and
the compute system.
In addition, it controls the R/W operations by moving the actuator arm and
switching between different R/W heads, and performs the optimization of data
access.
Spindle Sector
Sector
Track
Cylinder
Track
Platter
In the illustration, the drive shows eight sectors per track, six heads, and four cylinders. This means
a total of 8 × 6 × 4 = 192 blocks. The block number ranges from 0 to 191. Each block has its own
unique address. Assuming that the sector holds 512 bytes, a 500 GB drive with a formatted capacity
of 465.7 GB has in excess of 976,000,000 blocks.
Notes
Data on the disk is recorded on tracks, which are concentric rings on the platter
around the spindle. The tracks are numbered, starting from zero, from the outer
edge of the platter. The number of tracks per inch (TPI) on the platter (or the track
density) measures how tightly the tracks are packed on a platter.
Each track is divided into smaller units called sectors. A sector is the smallest,
individually addressable unit of storage. The track and sector structure is written on
the platter by the drive manufacturer using a low-level formatting operation. The
number of sectors per track varies according to the drive type. Typically, a sector
holds 512 bytes of user data. Besides user data, a sector also stores other
information, such as the sector number, head number or platter number, and track
number. This information helps the controller to locate the data on the drive.
A cylinder is a set of identical tracks on both surfaces of each drive platter. The
location of R/W heads is referred to by the cylinder number, not by the track
number. Earlier drives used physical addresses consisting of cylinder, head, and
sector (CHS) number. These addresses referred to specific locations on the disk,
and the OS had to be aware of the geometry of each disk used.
Logical block addressing (LBA) has simplified the addressing by using a linear
address to access physical blocks of data. The disk controller translates LBA to a
CHS address; the compute system needs to know only the size of the disk drive in
terms of the number of blocks. The logical blocks are mapped to physical sectors
on a 1:1 basis.
HDD Performance
Seek Time
Radial
Movement
Notes
The seek time (also called access time) describes the time taken to position the
R/W heads across the platter with a radial movement (moving along the radius of
the platter). In other words, it is the time taken to position and settle the arm and
the head over the correct track. Therefore, the lower the seek time, the faster the
I/O operation.
Full Stroke: It is the time taken by the R/W head to move across the entire width
of the disk, from the innermost track to the outermost track.
Average: It is the average time taken by the R/W head to move from one
random track to another, normally listed as the time for one-third of a full stroke.
Track-to-Track: It is the time taken by the R/W head to move between adjacent
tracks.
To minimize the seek time, data can be written to only a subset of the available
cylinders. This results in lower usable capacity than the actual capacity of the drive.
For example, a 500 GB disk drive is set up to use only the first 40 percent of the
cylinders and is effectively treated as a 200 GB drive. This is known as short-
stroking the drive.
Rotational Latency
Notes
This latency depends on the rotation speed of the spindle and is measured in
milliseconds. The average rotational latency is one-half of the time taken for a full
rotation. Similar to the seek time, rotational latency has more impact on the
reading/writing of random sectors on the disk than on the same operations on
adjacent sectors.
Average amount of data per unit time that the drive can deliver to the HBA :
Internal transfer rate: Speed at which data moves from the surface of a platter
to the internal buffer of the disk
External transfer rate: Rate at which data move through the interface to the
HBA
Head Disk
HBA Interface Buffer Assembly
Notes
The data transfer rate (also called transfer rate) refers to the average amount of
data per unit time that the drive can deliver to the HBA. In a read operation, the
data first moves from disk platters to R/W heads; then it moves to the drive’s
internal buffer. Finally, data moves from the buffer through the interface to the
compute system’s HBA.
In a write operation, the data moves from the HBA to the internal buffer of the disk
drive through the drive’s interface. The data then moves from the buffer to the R/W
heads. Finally, it moves from the R/W heads to the platters. The data transfer rates
during the R/W operations are measured in terms of internal and external transfer
rates.
Internal transfer rate is the speed at which data moves from a platter’s surface to
the internal buffer (cache) of the disk. The internal transfer rate takes into account
factors such as the seek time and rotational latency. External transfer rate is the
rate at which data can move through the interface to the HBA.
The external transfer rate is generally the advertised speed of the interface, such
as 133 MB/s for ATA.
Response Time
(ms)
Low Queue Size
0% 70% 100%
Utilization
Notes
The utilization of a disk I/O controller has a significant impact on the I/O response
time. Consider that a disk is viewed as a black box consisting of two elements: the
queue and the disk I/O controller. Queue is the location where an I/O request waits
before it is processed by the I/O controller and disk I/O controller processes I/Os
waiting in the queue one by one.
The I/O requests arrive at the controller at the rate generated by the application.
The I/O arrival rate, the queue length, and the time taken by the I/O controller to
process each request determines the I/O response time. If the controller is busy or
heavily utilized, the queue size will be large and the response time will be high.
As the utilization reaches 100 percent, that is, as the I/O controller saturates, the
response time moves closer to infinity. In essence, the saturated component or the
bottleneck forces the serialization of I/O requests; meaning, each I/O request must
wait for the completion of the I/O requests that preceded it.
Flash Memory
I/O
Interfaces
Drive Controller
Non-Volatile Memory
Mass Storage
Controller
The I/O interface enables connecting the power and data connectors to the solid
state drives.
Notes
Solid state drives are especially well suited for low-latency applications that require
consistent, low (less than 1 millisecond) read/write response times.
The controller includes a drive controller, RAM, and non-volatile memory (NVRAM).
The drive controller manages all drive functions.
The non-volatile RAM (NVRAM) is used to store the SSD’s operational software
and data. Not all SSDs have separate NVRAM. Some models store their programs
and data to the drive’s mass storage.
The RAM is used in the management of data being read and written from the SSD
as a cache, and for the SSD’s operational programs and data. SSDs include many
features such as encryption and write coalescing.
The mass storage is an array of non-volatile memory chips. They retain their
contents when powered off. These chips are commonly called Flash memory. The
number and capacity of the individual chips vary directly in relationship to the
SSD’s capacity. The larger the capacity of the SSD, the larger is the capacity and
the greater is the number of the Flash memory chips.
SSDs consume less power compared to hard disk drives. Because SSDs do not
have moving parts, they generate less heat compared to HDDs. Therefore, it
further reduces the need for cooling in storage enclosure, which further reduces the
overall system power consumption.
SSDs have multiple parallel I/O channels from its drive controller to the flash
memory storage chips. Generally, the larger the number of flash memory chips in
the drive, the larger is the number of channels.
SSD Addressing
Solid state memory chips have different capacities, for example a solid state
memory chip can be 32 GB or 4 GB per chip. However, all memory chips share the
same logical organization, that is pages and blocks.
LBA 0 x 2000
Notes
At the lowest level, a solid state drive stores bits. Eight bits make up a byte, and
while on the typical mechanical hard drive 512 bytes would make up a sector, solid
state drives do not have sectors. Solid state drives have a similar physical data
object called a page.Solid state memory chips have different capacities, for
example a solid state memory chip can be 32 GB or 4 GB per chip. However, all
memory chips share the same logical organization, that is pages and blocks.
Like a mechanical hard drive sector, the page is the smallest object that can be
read or written on a solid state drive. Unlike mechanical hard drives, pages do not
have a standard capacity. A page’s capacity depends on the architecture of the
solid state memory chip. Typical page capacities are 4 KB, 8 KB, and 16 KB.
A solid state drive block is made up of pages. A block may have 32, 64, or 128
pages. 32 is a common block size. The total capacity of a block depends on the
solid state chip’s page size. Only entire blocks may be written or erased on a solid
state memory chip.
A page has three possible states, erased (empty), valid, and invalid.
Write
Start Erased Valid
Erase
(Electrical) (Re)Write
or
Delete
Invalid
Notes
In order to write any data to a page, its owning block location on the flash memory
chip must be electrically erased. This function is performed by the SSD’s hardware.
Once a page has been erased, new data can be written to it.
Once a page is marked invalid, its data can no longer be read. An invalid page
needs to be erased before it can once again be written with new data. Garbage
collection handles this process. Garage collection is the process of providing new
erased blocks.
SDD Performance
Access type
SSD performs random reads the best
SSDs use all internal I/O channels in parallel for multithreaded large block
I/Os
Drive state
New SSD or SSD with substantial unused capacity offers best performance
Workload duration
Notes
Solid state drives are semiconductor, random-access devices; these result in very
low response times compared to hard disk drives. This, combined with the multiple
parallel I/O channels on the back end, gives SSDs performance characteristics that
are better than hard drives.SSD performance is dependent on access type, drive
state, and workload duration. SSD performs random reads the best.
A new SSD or an SSD with substantial unused capacity has the best performance.
Drives with substantial amounts of their capacity consumed will take longer to
complete the read-modify-write cycle. SSDs are best for workloads with short
bursts of activity.
NAND
Flash
Memory
HDD
Notes
In SSHDs the data elements that are associated with performance, such as most
frequently accessed data items, are stored in the NAND flash memory. This
method provides a significant performance improvement over traditional hard
drives.
Definition: NVMe
NVMe (Non-Volatile Memory Express) is a new device interface for
Non-Volatile Memory (NVM) storage technologies using PCIe
connectivity.
Notes
NVM stands for non-volatile memory such as NAND flash memory. NVMe has
been designed to capitalize on the low latency and internal parallelism of solid-state
storage devices.
The previous interface protocols like SCSI were developed for use with far slower
hard disk drives where a very lengthy delay exists between a request and data
transfer, where data speeds are much slower than RAM speeds, and where disk
rotation and seek time give rise to further optimization requirements.
NVMe is a command set and associated storage interface standards that specify
efficient access to storage devices and systems based on Non-Volatile Memory
(NVM) media. NVMe is broadly applicable to NVM storage technology, including
current NAND-based flash and higher-performance, Storage Class Memory (SCM).
Features:
Non-volatile
Short access time like DRAM
Low cost per bit like disk
Solid-state, no moving parts
Notes
Despite the emergence of flash storage and more recently, the NVMe stack,
external storage systems are still orders of magnitude slower than server memory
technologies (RAM). They can also be a barrier to achieving the highest end-to-end
system performance.
The memory industry has been aiming towards something that has the speed of
DRAM but the capacity, cost, and persistence of NAND flash memory. The shift
from SATA to faster interfaces such as SAS and PCI-Express using the NVMe
protocol has made SSDs much faster, but nowhere near the speed of DRAM.
Now, a new frontier in storage media bridges the latency gap between server
storage and external storage: storage-class memory (SCM). This new class of
memory technology has performance characteristics that fall between DRAM and
flash characteristics. Figure highlights where SCM fits into the storage media
hierarchy.
SCM is slower than DRAM but read and write speeds are over 10 times faster than
flash and can support higher IOPS while offering comparable throughput.
Furthermore, data access in flash is at the block and page levels, but SCM can be
addressed at the bit or word level. This granularity eliminates the need to erase an
entire block to program it, and it also simplifies random access.
Other persistent memory technologies are also in development, some with the
potential for broad adoption in enterprise and embedded applications, such as
nanotube RAM (NRAM) and resistive RAM (ReRAM).
Introduction
This lesson covers RAID and its use to improve performance and protection. It
covers various RAID implementations, techniques, and levels commonly used.
This lesson also covers the erasure coding technique and its advantages.
RAID Techniques
RAID Overview
Software RAID
Hardware RAID
Notes
RAID is a technique in which multiple disk drives are combined into a logical unit
called a RAID set and data is written in blocks across the disks in the RAID set.
RAID protects against data loss when a drive fails, by using redundant drives and
parity. RAID also helps in improving the storage system performance as read and
write operations are served simultaneously from multiple disk drives.
A RAID array is an enclosure that contains various disk drives and supporting
hardware to implement RAID.
A subset of disks within a RAID array can be grouped to form logical associations
called logical arrays, also known as a RAID set or a RAID group.
VM VM
Logical Array
(RAID Sets)
RAID Array
RAID Techniques
Three different RAID techniques form the basis for defining various RAID levels;
they are:
A A A
D1 D2 D3 P
A1 A2 A3 A4 A A A1 A2 A3 Ap
Notes
Striping
Striping is a technique of spreading data across multiple drives (more than one) in
order to use the drives in parallel. All the read/write heads work simultaneously,
allowing more data to be processed in a shorter time and increasing performance,
compared to reading and writing from a single disk. Within each disk in a RAID set,
a predefined number of contiguously addressable disk blocks are defined as strip.
The set of aligned strips that spans across all the disks within the RAID set is called
a stripe. The illustration shows representations of a striped RAID set. Strip size
(also called stripe depth) describes the number of blocks in a strip (represented as
“A1, A2, A3, and A4”). The maximum amount of data that can be written to or read
from a single disk in the set, assuming that the accessed data starts at the
beginning of the strip.
All strips in a stripe have the same number of blocks. Having a smaller strip size
means that the data is broken into smaller pieces while it is spread across the
disks. Stripe size (represented as A) is a multiple of strip size by the number of
data disks in the RAID set.
For example: in a four-disk striped RAID set with a strip size of 64KB, the stripe
size is 256 KB (64KB x 4). In other words, A = A1 +A2 + A3 + A4. Stripe width
refers to the number of data strips in a stripe. Striped RAID does not provide any
data protection unless parity or mirroring is used.
Mirroring
Mirroring is a technique whereby the same data is stored on two different disk
drives, yielding two copies of the data. If one disk drive failure occurs, the data
remains intact on the surviving disk drive and the controller continues to service the
compute system’s data requests from the surviving disk of a mirrored pair.
When the failed disk is replaced with a new disk, the controller copies the data from
the surviving disk of the mirrored pair. This activity is transparent to the compute
system. In addition to providing complete data redundancy, mirroring enables fast
recovery from disk failure. However, disk mirroring provides only data protection
and is not a substitute for data backup.
However, write performance is slightly lower than that in a single disk because
each write request manifests as two writes on the disk drives. Mirroring does not
deliver the same levels of write performance as a striped RAID.
Parity
Parity is a method to protect striped data from disk drive failure without the cost of
mirroring. An additional disk drive is added to hold parity, a mathematical construct
that allows re-creation of the missing data. Parity is a redundancy technique that
ensures protection of data without maintaining a full set of duplicate data.
For parity RAID, the stripe size calculation does not include the parity strip.
For example: in a four (3 + 1) disk parity RAID set with a strip size of 64 KB, the
stripe size will be 192 KB (64KB x 3).
RAID Levels
These RAID levels are defined based on striping, mirroring, and parity techniques.
Some RAID levels use a single technique, whereas others use a combination of
techniques.
The commonly used RAID levels are RAID 0, RAID 1, 3, 5, 6 and 1+0.
Video: RAID
RAID 0
RAID 0 configuration uses data striping techniques, where data is striped across all
the disks within a RAID set.
RAID Controller
A1 A2 A3 A4 A5
B1 B2 B3 B4 B5
C1 C2 C3 C4 C5
Data Disks
Notes
To read data, all the strips are gathered by the controller. When the number of
drives in the RAID set increases, the performance improves because more data
can be read or written simultaneously.
RAID 0 is a good option for applications that need high I/O throughput. However, if
these applications require high availability during drive failures, RAID 0 does not
provide data protection and availability.
RAID 1
A RAID 1 set consists of two disk drives and every write is written to both disks.
Data from
compute
system Notes
C
B In RAID 1, the mirroring is transparent
A to the compute system. During disk
failure, the impact on data recovery in
RAID 1 is the least among all RAID
implementations. This is because the
RAID Controller RAID controller uses the mirror drive
for data recovery.
A A
B B
C C
Mirror Set
Most data centers require data redundancy and performance from their RAID
arrays.
RAID 1+0 combines the performance benefits of RAID 0 with the redundancy
benefits of RAID 1.
C
B Data from compute
system
A
RAID
Striping
Controller
A1 A1 A2 A2 A3 A3
B1 B1 B2 B2 B3 B3
C1 C1 C2 C2 C3 C3
Notes
RAID 1+0 uses mirroring and striping techniques and combines their benefits. This
RAID type requires an even number of disks, the minimum being four.
RAID 1+0 is also known as RAID 10 (Ten) or RAID 1/0. RAID 1+0 is also called
striped mirror. The basic element of RAID 1+0 is a mirrored pair. This means that
data is first mirrored and then both copies of the data are striped across multiple
disk drive pairs in a RAID set.
When replacing a failed drive, only the mirror is rebuilt. In other words, the storage
system controller uses the surviving drive in the mirrored pair for data recovery and
continuous operation. Data from the surviving disk is copied to the replacement
disk.
RAID 3
C
B Data from compute
system
A
RAID Controller
A1 A2 A3 A4 Ap
B1 B2 B3 B4 Bp
C1 C2 C3 C4 Cp
Notes
In RAID 3, parity information is stored on a dedicated drive so that the data can be
reconstructed if a drive fails in a RAID set. For example, in a set of five disks, four
are used for data and one for parity.
Therefore, the total disk space that is required is 1.25 times the size of the data
disks. RAID 3 always reads and writes complete stripes of data across all disks
because the drives operate in parallel. There are no partial writes that update one
out of many strips in a stripe.
RAID 5
RAID Controller
A1 A2 A3 A4 Ap
B1 B2 B3 Bp B4
C1 C2 Cp C3 C4
Distributed Parity
Notes
The difference between RAID 4 and RAID 5 is the parity location. In RAID 4, parity
is written to a dedicated drive, creating a write bottleneck for the parity disk.
In RAID 5, parity is distributed across all disks to overcome the write bottleneck of a
dedicated parity disk.
RAID 6
RAID 6 works the same way as RAID 5, except that RAID 6 includes a second
parity element to enable survival if two disk failures occur in a RAID set. Therefore,
a RAID 6 implementation requires at least four disks.
RAID Controller
A1 A2 A3 Ap Aq
B1 B2 Bp Bq B3
C1 Cp Cq C2 C3
Notes
RAID 6 distributes the parity across all the disks. The write penalty (explained later
in this module) in RAID 6 is more than that in RAID 5; therefore, RAID 5 writes
perform better than RAID 6.
The rebuild operation in RAID 6 may take longer than that in RAID 5 due to the
presence of two parity sets.
The figure illustrates a single write operation on RAID 5 that contains a group of
five disks.
RAID Controller 2
3
4
1
A1 A2 A3 A4 Ap
B1 B2 B3 Bp B4
C1 C2 Cp C3 C4
Notes
The figure illustrates a single write operation on RAID 5 that contains a group of
five disks. The parity (P) at the controller is calculated as follows:
Cp = C1 + C2 + C3 + C4 (XOR operations)
Whenever the controller performs a write I/O, parity must be computed by reading
the old parity (Cp old) and the old data (C4 old) from the disk, which means two
read I/Os. Then, the new parity (Cp new) is computed as follows:
After computing the new parity, the controller completes the write I/O by writing the
new data and the new parity onto the disks, amounting to two write I/Os. Therefore,
the controller performs two disk reads and two disk writes for every write operation,
and the write penalty is 4.
In RAID 6, which maintains dual parity, a disk write requires three read operations:
two parity and one data. After calculating both the new parities, the controller
performs three write operations: two parity and an I/O. Therefore, in a RAID 6
implementation, the controller performs six I/O operations for each write I/O, and
the write penalty is 6.
RAID Comparison
1 2 50 2 Mirror
1+0 4 50 2 Mirror
A hot sparing refers to a process that temporarily replaces a failed disk drive with a
spare drive in a RAID array by taking the identity of the failed disk drive.
With the hot spare, one of the following methods of data recovery is performed
depending on the RAID implementation:
If parity RAID is used, the data is rebuilt onto the hot spare from the parity and
the data on the surviving disk drives in the RAID set.
If mirroring is used, the data from the surviving mirror is used to copy the data
onto the hot spare.
Failed Disk
RAID
Controller Replaced
Failed Disk
Hot Spare
Notes
When a new disk drive is added to the system, data from the hot spare is copied to
it. The hot spare returns to its idle state, ready to replace the next failed drive.
Alternatively, the hot spare replaces the failed disk drive permanently. This means
that it is no longer a hot spare, and a new hot spare must be configured on the
storage system.
A hot spare should be large enough to accommodate data from a failed drive.
Some systems implement multiple hot spares to improve data availability.A hot
spare can be configured as automatic or user initiated, which specifies how it will
be used in the event of disk failure.
In an automatic configuration, when the recoverable error rates for a disk exceed a
predetermined threshold, the disk subsystem tries to copy data from the failing disk
to the hot spare automatically. If this task is completed before the damaged disk
fails, the subsystem switches to the hot spare and marks the failing disk as
unusable.
Otherwise, it uses parity or the mirrored disk to recover the data. In the case of a
user-initiated configuration, the administrator has control of the rebuild process. For
example, the rebuild could occur overnight to prevent any degradation of system
performance. However, the system is at risk of data loss if another disk failure
occurs.
Introduction
This lesson covers different types of data access methods. It also covers types of
intelligent storage systems. Finally, this lesson covers the scale-up and scale-out
architectures.
Scale-out
Cluster
Notes
In scale-out, nodes can be added quickly to the cluster, when more performance
and capacity is needed, without causing any downtime. This provides the flexibility
to use many nodes of moderate performance and availability characteristics to
produce a total system that has better aggregate performance and availability.
Scale-out architecture pools the resources in the cluster and distributes the
workload across all the nodes. This results in linear performance improvements as
more nodes are added to the cluster.
Assessment
B. Distributed parity
C. No parity
D. Double parity
2. What is the stripe size of a five disk parity RAID 5 set that has a strip size of 64
KB?
A. 256 KB
B. 64 KB
C. 128 KB
D. 320 KB
Summary
Introduction
Introduction
Components of a Controller
Controller
Compute
Front End Back End Storage
VM VM
Cache
Connectivity
Hypervisor
Storage Network
The front end provides the interface between the storage system and the
compute system. It consists of two components:
Front-end ports
Front-end controllers
Controller
Compute
Front End Back End Storage
VM VM
Cache
Connectivity
Hypervisor
Storage Network
Notes
Typically, a front end has redundant controllers for high availability. Plus, each
controller contains multiple ports that enable large numbers of compute systems to
connect to the intelligent storage system.
Each front-end controller has processing logic that executes the appropriate
transport protocol, such as Fibre Channel, iSCSI, FICON, or FCoE for storage
connections. Front-end controllers route data to and from cache through the
internal data bus. When the cache receives the write data, the controller sends an
acknowledgment message back to the compute system.
Component: Cache
Controller
Compute
Front End Back End Storage
VM VM
Cache
Connectivity
Hypervisor
Storage
Network
Notes
Rotating disks are the slowest component of an intelligent storage system. Data
access on rotating disks usually takes several milliseconds because of seek time
and rotational latency. Accessing data from cache is fast and typically takes less
than a millisecond. On intelligent storage systems, write data is first placed in
cache and then written to the storage.
When a compute system issues a read request, the storage controller reads the tag
RAM to determine whether the required data is available in cache.
If the requested data is found in the cache, it is called a read cache hit or read
hit
If the requested data is not found in cache, it is called a cache miss
VM VM
Data found in cache
1. Read request
Hypervisor
Storage
VM VM
Data not found in cache
1. Read request 2. Read request
Hypervisor
Storage
Notes
When a compute system issues a read request, the storage controller reads the tag
RAM to determine whether the required data is available in cache. If the requested
data is found in the cache, it is called a read cache hit or read hit and data is sent
directly to the compute system, without any back-end storage operation. This
provides a fast response time to the compute system (about a millisecond).
If the requested data is not found in cache, it is called a cache miss and the data
must be read from the storage. The back end accesses the appropriate storage
device and retrieves the requested data. Data is then placed in cache and finally
sent to the compute system through the front end. Cache misses increase the I/O
response time.
Read performance is measured in terms of the read hit ratio, or the hit rate,
expressed as a percentage. This ratio is the number of read hits with respect to the
total number of read requests. A higher read hit ratio improves the read
performance.
Write-through Cache
VM VM
Cache
1. Data Write 2. Data Write
Hypervisor
4. Acknowledgement 3. Acknowledgement
Compute
Storage
Write-back Cache
VM VM
Cache
1. Data Write 3. Data Write
Hypervisor
2. Acknowledgement 4. Acknowledgement
Compute
Storage
Notes
Write-through cache
Data is placed in the cache and immediately written to the storage, and an
acknowledgment is sent to the compute system. Because data is committed to
storage as it arrives, the risks of data loss are low, but the write-response time is
longer because of the storage operations.
Write-back cache
For bypass, if the size of an I/O request exceeds the predefined size, called write
aside size, writes are sent directly to storage. This reduces the impact of large
writes consuming a large cache space. This is particularly useful in an environment
where cache resources are constrained and cache is required for small random
I/Os.
With dedicated cache, separate sets of memory locations are reserved for reads
and writes. In global cache, both reads and writes can use any of the available
memory addresses. Cache management is more efficient in a global cache
implementation because only one global set of addresses has to be managed.
Global cache enables users to specify the percentages of cache available for reads
and writes for cache management. Typically, the read cache is small, but it should
be increased if the application being used is read-intensive. In other global cache
implementations, the ratio of cache available for reads versus writes is dynamically
adjusted based on the workloads.
Pre-fetch
Used when read requests are sequential
Contiguous set of associated blocks is retrieved
Significantly improves the response time experienced by the compute system
New Data
Notes
Even though modern intelligent storage systems come with a large amount of
cache, when all cache pages are filled, some pages have to be freed up to
accommodate new data and avoid performance degradation.
Least Recently Used (LRU): An algorithm that continuously monitors data access
in cache and identifies the cache pages that have not been accessed for a long
time. LRU either frees up these pages or marks them for reuse. This algorithm is
based on the assumption that data that has not been accessed for a while will not
be requested by the compute system. However, if a page contains write data that
has not yet been committed to storage, the data is first written to the storage before
the page is reused.
Cache is volatile memory; so a power failure or any kind of cache failure will cause
loss of the data that is not yet committed to the storage drive. This risk of losing
uncommitted data that is held in cache can be mitigated using cache mirroring and
cache vaulting:
Cache mirroring
Each write to cache is held in two different memory locations on two independent
memory cards. If a cache failure occurs, the write data will still be safe in the
mirrored location and can be committed to the storage drive. Reads are staged
from the storage drive to the cache; therefore, if a cache failure occurs, the data
can still be accessed from the storage drives. Because only writes are mirrored,
this method results in better utilization of the available cache.
Cache vaulting
The risk of data loss due to power failure can be addressed in various ways:
powering the memory with a battery until the AC power is restored or using battery
power to write the cache content to the storage drives. If an extended power failure
occurs, using batteries is not a viable option. This is because in intelligent storage
systems, large amounts of data might need to be committed to numerous storage
drives, and batteries might not provide power for sufficient time to write each piece
of data to its intended storage drive.
Therefore, storage vendors use a set of physical storage drives to dump the
contents of cache during power failure. This is called cache vaulting and the
storage drives are called vault drives. When power is restored, data from these
storage drives is written back to write cache and then written to the intended drives.
Back end provides an interface between cache and the physical storage drives;
it consists of two components:
Back-end ports
Back-end controllers
Back-end controls data transfers between cache and the physical drives
From cache, data is sent to the back end and then routed to the destination
storage drives
Controller
Compute
Front End Back End Storage
VM VM
Cache
Connectivity
Hypervisor
Storage Network
Notes
Physical drives are connected to ports on the back end. The back-end controller
communicates with the storage drives when performing reads and writes and also
provides additional, but limited, temporary data storage. The algorithms that are
implemented on back-end controllers provide error detection and correction, along
with RAID functionality.
For high data protection and high availability, storage systems are configured with
dual controllers with multiple ports. Such configurations provide an alternative path
to physical storage drives if a controller or port failure occurs. This reliability is
further enhanced if the storage drives are also dual-ported. In that case, each drive
port can connect to a separate controller. Multiple controllers also facilitate load
balancing.
Storage
Physical storage drives are connected to the back-end storage controller and
provide persistent data storage.
Controller
Compute
Front End Back End Storage
VM VM
Cache
Connectivity
Hypervisor
Storage
Network
Notes
Workloads that have predictable access patterns typically work well with a
combination of HDDs and SSDs. If the workload changes, or constant high
performance is required for all the storage being presented, using a SSD can meet
the desirable performance requirements.
Introduction
This lesson covers traditional and virtual provisioning processes. This lesson also
covers LUN expansion and LUN masking mechanisms.
Storage Provisioning
Definition: LUN
Each logical unit created from the RAID set is assigned a unique ID,
called a LUN. A LUN is also referred to as a volume, partition, or
device.
LUNs hide the organization and composition of the RAID set from the compute
systems
LUNs created by traditional storage provisioning methods are also referred to
as thick
Once allocated, a LUN appears to a host as an internal physical disk
Notes
RAID sets usually have a large capacity because they combine the total capacity of
individual drives in the set. Logical units are created from the RAID sets by
partitioning (seen as slices of the RAID set) the available capacity into smaller
units. These units are then assigned to the compute system based on their storage
requirements. Logical units are spread across all the physical drives that belong to
that set.
Each logical unit created from the RAID set is assigned a unique ID, called a logical
unit number (LUN). LUNs hide the organization and composition of the RAID set
from the compute systems. LUNs created by traditional storage provisioning
methods are also referred to as thick LUNs to distinguish them from the LUNs
created by virtual provisioning methods.
Virtual storage drives are files on the hypervisor file system. The virtual storage
drives are then assigned to virtual machines and appear as raw storage drive to
them. To make the virtual storage drive usable to the virtual machine, similar steps
are followed as in a non-virtualized environment. Here, the LUN space may be
shared and accessed simultaneously by multiple virtual machines.
Virtual machines can also access a LUN directly on the storage system. In this
method the entire LUN is allocated to a single virtual machine. Storing data in this
way is recommended when the applications running on the virtual machine are
response-time sensitive, and sharing storage with other virtual machines may
impact their response time. The direct access method is also used when a virtual
machine is clustered with a physical machine. In this case, the virtual machine is
required to access the LUN that is being accessed by the physical machine.
Traditional Provisioning
The illustration shows a RAID set consisting of five storage drives that have been
sliced or partitioned into two LUNs: LUN 0 and LUN 1.These LUNs are then
assigned to Compute 1 and Compute 2 for their storage requirements.
Controller
LUN 0
Storage
Front End Back End
(RAID Set)
Compute 1
Cache
LUN 0
Storage
Network
VM VM
LUN 1
Compute 2
LUN 1
Notes
For traditional provisioning, the number of drives in the RAID set and the RAID
level determine the availability, capacity, and performance of the RAID set. It is
highly recommended to create the RAID set from drives of the same type, speed,
and capacity to ensure maximum usable capacity, reliability, and consistency in
performance.
For example, if drives of different capacities are mixed in a RAID set, the capacity
of the smallest drive is used from each drive in the set to make up the RAID set’s
overall capacity. The remaining capacity of the larger drives remains unused.
Likewise, mixing higher speed drives with lower speed drives lowers the overall
performance of the RAID set.
Virtual Provisioning
Virtual provisioning enables creating and presenting a LUN with more capacity
than is physically allocated to it on the storage system
The LUN created using virtual provisioning is called a thin LUN to distinguish it
from the traditional LUN
Thin LUNs do not require physical storage to be completely allocated to them at
the time they are created and presented to a compute system
10 TB
Compute System
Thin
LUN 0
Reported Capacity Controller
3 TB
Allocated
Storage
Front End Back End (Storage Pool)
Compute 1
Cache Thin
LUN 0
Storage
System
VM VM
Thin
LUN 1
Hypervisor
10 TB
Compute 2
Thin
Compute System Reported
LUN 1
Capacity
4 TB
Allocated
Notes
storage requirements of the compute systems grow. Multiple shared pools can be
created within a storage system, and a shared pool may be shared by multiple thin
LUNs.
A storage pool comprises physical drives that provide the physical storage that
is used by Thin LUNs
A storage pool is created by specifying a set of drives and a RAID type for
that pool
Thin LUNs are then created out of that pool (similar to traditional LUN created
on a RAID set)
All the Thin LUNs created from a pool share the storage resources of that
pool
Adding drives to a storage pool increases the available shared capacity for
all the Thin LUNs in the pool
Drives can be added to a storage pool while the pool is used in production
The allocated capacity is reclaimed by the pool when Thin LUNs are
destroyed
User
capacity
after
In-use
expansion
capacity
Thin LUN
User
capacity
Adding storage drives to In-use before
Thin pool rebalancing capacity expansion
the storage pool
Thin LUN
Storage Pool Expansion
Thin LUN Expansion
When a storage pool is expanded, the sudden introduction of new empty drives
combined with relative full drives cause a data imbalance. This imbalance is
resolved by automating a one-time data relocation, referred to as rebalancing.
Storage pool rebalancing is a technique that provides the ability to automatically
relocate extents (minimum amount of physical storage capacity that is allocated to
the thin LUN from the pool) on physical storage drives over the entire pool when
new drives are added to the pool.
Storage pool rebalancing restripes data across all the drives( both existing and new
drives) in the storage pool. This enables spreading out the data equally on all the
physical drives within the storage pool, ensuring that the used capacity of each
drive is uniform across the pool. After the storage pool capacity is increased, the
capacity of the existing LUNs can be expanded.
150 GB Available
Capacity
1650 GB or 1.65 TB
800 GB Available Capacity
Notes
With traditional provisioning, three LUNs are created and presented to one or more
compute systems. The total storage capacity of the storage system is 2 TB. The
allocated capacity of LUN 1 is 500 GB, of which only 100 GB is consumed, and the
remaining 400 GB is unused. The size of LUN 2 is 550 GB, of which 50 GB is
consumed, and 500 GB is unused. The size of LUN 3 is 800 GB, of which 200 GB
is consumed, and 600 GB is unused.
In total, the storage system has 350 GB of data, 1.5 TB of allocated but unused
capacity, and only 150 GB of remaining capacity available for other applications.
Now consider the same 2 TB storage system with virtual provisioning. Here, three
thin LUNs of the same sizes are created. However, there is no allocated unused
capacity. In total, the storage system with virtual provisioning has the same 350 GB
of data, but 1.65 TB of capacity is available for other applications, whereas only
150 GB is available in traditional storage provisioning.
Virtual provisioning and thin LUN offer many benefits, although in some cases
traditional LUN is better suited for an application. Thin LUNs are appropriate for
applications that can tolerate performance variations. In some cases, performance
improvement is perceived when using a thin LUN, due to striping across a large
number of drives in the pool. However, when multiple thin LUNs contend for shared
storage resources in a given pool, and when utilization reaches higher levels, the
performance can degrade. Thin LUNs provide the best storage space efficiency
and are suitable for applications where space consumption is difficult to forecast.
Using thin LUNs benefits organizations in reducing power and acquisition costs and
in simplifying their storage management.
Traditional LUNs are suited for applications that require predictable performance.
Traditional LUNs provide full control for precise data placement and allow an
administrator to create LUNs on different RAID groups if there is any workload
contention. Organizations that are not highly concerned about storage space
efficiency may still use traditional LUNs. Both traditional and thin LUNs can coexist
in the same storage system. Based on the requirement, an administrator may
migrate data between thin and traditional LUNs.
LUN Masking
Notes
The LUN masking function is implemented on the storage system. This ensures
that volume access by a compute system is controlled appropriately, preventing
unauthorized, or accidental use in a shared environment.
For example, consider a storage system with two LUNs that store data of the sales
and finance departments. Without LUN masking, both departments can easily see
and modify each other’s data, posing a high risk to data integrity and security. With
LUN masking, LUNs are accessible only to the designated compute systems.
Introduction
Storage Tiering
Notes
For example, if a policy states “move the data that are not accessed for the last 30
mins to the lower tier,” then all the data matching this condition are moved to the
lower tier.
The process of moving the data from one type of tier to another is typically
automated. In automated storage tiering, the application workload is proactively
monitored; the active data is automatically moved to a higher performance tier and
the inactive data is moved to higher capacity, lower performance tier. The data
movement between the tiers is performed non-disruptively.
The process of
storage tiering Tier 0
Move entire LUN with
within a storage inactive data from tier 0 to
LUN LUN
tier 1
system is called
intra-array storage Move entire LUN
with active data Tier 1
from tier 1 to tier 0
tiering. It enables for improved
performance LUN LUN
the efficient use of
SSD, FC, and LUN Tiering
SATA drives within
a system and
provides Tier 0
The goal is to
keep the SSDs Tier 1
busy by storing the
Sub-LUN Tiering
most frequently
accessed data on
Inactive Data
them, while
Active Data
moving out the
less frequently
accessed data to
the SATA drives.
Data movements that are executed between tiers can be performed at the LUN
level or at the sub-LUN level. The performance can be further improved by
implementing tiered cache.
Notes
Traditionally, storage tiering is operated at the LUN level that moves an entire LUN
from one tier of storage to another. This movement includes both active and
inactive data in that LUN. This method does not give effective cost and
performance benefits.
Today, storage tiering can be implemented at the sub-LUN level. In sub-LUN level
tiering, a LUN is broken down into smaller segments and tiered at that level.
Movement of data with much finer granularity, for example 8 MB, greatly enhances
the value proposition of automated storage tiering. Tiering at the sub-LUN level
effectively moves active data to faster drives and less active data to slower drives.
Cache Tiering
Tier 1
SSD
Tiered Cache
Storage System
Notes
Tiering is also implemented at the cache level. A large cache in a storage system
improves performance by retaining large amount of frequently accessed data in a
cache; so most reads are served directly from the cache. However, configuring a
large cache in the storage system involves more cost.
An alternative way to increase the size of the cache is by utilizing the SSDs on the
storage system. In cache tiering, SSDs are used to create a large capacity
secondary cache and to enable tiering between DRAM (primary cache) and SSDs
(secondary cache).
Storage as a Service
VM instances running business
applications
To gain cost advantage,
organizations may move their VM VM VM
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts in Practice
Concepts in Practice
XtremIO storage systems are created from building blocks called "X-Bricks" that
are each a high-availability, high-performance, fully active/active storage system
with no single point of failure. XtremIO's powerful operating system, XIOS,
manages the XtremIO storage cluster. XIOS ensures that the system remains
balanced and always delivers the highest levels of performance with no
administrator intervention.
PowerMax will deliver up to 25% better response times with NVMe Flash drives.
The combination of NVMe and SCM will unlock even greater performance reaching
up to 50% better response times.
The easiest way to describe CloudIQ is that it is like a fitness tracker for your
storage environment, providing a single, simple, display to monitor and predict the
health of your storage environment. CloudIQ makes it simple to track storage
health, report on historical trends, plan for future growth, and proactively discover
and remediate issues from any browser or mobile device.
Storage Center does not require users to pre-allocate space. Storage is pooled,
ensuring space is available when and where it is needed. You can even reclaim
capacity that is no longer in use by applications, automatically reduce the space
needed for virtual OS volumes and thin import volumes on legacy storage to
improve capacity utilization.
Assessment
C. LUN tiering
D. Sub-LUN tiering
2. Which is a process that provides data access control by defining which LUNs a
compute system can access?
A. LUN masking
B. Tiering
C. Virtual provisioning
D. Thin LUN
Summary
Introduction
This module presents an overview of Fibre Channel Storage Area Network (FC
SAN), its components and architecture. It also focuses on FC SAN topologies, and
zoning along with describing virtualization process in FC SAN environment.
Introduction
This lesson presents definition of SAN and its benefits and requirements.
Introduction to SAN
Definition: SAN
A network whose primary purpose is the transfer of data between
computer systems and storage devices and among storage devices.
Source: Storage Networking Industry Association
Storage Area Network (SAN) is a network that primarily connects the storage
systems with the compute systems and also connects the storage systems with
each other. It enables multiple compute systems to access and share storage
resources. It also enables to transfer data between the storage systems. With long-
distance SAN, the data transfer over SAN can be extended across geographic
locations. A SAN usually provides access to block-based storage systems.
Client Client
VM V VM V
M M
AP AP AP AP
NAS O
P P
O
P P
NAS
O O
S S S S
Hypervisor Hypervisor
Benefits of SAN
Notes
Introduction
FC SAN Overview
FC SAN Overview
Hypervisor Hypervisor
Hypervisor
Compute Systems
FC SAN
Storage Systems
Notes
Fibre Channel SAN (FC SAN) uses Fibre Channel (FC) protocol for
communication. FC protocol (FCP) is used to transport data, commands, and
status information between the compute systems and the storage systems. It is
also used to transfer data between the storage systems. FC is a high-speed
network technology that runs on high-speed optical fiber cables and serial copper
cables. The FC technology was developed to meet the demand for the increased
speed of data transfer between compute systems and mass storage systems. In
comparison with Ultra-Small Computer System Interface (Ultra-SCSI) that is
commonly used in the DAS environments, FC is a significant leap in storage
networking technology. Note: FibRE refers to the protocol, whereas fibER refers to
a media.
Components of FC SAN
Front-end adapters in
Cladding
storage system Core
Cables
Copper cables for short
Light In
distance
Optical fiber cables for
long distance
Two types: Single-mode Fibre
o Multimode
o Single-mode
Interconnecting devices
Notes
Network Adapters
In an FC SAN, the end devices, such as compute systems and storage systems
are all referred to as nodes. Each node is a source or destination of information.
Each node requires one or more network adapters to provide a physical interface
for communicating with other nodes. Examples of network adapters are FC host
bus adapters (HBAs) and storage system front-end adapters. An FC HBA has
SCSI-to-FC processing capability. It encapsulates operating system or hypervisor
storage I/Os (usually SCSI I/O) into FC frames before sending the frames to the FC
storage systems over an FC SAN.
Cables
FC SAN implementations primarily use optical fiber cabling. Copper cables may be
used for shorter distances because it provides acceptable signal-to-noise ratio for
distances up to 30 meters. Optical fiber cables carry data in the form of light. There
are two types of optical cables: multimode and single-mode. Multimode fiber (MMF)
cable carries multiple beams of light that is projected at different angles
simultaneously onto the core of the cable. In an MMF transmission, multiple light
beams traveling inside the cable tend to disperse and collide. This collision
weakens the signal strength after it travels a certain distance – a process that is
known as modal dispersion. Due to modal dispersion, an MMF cable is typically
used for short distances, commonly within a data center.
Single-mode fiber (SMF) carries a single ray of light that is projected at the center
of the core. The small core and the single light wave help to limit modal dispersion.
Single-mode provides minimum signal attenuation over maximum distance (up to
10 km). A single-mode cable is used for long-distance cable runs, and the distance
usually depends on the power of the laser at the transmitter and the sensitivity of
the receiver. A connector is attached at the end of a cable to enable swift
connection and disconnection of the cable to and from a port. A standard connector
(SC) and a lucent connector (LC) are two commonly used connectors for fiber optic
cables.
Interconnecting Devices
FC Interconnecting Devices
Notes
FC switches are more intelligent than FC hubs and directly route data from one
physical port to another. Therefore, the nodes do not share the data path. Instead,
each node has a dedicated communication path. The FC switches are commonly
available with a fixed port count. Some of the ports can be active for operational
purpose and the rest remain unused. The number of active ports can be scaled-up
non-disruptively. Some of the components of a switch such as power supplies and
fans are redundant and hot-swappable. Hot-swappable means components can be
replaced while a device is powered-on and remains in operation.
FC directors are high-end switches with a higher port count. A director has a
modular architecture and its port count is scaled-up by inserting extra line cards or
blades to the director’s chassis. Directors contain redundant components with
automated failover capability. Its key components such as switch controllers,
blades, power supplies, and fan modules are all hot-swappable. These ensure high
availability for business critical applications.
FC Interconnecting Options
Point-to-Point
In this configuration, two nodes are connected directly to each other. This
configuration provides a dedicated connection for data transmission between
nodes. However, the point-to-point configuration offers limited connectivity and
scalability and is used in a DAS environment.
VM VM
Hypervisor
Compute System Storage System
In this configuration, the devices are attached to a shared loop. Each device
contends with other devices to perform I/O operations. The devices on the loop
must “arbitrate” to gain control of the loop. At any given time, only one device can
perform I/O operations on the loop. Because each device in a loop must wait for its
turn to process an I/O request, the overall performance in FC-AL environments is
low.
VM VM
Compute
Systems
Hypervisor
VM VM
Compute
Systems FC Hub
Hypervisor
Compute System
Storage System
VM VM
Hypervisor
FC Switch FC Switch
Compute System
Interswitch Link
Storage System
Compute System
Port Description
N_Port An end point in the fabric. This port is also known as the node port.
Typically, it is a compute system port (FC HBA port) or a storage system
port that is connected to a switch in a switched fabric.
E_Port A port that forms the connection between two FC switches. This port is
also known as the expansion port. The E_Port on an FC switch
connects to the E_Port of another FC switch in the fabric ISLs.
VM VM
Hypervisor
N_Port
Compute System
F_Port
FC Switch
FC Switch
ISL
N_Port
N_Port
Organizations are adopting NVMe protocol to access SSDs over the PCIe bus
NVMe over FC is designed to transfer NVMe-based data over a FC network
Reduces latency and improves the performance of SSDs
FC protocol maps NVMe (upper layer protocol) to the lower layers for the data
transfer
Definition: SAN
A network whose primary purpose is the transfer of data between
computer systems and storage devices and among storage devices.
Source: Storage Networking Industry Association
Storage Area Network (SAN) is a network that primarily connects the storage
systems with the compute systems and also connects the storage systems with
each other. It enables multiple compute systems to access and share storage
resources. It also enables to transfer data between the storage systems. With long-
distance SAN, the data transfer over SAN can be extended across geographic
locations. A SAN usually provides access to block-based storage systems.
FC Architecture Lesson
Introduction
FC SAN Architecture
FC Architecture Overview
Notes
The FC architecture represents true channel and network integration and captures
some of the benefits of both channel and network technology. FC protocol provides
both the channel speed for data transfer with low protocol overhead and the
scalability of network technology. FC provides a serial data transfer interface that
operates over copper wire and optical fiber.
FC Protocol Stack
Encode/Decode
FC-1
FC-4 Mapping Mapping upper layer protocol (for example SCSI) to lower
interface FC layers
Notes
FC-4 Layer: It is the uppermost layer in the FCP stack. This layer defines the
application interfaces and the way Upper Layer Protocols (ULPs) are mapped to
the lower FC layers. The FC standard defines several protocols that can operate on
the FC-4 layer. Some of the protocols include SCSI, High Performance Parallel
Interface (HIPPI) Framing Protocol, ESCON, Asynchronous Transfer Mode (ATM),
and IP.
FC-1 Layer: It defines how data is encoded prior to transmission and decoded
upon receipt. At the transmitter node, an 8-bit character is encoded into a 10-bit
transmission character. This character is then transmitted to the receiver node. At
the receiver node, the 10-bit character is passed to the FC-1 layer, which decodes
the 10-bit character into the original 8-bit character. FC links, with a speed of 10
Gbps and above, use 64-bit to 66-bit encoding algorithm. This layer also defines
the transmission words such as FC frame delimiters, which identify the start and
the end of a frame and the primitive signals that indicate events at a transmitting
port. In addition to these, the FC-1 layer performs link initialization and error
recovery.
FC-0 Layer: It is the lowest layer in the FCP stack. This layer defines the physical
interface, media, and transmission of bits. The FC-0 specification includes cables,
connectors, and optical and electrical parameters for various data rates. The FC
transmission can use both electrical and optical media.
Notes
An FC address is dynamically assigned when a node port logs on to the fabric. The
FC address has a distinct format, as shown on the image. The first field of the FC
address contains the domain ID of the switch. A domain ID is a unique number that
is provided to each switch in the fabric. The area ID is used to identify a group of
switch ports that are used for connecting nodes.
An example of a group of ports with common area ID is a port card on the switch.
The last field, the port ID, identifies the port within the group. The FC address size
is 24 bits. The primary purpose of an FC address is routing data through the fabric.
5 0 0 6 0 1 6 0 0 0 6 0 0 1 B 2
0101 0000 0000 0110 0000 0001 0110 0000 0000 0000 0110 0000 0000 0001 1011 0010
Format Company ID Model Seed
Port
Type 24 bits 32 bits
1 0 0 0 0 0 0 0 c 9 2 0 d c 4 0
Notes
The name server in an FC SAN environment keeps the association of WWNs to the
dynamically created FC addresses for node ports. The illustration on the slide
illustrates the WWN structure examples for a storage system and an HBA.
Notes
Exchange: An exchange operation enables two node ports to identify and manage
a set of information units. Each upper layer protocol (ULP) has its protocol-specific
information that must be sent to another port to perform certain operations. This
protocol-specific information is called an information unit. The structure of these
information units is defined in the FC-4 layer. This unit maps to a sequence. An
exchange is composed of one or more sequences.
Sequence: A sequence refers to a contiguous set of frames that are sent from one
port to another. A sequence corresponds to an information unit, as defined by the
ULP.
Frame: A frame is the fundamental unit of data transfer at FC-2 layer. An FC frame
consists of five parts: start of frame (SOF), frame header, data field, cyclic
redundancy check (CRC), and end of frame (EOF). The SOF and EOF act as
delimiters. The frame header is 24 bytes long and contains addressing information
for the frame. The data field in an FC frame contains the data payload, up to 2,112
bytes of actual data – usually the SCSI data. The CRC checksum facilitates error
detection for the content of the frame. This checksum verifies data integrity by
checking whether the content of the frames is received correctly. The CRC
checksum is calculated by the sender before encoding at the FC-1 layer. Similarly,
it is calculated by the receiver after decoding at the FC-1 layer.
Notes
Process Login (PRLI): It is also performed between two N_Ports. This login
relates to the FC-4 ULPs, such as SCSI. If the ULP is SCSI, N_Ports exchange
SCSI-related service parameters.
Introduction
This lesson presents FC SAN topologies such as single-switch, mesh, and core-
edge. This lesson also focuses on the types of zoning.
Single-switch Topology
FC Director
VM VM
Hypervisor Kernel
Compute System
Compute System
Storage System
Notes
In a single-switch topology, the fabric consists of only a single switch. Both the
compute systems and the storage systems are connected to the same switch. A
key advantage of a single-switch fabric is that it does not need to use any switch
port for ISLs. Therefore, every switch port is usable for compute system or storage
system connectivity. Further, this topology helps eliminate FC frames traveling over
the ISLs and therefore eliminates the ISL delays.
Mesh Topology
VM VM
FC Switches
APP APP
OS OS
VMM VMM
Hypervisor Kernel
Compute System
Compute System
Storage System
VM VM
Compute System
Compute System
Storage System
Notes
In a full mesh, every switch is connected to every other switch in the topology.
A full mesh topology may be appropriate when the number of switches that are
involved is small. A typical deployment would involve up to four switches or
directors, with each of them servicing highly localized compute-to-storage traffic. In
a full mesh topology, a maximum of one ISL or hop is required for compute-to-
storage traffic.
However, with the increase in the number of switches, the number of switch ports
that are used for ISL also increases. This process reduces the available switch
ports for node connectivity.
In a partial mesh topology, not all the switches are connected to every other switch.
In this topology, several hops or ISLs may be required for the traffic to reach its
destination.
Partial mesh offers more scalability than full mesh topology. However, without
proper placement of compute and storage systems, traffic management in a partial
mesh fabric might be complicate. Also ISLs could become overloaded due to
excessive traffic aggregation.
Core-Edge Topology
Notes
The core tier is composed of directors that ensure high fabric availability. Also,
typically all traffic must either traverse this tier or terminate at this tier. In this
configuration, all storage systems are connected to the core tier, enabling compute-
to-storage traffic to traverse only one ISL. Compute systems that require high
performance may be connected directly to the core tier and therefore avoid ISL
delays.The core-edge topology increases connectivity within the FC SAN while
conserving the overall port utilization. It eliminates the need to connect edge
switches to other edge switches over ISLs.
Reduction of ISLs can greatly increase the number of node ports that can be
connected to the fabric. If fabric expansion is required, then administrators would
need to connect extra edge switches to the core. The core of the fabric is also
extended by adding more switches or directors at the core tier. Based on the
number of core-tier switches, this topology has different variations, such as single-
core topology and dual-core topology. To transform a single-core topology to dual-
core, new ISLs are created to connect each edge switch to the new core switch in
the fabric.
Link Aggregation
FC Switch FC Switch
{H1, S1} {H1, S1}
FC Switch FC Switch
Notes
Link aggregation combines two or more parallel ISLs into a single logical ISL,
called a port-channel, yielding higher throughput than a single ISL could provide.
Example Notes
Four HBA ports H1, H2, H3, and H4 have been configured to generate I/O
activity to four storage system ports S1, S2, S3, and S4 respectively.
The HBAs and the storage systems are connected to two separate FC switches
with three ISLs between the switches.
Let us assume that the bandwidth of each ISL is 8 Gb/s and the data
transmission rate for the port-pairs {H1,S1}, {H2,S2}, {H3,S3}, and {H4,S4} are
5 Gb/s, 1.5 Gb/s, 2 Gb/s, and 4.5 Gb/s.
Without link aggregation, the fabric typically assigns a particular ISL for each of the
port-pairs in a round-robin fashion. It is possible that port-pairs {H1,S1} and {H4,S4}
are assigned to the same ISL in their respective routes. The other two ISLs are
assigned to the port-pairs {H2,S2} and {H3,S3}. Two of the three ISLs are under-
utilized, whereas the third ISL is saturated and becomes a performance bottleneck
for the port-pairs assigned to it.
The example on the right has aggregated the three ISLs into a port-channel that
provides throughput up to 24 Gb/s. Network traffic for all the port-pairs are
distributed over the ISLs in the port-channel, which ensures even ISL utilization.
Zoning
Definition: Zoning
A logical private path between node ports in a fabric.
Zone 1
Compute System
FC SAN
VM VM Storage System
FC HBA Port
Each zone contains members (FC HBA and storage system ports)
Benefits:
Security
Restricts RSCN traffic
Notes
Zoning is a logical private path between node ports in a fabric. Whenever a change
takes place in the name server database, the fabric controller sends a Registered
State Change Notification (RSCN) to all the nodes impacted by the change. If
zoning is not configured, the fabric controller sends the RSCN to all the nodes in
the fabric. Involving the nodes that are not impacted by the change increases the
amount of fabric-management traffic.
For a large fabric, the amount of FC traffic generated due to this process can be
significant and might impact the compute-to-storage data traffic. Zoning helps to
limit the number of RSCNs in a fabric. In the presence of zoning, a fabric sends the
RSCN to only those nodes in a zone where the change has occurred.
Zoning also provides access control, along with other access control mechanisms,
such as LUN masking. Zoning provides control by enabling only the members in
the same zone to establish communication with each other.
Zone members, zones, and zone sets form the hierarchy that is defined in the
zoning process. A zone set is composed of a group of zones that can be activated
or deactivated as a single entity in a fabric. Multiple zone sets may be defined in a
fabric, but only one zone set can be active at a time.
Members are the nodes within the FC SAN that can be included in a zone. FC
switch ports, FC HBA ports, and storage system ports can be members of a zone.
A port or node can be a member of multiple zones. Nodes that are distributed
across multiple switches in a switched fabric may also be grouped into the same
zone. Zone sets are also referred to as zone configurations.
Types of Zoning
VM VM Switch Domain = 15
APP AP
P Port 5
OS OS
Zone 2
VMM VMM
Hypervisor Kernel FC
Port 1 Switch
Storage System
Compute WWN 10:00:00:00:C9:20:DC:40
System
VM VM
APP AP
Port 12
P
Port 9
OS OS
VMM VMM
Hypervisor Kernel Zone 3
WWN 10:00:00:00:C9:20:DC:56
Compute
WWN 50:06:04:82:E8:91:2B:9E
System
Zone 1
Compute
WWN 10:00:00:00:C9:20:DC:82 Zone 1 (WWN Zone) =10:00:00:00:C9:20:DC:82; 50:06:04:82:E8:91:2B:9E
System Zone 2 (Port Zone) = 15,5;15,12
Zone 3 (Mixed Zone) =10:00:00:00:C9:20:DC:56; 15,12
WWN Zoning
Uses World Wide Names to define zones. The zone members are the unique
WWN addresses of the FC HBA and its targets (storage systems). A major
advantage of WWN zoning is its flexibility. If an administrator moves a node to
another switch port in the fabric, the node maintains connectivity to its zone
partners without having to modify the zone configuration. This functionality is
possible because the WWN is static to the node port.
Port Zoning
Uses the switch port ID to define zones. In port zoning, access to node is
determined by the physical switch port to which a node is connected. The zone
members are the port identifiers (switch domain ID and port number) to which FC
HBA and its targets (storage systems) are connected. If a node is moved to
another switch port in the fabric, port zoning must be modified to enable the node,
in its new port, to participate in its original zone. However, if an FC HBA or storage
system port fails, an administrator has to replace the failed device without changing
the zoning configuration.
Mixed Zoning
Combines the qualities of both WWN zoning and port zoning. Using mixed zoning
enables a specific node port to be tied to the WWN of another node.
Introduction
This lesson presents an overview of Virtual SAN (VSAN), its configuration, VSAN
trunking, and VSAN tagging. It also focuses on concepts in practice for FC SAN
connectivity.
SAN Virtualization
VM VM VM VM
Compute
Systems
Hypervisor Hypervisor
Virtual Volume
FC SAN
Virtualization
Appliance
Storage Pool
LUN LUN
The figure on the slide shows two compute systems, each of which has one virtual
volume assigned. These virtual volumes are mapped to the LUNs in the storage
systems. When an I/O is sent to a virtual volume, it is redirected to the mapped
LUNs through the virtualization layer at the FC SAN. Depending on the capabilities
of the virtualization appliance, the architecture may allow for more complex
mapping between the LUNs and the virtual volumes.
Provides a virtualization layer in SAN
Abstracts block-based storage systems
Aggregates LUNs to create storage pool
Notes
Virtual volumes are created from the storage pool and assigned to the compute
systems. Instead of being directed to the LUNs on the individual storage systems,
the compute systems are directed to the virtual volumes provided by the
virtualization layer. The virtualization layer maps the virtual volumes to the LUNs on
the individual storage systems.
The compute systems remain unaware of the mapping operation and access the
virtual volumes as if they were accessing the physical storage attached to them.
Typically, the virtualization layer is managed via a dedicated virtualization
appliance to which the compute systems and the storage systems are connected.
After migration, the compute systems are updated to reflect the new storage
system configuration. In other instances, processor cycles at the compute system
were required to migrate data from one storage system to the other, especially in a
multivendor environment.
system still points to the same virtual volume on the virtualization layer. However,
the mapping information on the virtualization layer should be changed. These
changes can be executed dynamically and are transparent to the end user.
Definition: VSAN
VSAN 10 VSAN 20
VM VM
VM VM VM VM VM VM
OS OS
OS OS OS OS
OS OS
FC SAN
Each VSAN has its own fabric services, configuration, and set of FC addresses
VSANs improve SAN security, scalability, availability, and manageability
Notes
In a VSAN, a group of node ports communicate with each other using a virtual
topology that is defined on the physical SAN. Multiple VSANs may be created on a
single physical SAN. Each VSAN behaves and is managed as an independent
fabric. Each VSAN has its own fabric services, configuration, and set of FC
addresses. Fabric-related configurations in one VSAN do not affect the traffic in
another VSAN. A VSAN may be extended across sites, enabling communication
among a group of nodes, in either site with a common set of requirements.
Further, the same FC address can be assigned to nodes in different VSANs, thus
increasing the fabric scalability. The events causing traffic disruptions in one VSAN
are contained within that VSAN and are not propagated to other VSANs. VSANs
facilitate an easy, flexible, and less expensive way to manage networks.
VSAN Configuration
Notes
Both VSANs and zones enable node ports within a fabric to be logically
segmented into groups. But they are not same and their purposes are different.
There is a hierarchical relationship between them. An administrator first assigns
physical ports to VSANs and then configures independent zones for each
VSAN. A VSAN has its own independent fabric services, but the fabric services
are not available on a per-zone basis.
VSAN Trunking
Allows network traffic from multiple VSANs to traverse a single ISL (trunk link)
Enables an E_Port (trunk port) to send or receive multiple VSAN traffic over a
trunk link
Reduces the number of ISLs between switches that are configured with multiple
VSANs
Notes
VSAN trunking allows network traffic from multiple VSANs to traverse a single ISL.
It supports a single ISL to permit traffic from multiple VSANs along the same path.
The ISL through which multiple VSANs traffic travels is called a trunk link. VSAN
trunking enables a single E_Port to be used for sending or receiving traffic from
multiple VSANs over a trunk link. The E_Port capable of transferring multiple
VSANs traffic is called a trunk port. The sending and receiving switches must have
at least one trunk E_Port configured for all or a subset of the VSANs defined on the
switches.
VSAN trunking eliminates the need to create dedicated ISL(s) for each VSAN. It
reduces the number of ISLs when the switches are configured with multiple
VSANs. As the number of ISLs between the switches decreases, the number of
E_Ports used for the ISLs also reduces. By eliminating needless ISLs, the
utilization of the remaining ISLs increases. The complexity of managing the FC
SAN is also minimized with a reduced number of ISLs.
VSAN Tagging
Associated with VSAN trunking, it helps isolate FC frames from multiple VSANs
that travel through and share a trunk link.
VM VM VM VM
APP APP
APP APP
FC Switch OS OS
OS OS
VMM VMM
VMM VMM
Hypervisor Kernel
Hypervisor Kernel VSAN 20 Traffic VSAN 10 Traffic
Compute System
Compute System
VSAN 20 Traffic
FC Switch
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts In Practice
Concepts In Practice
Connectrix
Provides solution for block-level storage virtualization and data migration both
within and across data centers
Provides the capability to mirror data of a virtual volume both within and across
locations
VS6 engine with VPLEX for all-flash model provides the fastest and most
scalable VPLEX solution for all-flash systems
Enables organizations to move cold data to inexpensive cloud storage
Provides solution for block-level storage virtualization and data mobility both within
and across data centers. It forms a pool of distributed block storage resources and
enables creating virtual storage volumes from the pool. These virtual volumes are
then allocated to the compute systems.
VPLEX provides nondisruptive data mobility among storage systems to balance the
application workload and to enable both local and remote data access. It uses a
unique clustering architecture and advanced data caching techniques. They enable
multiple compute systems that are located across two locations to access a single
copy of data. Data migration with VPLEX can be done without any downtime,
saving countless weekends of maintenance downtime and IT resources. VPLEX
enables IT organizations to build modern data center infrastructure that is: Always
available even in the face of disasters Agile in responding to business requirements
Non-disruptive when adopting latest storage technology
The new VS6 engine with VPLEX for all-flash model provides the fastest and most
scalable VPLEX solution for all-flash systems. VPLEX also enables organizations
to move cold data to inexpensive cloud storage.
Assessment
A. FC - 0 - Layer
B. FC - 1 - Layer
C. FC - 2 - Layer
D. FC - 4 - Layer
2. Identify the topology that requires maximum of one ISL for compute to storage
communication. Select all that applies.
B. Single-switch topology
D. Core-edge topology
Summary
Introduction
This module focuses on IP SAN protocols such as Internet SCSI (iSCSI) and Fiber
Channel over IP (FCIP), components, and connectivity. It also covers details of
virtual LAN (VLAN) and reference models for communication.
Introduction
This lesson presents the Open Systems Interconnect (OSI) and the Transmission
Control Protocol/Internet Protocol (TCP/IP) reference model. It also covers details
of network protocols and connection establishment process.
Overview of TCP/IP
The OSI reference model is a logical structure for network operations standardized
by the International Standards Organization (ISO). Each layer in the OSI reference
model only interacts directly with the layer immediately beneath it, and provides
facilities for use by the layer above it. The following layers make up the OSI model:
A logical structure
for network L7 Application Layer
operations
L6 Presentation Layer
The OSI model End to
organizes the End
L5 Session Layer
communications
process into seven
L4 Transport Layer
different layers
Protocols are within L3 Network Layer
the layers
Layers 4-7 provide Network L2 Data Link Layer
end to end
communication L1 Physical Layer
Notes
2. Data Link Layer - Provides the functional and procedural means to transfer data
between network entities. It also detects and possibly correct errors that may
occur in the Physical Layer.
3. Network Layer - Transfers variable length data sequences from a source to
destination through one or more networks while also maintaining a quality of
service requested by the Transport Layer.
4. Transport Layer - Provides transparent transfer of data between end users,
providing reliable data transfer services to the upper layers.
5. Session Layer - Controls the connections between computers. It establishes,
manages, and terminates the connections between the local and remote
application.
6. Presentation Layer - Establishes a context between the Application layer
entities in which the high-layer entities can use different syntax and semantics.
7. Application Layer - Provides a user interface that enables user to access the
network and applications.
Application Layer
Transport Layer
Network Layer
Link Layer
TCP/IP is a hierarchical protocol suite that is named after its two primary protocols
Transmission Control Protocol (TCP) and Internet Protocol (IP). It is made up of
four layers as specified in the image.
Notes
The link layer is used to describe the local network topology and the interfaces
needed to affect transmission of Internet layer datagrams to next-neighbor
hosts.
The network layer is responsible for end-to-end communications and delivery of
packets across multiple network links.
The transport layer provides process to process delivery of the entire message.
The application layer enables users to access the network.
Application Layer
Presentation Layer
Application Layer
Session Layer
Transport Layer
Transport Layer
Network Layer
Internet Layer
Link Layer
Physical Layer
– OSI
– TCP/IP
Notes
The OSI and the TCP/IP reference models have much in common. The
architectural layers form a hierarchy and items are listed in order by rank. Higher
layers depend upon services from lower layers, and lower layers provide services
for upper layers. Also, the functionality of layers is roughly similar, except a few.
The presentation and the session layer of the OSI reference model was combined
with the application layer and represented as the application layer in the TCP/IP
Model. The model also does not distinguish the physical and the data link layer.
To understand the complex system and for simplification, the reference models are
implemented as a layered structure. The Open Systems Interconnection (OSI) and
TCP/IP reference models are widely adopted and are important network
architectures (reference model). Both of them defines the essential features of
network services and enhanced functionality.OSI Model is a logical structure for
network operations standardized by the International Standards Organization
(ISO).
The OSI model is a layered framework for the design of a network system that
enables communication between all types of systems.
TCP/IP is a hierarchical protocol suite that is made up of interactive modules,
providing specific functionality.
The transport layer is the heart of the TCP/IP protocol suite. Due to the use of
connection-oriented protocol TCP, the layer provides reliable, process-to-process,
and full-duplex service.
The client initiates the connection by sending the TCP SYN packet to the
destination host.
In the illustration,
SYN refers to synchronous and ACK refers to acknowledgement
The packet contains the random sequence number, which marks the
beginning of the sequence numbers of data that the client will transmit
This sequence number is called the initial sequence number
The server, which is a destination host, receives the packet, and responds with
its own sequence number. The response also includes the acknowledgment
number, which is client’s sequence number that is incremented by 1. That is
SYN+ACK segment is sent
Client acknowledges the response of the server by sending the
acknowledgment ACK segment. It acknowledges the receipt of the second
segment with the ACK flag
SYN_SENT
Listening
SYN
SYN_RCVD
SYN ACK
Established Established
ACK
Introduction
This lesson covers IP SAN and its protocols. It also focuses on the role of TCP/IP
in IP SAN.
Overview of IP SAN
IP SAN
Compute Systems
VM VM VM VM VM VM
APP APP APP APP APP APP
OS OS OS OS OS OS
IP
Storage Systems
Uses Internet Protocol (IP) for the transport of storage traffic. It transports block I/O
over an IP-based network.
Typically runs over a standard IP-based network and uses the TCP/IP) for
communication, commonly:
Drivers of IP SAN
Notes
The advantages of FC SAN such as scalability and high performance come with
the additional cost of buying FC components, such as FC HBA and FC switches.
On the other hand IP is a matured technology and using IP as a storage networking
option provides several advantages. These are listed below:
As we know, the IP SAN protocols typically run over a standard Ethernet network
and uses the Transmission Control Protocol/Internet Protocol (TCP/IP) for
communication along with transport of storage traffic.
iSCSI Stack
IP SAN Protocols
Two primary protocols that leverage IP as the transport mechanism for block-level
data transmission are Internet SCSI (iSCSI) and Fibre Channel over IP (FCIP).
iSCSI
Compute Systems
VM VM VM VM VM VM
APP APP APP APP APP APP
OS OS OS OS OS OS
IP
Storage Systems
FCIP
VM VM VM VM VM VM VM VM
APP APP APP APP APP APP APP APP
OS OS OS OS OS OS OS OS
Notes
iSCSI: It is widely adopted for transferring SCSI data over IP between compute
systems and storage systems and among the storage systems. It is relatively
inexpensive and easy to implement, especially environments in which an FC SAN
does not exist.
FCIP: Organizations are looking for ways to transport data over a long distance
between their disparate FC SANs at multiple geographic locations. One of the best
ways to achieve this goal is to interconnect geographically dispersed FC SANs
through reliable, high-speed links. This approach involves transporting the FC block
data over the IP infrastructure.
iSCSI Lesson
Introduction
This lesson covers iSCSI network components and connectivity. It also covers
iSCSI protocol stack, iSCSI address and name, and iSCSI discovery. The lesson
also focuses on the virtual LAN (VLAN) and stretched VLAN.
iSCSI
Video: iSCSI
iSCSI Overview
It is widely adopted for transferring SCSI data over IP between compute systems
and storage systems and among the storage systems. iSCSI is relatively
inexpensive and easy to implement, especially environments in which an FC SAN
does not exist
Compute Systems
VM VM VM VM VM VM
APP APP APP APP APP APP
OS OS OS OS OS OS
IP
Storage Systems
iSCSI initiators
Example: iSCSI HBA
iSCSI targets
Example: Storage system with iSCSI port
IP-based network
Hardware and software initiators are types of iSCSI initiators that are used by the
host to access iSCSI targets.
Initiator Types
Notes
The computing operations of the software iSCSI initiator are performed by the
server’s operating system. Whereas a hardware iSCSI initiator is a dedicated, host-
based network interface card (NIC) with the integrated resources to handle the
iSCSI processing functions. Following are the common examples of iSCSI
initiators:
Standard NIC with software iSCSI adapter: The software iSCSI adapter is an
operating system or hypervisor kernel-resident software. It uses an existing NIC
of the compute system to emulate an iSCSI initiator. It is least expensive and
easy to implement because most compute systems come with at least one, and
often with two embedded NICs. It requires only a software initiator for iSCSI
functionality. Because NICs provide standard networking function, both the
TCP/IP processing and the encapsulation of SCSI data into IP packets are
carried out by the CPU of the compute system. This functionality places more
overhead on the CPU. If a standard NIC is used in heavy I/O load situations, the
CPU of the compute system might become a bottleneck.
TOE NIC with software iSCSI adapter: A TOE NIC offloads the TCP/IP
processing from the CPU of a compute system and leaves only the iSCSI
functionality to the CPU. The compute system passes the iSCSI information to
the TOE NIC and then the TOE NIC sends the information to the destination
using TCP/IP. Although this solution improves performance, the iSCSI
functionality is still handled by a software adapter that requires CPU cycles of
the compute system.
iSCSI HBA: An iSCSI HBA is a hardware adapter with built-in iSCSI
functionality. It is capable of providing performance benefits over software iSCSI
adapters by offloading the entire iSCSI and TCP/IP processing from the CPU of
a compute system.
iSCSI Connectivity
iSCSI implementations support two types of connectivity: native and bridged. The
connectivities are described here:
Native
Storage System
Compute System
VM VM
Hypervisor
IP
Bridged
Storage System
Compute System
VM VM
iSCSI Gateway
Hypervisor
IP FC SAN
Native iSCSI: In this type of connectivity, the compute systems with iSCSI initiators
may be either directly attached to the iSCSI targets or connected through an IP-
based network. FC components are not required for native iSCSI connectivity. The
figure on the left shows a native iSCSI implementation that includes a storage
system with an iSCSI port. The storage system is connected to an IP network. After
an iSCSI initiator is logged on to the network, it can access the available LUNs on
the storage system.
between the FC and the IP environments. The iSCSI initiator is configured with the
gateway’s IP address as its target destination. On the other side, the gateway is
configured as an FC initiator to the storage system.
Typically, a storage system typically comes with both FC and iSCSI ports. The
combination enables both the native iSCSI connectivity and the FC connectivity in
the same environment and no bridge device is needed.
Compute System
VM VM
Hypervisor
IP
iSCSI
Storage System
iSCSI Port
HBA
Compute System
VM VM
FC
HBA
FC
Hypervisor Port
FC SAN
The image displays a model of iSCSI protocol layers and depicts the encapsulation
order of the SCSI commands for their delivery through a physical carrier.
SCSI is the command protocol that works at the application layer of the Open
System Interconnection (OSI) model
The initiators and the targets use SCSI commands and responses to talk to
each other
The SCSI commands, data, and status messages are encapsulated into TCP/IP
and transmitted across the network between the initiators and the targets
Interconnect
Notes
The figure on the slide displays a model of iSCSI protocol layers and depicts the
encapsulation order of the SCSI commands for their delivery through a physical
carrier.SCSI is the command protocol that works at the application layer of the
Open System Interconnection (OSI) model. The initiators and the targets use SCSI
commands and responses to talk to each other. The SCSI commands, data, and
status messages are encapsulated into TCP/IP and transmitted across the network
between the initiators and the targets.
iSCSI is the session-layer protocol that initiates a reliable session between devices
that recognize SCSI commands and TCP/IP. The iSCSI session-layer interface is
responsible for handling login, authentication, target discovery, and session
management.
TCP is used with iSCSI at the transport layer to provide reliable transmission. TCP
controls message flow, windowing, error recovery, and retransmission. It relies
upon the network layer of the OSI model to provide global addressing and
connectivity. The OSI Layer 2 protocols at the data link layer of this model enable
node-to-node communication through a physical network.
– Location of
iSCSI initiator/target
o Combination of IP address and TCP port number
– iSCSI name
o Unique identifier for initiator/target in an iSCSI network
Notes
iSCSI name is a unique worldwide iSCSI identifier that is used to identify the
initiators and targets within an iSCSI network to facilitate communication. The
unique identifier can be a combination of the names of the department, application,
manufacturer, serial number, asset number, or any tag that can be used to
recognize and manage the iSCSI nodes. The following are three types of iSCSI
names commonly used:
iSCSI Discovery
For iSCSI communication, initiator must discover location and name of targets on
the network.
For devices to communicate with one another, they must be configured in the same
discovery domain. The iSNS server may send state change notifications (SCNs) to
the registered devices. State change notifications inform the registered devices
about network events. These events affect the operational state of devices such as
the addition or removal of devices from a discovery domain.
VM VM VM VM VM VM
iSNS
Definition: VLAN
A logical network created on a LAN enabling communication between
a group of nodes with a common set of functional requirements,
independent of their physical location in the network.
Well-suited for iSCSI deployments as they enable isolating the iSCSI traffic from
other network traffic (for example, compute-to-compute traffic).
Help in isolating specific network traffic from other network traffic in a physical
Ethernet network
Configuring a VLAN:
Port-based
MAC-based
Protocol-based
IP subnet address-based
Application-based
Notes
A VLAN conceptually functions in the same way as a VSAN. Each VLAN behaves
and is managed as an independent LAN. Two nodes connected to a VLAN can
communicate between themselves without routing of frames – even if they are in
different physical locations. VLAN traffic must be forwarded through a router or OSI
Layer-3 switching device when two nodes in different VLANs are communicating –
even if they are connected to the same physical LAN. Network broadcasts within a
VLAN generally do not propagate to nodes that belong to a different VLAN, unless
configured to cross a VLAN boundary.
VLAN trunking allows a single network link (trunk link) to carry multiple VLAN
traffic
To enable trunking, trunk ports must be configured on both sending and
receiving network components
Sending network component inserts a tag field containing VLAN ID into an
Ethernet frame before sending through a trunk link
Receiving network component reads the tag and forwards the frame to
destination port(s)
Tag is removed once a frame leaves trunk link to reach a node port
Notes
Similar to the VSAN trunking, network traffic from multiple VLANs may traverse a
trunk link. A single network port, called trunk port, is used for sending or receiving
traffic from multiple VLANs over a trunk link. Both the sending and the receiving
network components must have at least one trunk port configured for all or a
subset of the VLANs defined on the network component.
As with VSAN tagging, VLAN has its own tagging mechanism. The tagging is
performed by inserting a 4-byte tag field containing 12-bit VLAN ID into the
Ethernet frame (as per IEEE 802.1Q standard) before it is transmitted through a
trunk link. The receiving network component reads the tag and forwards the frame
to the destination port(s) that corresponds to that VLAN ID. The tag is removed
once the frame leaves a trunk link to reach a node port.
Stretched VLAN
Site 1 Site 2
Compute System Compute System VLAN 20 VLAN 20 Compute System Compute System
VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM
Notes
Stretched VLANs also enable the movement of virtual machines (VMs) between
sites without the need to change their network configurations. This simplifies the
creation of high-availability clusters, VM migration, and application and workload
mobility across sites. The clustering across sites, for example, enables moving
VMs to an alternate site in the event of a disaster or during the maintenance of one
site. Without a stretched VLAN, the IP addresses of the VMs must be changed to
match the addressing scheme at the other site.
In a data center IP SAN offers multiple advantages which are common to midsize
businesses, including the following:
Increased reliability A shared set of dedicated IP-based storage systems can help
significantly increase the reliability and availability of
application data
FCIP Lesson
Introduction
This lesson covers FCIP connectivity, FCIP tunnel configuration, and FCIP protocol
stack.
FCIP
Video: FCIP
FCIP Overview
FCIP Connectivity
VM VM VM VM VM VM VM VM
APP APP APP APP APP APP APP APP
OS OS OS OS OS OS OS OS
Hypervisor Hypervisor Hypervisor Hypervisor
FCIP entity (e.g. FCIP gateway) is connected to each fabric to enable tunneling
through an IP network
An FCIP tunnel consists of one or more independent connections between two
FCIP ports
Notes
VE E
E E FCIP Tunnel V E
LAN/WAN
FCIP Gateway FCIP Gateway
FC SAN FC SAN
Only a small subset of nodes in either fabric requires connectivity across an FCIP
tunnel. Thus, an FCIP tunnel may also use vendor-specific features to route
network traffic between specific nodes without merging the fabrics.
The image illustrates a solution for FC-FC routing but the FCIP tunnel is configured
in a way that does not merge the fabrics. In this deployment:
E E V FCIP Tunnel V E E
X X
FCIP Gateway LAN/WAN
FC SAN FCIP Gateway FC SAN
Protocol Stack
Application
Encapsulation
SCSI Commands, Data, and Status
FC Frame
IP
fragmentation occurs when the
data link cannot support the
Physical Media
maximum transmission unit (MTU)
size of an IP packet.
When an IP packet
is fragmented, the required parts of the header must be copied by all fragments
FCIP Encapsulation
When a TCP packet is segmented, normal TCP operations are responsible for
receiving and resequencing the data
The receiving and resequencing is performed prior to passing it on to the FC
processing portion of the device
FCoE Lesson
Introduction
This lesson focuses on FCoE components and FCoE connectivity.It also covers
FCoE switch and CNA.
FCoE
Video: FCoE
FCoE Overview
A protocol that transports FC data along with regular Ethernet traffic over a
Converged Enhanced Ethernet (CEE) network
Uses FCoE protocol, defined by the T11 standards committee, that
encapsulates FC frames into Ethernet frames
Ensures lossless transmission of FC traffic over Ethernet
Components of FCoE
Network adapters
Example: Converged Network Adapter (CNA) and software FCoE adapter
Cables
Example: Copper cables and fiber optical cables
FCoE switch
Compute Systems
VM VM VM VM VM VM
CNA
CEE Link
FC
SAN
FC Ports
Storage Systems
What is CNA?
10GE/FCoE
FCoE ASIC
10GE FC
ASIC ASIC
PCIe Bus
FCoE Switch
An FCoE switch has both Ethernet switch and FC switch functionalities. It has a
Fibre Channel Forwarder (FCF), an Ethernet Bridge, and a set of ports that can be
used for FC and Ethernet connectivity:
Ethernet Bridge
Non FCoE frames are handled as typical Ethernet traffic and forwarded over
the Ethernet ports
VM VM
FCoE Switch
FCoE Port
Hypervisor
LAN FC Ports
VM VM
FC SAN
Hypervisor
Storage System
FCoE Switch
CEE Link
Notes
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts In Practice
Concepts in Practice
Offers various port speed choices for Fibre Channel and Ethernet connectivity
Provides flexibility and high performance for modern workloads
Can be used in the following use cases:
Provide end to end FC switch connectivity
Open networking and SDN-ready fixed form factor switches. They are purpose-built
for applications in modern computing environments. They not only simplify
manageability, it provides optimal flexibility, performance, density and power
efficiency for the data center. It also supports both VLAN Tagging and Double
VLAN Tagging and comprises of 10/25/40/50/100GbE options.
A feature rich multi-functional switch offering various port speed choices for Fibre
Channel and Ethernet connectivity. It is designed for flexibility and high
performance for today’s demanding modern workloads and performance. It can be
used as an end to end FC switch and as an NPIV Gateway Edge switch in a large
Assessment
B. Flow Control
C. Monitors Computers
D. Buffers packet
A. iSCSI
B. ARP
C. ICMP
D. Ethernet
Summary
Introduction
This module focuses on the NAS components and architecture. This module also
focuses on object-based storage components and operations. Finally, this module
focuses on unified storage architecture.
Introduction
Notes
In a file-sharing environment, a user who creates the file (the creator or owner of a
file) determines the type of access (such as read, write, execute, append, delete) to
be given to other users. When multiple users try to access a shared file
simultaneously, a locking scheme is required to maintain data integrity and
simultaneously make this sharing possible.
Some examples of file-sharing methods are the peer-to-peer (P2P) model, File
Transfer Protocol (FTP), client/server models that use file-sharing protocols such
as NFS and CIFS, and Distributed File System (DFS). FTP is a client/server
protocol that enables data transfer over a network. An FTP server and an FTP
client communicate with each other using TCP as the transport protocol.
A peer-to-peer (P2P) file sharing model uses peer-to-peer network. P2P enables
client machines to directly share files with each other over a network. Clients use a
file sharing software that searches for other peer clients. This software differs from
client/server model that uses file servers to store files for sharing.
The standard client/server file-sharing protocols are NFS and CIFS. These
protocols enable the owner of a file to set the required type of access, such as
read-only or read/write, for a particular user or group of users. Using this protocol,
the clients mount remote file systems that are available on dedicated file servers.
A distributed file system (DFS) is a file system that is distributed across several
compute systems. A DFS can provide compute systems with direct access to the
entire file system, while ensuring efficient management and data security. Hadoop
Distributed File System (HDFS) is an example of distributed file system which is
later discussed in this module. Vendors now support HDFS on their NAS systems
to support the scale-out architecture. The scale-out architecture helps to meet the
big data analytics requirements.
What is NAS?
Definition: NAS
An IP-based, dedicated, high-performance file sharing and storage
device.
Clients
LAN
VM VM
NAS System
Hypervisor
Application Servers
Notes
NAS provides the advantages of server consolidation by eliminating the need for
multiple file servers. It also consolidates the storage used by the clients onto a
single system, making it easier to manage the storage. NAS uses network and file-
sharing protocols to provide access to the file data. These protocols include TCP/IP
for data transfer and Common Internet File System (CIFS) and Network File
System (NFS) for network file service. Apart from these protocols, the NAS
systems may also use HDFS and its associated protocols (discussed later in the
module) over TCP/IP to access files. NAS enables both UNIX and Microsoft
Windows users to share the same data seamlessly.
A NAS device uses its own operating system and integrated hardware and
software components to meet specific file-service needs. Its operating system is
optimized for file I/O and, therefore, performs file I/O better than a general-purpose
server. As a result, a NAS device can serve more clients than general-purpose
servers and provide the benefit of server consolidation.
Applications
Network Interface
Scale-up NAS
Scale-out NAS
Notes
Storage is used to persistently store data. The NAS system may have different
types of storage devices to support different requirements. The NAS system may
support SSD, SAS, and SATA in a single system.
The extent to which the components, such as CPU, memory, network adapters,
and storage, can be scaled depends upon the type of NAS architecture used.
There are two types of NAS architectures; scale-up and scale-out. Both these
architectures are detailed in the next few slides.
Scale-Up NAS
Storage
NAS Head(s)
A scale-up NAS architecture provides the capability to scale the capacity and
performance of a single NAS system based on requirements. Scaling up a NAS
system involves upgrading or adding NAS heads and storage.
These NAS systems have a fixed capacity ceiling, which limits their scalability. The
performance of these systems starts degrading when reaching the capacity limit.
Unified NAS
A unified NAS system contains one or more NAS heads and storage in a single
system. NAS heads are connected to the storage. The storage may consist of
different drive types, such as SAS, ATA, FC, and solid-state drives, to meet
different workload requirements.
Each NAS head in a unified NAS has front-end Ethernet ports, which connect to
the IP network. The front-end ports provide connectivity to the clients. Each NAS
head has back-end ports to provide connectivity to the attached storage. Unified
NAS systems have NAS management software that can be used to perform all the
administrative tasks for the NAS head and storage.
VM VM
Block Data
Access
Hypervisor FC SAN
FC Host
FC Port
VM VM
Block Data
Access
iSCSI SAN
Hypervisor
iSCSI Port
Ethernet Port
iSCSI Host
Unified NAS
Ethernet
File Data
Access
NAS Clients
Unified NAS
Gateway NAS
A gateway NAS system consists of one or more NAS heads and uses external and
independently managed storage. In gateway NAS implementation, the NAS
gateway shares the storage from a block-based storage system. The management
functions in this type of solution are more complex than those in an integrated a
unified NAS environment. This is because there are separate administrative tasks
for the NAS head and the storage.
The administrative tasks of the NAS gateway are performed by the NAS
management software. The storage system is managed with the management
software of the block-based storage system. A gateway solution can use the FC
infrastructure, such as switches and directors for accessing SAN-attached storage
arrays or direct-attached storage arrays.
VM VM
Hypervisor
NAS Clients
Application Server
VM VM
IP FC SAN
Hypervisor
NAS Clients
Application Server
Storage System
NAS Gateway
NAS Clients
Gateway NAS
Scale-Out NAS
External Switch
Cluster
Notes
The scale-out NAS implementation pools multiple NAS nodes together in a cluster.
A node may consist of either the NAS head or the storage or both. The cluster
performs the NAS operation as a single entity. A scale-out NAS provides the
capability to scale its resources by simply adding nodes to a clustered NAS
architecture. The cluster works as a single NAS device and is managed centrally.
Nodes can be added to the cluster, when more performance or more capacity is
needed, without causing any downtime. Scale-out NAS provides the flexibility to
use many nodes of moderate performance and the availability characteristics. This
scale-out NAS produce a total system that has better aggregate performance and
availability. It also provides ease of use, low cost, and theoretically unlimited
scalability.
Scale-out NAS uses a distributed clustered file system that runs on all nodes in the
cluster. All information is shared among nodes, so the entire file system is
accessible by clients connecting to any node in the cluster. Scale-out NAS stripes
data across all nodes in a cluster along with mirror or parity protection. As data is
sent from clients to the cluster, the data is divided and allocated to different nodes
in parallel. When a client sends a request to read a file, the scale-out NAS retrieves
the appropriate blocks from multiple nodes. It recombines the blocks into a file and
presents the file to the client. As nodes are added, the file system grows
dynamically, and data is evenly distributed to every node. Each node added to the
cluster increases the aggregate storage, memory, CPU, and network capacity.
Hence, cluster performance is also increased.
Scale-out NAS clusters use separate internal and external networks for back-end
and front-end connectivity respectively. An internal network provides connections
for intra-cluster communication, and an external network connection enables clients
to access and share file data. Each node in the cluster connects to the internal
network. The internal network offers high throughput and low latency and uses
high-speed networking technology, such as InfiniBand or Gigabit Ethernet. To
enable clients to access a node, the node must be connected to the external
Ethernet network. Redundant internal or external networks may be used for high
availability.
Different methods can be used to access files on a NAS system. The most
common methods are:
Common Internet File System / Server Message Block (CIFS/SMB)
Network File System (NFS)
Hadoop Distributed File System (HDFS)
CIFS/SMB
NFS
HDFS
A file system that spans multiple nodes in a cluster and enables user data to be
stored in files.
Hadoop Cluster
Clients
Ethernet
LAN
NameNode
Clients
Notes
The CIFS protocol enables remote clients to gain access to files on a server. CIFS
enables file sharing with other clients by using special locks. Filenames in CIFS are
encoded using Unicode characters. CIFS provides the following features to ensure
data integrity:
It uses file and record locking to prevent users from overwriting the work of
another user on a file or a record.
It supports fault tolerance and can automatically restore connections and
reopen files that were open prior to an interruption.
Network File System (NFS): is a client/server protocol for file sharing that is
commonly used on UNIX systems. NFS was originally based on the connectionless
User Datagram Protocol (UDP). It uses a machine-independent model to represent
user data. It also uses Remote Procedure Call (RPC) for interprocess
communication between two computers.
The NFS protocol provides a set of RPCs to access a remote file system for the
following operations:
Searching files and directories
Opening, reading, writing to, and closing a file
Changing file attributes
Modifying file links and directories
NFS creates a connection between the client and the remote system to transfer
data.
NFS/CIFS NFS/CIFS
The figure illustrates an I/O operation in a scale-up NAS system. The process of
handling I/Os in a scale-up NAS environment is as follows:
1. The requestor (client) packages an I/O request into TCP/IP and forwards it
through the network stack. The NAS system receives this request from the
network.
2. The NAS system converts the I/O request into an appropriate physical storage
request, which is a block-level I/O. This system then performs the operation on
the physical storage.
3. When the NAS system receives data from the storage, it processes and
repackages the data into an appropriate file protocol response.
4. The NAS system packages this response into TCP/IP again and forwards it to
the client through the network.
Cluster
Clients
Ethernet
LAN
Clients
The figure illustrates I/O operation in a scale-out NAS system. A scale-out NAS
consists of multiple NAS nodes and each of these nodes has the functionality
similar to a NameNode or a DataNode. In some proprietary scale-out NAS
implementations, each node may function as both a NameNode and DataNode,
typically to provide Hadoop integration. All the NAS nodes in scale-out NAS are
clustered.
Notes
New nodes can be added as required. As new nodes are added, the file system
grows dynamically and is evenly distributed to each node. As the client sends a file
to store to the NAS system, the file is evenly striped across the nodes. When a
client writes data, even though that client is connected to only one node, the write
operation occurs in multiple nodes in the cluster. This operation is also true for read
operations. A client is connected to only one node at a time. However, when that
client requests a file from the cluster, the node to which the client is connected
don’t have the entire file locally on its drives. The node to which the client is
connected retrieves and rebuilds the file using the back-end InfiniBand network.
Introduction
This lesson covers file-level virtualization, storage tiering, and NAS use case.
Eliminates dependency between data accessed at the file-level and the location
where the files are physically stored
Enables users to use a logical path, rather than a physical path, to access files
Uses global namespace that maps logical path of file resources to their physical
path
Provides non-disruptive file mobility across file servers or NAS devices
Before virtualization, each client knows exactly where its file resources are located.
This environment leads to underutilized storage resources and capacity problems
because files are bound to a specific NAS device or file server. It may be required
to move the files from one server to another because of performance reasons or
when the file server fills up. Moving files across the environment is not easy and
may make files inaccessible during file movement. Moreover, hosts and
applications need to be reconfigured to access the file at the new location. This
operation makes it difficult for storage administrators to improve storage efficiency
while maintaining the required service level.
Clients Clients
Virtualization
Appliance
Notes
As an example, the policy engine might be configured to relocate all the files in the
primary storage tier that have not been accessed in one month and archive those
files to the secondary storage. For each archived file, the policy engine creates a
small space-saving stub file in the primary storage that points to the data on the
secondary storage. When a user tries to access the file from its original location on
the primary storage, the user is transparently provided with the actual file from the
secondary storage.
The figure illustrates the file-level storage tiering. In a file-level storage tiering
environment, a file can be moved to a secondary storage tier or to the cloud.
Before moving a file from primary NAS to secondary NAS or from primary NAS to
cloud, the policy engine scans the primary NAS to identify files that meet the
predefined policies. After identifying the data files, the stub files are created, and
the data files are moved to the destination storage tier.
VM VM
LAN/WAN
Hypervisor
Secondary NAS
Tier2
Application Servers
The data lake represents a paradigm shift from the linear data flow model. As data
and the insights gathered from it increase in value, the enterprise-wide
consolidated storage is transformed into a hub around which the ingestion and
consumption systems work (see figure). This enables enterprises to bring analytics
to data and avoid expensive cost of multiple systems, storage, and time for
ingestion and analysis.
Ingest Store
Velocity Analyse
Data Lake
Volume Act
Scale-out NAS
Notes
Scale-out NAS has the ability to provide the storage platform to this data lake. The
scale-out NAS enhances this paradigm by providing scaling capabilities in terms of
capacity, performance, security, and protection.
Introduction
This lesson focuses on the key object-based storage components. This lesson also
focuses on the key features of object-based storage. Finally, this lesson focuses on
unified storage architecture.
Amount of data created annually is growing exponentially and more than 90% of
data generated is unstructured
Rapid adoption of third platform technologies leads to significant growth of
data
Longer data retention due to regulatory compliance also leads to data
growth
Data must be instantly accessible through a variety of devices from anywhere in
the world
Traditional storage solutions are inefficient in managing this data and in
handling the growth
Notes
The amount of data created each year is growing exponentially and the recent
studies have shown that more than 90 percent of data generated is unstructured
(e-mail, instant messages, graphics, images, and videos). Today, organizations not
only have to store and protect petabytes of data, but they also have to retain the
data over longer periods of time, for regulation and compliance reasons. They have
also recognized that data can help gain competitive advantages and even support
new revenue streams. In addition to increasing amounts of data, there has also
been a significant shift in how people want and expect to access their data. The
rising adoption rate of smartphones, tablets, and other mobile devices by
consumers, combined with increasing acceptance of these devices in enterprise
workplaces, has resulted in an expectation for on-demand access to data from
anywhere on any device.
Traditional storage solutions like NAS, which is a dominant solution for storing
unstructured data, cannot scale to the capacities required or provide universal
access across geographically dispersed locations. Data growth adds high overhead
to the NAS in terms of managing large number of permission and nested
directories. File systems require more management as they scale and are limited in
size. Their performance degrades as file system size increases, and do not
accommodate metadata beyond file properties which is a requirement of many new
applications.These challenges demand a smarter approach (object storage) that
allows to manage data growth at low cost, provides extensive metadata
capabilities, and also provides massive scalability to keep up with the rapidly
growing data storage and access demands.
Stores data in the form of objects on flat address space based on its
content and attributes rather than the name and location.
Ds
Metadata
Object
Object ID
Object
Notes
An object is the fundamental unit of object-based storage that contains user data,
related metadata (size, date, ownership, etc.), and user defined attributes of data
For example, when an MRI scan of a patient is stored as a file in a NAS system,
the metadata is basic and may include information such as file name, date of
creation, owner, and file type. When stored as an object, the metadata component
of the object may include additional information such as patient name, ID, and
attending physician’s name, apart from the basic metadata.
Enables the OSD to meet the scale-out storage requirement of third platform
File Names/ Nodes
Object
File-based storage systems (NAS) are based on file hierarchies that are complex in
Data Attributes Object Object Object
structure. Most file systems have restrictions on the number of files, directories and
Flat Address Space
Hierarchical File System
levels of hierarchy that can be supported, which limits the amount of data that can
be stored.
OSD stores data using flat address space where the objects exist at the same level
and one object cannot be placed inside another object. Therefore, there is no
hierarchy of directories and files, and as a result, billions of objects are to be stored
in a single namespace. This enables the OSD to meet scale-out storage
requirement needs.
VM VM
Metadata
Hypervisor Service
Internal
IP Network Storage Network
Service
OSD System
Notes
The OSD system is composed of one or more nodes. A node is a server that runs
the OSD operating environment and provides services to store, retrieve, and
manage data in the system. Typically OSD systems are architected to work with
inexpensive x86-based nodes, each node provides both compute and storage
resources, and scales linearly in capacity and performance by simply adding
nodes.
The OSD node has two key services: metadata service and storage service. The
metadata service is responsible for generating the object ID from the contents (may
also include other attributes of data) of a file. It also maintains the mapping of the
object IDs and the file system namespace. In some implementations, the metadata
service runs inside an application server. The storage service manages a set of
disks on which the user data is stored.
The OSD nodes connect to the storage via an internal network. The internal
network provides node-to-node connectivity and node-to-storage connectivity. The
application server accesses the node to store and retrieve data over an external
network. OSD typically uses low-cost and high-density disk drives to store the
objects. As more capacity is required, more disk drives can be added to the
system.
Features Description
Flexible data Supports REST/SOAP APIs for web/mobile access, and file
access method sharing protocols (CIFS and NFS) for file service access
Notes
storage assumes that the system can easily grow with aggregate demand. OSD is
based on distributed scale-out architecture where each node in the cluster
contributes with its resources to the total amount of space and performance. Nodes
are independently added to the cluster that provides massive scaling to support
petabytes and even exabytes of capacity with billions of objects that make it
suitable for cloud environment.
Flexible data access method: OSD supports REST/SOAP APIs for web/mobile
access, and file sharing protocols (CIFS and NFS) for file service access. Some
OSD storage systems support HDFS interface for big data analytics.
Data protection: The objects stored in an OSD are protected using two methods:
replication and erasure coding. The replication provides data redundancy by
creating an exact copy of an object. The replica requires the same storage space
as the source object. Based on the policy configured for the object, one or more
replicas are created and distributed across different locations. Erasure coding
technique is discussed in the next slide.
– A set of n disks is divided into m disks to hold data and k disks to hold
coding information
– Coding information is calculated from data
Data
Write
9 fragments
Encode
Encoded fragments
k=3 m=9
The figure illustrates an example of dividing a data into nine data segments (m = 9)
and three coding fragments (k = 3). The maximum number of drive failure
supported in this example is three.
Notes
Object storage systems support erasure coding technique that provides space-
optimal data redundancy to protect data loss against multiple drive failures. In
storage systems, erasure coding can also ensure data integrity without using RAID.
This avoids the capacity overhead of keeping multiple copies and the processing
overhead of running RAID calculations on very large data sets. The result is data
protection for very large storage systems without the risk of very long RAID rebuild
cycles.
In general, erasure coding technique breaks the data into fragments, encoded with
redundant data and stored across a set of different locations, such as disks,
storage nodes, or geographic locations. In a typical erasure coded storage system,
a set of n disks is divided into m disks to hold data and k disks to hold coding
information, where n, m, and k are integers. The coding information is calculated
from the data. If up to k of the n disks fail, their contents can be recomputed from
the surviving disks.
Erasure coding offers higher fault tolerance (tolerates k faults) than replication with
less storage cost. The additional storage requirement for storing coding segments
increases as the value of k/m increases.
Cloud storage provides unified and universal access, policy-based data placement,
and massive scalability. It also enables data access through or file access
protocols and provides automated data protection and efficiency to manage large
amount of data. With the growing adoption of cloud computing, cloud service
providers can leverage OSD to offer storage-as-a-service, backup-as-a-service,
and archive-as-a-service to their consumers.
VM VM VM VM
VM VM
Cloud-based object storage
gateway
Hypervisor
iSCSi/FC/ FCoE REST
Application Servers
Gateways provide a translation layer between the standard interfaces (iSCSI, FC,
NFS, CIFS) and cloud provider’s REST API
Sits in a data center and presents file and block-based storage interfaces to
applications
Performs protocol conversion to send data directly to cloud storage
Encrypts the data before it transmits to the cloud storage
Supports deduplication and compression
Provides a local cache to reduce latency
Notes
The lack of standardized cloud storage APIs has made gateway appliance a crucial
component for cloud adoption. Typically service providers offer cloud-based object
storage with interfaces such as REST or SOAP, but most of the business
applications expect storage resources with block-based iSCSI or FC interfaces or
file-based interfaces, such as NFS or CIFS. The cloud-based object storage
gateways provide a translation layer between these standard interfaces and service
provider's REST API.
The gateway device is a physical or virtual appliance that sits in a data center and
presents file and block-based storage interfaces to the applications. It performs
protocol conversion so that data can be sent directly to cloud storage. To provide
security for the data sent to the cloud, most gateways automatically encrypt the
data before it is sent. To speed up data transmission times (as well as to minimize
cloud storage costs), most gateways support data deduplication and compression.
Notes
There are numerous benefits associated with deploying unified storage systems:
Creates a single pool of storage resources that can be managed with a single
management interface.
Sharing of pooled storage capacity for multiple business workloads should lead to a
lower overall system cost and administrative time, thus reducing the total cost of
ownership (TCO).
Provides the capability to plan the overall storage capacity consumption. Deploying
a unified storage system takes away the guesswork associated with planning for
file and block storage capacity separately.
A unified storage architecture enables the creation of a common storage pool that
can be shared across a diverse set of applications with a common set of
management processes. The key component of a unified storage architecture is
unified controller. The unified controller provides the functionalities of block storage,
file storage, and object storage. It contains iSCSI, FC, FCoE, and IP front-end ports
for direct block access to application servers and file access to NAS clients.
SAN NAS
Object
(iSCSI/FC/FCOE) (CIFS/NFS)
(REST/SOAP)
Unified Controller
Unified Storage
Notes
For block-level access, the controller configures LUNs and presents them to
application servers and the LUNs presented to the application server appear as
local physical disks. A file system is configured on these LUNs at the server and is
made available to applications for storing data.
For NAS clients, the controller configures LUNs and creates a file system on these
LUNs and creates a NFS, CIFS, or mixed share, and exports the share to the
clients. Some storage vendors offer REST API to enable object-level access for
storing data from the web/cloud applications.
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts in Practice
Concepts in Practice
Delivers a full block and file unified environment in a single enclosure. The purpose
built Dell EMC Unity system can be configured as an All Flash system with only
solid state drives, or as a Hybrid system with a mix of solid state and spinning
media to deliver the best on both performance and economics. The Unisphere
management interface offers a consistent look and feel whether you are managing
block resources, file resources, or both. Dell EMC Unity offers multiple solutions to
address security and availability. Unified Snapshots provide point-in-time copies of
block and file data that can be used for backup and restoration purposes.
Asynchronous Replication offers an IP-based replication strategy within a system
Assessment
1. Which file access method provides file sharing that is commonly used on UNIX
systems?
A. NTFS
B. NFS
C. CIFS
D. HDFS
2. Which type of storage device stores data on a flat address space based on its
content and attributes?
A. Block-based
B. Scale-up NAS
C. Scale-out NAS
D. Object-based
Summary
Introduction
Introduction
This lesson presents the drivers the drivers, the attributes, and the architecture of
software-defined storage. Further, this lesson covers asset discovery, resource
abstraction, pooling, and resource provisioning for services. Finally, this lesson
covers the application programming interface (API) and RESTful API.
Notes
Notes
Attribute Description
Notes
applications only when they are needed. If the policy changes, the storage
environment dynamically and automatically responds with the new requested
service level. Unified management: SDS provides a unified storage management
interface that provides an abstract view of the storage infrastructure. Unified
management provides a single control point for the entire infrastructure across all
physical and virtual resources. Self-service: Resource pooling enables multi-
tenancy, and automated storage provisioning enables self-service access to
storage resources. Users select storage services from a self-service catalog and
self-provision them. Open and extensible: An SDS environment is open and easy
to extend enabling new capabilities to be added. An extensible architecture enables
integrating multi-vendor storage, and external management interfaces and
applications into the SDS environment through the use of application programming
interfaces (APIs).
Software-Defined Storage
REST API
Commodity
Notes
in a compute cluster. The storage pool is then shared among the servers in the
cluster. The REST API is the core interface to the SDS controller. All underlying
resources managed by the controller are accessible through the API. The REST
API makes the SDS environment open and extensible, which enables integration of
multi-vendor storage, external management tools, and written applications. The API
also integrates with monitoring and reporting tools. Further, the API provides
access to external cloud/object storage.
ETH/ IB
Notes
unused, is not wasted. A compute system that requires access to the block storage
volumes, runs a client program. The client program is a block device driver that
exposes shared block volumes to an application on the compute system. The
blocks that the client exposes can be blocks from anywhere within the compute-
based SAN. This process enables the application to issue an I/O request, and the
client fulfills it regardless of where the particular blocks reside. The client
communicates with other compute systems either over Ethernet (ETH) or Infiniband
(IB) – a high-speed, low latency communication standard for compute networking.
The compute systems that contribute their local storage to the shared storage pool
within the virtual SAN, run an instance of a server program. The server program
owns the local storage and performs I/O operations as requested by a client from a
compute system within the cluster. A compute-based SAN’s control component,
which is known as the metadata manager, serves as the monitoring and
configuration agent. It holds cluster-wide mapping information and monitors
capacity, performance, and load balancing. It is also responsible for decisions
regarding migration, rebuilds, and all system-related functions. The metadata
manager is not on the virtual SAN data path, and reads and writes do not traverse
the metadata manager. The metadata manager may communicate with other
compute-based SAN components within the cluster to perform system
maintenance and management operations but not data operations. The metadata
manager may run on a compute system within the compute-based SAN, or on an
external compute system.
Benefit Description
Simplified storage Breaks down storage silos and their associated complexity
environment Provides centralized management across all physical and
virtual storage environments
Simplifies management by enabling administrators to
centralize storage management and provisioning tasks
Operational
efficiency Automated policy-driven storage provisioning improves
quality of services, reduces errors, and lowers operational
cost
Provides faster streamlined storage provisioning, which
enables new requirements to be satisfied more rapidly
Asset discovery
Resource abstraction and pooling
Provisioning resources for services
Notes
Asset Discovery
Controller automatically detects assets when they are added to the SDS
environment
Controller obtains or confirms asset configuration information
Examples of asset categories that can be discovered are:
Storage systems
Storage networks
Compute systems and clusters
Data protection solutions
Notes
Data centers commonly contain many physical storage systems of different types
and often from multiple manufacturers. Each physical storage system must also be
individually managed, which is time consuming and error prone.
FC FC FC
Resource Provisioning
Notes
The block data service provides a block volume of required size, performance level,
and protection level to a user. Examples of the services that an administrator
defines in this service category are as follows: Create a block volume: A user can
create a block storage volume by selecting a virtual storage system and virtual
pool. On receiving the request, the SDS controller chooses the physical pool from
the selected virtual pool and storage system. It creates a block volume, which
corresponds to a LUN on the storage system. Delete a block volume: A user can
delete an existing volume. On receiving the request, the SDS controller destroys
the volume from the physical storage pool. Bind a block volume to compute: A user
can assign a block volume to a selected compute system/cluster. On receiving this
request, the SDS controller binds the block volume to the specified compute
system/cluster. However, the volume cannot be written to or read from unless it is
mounted. Unbind block volume from compute: A user can unbind a volume from a
compute system/cluster. This block service simply makes the block volume invisible
to the compute. Mount a block volume: A user can mount a block volume on a
compute system/cluster. The SDS controller sends commands to the OS to mount
the volume. This operation is specific to the type of OS on the compute system
such as Windows, Linux, and ESXi. Unmount block volume: A user can unmount a
block volume from a compute system/cluster. On receiving the request, the SDS
controller sends commands to the compute to unmount the volume. Expand block
volume: A user can expand/extend a block volume by combining it either with a
newly created volume or with an existing volume. On receiving the request to
expand a volume, the SDS controller commands the storage system to expand the
LUN.
Notes
APIs enable integrating third-party data services and capabilities into existing
architecture
In SDDC, APIs enable orchestration and provisioning resources from pools
Ensures meeting the SLAs that organizations require
In SDS, the REST API provides the interface to all underlying resources
Notes
Notes
REST API
SDS Controller
Storage Systems
The REST API enables the extensibility of the SDS functionality through integration
with written applications, and external management tools and cloud stacks such as
VMware, Microsoft, and OpenStack. This provides an alternative to provisioning
storage from the native management interface. The open platform enables users
and developers to write new data services. This enables building an open
development community around the platform.
The API also integrates with tools for monitoring and reporting system utilization,
performance, and health. This also enables generating chargeback/showback
reports. The API may also support cloud/object storage platforms such as, Amazon
S3, and OpenStack Swift. Further, the API may also support integration with HDFS
for running Hadoop applications.
Describes the programmatic interfaces that allow users to create, read, update,
and delete resources through the HTTP methods PUT, GET, POST, and
DELETE
Accessible using any web browser or programming platform that can issue
HTTP requests
The browser may require a special plugin such as httpAnalyzer for Internet
Explorer, Poster for Firefox, and PostMan for Chrome. The REST API may also be
accessed using scripting platforms such as Perl. Vendors may also provide class
libraries that enable developers to write applications that access the SDS data
services.
Introduction
VM VM VM
OS OS OS
Control Plane
Network OS
Data Plane
Notes
Application Layer
Application Plane
Application Application
Northbound APIs
Control Layer
Control Plane
Controller Controller
Southbound APIs
Infrastructure Layer
Data Plane
Networking Device Networking Device
The architecture of SDN consists of three layers along with APIs in between to
define the communication.
between the controller and application layer. Southbound interfaces define the
communications between the control and infrastructure layer.
Benefit Details
Centralized Control Provides a single point of control for the entire network
infrastructure that may span across data centers
Centralized control plane provides the programming logic
for transferring the network traffic, which can be uniformly
and quickly applied across the network infrastructure
Programming logic can be upgraded centrally to add new
features based on application requirements.
Notes
Listed some common use cases where SDN is used to strengthen the security,
automate the processes for faster provisioning of network resources and enable
business continuity.
Notes
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts in Practice
Concepts in Practice
ViPR Controller also provides a REST-based API making the storage architecture
extensible. It supports multiple vendors enabling organizations to choose storage
platforms from either Dell EMC or third-party. It also supports different cloud stacks
such as VMware, Microsoft, and OpenStack. ViPR Controller development is driven
by the open-source community, which enables expanding its features and
functionalities.
Software that creates a server and IP-based SAN from direct-attached server
storage to deliver flexible and scalable performance and capacity on demand. As
an alternative to a traditional SAN infrastructure, VxFlex OS combines HDDs,
SSDs, and PCIe flash cards to create a virtual pool of block storage with varying
performance tiers. It decouples compute and storage, and scales each resource
together or independently to drive maximum efficiency and to eliminate wasted
CAPEX at scale. Distributed I/O Parallelism vs. Data Locality: uses all resources to
deliver against all I/O requests to drive massive performance. Eliminates
bottlenecks and scales performance linearly.
VxFlex OS is built for workload variability and consolidates many workloads onto a
single system with consistent performance for all. For storage utilization , VxFlex
OS is completely agnostic because the OS and Hypervisor enable the sharing of
storage resources across multiple operating systems/clusters. Regarding
VMware NSX
VMware NSX lets you create, delete, save, and restore networks without changing
the physical network. This process reduces the time to provision by simplifying
overall network operations. NSX Manager is integrated with vCenter for single pane
management and all these network resources can be deployed whether in a cloud
or a self-service portal environment.
Assessment
1. Which product creates IP-based SAN from direct attached server storage ?
B. VMware NSX
C. VMware vSphere
A. Control
B. Infrastructure
C. Application
D. API
Summary
Introduction
Introduction
Business Continuity
Notes
Business continuity (BC) is a set of processes that includes all activities that a
business must perform to mitigate the impact of planned and unplanned downtime.
BC entails preparing for, responding to, and recovering from a system outage that
adversely affects business operations. It describes the processes and procedures
an organization establishes to ensure that essential functions can continue during
and after a disaster.
In a modern data center, policy-based services can be created that include data
protection through the self-service portal. Consumers can select the class of
service that best meets their performance, cost, and protection requirements on
demand. Once the service is activated, the underlying data protection solutions that
are required to support the service is automatically invoked to meet the required
data protection.
For example: If a service requires VM backup for every six hours and then backing
up VM is scheduled automatically every six hours.The goal of a BC solution is to
ensure “information availability” required to conduct vital business operations.
High-risk Data Organizations seek to protect their sensitive data to reduce the
risk of financial, legal, and business loss
Notes
Data is the most valuable asset for an organization. An organization can use its
data to efficiently bill customers, advertise relevant products to the existing and
potential customers. It also enables organizations to launch new products and
services, and perform trend analysis to devise targeted marketing plans. These
sensitive data, if lost, may lead to significant financial, legal, and business loss
apart from serious damage to the reputation of an organization. An organization
seeks to reduce the risk of sensitive data loss to operate its business successfully.
It should focus its protection efforts where the need exists—its high-risk data.
Information Availability
Timeliness Defines the time window (a particular time of the day, week, month,
and year as specified) during which information must be
accessible.
For example: if online access to an application is required between
8:00 am and 10:00 pm each day, any disruption to data availability
outside of this time slot is not considered to affect timeliness.
Notes
Data center failure due to disaster (natural or man-made disasters such as flood,
fire, earthquake, and so on) is not the only cause of information unavailability. Poor
application design or resource configuration errors can lead to information
unavailability. For example, if the database server is down for some reason, then
the data is inaccessible to the consumers, which leads to IT service outage.
Even the unavailability of data due to several factors (data corruption and human
error) leads to outage. The IT department is routinely required to take on activities
such as refreshing the data center infrastructure, migration, running routine
maintenance, or even relocating to a new data center. Any of these activities can
have its own significant and negative impact on information availability.
Note: In general, the outages can be broadly categorized into planned and
unplanned outages.
Financial Performance
Damaged Reputation
- Customers - Revenue recognition
MTBF: Average time available for a system or component to perform its normal
operations between failures
MTBF = Total uptime / Number of failures
Disaster
Time
Notes
Both RPO and RTO are counted in minutes, hours, or days and are directly related
to the criticality of the IT service and data. Usually, the lower the RTO and RPO,
the higher is the cost of a BC solution or technology.
BC Planning Lifecycle
BC planning must follow a disciplined approach like any other planning process.
Organizations today dedicate specialized resources to develop and maintain BC
plans. From the conceptualization to the realization of the BC plan, a lifecycle of
activities can be defined for the BC process. The BC planning lifecycle includes five
stages:
Establish Objectives
Determine BC requirements
Estimate the scope and budget to achieve requirements
Select a BC team that includes subject matter experts from all areas of
business, whether internal or external
Create BC policies
Analyze
Define the team structure and assign individual roles and responsibilities; for
example, different teams are formed for activities such as emergency response
and infrastructure and application recovery
Design data protection strategies and develop infrastructure
Develop contingency solution and emergency response procedures
Detail recovery and restart procedures
Implement
Train the employees who are responsible for backup and replication of
business-critical data on a regular basis or whenever there is a modification in
the BC plan
Train employees on emergency response procedures when disasters are
declared
Train the recovery team on recovery procedures based on contingency
scenarios
Perform damage-assessment processes and review recovery plans
Test the BC plan regularly to evaluate its performance and identify its limitations
Assess the performance reports and identify limitations
Update the BC plans and recovery/restart procedures to reflect regular changes
within the data center
A disaster may impact the ability of a data center to remain up and provide services
to users. This disaster may cause information unavailability. Disaster recovery (DR)
mitigates the risk of information unavailability due to a disaster. It involves a set of
policies and procedures for restoring IT infrastructure including data. This
infrastructure and data are required to support the ongoing IT services after a
disaster occurs.
Data Access
Data Access
Replication
Notes
Organizations often keep their DR site ready to restart business operations if there
is an outage at the primary data center. This may require the maintenance of a
complete set of IT resources at the DR site that matches the IT resources at the
primary site. Organization can either build their own DR site, or they can use cloud
to build DR site.
Notes
With the aim of meeting the required information and service availability, the
organizations should build a resilient IT infrastructure. Building a resilient IT
infrastructure requires the following high availability and data protection solutions:
Deploying redundancy at both the IT infrastructure component level and the site
level to avoid single point of failure
Deploying data protection solutions such as backup, replication, migration, and
archiving
Automatic failover mechanism is one of the important methods as well. It is one
the efficient and cost effective way to ensure HA. For example, scripts can be
defined to bring up a new VM automatically when the current VM stops
responding or goes down.
Architecting resilient modern applications
application users, VMs, data, and services to the new data center. This process
involves the use of redundant infrastructure across different geographic locations,
live migration, backup, and replication solutions.
Introduction
This lesson presents key requirements for fault tolerance. This lesson also focuses
on component-level and site-level fault tolerance techniques.
Fault tolerance ensures that a single fault or failure does not make an entire system
or a service unavailable. It protects an IT system or a service against various types
of unavailability.
Transient Unavailability
Deals Due That
with to Hardware Outage Cause
Fault Fault
Software Issues User
Tolerance Failure Intermittent Unavailability
Errors
Permanent Unavailability
Notes
A fault tolerant IT infrastructure should meet two key requirements such as fault
isolation and eliminating single points of failure (SPOF).
Fault Isolation
Fault Isolation
Fault isolation limits the scope of a fault into local area so that the other areas of a
system are not impacted by the fault. It does not prevent failure of a component but
ensures that the failure does not impact the overall system.
Fault isolation requires a fault detection mechanism that identifies the location of a
fault and a contained system design (like sandbox) that prevents a faulty system
component from impacting other components.
Hypervisor
The example represents two I/O paths between a compute system and a storage
system. The compute system uses both the paths to send I/O requests to the
storage system. If an error or fault occurs on a path causing a path failure, the fault
isolation mechanism present in the environment automatically detects the failed
path. It isolates the failed path from the set of available paths and marks it as a
dead path to avoid sending the pending I/Os through it. All pending I/Os are
redirected to the live path. This helps avoiding the time-out and the retry delays.
SPOF at Storage-level
SPOF at Network-
level
VM VM
SPOF at Site-level
SPOF at Compute-
level Hypervisor
FC Switch
Storage System
Compute System Data Center
Notes
Organizations may also create multiple availability zones to avoid single points of
failure at data center level. Usually, each zone is isolated from others, so that the
failure of one zone would not impact the other zones. It is important to have high
availability mechanisms that enable automated application/service failover within
and across the zones if there is a component failure or disaster.
Note:
participate in the service operations. The standby component is active only if any
one of the active components fails.
Hypervisor
NIC Teaming
Clients
Remote Site
Redundant
HBAs
Redundant FC
LAN/WAN
Switches
VM VM
APP APP
Redundant
OS OS
Network VMM VMM
Hypervisor Kernel
Redundant
Redundant
NIC Teaming Ports
Storage
System
Notes
Compute Clustering
Service Failover
Compute Cluster
Heartbeat Signal
Active/active
Active/passive
Notes
Clustering is a technique where at least two compute systems (or nodes) work
together and are viewed as a single compute system to provide high availability
and load balancing. If one of the compute systems fails, the service running in the
compute system can failover to another compute system in the cluster. This
method minimizes or avoids any outage.
In active/active clustering, the nodes in a cluster are all active participants and
run the same service of their clients. The active/active cluster balances requests
for service among the nodes. If one of the nodes fails, the surviving nodes take
the load of the failed one. This method enhances both the performance and the
availability of a service. The nodes in the cluster have access to shared storage
volumes. In active/active clustering only one node can write or update the data
in a shared file system or database at a given time.
In active/passive clustering, the service runs on one or more nodes and the
passive node waits for a failover. If the active node fails, the service that had
been running on the active node is failed over to the passive node.
Active/passive clustering does not provide performance improvement like
active/active clustering.
Clustering uses a heartbeat mechanism to determine the health of each node in the
cluster. The exchange of heartbeat signals, usually happens over a private network
enables participating cluster members to monitor one another’s status.
Record
Logging Traffic Reply SecondaryV
VM
Events VM
Primary VM Events M
Acknowledgement
Network
Storage System
Notes
The hypervisor running the primary VM as shown in the illustration captures the
sequence of events for the primary VM. This includes instructions from the virtual
I/O devices, virtual NICs, and so on. Then it transfers these sequences to the
hypervisor running on another compute system. The hypervisor running the
secondary VM receives these event sequences and sends them to the secondary
VM for execution.
The primary and the secondary VMs share the same storage, but all output
operations are performed only by the primary VM. A locking mechanism ensures
that the secondary VM does not perform write operations on the shared storage.
The hypervisor posts all events to the secondary VM at the same execution point
as they occurred on the primary VM. This way, these VMs “play” the same set of
events and their states are synchronized with each other.
Link Aggregation
Combines links between two switches and also between a switch and a node
Enables network traffic failover in the event of a link failure in the aggregation
NIC Teaming
Teaming Software
Physical Switch
Logical NIC
Physical NIC
Groups NICs so that they appear as a single, logical NIC to the operation
system or hypervisor
Provides network traffic failover in the event of a NIC/link failure
Distributes network traffic across NICs
Multipathing
Notes
aggregation, NIC teaming, multipathing, and load balancing provide fault tolerance
mechanism against link failure.
Link aggregation combines two or more network links into a single logical link,
called port channel, yielding higher bandwidth than a single link could provide.
Link aggregation enables distribution of network traffic across the links and
traffic failover if there is a link failure. If a link in the aggregation is lost, all
network traffic on that link is redistributed across the remaining links.
NIC teaming groups NICs so that they appear as a single, logical NIC to the OS
or hypervisor. NIC teaming provides network traffic failover to prevent
connectivity loss if there is a NIC failure or a network link outage. Sometimes,
NIC teaming enables aggregation of network bandwidth of individual NICs. The
bandwidth aggregation facilitates distribution of network traffic across NICs in
the team.
To use multipathing, multiple paths must exist between the compute and the
storage systems. Each path can be configured as either active or standby. If one or
more active paths fail then standby paths become active. If an active path fails, the
multipathing process detects the failed path and then redirects I/Os of the failed
path to another active path.
Elastic load balancing enables dynamic distribution of application and client I/O
traffic among VM instances. It dynamically scales resources (VM instances) to
meet traffic demands. Load balancer provides fault tolerance capability by
detecting the unhealthy VM instances and automatically redirects the I/Os to other
healthy VM instances.
Data centers comprise storage systems with a large number of disk drives, and
solid state drives. This storage systems support various applications and services
running in the environment. The failure of these drives could result in data loss and
information unavailability. The greater the number of drives in use the greater is the
probability of a drive failure.
The following techniques provide data protection in the event of drive failure:
RAID
A1 A2 Ap Aq
B1 Bp Bq B2
Cp Cq C1 C2
Erasure Coding
Storage Virtualization
VM VM
Hypervisor
Virtualization
Appliance
Storage Pool
SAN
LUN LUN
Notes
Dynamic disk sparing is a fault tolerance mechanism that refers to a spare drive
which automatically replaces a failed disk drive by taking the identity of it. A spare
drive should be large enough to accommodate data from a failed drive. Some
systems implement multiple spare drives to improve data availability.
In dynamic disk sparing, when the recoverable error rates for a disk exceed a
predetermined threshold, the disk subsystem tries to copy data from the failing disk
to the spare drive automatically. If this task is completed before the damaged disk
fails, the subsystem switches to the spare disk and marks the failing disk as
unusable. Otherwise, it uses parity or the mirrored disk to recover the data.
Instead of being directed to the LUNs on the individual storage systems, the
compute systems are directed to the virtual volume provided by the virtualization
layer
An availability zone is a location with its own set of resources and isolated from
other zones.A zone can be an entire data center or a part of the data center
Enables running multiple service instances within and across zones to survive
data center or site failure
If there is an outage, the service should seamlessly failover across the zones
Notes
For example, if two compute systems are deployed, one in zone A and the other in
zone B, and then the probability that both go down simultaneously due to an
external event is low. This simple strategy enables the organization to construct
highly reliable web services by placing compute systems into multiple zones. So
the failure of one zone does not disrupt the service, or at the least, enable to rapidly
reconstruct the service in the second zone.
High availability can be achieved by moving services across zones that are located
in different locations without user interruption. The services can be moved across
zones by implementing stretched cluster.
Zone A Zone B
VM VM VM VM
I/Os I/Os
FC SAN FC SAN
Virtual Volume
Virtualization Virtualization
Appliance Appliance
FC/IP
Storage Pool
Virtualization
Layer
Storage Storage
System LUN LUN System
Notes
The illustration also shows that a virtual volume is created from the federated
storage resources across zones. The virtualization appliance has the ability to
mirror the data of a virtual volume between the LUNs located in two different
storage systems at different locations.
Each I/O from a host to the virtual volume is mirrored to the underlying LUNs on the
storage systems. If an outage occurs at one of the data centers, for example at
zone A, then the running VMs at zone A can be restarted at Zone B without
impacting the service availability.
This setup also enables accessing the storage even if one of the storage systems
is unavailable. If storage system at zone A is unavailable, then the hypervisor
running there still accesses the virtual volume. The hypervisor can access the data
from the available storage system at zone B.
Notes
A reliable application properly manages the failure of one or more modules and
continues operating properly. If a failed operation is retried a few milliseconds later,
the operation may succeed. These types of error conditions are called as transient
faults. Fault resilient applications have logic to detect and handle transient fault
conditions in order to avoid application downtime.
Graceful Degradation
Notes
However, in this same scenario, it is still possible to make the product catalog
module available to consumers, to view the product catalog. The application could
also enable one to place the order and move it into the shopping cart. This method
provides the ability to process the orders when the payment gateway is available or
after failing over to a secondary payment gateway.
In a persistent application state model, the state information is stored out of the
memory and is stored in a repository (database). If a VM running the application
instance fails, the state information is still available in the repository. A new
application instance is created on another VM which can access the state
information from the database and resume the processing.
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts In Practice
Concepts in Practice
Dell EMC PowerPath/VE is compatible with VMware vSphere and Microsoft Hyper-
V-based virtual environments. It can be used together with Dell EMC PowerPath to
perform the following functions in both physical and virtual environments:
Concepts in Practice
VMware HA
VMware FT
VMware HA
VMware FT
instances in the event of hardware failure, FT eliminates even the smallest chance
of data loss or disruption.
Assessment
1. Which defines the amount of data loss that a business can endure?
A. RTO
B. RPO
D. Availability zone
A. Graceful degradation
B. Retry logic
D. Core-edge topology
Summary
Introduction
This module presents need for backup, various backup methods and deduplication
implementation. This module also focuses on different replication types, data
archiving solution, and data migration solution.
Replication Lesson
Introduction
This lesson presents the primary uses of replica, and characteristics of replica. This
lesson also focuses on replication types.
Replication
Replicas are used to restore and restart operations if data loss occurs
Data can be replicated to one or more locations based on the business
requirements
Data Center A
Data
Replication
Replica
Data Center B
Servers
Connectivity
Data
Replication
Storage
Cloud
Data Replication
to Cloud
Data Replication
Notes
Data is one of the most valuable assets of any organization. It is being stored,
mined, transformed, and used continuously. It is a critical component in the
operation and function of organizations. Outages, whatever may be the cause, are
costly, and customers are always concerned about data availability. Safeguarding
and keeping the data highly available are some of the top priorities of any
organization.
Data replication is the process of creating an exact copy (replica) of data. If a data
loss occurs, then the replicas are used to restore and restart operations. For
example, if a production VM goes down and then the replica VM can be used to
restart the production operations with minimum disruption. Based on business
requirements, data can be replicated to one or more locations.
For example, data can be replicated within a data center, between data centers,
from a data center to a cloud, or between clouds.In a replication environment, a
compute system accessing the production data from one or more LUNs on storage
system is called a production compute system. These LUNs are known as source
LUNs, production LUNs, or the source. A LUN on which the production data is
replicated to is called the target LUN or the target or replica.
Replicas are created for various purposes which include the following:
Data
Replication Can be used to restart business
operations or to recover the data
Source
Replication
Used for testing applications
Replication
Data migration
Replica
Notes
Under normal backup operations, data is read from the production LUNs and
written to the backup device. This places an extra burden on the production
infrastructure because production LUNs are simultaneously involved in production
operations and servicing data for backup operations.
To avoid this situation, a replica can be created from production LUN and it can be
used as a source to perform backup operations. This method alleviates the backup
I/O workload on the production LUNs.
For critical applications, replicas can be taken at short, regular intervals. This
enables fast recovery from data loss. If a complete failure of the source LUN
occurs, the replication solution enables to restart the production operation on the
replica. This approach reduces the RTO.
Decision-Support Activities
Running reports using the data on the replicas greatly reduces the I/O burden on
the production device.
Testing Platform
For example, an organization may use the replica to test the production application
upgrade. If the test is successful, the upgrade may be implemented on the
production environment.
Data Migration
Another use for a replica is data migration. Data migrations are performed for
various reasons such as migrating from a smaller capacity LUN to one of a larger
capacity.
Consistency Continuous
Ensures the usability of a replica Near-zero RPO
Replica must be consistent with the source
Notes
Recoverability Enables restoration of data from the replicas to the source if data
loss occurs.
Consistency Replica must be consistent with the source so that it is usable for
both recovery and restart operations.
For example, if a service running on a primary data center is to fail
over to remote site due to disaster. There must be a consistent
replica available at that site. So, ensuring consistency is the
primary requirement for all the replication technologies.Replicas
can either be point-in-time (PIT) or continuous and the choice of
replica ties back into RPO.
PIT replica The data on the replica is an identical image of the production at
some specific timestamp.
For example, a replica of a file system is created at 4:00 PM on
Monday. This replica would then be referred to as the Monday
4:00 PM PIT copy. The RPO maps to the time when the PIT was
created to the time when any kind of failure on the production
occurred. If there is a failure on the production at 8:00 PM and
there is a 4:00 PM PIT available, the RPO would be 4 hours (8-
4=4). To minimize RPO, take periodic PITs.
Continuous The data on the replica is in-sync with the production data always.
replica The objective with any continuous replication is to reduce the
RPO to zero or near-zero.
Types of Replication
Local Replication
Refers to
Data is replicated within a storage
replicating data Storage system in a storage-based
System replication
within the same
location
Within a data Storage
Data is replicated within a data
center from one system to another in
System
center in a compute-based replication
compute-based
replication Data Center
Within a
storage system in storage system-based replication
Typically used for operational restore of data if there is a data loss
Remote Replication
Refers to
replicating data Data is replicated to
remote data center
to remote
locations
Storage System Storage System
(locations can be
geographically Data Center A Data Center B
dispersed)
Data can be
synchronously or asynchronously replicated
Helps to mitigate the risks associated with regional outages
Enables organizations to replicate the data to cloud for DR purpose
Notes
Local replication is the process of replicating data within the same storage system
or the same data center.
Local replicas help to restore the data if there is a data loss or enable restarting the
application immediately to ensure business continuity.
Remote replication helps organizations to mitigate the risks that are associated with
regional outages resulting from natural or human-made disasters. During disasters,
the services can be moved to a remote location to ensure continuous business
operation.
Remote replication also enables organizations to replicate their data to the cloud
for DR purpose. In a remote replication, data can be synchronously or
asynchronously replicated.
– This includes disks, memory, and other devices, such as virtual network
interface cards
– This VM snapshot is useful for quick restore of a VM
For example:
An administrator can create a snapshot of a VM, make changes such as
applying patches and software upgrades to the VM
If anything goes wrong, the administrator can restore the VM to its previous
state using the VM snapshot
The hypervisor provides an option to create and manage multiple snapshots
Taking multiple snapshots provide several restore points for a VM
While more snapshots improve the resiliency of the infrastructure, it is important
to consider the storage space they consume
Notes
When a snapshot is created for a VM, a child virtual disk (delta disk file) is created
from the base image or parent virtual disk. The snapshot mechanism prevents the
guest operating system from writing to the base image or parent virtual disk.
Instead it directs all writes to the delta disk file. Successive snapshots generate a
new child virtual disk from the last child virtual disk in the chain. Snapshots hold
only changed blocks.
Child virtual disks store all the changes that are made to the parent VM after
snapshots are created
When committing snapshot 3, the data on child virtual disk file 1 and 2 are
committed prior to committing data on child virtual disk 3 to the parent virtual
disk file
After committing the data, the child virtual disk 1, 2, and 3 are deleted
However, while rolling back to the snapshot 1, child disk file 1 is retained
and the snapshots 2 and 3 are discarded
VM
VM writes here
Changed blocks
Snapshot 3 (Child of snapshot 2
Virtual Disk 3)
and base image
Notes
parent virtual disk file. After committing the data, the child virtual disk 1, child virtual
disk 2, and child virtual disk 3 are deleted. However, while rolling back to the
snapshot 1(PIT), child disk file 1 is retained and the snapshots 2 and 3 are
discarded.Sometimes it may be required to retain a snapshot for longer period. It
must be noted that larger snapshots take longer time to commit and may impact
the performance. Source (parent VM) must be healthy in order to use snapshot for
roll back.
VM VM
Reads
Notes Source
Notes
Cloning provides the ability to create fully populated point-in-time copies of LUNs
within a storage system or create a copy of an existing VM.
VM Clone:
A VM clone is a copy of an existing VM. The existing VM is called the parent of
the clone. When the cloning operation completes, the clone becomes a
separate VM. The changes made to a clone do not affect the parent VM.
Changes made to the parent VM do not appear in a clone. A clone's MAC
address is different from that of the parent VM.
In general, installing a guest operating system and applications on a VM is a
time consuming task. With clones, administrators can make many copies of a
virtual machine from a single installation and configuration process. For
example, in an organization, the administrator can clone a VM for each new
employee, with a suite of preconfigured software applications.
Write is committed to both the source and the remote replica before it is
acknowledged to the compute system
Enables to restart business operations at a remote site with zero data loss;
Provides near zero RPO
VM VM
Production
1.The write I/O is received from production compute
Compute Hypervisor
System system into cache of source and placed in queue
Notes
In synchronous replication, writes must be committed to the source and the remote
target prior to acknowledging “write complete” to the production compute system.
Additional writes on the source cannot occur until each preceding write has been
completed and acknowledged.
This approach ensures that data is identical on the source and the target at all
times. Further, writes are transmitted to the remote site exactly in the order in which
they are received at the source. Write ordering is maintained and it ensures
transactional consistency when the applications are restarted at the remote
location. As a result, the remote images are always restartable copies.
The degree of impact on response time depends primarily on the distance and
the network bandwidth between sites. If the bandwidth provided for
synchronous remote replication is less than the maximum write workload, there
will be times during the day when the response time might be excessively
elongated, causing applications to time out. The distances over which
synchronous replication can be deployed depend on the application’s capability
to tolerate the extensions in response time. Typically synchronous remote
replication is deployed for distances less than 200 kilometers (125 miles)
between the two sites.
VM VM
Production
Compute 1.The write I/O is received from production compute system
System Hypervisor into cache of source and placed in queue
Notes
Replicating data across sites which are 1000s of kilometers apart would help
organization to face any disaster. If a disaster strikes at one of the regions then the
data would still be available in another region. The service could move to the
location. Asynchronous replication enables to replicate data across sites which are
1000s of kilometers apart.
In asynchronous replication, compute system writes are collected into buffer (delta
set) at the source. This delta set is transferred to the remote site in regular
intervals. Adequate buffer capacity should be provisioned to perform asynchronous
replication. Some storage vendors offer a feature called delta set extension, which
enables to offload delta set from buffer (cache) to specially configured drives. This
feature makes asynchronous replication resilient to the temporary increase in write
workload or loss of network link.
In asynchronous replication, RPO depends on the size of the buffer, the available
network bandwidth, and the write workload to the source. This replication can take
advantage of locality of reference (repeated writes to the same location). If the
same location is written multiple times in the buffer prior to transmission to the
remote site, only the final version of the data is transmitted. This feature conserves
link bandwidth.
Data from source site is replicated to multiple remote sites for DR purpose
Disaster recovery protection is always available if any one-site failure occurs
Mitigates the risk in two-site replication
Remote Site
VM VM
Production
Compute System
Hypervisor
Storage
(Target 2)
Asynchronous
Storage (Source)
Synchronous
Storage
(Target 1)
Source Site
Notes
In a two-site synchronous replication, the source and target sites are usually within
a short distance. If a regional disaster occurs, both the source and the target sites
might become unavailable. This can lead to extended RPO and RTO. Since the
last known good copy of data would need to come from another source, such as an
offsite tape. A regional disaster will not affect the target site in a two-site
asynchronous replication since the sites are typically several hundred or several
thousand kilometers apart. If the source site fails, production can be shifted to the
target site. However, there is no further remote protection of data until the failure is
resolved.
Multisite replication mitigates the risks that are identified in two-site replication. In a
multisite replication, data from the source site is replicated to two or more remote
sites. The illustration provides an example of a three-site remote replication
solution. In this approach, data at the source is replicated to two different storage
systems at two different sites. The source-to-bunker site (target 1) replication is
synchronous with a near-zero RPO. The source-to-remote site (target 2) replication
is asynchronous with an RPO in the order of minutes. The key benefit of this
replication is the ability to fail over to either of the two remote sites in the case of
source-site failure.
Notes
CDP supports both local and remote replication of data and VMs to meet
operational and disaster recovery respectively. In a CDP implementation, data can
be replicated to more than two sites using synchronous and asynchronous
replication. CDP supports various WAN optimization techniques (deduplication,
compression). These techniques reduce bandwidth requirements, and also
optimally use the available bandwidth.
Journal Volume
Contains all the data that has changed from the time the replication session
started to the production volume
CDP Appliance
Write Splitter
Intercept writes to the production volume from the compute system and splits
each write into two copies
Can be implemented at the compute, fabric, or storage system
Notes
CDP uses a journal volume to store all the data that has changed on the production
volume from the time the replication session started. The journal contains the
metadata and data that enable roll back to any recovery points. The amount of
space that is configured for the journal determines how far back the recovery points
can go.
CDP also uses an appliance and a write splitter. A CDP appliance is an intelligent
hardware platform that runs the CDP software and manages local and remote data
replications. Some vendors offer virtual appliance where the CDP software is
running inside VMs.
Write splitters intercept writes to the production volume from the compute system
and split each write into two copies. Write splitting can be performed at the
compute, fabric, or storage system.
Write Splitter
Notes
Typically the replica is synchronized with the source, and then the replication
process starts. After the replication starts, all the writes from the compute system to
the source (production volume) are split into two copies. One copy is sent to the
local CDP appliance at the source site, and the other copy is sent to the production
volume. Then the local appliance writes the data to the journal at the source site
and the data in turn is written to the local replica. If a file is accidentally deleted, or
the file is corrupted, the local journal enables organizations to recover the
application data to any PIT.
In remote replication, the local appliance at the source site sends the received write
I/O to the appliance at the remote (DR) site. Then, the write is applied to the journal
volume at the remote site. As a next step, data from the journal volume is sent to
the remote replica at predefined intervals. CDP operates in either synchronous or
asynchronous mode.
Hypervisor-based CDP
Protects a single or
VM VM
multiple VMs locally or remotely
Virtual
Appliance Enables to restore VM
to any PIT
Hypervisor
Virtual appliance is
Write Splitter running on a hypervisor
Write splitter is
embedded in the hypervisor
SAN
Notes
VM Disk VM Disk
Some vendors offer continuous data
Files Files
protection for VMs through hypervisor-
Source Local Journal based CDP implementation. In this
Volume Replica
deployment, the specialized hardware-
CDP - Local Replication based appliance is replaced with virtual
appliance which is running on a
hypervisor. The write splitter is
embedded in the hypervisor. This
option protects single or multiple VMs locally or remotely and enables to restore
VMs to any PIT. The local and remote replication operations are as similar as
network-based CDP replication.
Introduction
This lesson presents the need for backup, backup architecture, backup target, and
backup operation. This lesson also focuses on backup granularity and various
backup methods.
Definition: Backup
An additional copy of production data, which is created and retained
for the sole purpose of recovering lost or corrupted data.
Notes
For example: when a service is failed over to other zone (data center), the data
should be available at the destination. This approach helps to successfully failover
the service to minimize the outage. One of the key data protection solutions that
are widely implemented is backup.
A backup is an additional copy of production data, which is created and retained for
the sole purpose of recovering the lost or corrupted data. With the growing
business and the regulatory demands for data storage, retention, and availability,
organizations face the task of backing up an ever-increasing amount of data. This
task becomes more challenging with the growth of data, reduced IT budgets, and
less time available for taking backups.
Moreover, organizations need fast backup and recovery of data to meet their
service level agreements. Most organizations spend a considerable amount of time
and money protecting their application data but give less attention to protecting
their server configurations. During disaster recovery, server configurations must be
re-created before the application and data are accessible to the user.
Backup Architecture
The role of a backup client is to gather the data that needs to backup and send it to
the storage node. The backup client can be installed on application servers, mobile
clients, and desktops. It also sends the tracking information to the backup server.
Backup client
Backup server
Storage node
Backup device (backup target)
Cloud
Backup Clients Backup Server
VM VM
Tracking Information
Backup Data
Tracking Information
Hypervisor
Backup Data
Backup Data
Storage Node
Backup Device
Notes
The backup server manages the backup operations and maintains the backup
catalog, which contains information about the backup configuration and backup
metadata. The backup configuration contains information about when to run
backups, which client data to be backed up, and so on. The backup metadata
contains information about the backed up data. The storage node is responsible for
organizing the client’s data and writing the data to a backup device. A storage node
controls one or more backup devices.
In most implementations, the storage node and the backup server run on the same
system. Backup devices may be attached directly or through a network to the
storage node. The storage node sends the tracking information about the data that
is written to the backup device to the backup server. Typically this information is
used for recoveries. A wide range of backup targets are available such as tape,
disk, and virtual tape library. Now, organization can also back up their data to the
cloud storage. Many service providers offer backup as a service that enables an
organization to reduce its backup management overhead.
Backup Targets
Tape Library Tapes are portable and can be used for long term offsite
storage.
Must be stored in locations with a controlled environment
Not optimized to recognize duplicate content
Data integrity and recoverability are major issues with
tape-based backup media.
Notes
A tape library contains one or more tape drives that records and retrieves data on a
magnetic tape. Tape is portable, and one of the primary reasons for the use of tape
is long-term, offsite storage. Backups that are implemented using tape devices
involve several hidden costs. Tapes must be stored in locations with a controlled
environment to ensure preservation of the media and to prevent data corruption.
Physical transportation of the tapes to offsite locations also adds management
overhead and increases the possibility of loss of tapes during offsite shipment.
The traditional backup process, using tapes, is not optimized to recognize duplicate
content. Due to its sequential data access, both backing up of data and restoring it
take more time with tape. This data access may impact the backup window and
RTO. A backup window is a period during which a production volume is available to
perform backup. Data integrity and recoverability are also major issues with tape-
based backup media.
Disk density has increased dramatically over the past few years, lowering the cost
per GB. So, it became a viable backup target for organizations. When used in a
highly available configuration in a storage array, disks offer a reliable and fast
backup target medium. One way to implement a backup to disk system is by using
it as a staging area. This approach offloads backup data to a secondary backup
target such as tape after a period of time.
Virtual tape libraries use disks as backup media. Virtual tapes are disk drives that
are emulated and presented as tapes to the backup software. Compared to
physical tapes, virtual tapes offer better performance, better reliability, and random
disk access. A virtual tape drive does not require the usual maintenance tasks that
are associated with a physical tape drive, such as periodic cleaning and drive
calibration. Compared to the disk library, a virtual tape library offers easy
installation and administration because it is preconfigured by the manufacturer. A
key feature that is available on virtual tape library appliances is replication.
Backup Operation
VM VM
Backup Clients
Hypervisor
3b
4 Backup
Device
1 3a
5
2
7
Backup Server Storage Node
6
(2) Backup server retrieves backup-related information from the backup catalog.
(3a) Backup server instructs storage node to load backup media in the backup
device.
(3b) Backup server instructs backup clients to send data to be backed up to the
storage node.
(4) Backup clients send data to storage node and update the backup catalog on the
backup server.
(6) Storage node sends metadata and media information to the backup server
Notes
The backup operation is typically initiated by a server, but it can also be initiated by
a client. The backup server initiates the backup process for different clients that is
based on the backup schedule configured for them.
For example: the backup for a group of clients may be scheduled to start at 3:00
a.m. every day. The backup server coordinates the backup process with all the
components in a backup environment. The backup server maintains the information
about backup clients to be backed up and storage nodes to be used in a backup
operation. The backup server retrieves the backup related information from the
backup catalog. Based on this information, the backup server instructs the storage
node to load the appropriate backup media into the backup devices.
Hot backup and cold backup are the two methods that are deployed for backup.
They are based on the state of the application when the backup is performed. In a
hot backup, the application is up-and-running, with users accessing their data
during the backup process. This method of backup is also referred to as online
backup. The hot backup of online production data is challenging because data is
actively being used and changed. If a file is open, it is normally not backed up
during the backup process.
In such situations, an open file agent is required to back up the open file. These
agents interact directly with the operating system or application and enable the
creation of consistent copies of open files. The disadvantage that is associated with
a hot backup is that the agents usually affect the overall application performance. A
cold backup requires the application to be shut down during the backup process.
Hence, this method is also referred to as offline backup. Consistent backups of
databases can also be done by using a cold backup. The disadvantage of a cold
backup is that the database is inaccessible to users during the backup process.
Recovery Operation
VM VM
Backup
Hypervisor Clients
4
1
Backup
Device
3 4
2
6
Backup Server Storage Node
5
(2) Backup server scans backup catalog to identify data to be restored and the
client that will receive data
(3) Backup server instructs storage node to load backup media in the backup
device
Notes
After the data is backed up, it can be restored when required. A restore process
can be manually initiated from the client. A recovery operation restores data to its
original state at a specific PIT. Typically backup applications support restoring one
or more individual files, directories, or VMs. The illustration depicts a restore
operation.
The administrator then selects the data to be restored and the specified point in
time to which the data has to be restored based on the RPO. Because all this
information comes from the backup catalog, the restore application needs to
communicate with the backup server. The backup server instructs the appropriate
storage node to mount the specific backup media onto the backup device. Data is
then read and sent to the client that has been identified to receive the restored
data.Some restorations are successfully accomplished by recovering only the
requested production data. For example, the recovery process of a spreadsheet is
completed when the specific file is restored. In database restorations, additional
data, such as log files, must be restored along with the production data. This
approach ensures consistency of the restored data. In these cases, the RTO is
extended due to the additional steps in the restore operation. It is also important for
the backup and recovery applications to have security mechanisms to avoid
recovery of data by nonauthorized users.
Backup Granularity
Full backup
Incremental Backup
Notes
Backup granularity depends on business needs and the required RTO/RPO. Based
on the granularity, backups can be categorized as full, incremental, and cumulative
(or differential). Most organizations use a combination of these backup types to
meet their backup and recovery requirements.
Full Backup: It is a full copy of the entire data set. Organizations typically use full
backup on a periodic basis because it requires more storage space and also takes
more time to back up. The full backup provides a faster data recovery.
Incremental Backup: It copies the data that has changed since the last backup.
For example, a full backup is created on Monday, and incremental backups are
created for the rest of the week. Tuesday's backup would only contain the data that
has changed since Monday. Wednesday's backup would only contain the data that
has changed since Tuesday.The primary disadvantage to incremental backups is
that they can be time-consuming to restore. Suppose an administrator wants to
restore the backup from Wednesday. To do so, the administrator has to first restore
Monday's full backup. After that, the administrator has to restore Tuesday's copy,
followed by Wednesday's.
Cumulative Backup: It copies the data that has changed since the last full backup.
Suppose, for example, the administrator wants to create a full backup on Monday
and differential backups for the rest of the week. Tuesday's backup would contain
all of the data that has changed since Monday. It would therefore be identical to an
incremental backup at this point.On Wednesday, however, the differential backup
would backup any data that had changed since Monday (full backup). The
advantage that differential backups have over incremental is shorter restore times.
Restoring a differential backup never requires more than two copies.The tradeoff is
that as time progresses, a differential backup can grow to contain more data than
an incremental backup.
Agent-Based Backup
A
VM VM
Hypervisor
Application Servers
A
Backup Device
Backup Server/ Storage Node
Agent A
Notes
This backup does not capture virtual machine configuration files. The agent running
on the compute system consumes CPU cycles and memory resources. If multiple
VMs on a compute system are backed up simultaneously, then the combined I/O
and bandwidth demands that are placed on the compute system by the various
backup operations can deplete the compute system resources.
This approach may impact the performance of the services or applications running
on the VMs. To overcome these challenges, the backup process can be offloaded
from the VMs to a proxy server. This can be achieved by using the image-based
backup approach.
Image-Based Backup
Image-based backup makes a copy of the virtual drive and configuration that are
associated with a particular VM.
Backup is saved as a single entity called a VM image
Enables quick restoration of a VM
Supports recovery at VM-level and file-level
No agent is required inside the VM to perform backup
Backup processing is offloaded from VMs to a proxy server
VM VM
Notes
Image-based backup makes a copy of the virtual drive and configuration that are
associated with a particular VM. The backup is saved as a single entity called as
VM image. This type of backup is suitable for restoring an entire VM if there is a
hardware failure or human error such as the accidental deletion of the VM. The
image - based backup also supports file-level recovery.
In an image-level backup, the backup software can backup VMs without installing
backup agents inside the VMs or at the hypervisor-level. The backup processing is
performed by a proxy server that acts as the backup client, thereby offloading the
backup processing from the VMs. The proxy server communicates to the
management server responsible for managing the virtualized compute
environment. It sends commands to create a snapshot of the VM to be backed up
and to mount the snapshot to the proxy server. A snapshot captures the
configuration and virtual drive data of the target VM and provides a point-in-time
view of the VM. The proxy server then performs backup by using the
Definition: Recovery-in-place
A term that refers to running a VM directly from the backup device,
using a backed up copy of the VM image instead of restoring that
image file.
Eliminates the need to transfer the image from the backup device to the primary
storage before it is restarted
Provides an almost instant recovery of a failed VM
Requires a random access device to work efficiently
Disk-based backup target
Reduces the RTO and network bandwidth to restore VM files
Notes
One of the primary benefits of recovery in place is that it eliminates the need to
transfer the image from the backup area to the primary storage area before it is
restarted. So, the application that is running on those VMs can be accessed more
quickly. This method not only saves time for recovery, but also reduces network
bandwidth to restore files.
NDMP-Based Backup
Definition: NDMP
An open standard TCP/IP-based protocol that is designed for a
backup in a NAS environment.
Notes
To maintain its operational efficiency generally, it does not support the hosting of
third-party applications such as backup clients. This forced backup administrators
to backup data from application server or mount each NAS volume through CIFS or
NFS from another server across the network, which hosted a backup agent. These
approaches may lead to performance degradation of application server and
production network during backup operations, due to overhead.
Further, security structures differ on the two network file systems, NFS and CIFS.
Backups that are implemented through one of the file systems would not effectively
backup any data security attributes on the NAS head that was accessed through a
different file system. For example, CIFS backup, when restored, would not be able
to restore NFS file attributes and vice versa. These backup challenges of the NAS
environment can be addressed with the use of Network Data Management Protocol
(NDMP).
VM VM
A
This backup approach backs up data directly from primary storage system to
backup target without requiring additional backup software.
This backup approach backs up data directly from primary storage system to
backup target without requiring additional backup software.
Eliminates the backup impact on application servers
Improves the backup and recovery performance to meet SLAs
Notes
Typically, an agent runs on the application servers that control the backup process.
This agent stores configuration data for mapping the LUNs on the primary storage
system to the backup device to orchestrate backup (the transfer of changed blocks
and creation of backup images) and recovery operations. This backup information
(metadata) is stored in a catalog which is local to the application server.
When a backup is triggered through the agent running on application server, the
application momentarily pauses simply to mark the point in time for that backup.
The data blocks that have changed since the last backup is sent across the
network to the backup device. The direct movement from primary storage to
backup device eliminates the LAN impact by isolating all backup traffic to the SAN.
This approach eliminates backup impact on application servers and provides faster
backup and recovery to meet the application protection SLAs.
For data recovery, the backup administrator triggers recovery operation and then
the primary storage reads the backup image from the backup device. The primary
storage replaces production LUN with the recovered copy.
Cloud
VM VM
Backup
Clients
Provides the capability to perform backup and recovery at any time, from
anywhere
Reduces the backup management overhead
Transforms from CAPEX to OPEX
Pay-per-use/subscription-based pricing
Enables organizations to meet long-term retention requirements
Backing up to cloud ensures regular and automated backup of data
Gives consumers the flexibility to select a backup technology based on their
current requirements
Notes
Data is important for businesses of all sizes. Organizations need to regularly back
up data to avoid losses, stay compliant, and preserve data integrity. IT
organizations today are dealing with the explosion of data, particularly with the
development of third platform technologies. Data explosion poses the challenge of
data backup and quick data restore. It strains the backup windows, IT budget, and
IT management. The growth and complexity of the data environment, added with
proliferation of virtual machines and mobile devices constantly outpaces the
existing data backup plans.
Many organizations’ remote and branch offices have limited or no backup in place.
Mobile workers represent a particular risk because of the increased possibility of
lost or stolen devices. Backing up to cloud ensures regular and automated backup
of data. Cloud computing gives consumers the flexibility to select a backup
technology, based on their requirement. It also enables to quickly move to a
different technology when their backup requirement changes.
Data can be restored from the cloud using two methods, namely web-based restore
and media-based restore. In web-based restore, the requested data is gathered
and sent to the server, running cloud backup agent. The agent software restores
data on the server. This method is considered if sufficient bandwidth is available. If
a large amount of data needs to be restored and sufficient bandwidth is not
available, then the consumer may request data restoration using backup media
such as DVD or disk drives. In this option, the service provider gathers the data to
restore, stores data to a set of backup media, and ships it to the consumer.
Introduction
This lesson presents the need for data deduplication, and the factors affecting
deduplication ratio. This lesson also focuses on source-based and target-based
deduplication.
Data Deduplication
Duplication process:
Chunk the dataset
Identify duplicate chunk Deduplication
deduplication is expressed
as a deduplication ratio
Notes
To identify redundant blocks, the data deduplication system creates a hash value
or digital signature, like a fingerprint, for each data block. It also creates an index of
the signatures for a given repository. The index provides the reference list to
determine whether blocks exist in a repository.
When the data deduplication system sees a block it has processed before, instead
of storing the block again, it inserts a pointer to the original block in the repository.
It is important to note that the data deduplication can be performed in backup as
well as in production environment. In production environment, the deduplication is
Every data deduplication vendor claims that their product offers a certain ratio of
data reduction. However, the actual data deduplication ratio varies, based on many
factors.
Limited Limited
Budget Backup
Window
Network Longer
Bandwidth Retention
Constrain Period
Notes
With the growth of data and 24x7 service availability requirements, organizations
are facing challenges in protecting their data. Typically, many redundant data is
backed-up. It increases the backup window size and also results in unnecessary
consumption of resources, such as backup storage space and network bandwidth.
There are also requirements to preserve data for longer periods – whether driven
by the need of consumers or legal and regulatory concerns. Backing up large
amount of duplicate data at the remote site or cloud for DR purpose is also
cumbersome and requires lots of bandwidth.
Factor Description
Retention period Longer the data retention period, the greater is the chance of
identical data existence in the backup
Frequency of full More frequently the full backups are conducted, the greater is
backup the advantage of deduplication
Change rate Fewer the changes to the content between backups, the
greater is the efficiency of deduplication
Data type The more unique the data, the less intrinsic duplication exists
Notes
Retention period: This is the period of time that defines how long the backup
copies are retained. The longer the retention, the greater is the chance of identical
data existence in the backup set which would increase the deduplication ratio and
storage space savings.
Frequency of full backup: As more full backups are performed, it increases the
amount of same data being repeatedly backed-up. So, it results in high
deduplication ratio.
Change rate: This is the rate at which the data received from the backup
application changes from backup to backup. Client data with a few changes
between backups produces higher deduplication ratios.
are good deduplication candidates. Other data such as audio, video, and scanned
images are highly unique and typically do not yield good deduplication ratio.
The level at which data is identified as duplicate affects the amount of redundancy
or commonality. The operational levels of deduplication include file-level
deduplication and sub-file deduplication.
File-level Deduplication
Two methods:
Fixed-length block
Variable-length block
Notes
File-level deduplication (also called single instance storage) detects and removes
redundant copies of identical files in a backup environment. Only one copy of the
file is stored; the subsequent copies are replaced with a pointer to the original file.
By removing all of the subsequent copies of a file, a significant amount of space
savings can be achieved. File-level deduplication is simple but does not address
the problem of duplicate content inside the files. A change in any part of a file also
results in classifying that as a new file and saving it as a separate copy. For
example, two 10-MB presentations with a difference in just the title page are not
considered as duplicate files, and each file is stored separately.
Sub-file deduplication breaks the file into smaller blocks and then uses a standard
hash algorithm to detect redundant data within and across the file. As a result, sub-
file deduplication eliminates duplicate data across files. There are two forms of sub-
Notes
However, a deduplication agent running on the client may impact the backup
performance, especially when a large amount of data needs to be backed-up.
When image-level backup is implemented, the backup workload is moved to a
proxy server. The deduplication agent is installed on the proxy server to perform
deduplication without impacting the VMs running applications. Organizations can
implement source-based deduplication when performing backup (backup as a
service) from their location to provider’s location.
Notes
Target-based data deduplication occurs at the backup device, which offloads the
deduplication process and its performance impact from the backup client. In target-
based deduplication, the backup application sends data to the target backup device
where the data is deduplicated, either immediately (inline) or at a scheduled time
(post-process).
With inline data deduplication, the incoming backup stream is divided into small
chunks, and then compared to data that has already been deduplicated. The inline
deduplication method requires less storage space than the post process approach.
However, inline deduplication may slow down the overall data backup process.
In post-process deduplication, the backup data is first stored to the disk in its native
backup format and deduplicated after the backup is complete. In this approach, the
deduplication process is separated from the backup process and the deduplication
happens outside the backup window. However, the full backup dataset is
transmitted across the network to the storage target before the redundancies are
eliminated. So, this approach requires adequate storage capacity to accommodate
the full backup dataset.
Introduction
This lesson presents data archiving operations and difference between backup and
archiving. This lesson also focuses on purpose-built archive storage and cloud-
based archiving.
Notes
In the information life cycle, data , accessed, and changed. As data ages, it is less
likely to be changed and eventually becomes “fixed” but remains accessed by
applications and users. This data is called fixed content. Assets such as X-rays,
MRIs, CAD/CAM designs, surveillance video, MP3s, and financial documents are
examples of fixed data. These data are growing at over 90% annually.
Data archiving is the process of moving data (fixed content) that is no longer
actively accessed to a separate low-cost archival storage tier for long-term
retention and future reference. Data archive is a storage repository that is used to
store these data. Organizations set their own policies for qualifying data to move
into archives. These policy settings are used to automate the process of identifying
and moving the appropriate data into the archive system. Organizations implement
archiving processes and technologies to reduce primary storage cost. With
archiving, the capacity on expensive primary storage can be reclaimed by moving
infrequently accessed data to lower-cost archive tier. Archiving fixed content before
taking backup helps to reduce the backup window and backup storage acquisition
costs.
For instance, all publicly traded companies are subject to the Sarbanes-Oxley
(SOX) Act. This act defines email retention requirements, among other things
related to data storage and security.
For example, new product innovation can be fostered if engineers can access
archived project materials such as designs, test results, and requirement
documents. Besides to meeting governance and compliance requirements,
organizations retain data for business intelligence and competitive advantage. Both
active and archived information can help data scientists drive innovations or help to
improve current business processes.
Data archiving is often confused with data backup. Backups are used to restore
data in case it is lost, corrupted, or destroyed. In contrast, data archives protect
older data that is not required for everyday business operations but may
occasionally need to be accessed.
The table compares some of the significant differences between backup and
archiving.
Archiving agent scans primary storage to find files that meet the archiving
policy. The archive server indexes the files.
Once the files have been indexed, they are moved to archive storage and small
stub files are left on the primary storage.
Primary Storage
Communication
Index
Network
Archive Server
Clients
Archive Storage
Notes
The data archiving operation has an archiving agent, archive server/policy engine,
and archive storage. The archiving agent scans the primary storage to find files that
meet the archiving policy. This policy is defined on the archive server (policy
engine).
After the files are identified for archiving, the archive server creates the index for
the files. Once the files have been indexed, they are moved to the archive storage
and small stub files are left on the primary storage. In other words, each archived
file on primary storage is replaced with a stub file. The stub file contains the
address of the archived file. As the size of the stub file is small, it saves space on
primary storage.
From the perspective of a client, the data movement from primary storage to
secondary storage is transparent.
Legal Dispute
Government Compliance
For example, an organization may need to produce all emails from all individuals
that are involved in stock sales or transfers. Failure to comply with these
requirements could cause an organization to incur penalties.
Email archiving provides more mailbox space by moving old emails to archive
storage.
For example, an organization may configure a quota on each mailbox to limit its
size. A fixed quota for a mailbox forces users to delete emails as they approach the
quota size. However, users often need to access emails that are weeks, months, or
even years old. With email archiving, organizations can free up space in user
mailboxes and still provide user access to older emails.
Each object that is stored in CAS is assigned a globally unique content address
(digital fingerprint of the content).
Application server accesses the CAS device through the CAS API.
Network
Client
Application CAS
Server
CAS API
Notes
CAS stores user data and its attributes as an object. The stored object is assigned
a globally unique address, which is known as a content address (CA). This address
is derived from the binary representation of an object. Content addressing
eliminates the need for application servers to understand and manage the physical
location of objects on a storage system.
Content address (digital fingerprint of the content) not only simplifies the task of
managing huge number of objects, but also ensures content authenticity. The
application server can access the CAS device only through the CAS API.
Cloud-Based Archiving
Organizations prefer hybrid cloud options. Archived data that may require high-
speed access is retained internally (private cloud) while lower-priority archive data
is moved to low-cost, public cloud-based archive storage.
Primary Storage
VM VM
Archive
Data
Hypervisor
Network WAN Cloud
Email/File Server
Archiving Server
(Policy Engine)
Data Center
Notes
Cloud computing provides highly scalable and flexible computing that is available
on demand. It empowers self-service requesting through a fully automated request-
fulfillment process in the background. It provides capital cost savings and agility to
Migration Lesson
Introduction
This lesson presents importance of data migrations and various types of data
migration. This lesson also focuses on Disaster Recovery as a Service (DRaaS).
Migration
Data Migration
Notes
Push
Control
Remote
Device
Device
SAN
Pull
Notes
Data migration solutions perform push and pull operations for data movement.
These terms are defined from the perspective of the control storage system. In the
push operation, data is moved from the control storage system to the remote
storage system. In the pull operation, data is moved from the remote storage
system to the control storage system.
During the push and pull operations, compute system’s access to the remote
device is not enabled. Since, the control storage system has no control over the
remote storage and cannot track any change on the remote device. Data integrity
cannot be guaranteed if changes are made to the remote device during the push
and pull operations. The push/pull operations can be either hot or cold. These
terms apply to the control devices only.
In a cold operation, the control device is inaccessible to the compute system during
migration. Cold operations guarantee data consistency because both the control
and the remote devices are offline. In a hot operation, the control device is online
for compute system operations. During hot push/pull operations, changes can be
made to the control device. Since, the control storage system can keep track of all
changes, and thus ensure data integrity.
VM VM
Notes
Hypervisor
Compute
Data migration can also System
be implemented using a
virtualization appliance
at the SAN. Virtualization Virtual Volume Virtualization
Appliance
appliance provides a
translation layer in the
SAN, between the
FC SAN
compute systems and Storage Storage
System A System B
the storage systems. Data Migration
A virtual volume is created from the storage pool and assigned to the compute
system. When an I/O is sent to a virtual volume, it is redirected through the
virtualization layer to the mapped LUNs. The key advantage of using virtualization
appliance is to support data migration between multivendor heterogeneous storage
systems.
Running services on VMs are moved from one physical compute system to another
without any downtime:
Enables scheduled maintenance without any downtime
Facilitates VM load balancing
Services
Migrated VMs
VM VM VM VM
VM Migration
Services
Hypervisor Hypervisor
Storage
System
Notes
The migration can also be used for disaster recovery, or consolidating VMs onto
fewer physical compute systems. The ideal virtual infrastructure platform should
enable organizations to move the running VMs as quickly as possible and with
minimal impact on the users. This can be achieved with the help of implementing
VM live migrations.
In a VM live migration the entire active state of a VM is moved from one hypervisor
to another. The state information includes memory contents and all other
information that identifies the VM. This method involves copying the contents of VM
memory from the source hypervisor to the target. Then transferring the control of
the VM’s disk files to the target hypervisor. Next, the VM is suspended on the
source hypervisor, and the VM is resumed on the target hypervisor.
Live migration with stretched cluster provides the ability to move VMs and
applications to a location that is closest to the consumer for faster/reliable access.
Migrates VM files from one storage system to another without any service
disruption
VM VM
Notes
Hypervisor
Compute
System
In a VM storage migration, VM
files are moved from one storage
Network system to another system without
any downtime. This approach
VM
enables the administrator to
move VM files across dissimilar
VM storage systems. VM storage
VM
migration starts by copying the
VM metadata about the VM from the
Storage Systems
source system to the target
storage system. The metadata
essentially consists of
configuration, swap, and log files.
After the metadata is copied, the VM disk file is moved to the new location. During
migration, there might be a chance that the source is updated. It is necessary to
track the changes on the source to maintain data integrity. After the migration is
completed, the blocks that have changed since the migration has started are
transferred to the new location.
Notes
pricing model and the use of automated virtual platforms. This can lower costs and
minimize the recovery time after a failure. During normal production operations, IT
services run at the organization’s production data center. Replication of data occurs
from the organization’s production environment to the cloud over the network.
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts In Practice
Concepts in Practice
Remote replication solution that provides DR and data mobility solutions for
PowerMax (VMAX) storage system
Provides the ability to maintain multiple, host-independent, remotely mirrored
copies of data
Backup and recovery software which centralizes, automates, and accelerates data
backup and recovery operations. The following are key features of NetWorker:
DELL EMC Avamar provides a variety of options for backup, including guest OS-
level backup and image-level backup. The three major components of an Avamar
system include Avamar server, Avamar backup clients, and Avamar administrator.
Avamar server provides the essential processes and services required for client
access and remote system administration. The Avamar client software runs on
each compute system that is being backed up. Avamar administrator is a user
DELL EMC Data Domain Extended Retention is a solution for long-term retention
of backup data. It is designed with an internal tiering approach to enable cost-
effective, long-term retention of data on disk by implementing deduplication
technology. Data Domain provides secure multi-tenancy that enables data
protection-as-a-service for large enterprises and service providers who are looking
to offer services based on Data Domain in a private or public cloud. With secure
multi-tenancy, a Data Domain system will logically isolate tenant data, ensuring that
each tenant’s data is only visible and accessible to them.
DELL EMC Data Domain Replicator software transfers only the deduplicated and
compressed unique changes across any IP network, requiring a fraction of the
bandwidth, time, and cost, compared to traditional replication methods. Data
Domain Cloud DR (DD CDR) allows enterprises to copy backed-up VMs from their
on-premise Data Domain and Avamar environments to the public cloud.
IDPA is an innovative solution that provides support for modern applications like
MongoDB and MySQL, and is optimized for VMware. It is also built on industry
proven data invulnerability architecture, delivering encryption, fault detection, and
healing.
SRDF, which stands for Symmetrix Remote Data Facility), is a family of software
that is the industry standard for remote replication in mission-critical environments.
Built for the industry-leading highend PowerMax (VMAX) hardware architecture, the
SRDF family of solutions is trusted globally for disaster recovery and business
continuity.
The SRDF family offers unmatched deployment flexibility and massive scalability to
deliver a wide range of distance replication capabilities.
SnapVX also provides a new option to secure snaps against accidental or internal
deletion. It provides instant restore which means when a LUN level restore is
Concepts in Practice
Enable continuous data protection for any PIT recovery to optimize RPO and
RTO
Ensure recovery consistency for interdependent applications
Provide synchronous or asynchronous replication policies
Reduce WAN bandwidth consumption and utilize available bandwidth optimally
Offer multisite support
Simplifies data backup and archive by easily integrating the LTO family of tape
drives into your data center
It’s lower power consumption makes it an ideal part of a cloud physical
infrastructure build-out
Linear Tape File System (LTFS) support removes software incompatibilities,
creating portability between different vendors and operating systems
Archiving software that helps organizations to archive aging emails, files, and
the Microsoft SharePoint content to the appropriate storage tiers
VMware vMotion
Enables live migration of VM disk files within and across storage systems
without any downtime
Performs zero-downtime storage migrations with complete transaction integrity
Migrates the disk files of VMs running any supported operating system on any
supported server hardware
Simplifies data backup and archive by easily integrating the LTO family of tape
drives into your data center. Supporting TBs of native capacity on a single
cartridge, LTO drives provide decades of shelf life for industries and tasks that
need reliable, long-term, large-capacity data retention, such as:
Healthcare imaging
Media and entertainment
Video surveillance
Geophysical (oil and gas) data
Computational analysis, such as genome mapping and event simulations
VMware vMotion
Performs live migration of a running virtual machine from one physical server to
another, without downtime. The virtual machine retains its network identity and
connections, ensuring a seamless migration process. Transferring the virtual
machine's active memory and the precise execution state over a high-speed
network, allows the virtual machine to move from one host to another. This entire
process takes less time on a gigabit Ethernet network.
Enables live migration of virtual machine disk files within and across storage
systems without service disruptions. Storage vMotion performs zero-downtime
storage migrations with complete transaction integrity. It migrates the disk files of
virtual machines running any supported operating system on any supported server
hardware. It performs live migration of virtual machine disk files across any Fibre
Channel, iSCSI, FCoE, and NFS storage system supported by VMware vSphere. It
allows to redistribute VMs or virtual disks to different storage systems or volumes to
balance capacity or improve performance.
Assessment
A. RTO
B. RPO
C. Backup Window
D. Backup Media
2. Which provides the ability to create fully populated point-in-time copies of LUNs
within a storage system or create a copy of an existing VM?
A. Clone
B. Snapshot
D. LUN masking
A. Retention period
D. Value of data
Summary
Introduction
This module focuses on information security goals and key terminologies. This
module also focuses on the three storage security domains and key threats across
the domains. Further, this module focuses on the various security controls that
enable an organization to mitigate these threats. Finally, this module focuses on
the governance, risk, and compliance (GRC) aspect in a data center environment.
Introduction
This lesson covers goals of information security, security concepts and their
relations, and defense-in-depth strategy. The lesson also focuses on the
governance, risk, and compliance (GRC) aspect in a data center environment.
Notes
Confidentiality
Integrity
Availability
A Ensures that the resources are always available to authorized
users
Accountability
A Users or the applications are responsible for the
actions
Notes
Ensuring confidentiality, integrity, and availability are the primary objective of any IT
security implementation. These goals are supported by using authentication,
authorization, and auditing processes.
Notes
For example: In cloud computing, a customer can access the cloud service catalog
using the credentials. Once the customer is authenticated, a different view of the
catalog is provided along with different options, based on the privileges assigned.
Administrator can have a different view of the catalog compared to a normal user.
The number of times a customer has logged in to the catalog is audited for
monitoring purposes.
Threat Agent
Gives rise to
Threat
Wish to
That exploits
abuse/or may
damage Owner
Vulnerabilities
Leading to
Risk Countermeasures
To Imposes
To reduce
Asset
Values
Notes
The figure shows relationship among various security concepts in a data center
environment. An organization (owner of the asset) wants to safeguard the asset
from threat agents (attackers) who seek to abuse the assets. Risk arises when the
likelihood of a threat agent (an attacker) to exploit the vulnerability arises.
Therefore, the organizations deploy various countermeasures to minimize risk by
reducing the vulnerabilities.
Risk assessment is the first step to determine the extent of potential threats and
risks in an infrastructure. The process assesses risk and helps to identify
appropriate controls to mitigate or eliminate risks. Organizations must apply their
basic information security and risk-management policies and standards to their
infrastructure.
Some of the key security areas that an organization must focus on while building
the infrastructure are: authentication, identity and access management, data loss
prevention and data breach notification, governance, risk, and compliance (GRC),
privacy, network monitoring and analysis, security information and event logging,
incident management, and security management.
Security Concepts
Security Assets
Security Threats
Security Vulnerabilities
Security Controls
Preventive
Detective
Corrective
Information is one of the most important assets for any organization. Other assets
include hardware, software, and other infrastructure components required to
access the information. To protect these assets, organizations deploy security
controls. These security controls have two objectives.
The first objective is to ensure that the resources are easily accessible to
authorized users.
The second objective is to make it difficult for potential attackers to access and
compromise the system.
The effectiveness of a security control can be measured by two key criteria. One,
the cost of implementing the system should be a fraction of the value of the
protected data. Two, it should cost heavily to a potential attacker, in terms of
money, effort, and time, to compromise and access the assets.
Threats are the potential attacks that can be carried out on an IT infrastructure.
These attacks can be classified as active or passive. Passive attacks are attempts
to gain unauthorized access into the system. Passive attacks pose threats to
confidentiality of information. Active attacks include data modification, denial of
service (DoS), and repudiation attacks. Active attacks pose threats to data integrity,
availability, and accountability.
Attack surface, attack vector, and work factor are the three factors to consider
when assessing the extent to which an environment is vulnerable to security
threats. Attack surface refers to the various entry points that an attacker can use to
launch an attack, which includes people, process, and technology. For example,
each component of a storage infrastructure is a source of potential vulnerability. An
attack vector is a step or a series of steps necessary to complete an attack. For
example, an attacker might exploit a bug in the management interface to execute a
snoop attack. Work factor refers to the amount of time and effort required to exploit
an attack vector.
The security controls are directed at reducing vulnerability by minimizing the attack
surfaces and maximizing the work factors. These controls can be technical or non-
technical. Controls are categorized as preventive, detective, and corrective.
Preventive: Avoid problems before they occur
Detective: Detect a problem that has occurred
Corrective: Correct the problem that has occurred
Defense-in-Depth
Definition: Defense-in-Depth
A strategy in which multiple layers of defense are deployed
throughout the infrastructure to help mitigate the risk of security
threats in case one layer of the defense is compromised.
Storage Security
(Encryption, Zoning, etc.)
Compute Security
(Hardening, Malware Protection Software, etc.)
Network Security
(Firewall, DMZ, etc.)
Perimeter Security
(Physical Security)
Notes
This potentially reduces the scope of a security breach. However, the overall cost
of deploying defense-in-depth is often higher compared to single-layered security
controls. An example of defense-in-depth could be a virtual firewall installed on a
hypervisor when there is already a network-based firewall deployed within the
same environment. This provides additional layer of security reducing the chance
of compromising hypervisor’s security if network-level firewall is compromised.
Definition: GRC
A term encompassing processes that help an organization to ensure
that their acts are ethically correct and in accordance with their risk
appetite (the risk level an organization chooses to accept), internal
policies, and external regulations.
Notes
secured multi-tenancy, the jurisdictions where data should be stored, data privacy,
and ownership.
Introduction
This lesson covers the storage security domains and the key security threats
across domains.
Management
Access
Storage Network
Application Secondary Storage
Access
Data
Storage
In the illustration:
The first security domain involves application access to the stored data through
the storage network. Application access domain may include only those
applications that access the data through the file system or a database
interface.
The second security domain includes management access to storage and
interconnecting devices and to the data residing on those devices. Management
access, whether monitoring, provisioning, or managing storage resources, is
associated with every device within the storage environment. Most
management software supports some form of CLI, system management
console, or a web-based interface. Implementing appropriate controls for
securing management applications is important because the damage that can
be caused by using these applications can be far more extensive.
The third domain consists of backup, replication, and archive access. This
domain is primarily accessed by storage administrators who configure and
manage the environment. Along with the access points in this domain, the
backup and replication media also needs to be secured.
Notes
Prevents legitimate users from accessing resources or services. DoS attacks can
be targeted against compute systems, networks, or storage resources in a storage
environment. Always, the intent of DoS is to exhaust key resources, such as
network bandwidth or CPU cycles, thus impacting production use. For example, an
attacker may send massive quantities of data over the network to the storage
system with the intention of consuming bandwidth. This prevents legitimate users
from using the bandwidth and the user may not be able to access the storage
system over the network. Such an attack may be carried out by exploiting
weaknesses of a communication protocol. For example, an attacker may cause
DoS to a legitimate user by resetting TCP sessions. Apart from DoS attack, an
attacker may also carry out Distributed DoS attack.
The principal control that can minimize the impact of DoS and DDoS attack is to
impose restrictions and limits on the network resource consumption. For example,
when it is identified that the amount of data being sent from a given IP address
exceeds the configured limits, the traffic from that IP address may be blocked. This
provides a first line of defense. Further, restrictions and limits may be imposed on
resources consumed by each compute system, providing an additional line of
defense.
Loss of Data
Notes
Data loss can occur in a storage environment due to various reasons other than
malicious attacks. Some of the causes of data loss may include accidental deletion
by an administrator or destruction resulting from natural disasters. In order to
prevent data loss, deploying appropriate measures such as data backup or
replication can reduce the impact of such events. Organizations need to develop
strategies that can avoid or at least minimize the data loss due to such events.
Examples of such strategies include choice of backup media, frequency of backup,
synchronous/asynchronous replication, and number of copies.
Further, if the organization is a cloud service provider then they must publish the
protection controls deployed to protect the data stored in cloud. The providers must
also ensure appropriate terms and conditions related to data loss and the
associated penalties as part of the service contract. The service contract should
also include various BC/DR options, such as backup and replication, offered to the
consumers.
Malicious Insiders
Notes
Today, most organizations are aware of the security threats posed by outsiders.
Countermeasures such as firewalls, malware protection software, and intrusion
detection systems can minimize the risk of attacks from outsiders. However, these
measures do not reduce the risk of attacks from malicious insiders.
the malicious insider may exploit the security weakness. Control measures that can
minimize the risk due to malicious insiders include strict access control policies,
disabling employee accounts immediately after separation from the company,
security audit, encryption, and segregation of duties (role-based access control,
which is discussed later in this module). A background investigation of a candidate
before hiring is another key measure that can reduce the risk due to malicious
insiders.
Account Hacking
Notes
the attacker to take over the user’s account. For example, an employee of an
organization may receive an email that is designed to appear as if the IT
department of that organization has sent it. This email may ask the users to click
the link provided in the email and update their details. After clicking the email, the
user is directed to a malicious website where their details are captured.
Intrusion detection and prevention systems and firewalls are additional controls that
may reduce the risk of such attacks.
Notes
Technologies that are used to build today’s storage infrastructure provide a multi-
tenant environment enabling the sharing of resources. Multi-tenancy is achieved by
using controls that provide separation of resources such as memory and storage
for each application. Failure of these controls may expose the confidential data of
one business unit to users of other business units, raising security risks.
Introduction
This lesson covers physical security and focuses on key security controls.
Security Controls
Any security control should account for three aspects: people, process, and
technology, and the relationships among them.
Notes
At the compute system level, security controls are deployed to secure hypervisors
and hypervisor management systems, virtual machines, guest operating systems,
and applications. Security at the network level commonly includes firewalls,
demilitarized zones, intrusion detection and prevention systems, virtual private
networks, and VLAN. At the storage level, security controls include data shredding,
and data encryption. Apart from these security controls, the storage infrastructure
also requires identity and access management, role-based access control, and
physical security arrangements.
Physical Security
The physical security measures that are deployed to secure the organization’s
storage infrastructure are:
Disabling all unused devices and ports
24/7/365 onsite security
Biometric or security badge-based authentication to grant access to the facilities
Surveillance cameras to monitor activity throughout the facility
Sensors and alarms to detect motion and fire
Notes
The key traditional authentication and authorization controls that are deployed in a
storage environment are Windows ACLs, UNIX permissions, Kerberos, and
Challenge-Handshake Authentication Protocol (CHAP). Alternatively, the
organization can use Federated Identity Management (FIM) for authentication. A
federation is an association of organizations (referred to as trusted parties) that
come together to exchange information about their users and resources to enable
collaboration.
Federation includes the process of managing the trust relationships among the
trusted parties beyond internal networks or administrative boundaries. FIM enables
the organizations (especially cloud service providers) to offer services without
implementing their own authentication system. The organization can choose an
identity provider to authenticate their users. This involves exchanging identity
attributes between the organizations and the identity provider in a secure way. The
identity and access management controls used by organizations include OpenID
and OAuth.
OAuth
Definition: OAuth
An open authorization control enables a client to access protected
resources from a resource server on behalf of a resource owner.
Client
1. Authorization Request
Resource Owner
2. Authorization Grant
3. Authorization Grant
Authorization
Server
4. Access Token
5. Access Token
Resource Server
6. Service Request
Notes
The illustration shows the steps involved in OAuth process as described in Request
for Comments (RFC) 6749 published by Internet Engineering Task Force (IETF):
1. The client requests authorization from the resource owner. The authorization
request can be made directly to the resource owner, or indirectly through the
authorization server.
2. The client receives an authorization grant, which is a credential representing the
resource owner's authorization to access its protected resources. It is used by
the client to obtain an access token. Access tokens are credentials that are
used to access protected resources. An access token is a string representing
an authorization issued to the client. The string is usually opaque to the client.
Tokens represent specific scopes and durations of access, granted by the
resource owner, and enforced by the resource server and authorization server.
3. The client requests an access token by authenticating with the authorization
server and presenting the authorization grant.
4. The authorization server authenticates the client and validates the authorization
grant, and if valid, issues an access token.
5. The client requests the protected resource from the resource server and
authenticates by presenting the access token.
6. The resource server validates the access token, and if valid, serves the request.
OpenID
Definition: OpenID
An open standard for authentication in which an organization uses
authentication services from an OpenID provider.
Service Provider
Do not require their own
(Relying Party) authentication control
Step 1
Step 4
Step 3
OpenID Provider
(Identity Provider)
User
Step 1: Login request using OpenID
Notes
The organization is known as the relying party and the OpenID provider is known
as the identity provider. An OpenID provider maintains users credentials on their
authentication system and enables relying parties to authenticate users requesting
the use of the relying party’s services. This eliminates the need for the relying party
to deploy their own authentication systems.
In the OpenID control, a user creates an ID with one of the OpenID providers. This
OpenID then can be used to sign on to any organization (relying party) that accepts
Open ID authentication. This control can be used in the modern environment to
secure application access domain.
The illustration shows the OpenID concept by considering a user who requires
services from the relying party. For the user to use the services provided by the
relying party, an identity (user ID and password) is required. The relying party does
not provide their own authentication control, however they support OpenID from
one or more OpenID providers. The user can create an ID with the identity provider
and then use this ID with the relying party. The relying party, after receiving the
login request, authenticates it with the help of identity provider and then grants
access to the services.
Multifactor Authentication
User Login
BH347N12459820
Password
Token
459820
Notes
further enhance the authentication process, more factors may also be considered.
Examples of more factors that may be used include biometric identity. A multifactor
authentication technique may be deployed using any combination of these factors.
A user’s access to the environment is granted only when all the required factors are
validated.
CHAP is basic authentication control that has been widely adopted by network
devices and compute systems. It provides a method for initiators and targets to
authenticate each other by using a secret code or password.
The figure illustrates the handshake steps that occur between an initiator and a
target:
VM VM
3. Takes shared secret and calculates value using a one-way hash function
Hypervisor
4. Returns hash value to the target
Compute System
5. Computes the expected hash value from the shared secret and compares the value
iSCSI Storage System
received from initiator
Initiator Target
Notes
CHAP secrets are random secrets of 12 to 128 characters. The secret is never
exchanged directly over the communication channel. It is rather, a one-way hash
function that converts it into a hash value, which is then exchanged.
A hash function, using the MD5 algorithm, transforms data in such a way that the
result is unique and cannot be changed back to its original form. If the initiator
requires reverse CHAP authentication, the initiator authenticates the target by
using the same procedure. The CHAP secret must be configured on the initiator
and the target. A CHAP entry, which is composed of the name of a node and the
secret associated with the node, is maintained by the target and the initiator.
The same steps are execute run in a two-way CHAP authentication scenario.After
these steps are completed, the initiator authenticates the target. If both the
authentication steps succeed, then data access is enabled. CHAP is often used
Notes
Definition: Firewall
A security control designed to monitor the incoming and the outgoing
network traffic and compare them to a set of filtering rules.
Firewall security rules may use various filtering parameters such as source
address, destination address, port numbers, and protocols. The effectiveness of
a firewall depends on how robustly and extensively the security rules are
defined.
Firewalls can be deployed at:
Network level
Compute level
Hypervisor level
Uses various parameters for traffic filtering
Notes
A network-level firewall is typically used as first line of defense for restricting certain
type of traffic from coming in and going out from a network. This type of firewall is
typically deployed at the entry point of an organization’s network.
To reduce the vulnerability and protect the internal resources and applications, the
compute systems or virtual machines that require the Internet access are placed in
a demilitarized zone.
In a demilitarized zone environment, servers that need Internet access are placed
between two sets of firewalls. The servers in the demilitarized zone may or may not
be allowed to communicate with internal resources. Application-specific ports such
as those designated for HTTP or FTP traffic are allowed through the firewall to the
demilitarized zone servers. However, no Internet-based traffic is allowed to go
through the second set of firewalls and gain access to the internal network.
Scans and analyzes events to detect if they are statistically different from
normal events
Has the ability to detect various events
Notes
Intrusion detection is the process of detecting events that can compromise the
confidentiality, integrity, or availability of IT resources.
An intrusion detection system (IDS) is a security tool that automates the detection
process. An IDS generates alerts, in case anomalous activity is detected. An
intrusion prevention system (IPS) is a tool that has the capability to stop the events
after they have been detected by the IDS. These two controls usually work together
and are generally referred to as intrusion detection and prevention system (IDPS).
The key techniques used by an IDPS to identify intrusion in the environment are
signature-based and anomaly-based detection.
In the anomaly-based detection technique, the IDPS scans and analyzes events to
determine whether they are statistically different from events normally occurring in
the system. This technique can detect various events such as multiple login
failures, excessive process failure, excessive network bandwidth consumed by an
The IDPS can be deployed at the compute system, network, or hypervisor levels.
Notes
In the storage environment, a virtual private network (VPN) can be used to provide
a user, a secure connection to the storage resources. VPN is also used to provide
secure site-to-site connection between a primary site and a DR site when
performing remote replication. VPN can also be used to provide secure site-to-site
connection between an organization’s data center and cloud.
Notes
Malware protection software can also be used to protect operating system against
attacks. A common type of attack that is carried out on operating systems is by
modifying its sensitive areas, such as registry keys or configuration files, with the
intention of causing the application to function incorrectly or to fail. This can be
prevented by disallowing the unauthorized modification of sensitive areas by
Data Encryption
Notes
Data encryption is one of the most important controls for securing data in-flight and
at-rest. Data in-flight refers to data that is being transferred over a network and
data at-rest refers to data that is stored on a storage medium. Data encryption
provides protection from threats such as tampering with data which violates data
integrity, media theft which compromises data availability, and confidentiality and
sniffing attacks which compromise confidentiality.
Data Shredding
Notes
Typically, when data is deleted, it is not made unrecoverable from the storage and
an attacker may use specialized tools to recover it. The threat of unauthorized data
recovery is greater when an organization discards the failed storage media such as
disk drive, solid state drive, or tape. After the organization discards the media, an
attacker may gain access to these media and may recover the data by using
specialized tools.
Organizations may create multiple copies (backups and replicas) of their data and
store at multiple locations as part of business continuity and disaster recovery
Introduction
This section highlights technologies that are relevant to the topics covered in this
module.
Concepts in Practice
Concepts in Practice
RSA SecurID
Helps security analysts detect and investigate threats often missed by other
security tools. Security Analytics provides converged network security monitoring
and centralized security information and event management (SIEM). Security
Analytics combines big data security collection, management, and analytics; full
network and log-based visibility; and automated threat intelligence – enabling
security analysts to better detect, investigate, and understand threats they often
could not easily see or understand before. It provides a single platform for
capturing and analyzing large amounts of network, log, and other data. It also
accelerates security investigations by enabling analysts to pivot through terabytes
of metadata, log data, and recreated network sessions. It archives and analyzes
long-term security data through a distributed computing architecture and provides
built-in compliance reports covering a multitude of regulatory regimes.
post-login activities by evaluating a variety of risk indicators. Using a risk and rules-
based approach, the system then requires additional identity assurance, such as
out-of-band authentication, for scenarios that are at high risk and violate a policy.
This methodology provides transparent authentication for organizations that want to
protect users accessing websites and online portals, mobile applications and
browsers, Automated Teller Machines (ATMs), Secure Sockets Layer (SSL), virtual
private network (VPN) applications, web access management (WAM) applications,
and application delivery solutions.
Helps customers to audit, alert, protect and reports user activity and configuration
and application changes against Active Directory and Windows applications. The
software has role-based access, enabling auditors to have access to only the
information they need to quickly perform their job. Change Auditor provides visibility
into enterprise-wide activities from one central console, enabling customers to see
how data is being handled.
Dell InTrust
An IT data analytics solution that provides the organizations the power to search
and analyze vast amounts of data in one place. It provides real-time insights into
user activity across security, compliance, and operational teams. It helps the
administrators to troubleshoot the issues by conducting security investigations
regardless of how and where the data is stored. It helps the compliance officers to
produce reports validating the compliance across multiple systems. This web
interface quickly provides information on who accessed the data, how was it
obtained and how the data was used. This helps the administrators and security
teams to discover the suspicious event trends.
VMware Airwatch
VMware AppDefense
Assessment
A. Degaussing media
B. Masking
C. Backup
D. Hardening
Summary
Introduction
This module focuses on the key functions and processes of the storage
infrastructure management.
Introduction
Notes
The key storage infrastructure components are compute systems, storage systems,
and storage area networks (SANs). These components could be physical or virtual
and are used to provide services to the users. The storage infrastructure
management includes all the storage infrastructure-related functions that are
necessary for the management of the infrastructure components and services, and
for the maintenance of data throughout its lifecycle. These functions help IT
organizations to align their storage operations and services to their strategic
business goal and service level requirements. They ensure that the storage
infrastructure is operated optimally by using as few resources as needed. They
also ensure better utilization of existing components, thereby limiting the need for
excessive ongoing investment on infrastructure.
Modern data center management functions are different in many ways from the
traditional management and have the following set of distinctive characteristics:
Service-focused approach
Software-defined infrastructure-aware
End-to-end visibility
Orchestrated operations
Notes
Further, the traditional management processes and tools may not support a service
oriented infrastructure, especially if the requirement is to provide cloud services.
They usually lack the ability to execute management operations in agile manner,
respond to adverse events quickly, coordinate the functions of distributed
infrastructure components, and meet sustained service levels. This component
specific, extremely manual, time consuming, and overly complex management is
simply not appropriate for modern data center infrastructure.
Service-focused Approach
Notes
Software-Defined Infrastructure-aware
Notes
all the managed components that may be distributed across wide locations. These
components may also be proprietary or commodity hardware manufactured by
different vendors. But, the software controller ensures that the management
operations are independent of the underlying hardware.
End-to-end Visibility
Notes
Depending on the size of the storage infrastructure and the number of services
involved, the administrators may have to monitor information about hundreds or
thousands of components located in multiple data centers. In addition, the
configuration, connectivity, and interrelationships of components change as the
storage infrastructure grows, applications scale, and services are updated.
Organizations typically deploy specialized monitoring tools that provide end-to-end
visibility of a storage infrastructure on a digital dashboard. In addition, they are
capable of reporting relevant information in a rapidly changing and varying
workload environment.
Orchestrated Operations
Definition: Orchestration
Automated arrangement, coordination, and management of various
system or component functions in a storage infrastructure.
Notes
Orchestration Example
The example illustrates an orchestrated operation that creates a block volume for a
compute system.
Orchestrator
Administrator
Storage Infrastructure
V V
AP AP
O O
VMM
Storage
VMM
Hypervisor Kernel
FC Switch System Interaction
ComputeSystem
Notes
In this example, an administrator logs on to the management portal and initiates the
volume creation operation from the portal. The operation request is routed to the
orchestrator which triggers a workflow, as shown on the slide, to fulfill this request.
The workflow programmatically integrates and sequences the required compute,
storage, and network component functions to create the block volume.
The orchestrator interacts with the software-define storage (SDS) controller to let
the controller to carry out the operation according to the workflow. The SDS
controller interacts with the infrastructure components to enable the execution of
component functions such as zoning, LUN creation, and bus rescan. Through the
workflow, the management portal receives the response on the outcome of the
operation.
Definition: Discovery
A management function that creates an inventory of infrastructure
components and provides information about the components
including their configuration, connectivity, functions, performance,
capacity, availability, utilization, and physical-to-virtual dependencies.
Infrastructure Discovery
Operations Management
Monitoring
Configuration management
Change management
Capacity management
Performance management
Availability management
Incident management
Problem management
Security management
Notes
Infrastructure discovery provides the visibility needed to monitor and manage the
infrastructure components. Discovery is performed using a specialized tool that
commonly interacts with infrastructure components commonly through the native
APIs of these components. Through the interaction, it collects information from the
infrastructure components.
Operations Management
Introduction
Operations Management
Introduction to Monitoring
Monitoring provides visibility into the storage infrastructure and forms the basis
for performing management operations
It helps to
Notes
Such procedures can reduce downtime due to known infrastructure errors and the
level of manual intervention needed to recover from them. Further, monitoring
helps in generating reports for service usage and trends. It also helps to trigger
alerts when thresholds are reached, security policies are violated, and service
performance deviates from SLA. Alerting and reporting are detailed later in this
module. Additionally, monitoring of the data center environment parameters such
Monitoring Parameters
Monitoring Configuration
WWN 10:00:00:90:FA:18:0D:CF
WWN 50:06:01:6F:08:60:1E:BD
Zone esx161_vnx_152_1
VM VM
Hypervisor
VM VM
Hypervisor
FC Switch
VM VM
Hypervisor
Compute Systems
Storage Systems
The table lists configuration changes in the storage infrastructure shown in the
image.
Notes
Monitoring Availability
Identifies the failure of any component or process that may lead to service
unavailability or degraded performance.
VM VM
APP APP
OS OS
VM VM
APP APP
OS OS
Hypervisor
VM VM
APP APP SW2
Storage Systems
OS OS
Hypervisor
Compute Systems
Notes
A storage infrastructure includes three compute systems (H1, H2, and H3) that
are running hypervisors
All the compute systems are configured with two FC HBAs, each connected to
the production storage system through two FC switches, SW1 and SW2. All the
compute systems share two storage ports on the storage system.
Multipathing software has also been installed on hypervisor running on all the
three compute systems. If one of the switches, SW1 fails, the multipathing
software initiates a path failover, and all the compute systems continue to
access data through the other switch, SW2.
Due to absence of redundant switch, a second switch failure could result in
unavailability of the storage system. Monitoring for availability enables detecting
the switch failure and helps administrator to take corrective action before
another failure occurs. In most cases, the administrator receives symptom alerts
for a failing component and can initiate actions before the component fails.
Monitoring Capacity
The figure provides an example that illustrates the importance of monitoring NAS
file system capacity.
Free
Capacity
Free
Free Capacity
Capacity
Free
Capacity
Used Used
Used Capacity Capacity
Capacity
Used
Capacity
NAS File System NAS File System
Time
Notes
The figure provides an example that illustrates the importance of monitoring NAS
file system capacity:
If the file system is full and no space is available for applications to perform
write I/O, it may result in application/service outage
Monitoring tools can be configured to issue a notification when thresholds are
reached on the file system capacity; for example:
When the file system reaches 66 percent of its capacity, a warning message
is issued, and a critical message is issued when the file system reaches 80
percent of its capacity
This enables the administrator to take actions to provision additional LUNs
to the NAS and extend the NAS file system before it runs out of capacity
Proactively monitoring the file system can prevent service outages caused due
to lack of file system space
Monitoring Performance
VM VM
APP APP
H1 OS OS
Hypervisor
Storage Systems
VM VM
SW1
APP APP
H2 OS OS
Hypervisor
VM VM
APP APP
H3 OS OS SW2
Hypervisor
100%
VM VM
APP APP
New Compute OS OS
Port
Systems Hypervisor
Utilization
%
Compute Systems H1 + H2 + H3
Notes
performance level. This helps to identify performance bottlenecks. It also deals with
the utilization of resources, which affects the way resources behave and respond.
Utilization of the shared storage port is shown by the solid and dotted lines in the
graph. If the port utilization prior to deploying the new compute system is close to
100 percent, then deploying the new compute system is not recommended
because it might impact the performance of the other compute systems. However,
if the utilization of the port prior to deploying the new compute system is closer to
the dotted line, then there is room to add a new compute system.
Monitoring Security
Workgroup 2 (WG2)
V V
AP AP
O O
S S
Hypervisor
V V V V
AP AP AP AP
O
S
Hypervisor
O
S
O
S
Hypervisor
O
S SW1
WG2
WG1
V V
AP AP
SW2
O O
S S
Hypervisor
Replication
Command
V V V V
AP AP AP AP
O
S
O
S
O
S
O
S
Storage System
Hypervisor Hypervisor Warning: Attempted replication of WG2 devices
by WG1 user - Access denied
Inaccessible
Workgroup 1 (WG1)
Notes
IT organizations typically comply with various information security policies that may
be specific to government regulations, organizational rules, or deployed services.
Monitoring detects all operations and data movement that deviate from predefined
Alerting
Notes
policy, or a soft media error on storage drives, are considered warning signs and
may also require administrative attention.
Reporting
Notes
Reports are commonly displayed like a digital dashboard, which provide real time
tabular or graphical views of gathered information. Dashboard reporting helps
administrators to make instantaneous and informed decisions on resource
procurement, plans for modifications in the existing infrastructure, policy
enforcement, and improvements in management processes.
The ability to measure storage resource consumption per business unit or user
group and charge them back accordingly.
To perform chargeback, the storage usage data is collected by a billing system
that generates chargeback report for each business unit or user group
The billing system is responsible for accurate measurement of the number of
units of storage used and reports cost/charge for the consumed units
The figure shows the assignment of storage resource as services to two business
units, Payroll_1 and Engineering_1, and presents a sample chargeback report.
Payroll_1 Compute 50 GB 50 GB 50 GB
Systems
50 GB 50 GB 50 GB
Engineering_1 Compute
100 GB 100 GB 100 GB
Systems
Notes
In this example, each business unit is using a set of compute systems that are
running hypervisor. The VMs hosted on these compute systems are used by the
business units. LUNs are assigned to the hypervisor from the production storage
system. Storage system-based replication technology is used to create both local
and remote replicas. A chargeback report documenting the exact amount of
storage resources used by each business unit is created by a billing system. If the
unit for billing is GB of raw storage, the exact amount of raw space (usable capacity
plus protection provided) configured for each business unit must be reported.
Consider that the Payroll_1 unit has consumed two production LUNs, each 50 GB
in size. Therefore, the storage allocated to the hypervisor is 100 GB (50 + 50). The
allocated storage for local replication is 100 GB and for remote replication is also
100 GB. From the allocated storage, the raw storage configured for the hypervisor
is determined based on the RAID protection that is used for various storage pools.
If the Payroll_1 production LUNs are RAID 1-protected, the raw space used by the
production volumes is 200 GB.
Assume that the local replicas are on unprotected LUNs, and the remote replicas
are protected with a RAID 5 configuration, then 100 GB of raw space is used by the
local replica and 125 GB by the remote replica. Therefore, the total raw capacity
used by the Payroll_1 unit is 425 GB. The total cost of storage provisioned for
Payroll_1 unit will be $2,125 (assume cost per GB of raw storage is $5). The
Engineering_1 unit also uses two LUNs, but each 100 GB in size. Considering the
same RAID protection and per unit cost, the chargeback for the Engineering_1 unit
will be $3,500.
– Configuration management
– Change management
– Capacity management
– Performance management
– Availability management
– Incident management
– Problem management
– Security management
Configuration Management
Key functions:
Discovers and maintains information on CIs in a configuration management
system (CMS)
Updates CMS when new CIs are deployed, or CI attributes change
Examples of CI information:
Attributes of CIs such as CI’s name, manufacturer name, serial number, license
status, version, location, and inventory status
Used and available capacity of CIs
Issues linked to CIs
Inter-relationships among CIs such as service-to-user, storage pool-to-service,
storage system-to-storage pool, and storage system-to-SAN switch
Notes
All information about CIs is usually collected and stored by the discovery tools in a
single database or in multiple autonomous databases mapped into a federated
database called a configuration management system (CMS). Discovery tools also
update the CMS when new CIs are deployed or when attributes of CIs change.
CMS provides a consolidated view of CI attributes and relationships, which is used
by other management processes for their operations. For example, CMS helps the
security management process to examine the deployment of a security patch on
VMs, the problem management to resolve a connectivity issue, or the capacity
management to identify the CIs affected on expansion of a storage pool.
Change Management
Key function:
Assesses potential risks of all changes to the CIs and makes a decision to
approve/reject the requested changes
Notes
With the changing business requirements, the ongoing changes to the CIs become
almost daily task. Relevant changes could range from the introduction of a new
service, to modification of an existing service’s attributes, to retirement of a service;
from replacing a SAN switch, to expansion of a storage pool, to a software
upgrade, and even to a change in process or procedural documentation. Change
management standardizes change-related procedure in a storage infrastructure to
respond to the changing business requirements in an agile way. It oversees all
changes to the CIs to minimize adverse impact of those changes to the business
and the users of services.
Capacity Management
Key functions:
Determines optimal amount of storage needed to meet SLA
Maximizes capacity utilization without impacting service levels
Establishes capacity consumption trends and plans for additional capacity
Notes
serves as input to the capacity planning activities and enables the procurement and
provisioning of additional capacity in the most cost effective and least disruptive
manner.
This example illustrates the expansion of a NAS file system using an orchestrated
workflow. The file system is expanded to meet the capacity requirement of a
compute cluster that accesses the file system.
Administrator
Orchestration
Start
Change Management
Yes
Expand File System to a Approval Review and Approve/Reject Change
Request for Approval
Required? Request
Specific Size
No
Yes
Management Portal Request
Expand File System Approved?
SDS Controller
No
Add Required Capacity to File System
Discover and Update CMS
Update Portal (Operation
Rejected)
Configuration Management
Notes
If the file system expansion request is approved, the orchestrator interacts with the
SDS controller to invoke the expansion. Thereafter, the SDS controller interacts
with the storage infrastructure components to add the required capacity to the file
system. The orchestrated workflow also invokes the discovery operation which
updates the CMS with information on the modified file system size. The
orchestrator responds by sending updates to the management portal appropriately
following completion or rejection of the expansion operation.
Performance Management
Key functions:
Measures and analyzes the response time and throughput of components
Identifies components that are performing below the expected level
Makes configuration changes to optimize performance and address issues
Notes
Availability Management
Key functions:
Establishes guideline to meet stated availability levels at a justifiable cost
Identifies availability-related issues and areas for improvement
Proposes changes in existing BC solutions or architects new BC solutions
Notes
Based on the service availability requirements and areas found for improvement,
the availability management team may propose new business continuity (BC)
solutions or changes in the existing BC solutions. For example, when a set of
compute systems is deployed to support a service or any critical business function,
it requires high availability. The availability management team proposes
redundancy at all levels, including components, data, or even site levels. This is
generally accomplished by deploying two or more HBAs per system, multipathing
software, and compute clustering.
The compute systems must be connected to the storage systems using at least two
independent fabrics and switches that have built-in redundancy and hot-swappable
components. The VMs running on these compute systems must be protected from
hardware failure/unavailability through VM failover mechanisms. Deployed
applications should have built-in fault resiliency features. The storage systems
should also have built-in redundancy for various components and should support
local and remote replication. RAID-protected LUNs should be provisioned to the
compute systems using at least two front-end ports. In addition, multiple availability
zones may be created to support fault tolerance at the site level.
Incident Management
Key functions:
Detects and records all incidents in a storage infrastructure
Investigates incidents and provides solutions to resolve the incidents
Documents incident history
The table provides a sample list of incidents that are captured by an incident
management tool.
Notes
Incidents are commonly detected and logged by incident management tools. They
also help administrators to track, escalate, and respond to the incidents from their
initiation to closure. Incidents may also be registered by the users through a self-
service portal, emails, or a service desk. The service desk may consist of a call
center to handle a large volume of telephone calls and a help desk as the first line
of service support. If the service desk is unsuccessful in providing solutions against
the incidents, they are escalated to other incident management support groups or
to problem management.
Problem Management
Key functions:
Reviews incident history to detect problems in a storage infrastructure
Identifies the underlying root cause that creates a problem
Integrated incident and problem management tools may mark specific
incidents as problem and perform root cause analysis
Provides most appropriate solution/preventive remediation for problems
Analyzes and solves errors proactively before they become an incident/problem
Notes
Security Management
Key functions:
Develops information security policies
Deploys required security architecture, processes, mechanisms, and tools
Notes
Introduction
Concepts In Practice
Concepts in Practice
future growth, and proactively discover and re-mediate issues from any browser or
mobile device.
vRealize Operations
vRealize Orchestrator
Orchestration software that helps to automate and coordinate the service delivery
and operational functions in a storage infrastructure. It comes with a built-in library
of pre-defined workflows as well as a drag-and-drop feature for linking actions
together to create customized workflows. These workflows can be launched from
the VMware vSphere client, from various components of VMware vCloud Suite, or
through various triggering mechanisms. vRealize Orchestrator can execute
hundreds or thousands of workflows concurrently.
Assessment
Summary
Summary
Summary