You are on page 1of 808

INFORMATION

STORAGE AND
MANAGEMENT (ISM) V4
Revision [1.0]

PARTICIPANT GUIDE

PARTICIPANT GUIDE
Dell Confidential and Proprietary

Copyright © 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies,
Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be trademarks of their respective owners.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page i


Table of Contents

Course Introduction.................................................................................. 1

Information Storage and Management (ISM) v4 ...................................................... 2


Prerequisite Skills ................................................................................................................ 3
Course Agenda .................................................................................................................... 4

Introduction to Information Storage ........................................................ 5

Introduction to Information Storage ......................................................................... 6


Introduction to Information Storage ...................................................................................... 7
Assessment ....................................................................................................................... 24

Summary................................................................................................................... 25

Modern Technologies Driving Digital Transformation ......................... 26

Cloud Computing Lesson ....................................................................................... 27


Cloud Computing ............................................................................................................... 28

Big Data Analytics Lesson ...................................................................................... 51


Big Data Analytics .............................................................................................................. 52

Internet of Things Lesson ....................................................................................... 68


Internet of Things ............................................................................................................... 69

Machine Learning Lesson ....................................................................................... 75


Machine Learning .............................................................................................................. 76

Concepts in Practice Lesson .................................................................................. 82


Concepts in Practice .......................................................................................................... 83
Assessment ....................................................................................................................... 85

Summary................................................................................................................... 86

Information Storage and Management (ISM) v4

Page ii © Copyright 2019 Dell Inc.


Modern Data Center Environment ......................................................... 87

Compute System Lesson ........................................................................................ 88


Compute System ............................................................................................................... 89

Compute and Desktop Virtualization Lesson ...................................................... 105


Compute and Desktop Virtualization ................................................................................ 106

Storage and Network Lesson ................................................................................ 122


Storage and Network ....................................................................................................... 123

Applications Lesson .............................................................................................. 135


Applications ..................................................................................................................... 136

Software-Defined Data Center (SDDC) Lesson ................................................... 146


Software-Defined Data Center (SDDC) ............................................................................ 147

Modern Data Center Infrastructure Lesson ......................................................... 152


Modern Data Center Infrastructure ................................................................................... 153

Concepts in Practice Lesson ................................................................................ 170


Concepts in Practice ........................................................................................................ 171
Assessment ..................................................................................................................... 177

Summary................................................................................................................. 178

Intelligent Storage Systems ................................................................. 179

Components of Intelligent Storage Systems Lesson ......................................... 180


ISMv4 Source - Intelligent Storage Systems - Components ............................................. 181

RAID Techniques Lesson ...................................................................................... 211


RAID Techniques ............................................................................................................. 212

Types of Intelligent Storage Systems Lesson ..................................................... 233

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page iii


Types of Intelligent Storage Systems ............................................................................... 234
Assessment ..................................................................................................................... 238

Summary................................................................................................................. 239

Block-Based Storage System .............................................................. 240

Components of a Block-Based Storage System Lesson.................................... 241


Components of a Block-Based Storage System ............................................................... 242

Storage Provisioning Lesson ............................................................................... 256


Storage Provisioning ........................................................................................................ 257

Storage Tiering Lesson ......................................................................................... 269


Storage Tiering ................................................................................................................ 270

Concepts in Practice Lesson ................................................................................ 278


Concepts in Practice ........................................................................................................ 279
Assessment ..................................................................................................................... 282

Summary................................................................................................................. 283

Fibre Channel SAN ............................................................................... 284

Introduction to SAN Lesson .................................................................................. 285


Introduction to SAN .......................................................................................................... 286

FC SAN Overview Lesson ..................................................................................... 289


FC SAN Overview ............................................................................................................ 290

FC Architecture Lesson......................................................................................... 302


FC SAN Architecture........................................................................................................ 303

Topologies, Link Aggregation and Zoning Lesson ............................................. 314


Topologies, Link Aggregation and Zoning ........................................................................ 315

SAN Virtualization Lesson .................................................................................... 328

Information Storage and Management (ISM) v4

Page iv © Copyright 2019 Dell Inc.


SAN Virtualization ............................................................................................................ 329

Concepts in Practice Lesson ................................................................................ 339


Concepts In Practice ........................................................................................................ 340
Assessment ..................................................................................................................... 343

Summary................................................................................................................. 344

IP and FCoE SAN .................................................................................. 345

Overview of TCP/IP Lesson................................................................................... 346


Overview of TCP/IP ......................................................................................................... 347

Overview of IP SAN Lesson .................................................................................. 356


Overview of IP SAN ......................................................................................................... 357

iSCSI Lesson .......................................................................................................... 363


iSCSI ............................................................................................................................... 364

FCIP Lesson ........................................................................................................... 386


FCIP ................................................................................................................................ 387

FCoE Lesson .......................................................................................................... 395


FCoE ............................................................................................................................... 396

Concepts in Practice Lesson ................................................................................ 404


Concepts In Practice ........................................................................................................ 405
Assessment ..................................................................................................................... 408

Summary................................................................................................................. 409

File-Based and Object-Based Storage System ................................... 410

NAS Components and Architecture Lesson ........................................................ 411


NAS Components and Architecture ................................................................................. 412

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page v


File-Level Virtualization and Tiering Lesson ....................................................... 432
File-Level Virtualization and Tiering ................................................................................. 433

Object-Based and Unified Storage Lesson .......................................................... 440


Object-Based and Unified Storage Overview ................................................................... 441

Concepts in Practice Lesson ................................................................................ 462


Concepts in Practice ........................................................................................................ 463
Assessment ..................................................................................................................... 465

Summary................................................................................................................. 466

Software-Defined Storage and Networking ......................................... 467

Software-Defined Storage (SDS) Lesson ............................................................. 468


Software-Defined Storage (SDS) ..................................................................................... 469

Software-Defined Networking (SDN) Lesson ....................................................... 493


Software-Defined Networking (SDN) ................................................................................ 494

Concepts in Practice Lesson ................................................................................ 502


Concepts in Practice ........................................................................................................ 503
Assessment ..................................................................................................................... 505

Summary................................................................................................................. 506

Introduction to Business Continuity ................................................... 507

Business Continuity Overview Lesson ................................................................ 508


Business Continuity Overview .......................................................................................... 509

Business Continuity Fault Tolerance Lesson ..................................................... 529


Fault Tolerance IT Infrastructure ...................................................................................... 530

Concepts in Practice Lesson ................................................................................ 556


Concepts In Practice ........................................................................................................ 557

Information Storage and Management (ISM) v4

Page vi © Copyright 2019 Dell Inc.


Assessment ..................................................................................................................... 560

Summary................................................................................................................. 561

Data Protection Solutions .................................................................... 562

Replication Lesson ................................................................................................ 563


Replication ....................................................................................................................... 564

Backup and Recovery Lesson .............................................................................. 594


Backup and Recovery Overview ...................................................................................... 595

Data Deduplication Lesson ................................................................................... 622


Data Deduplication........................................................................................................... 623

Data Archiving Lesson .......................................................................................... 634


ISMv4 Source - Data Protection Solutions - Data Archiving ............................................. 635

Migration Lesson ................................................................................................... 647


Migration .......................................................................................................................... 648

Concepts in Practice Lesson ................................................................................ 660


Concepts In Practice ........................................................................................................ 661
Assessment ..................................................................................................................... 672

Summary................................................................................................................. 674

Storage Infrastructure Security ........................................................... 675

Introduction to Information Security Lesson ...................................................... 676


Introduction to Information Security .................................................................................. 677

Storage Security Domains and Threats Lesson .................................................. 691


Storage Security Domains and Threats............................................................................ 692

Security Controls Lesson...................................................................................... 703

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page vii


Security Controls.............................................................................................................. 704

Concepts in Practice Lesson ................................................................................ 730


Concepts in Practice ........................................................................................................ 731
Assessment ..................................................................................................................... 735

Summary................................................................................................................. 736

Storage Infrastructure Management.................................................... 737

Introduction to Storage Infrastructure Management Lesson ............................. 738


Introduction to Storage Infrastructure Management ......................................................... 739

Operations Management ....................................................................................... 751


Operations Management.................................................................................................. 752

Concepts in Practice Lesson ................................................................................ 790


Concepts In Practice ........................................................................................................ 791
Assessment ..................................................................................................................... 793

Summary................................................................................................................. 794

Course Conclusion ............................................................................... 795

Information Storage and Management (ISM) v4 .................................................. 796


Summary ......................................................................................................................... 797

Information Storage and Management (ISM) v4

Page viii © Copyright 2019 Dell Inc.


Course Introduction

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 1


Information Storage and Management (ISM) v4

Information Storage and Management (ISM) v4

Introduction

Information Storage and Management (ISM) is a unique course that provides a


comprehensive understanding of the various storage infrastructure components in
a modern data center environment. Participants will learn the architectures,
features, and benefits of intelligent storage systems including block-based, file-
based, object-based, and unified storage; software-defined storage; storage
networking technologies such as FC SAN, IP SAN, and FCoE SAN; business
continuity solutions such as backup and replication; the highly-critical area of
information security; and storage infrastructure management. This course takes an
open-approach to describe all the concepts and technologies, which are further
illustrated and reinforced with Dell products and based on real world use cases.
This course aligns to the Associate level proven professional certification which
serves as a baseline for a number of additional product specializations.

Information Storage and Management (ISM) v4

Page 2 © Copyright 2019 Dell Inc.


Information Storage and Management (ISM) v4

Prerequisite Skills

The following skills are prerequisites:


 To understand the content and successfully complete this course, a participant
must have a basic understanding of computer architecture, operating systems,
networking, and databases
 Participants with experience in specific segments of storage infrastructure
would also be able to assimilate the course material

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 3


Information Storage and Management (ISM) v4

Course Agenda

Introductions

Information Storage and Management (ISM) v4

Page 4 © Copyright 2019 Dell Inc.


Introduction to Information Storage

Introduction

This module presents digital data, types of digital data, and information. This
module also focuses on data center characteristics and technologies driving digital
transformation.

Upon completing this module, you will be able to:


 Describe digital data, types of digital data, and information
 Describe data center and its key characteristics
 Describe the technologies driving digital transformation

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 5


Introduction to Information Storage

Introduction to Information Storage

Information Storage and Management (ISM) v4

Page 6 © Copyright 2019 Dell Inc.


Introduction to Information Storage

Introduction to Information Storage

Growth of the Digital Universe

 Digital universe is created and defined by


software
 Digital data is continuously generated,
collected, stored, and analyzed through
software
 IDC report predicts worldwide data creation will
grow to an enormous 163 (ZB) by 2025
 Technologies driving digital transformation add to data growth

Notes

We live in a digital universe – software creates and defines a world. A massive


amount of digital data is continuously generated, collected, stored, and analyzed
through software in the digital universe. IDC report predicts worldwide data creation
will grow to an enormous 163 zettabytes (ZB) by 2025.

The data in the digital universe comes from diverse sources, including both
individuals and organizations. Individuals constantly generate and consume
information through numerous activities, such as web searches, emails, uploading
and downloading content and sharing media files. In organizations, the volume and
importance of information for business operations continue to grow at astounding
rates. Technologies driving digital transformation including Internet of Things (IoT)
have significantly contributed to the growth of the digital universe.

In the past, individuals created most of the data in the world. Now IDC predicts
organizations will create 60 percent of world’s data through applications relying on
machine learning, automation, machine-to-machine technologies, and embedded
devices.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 7


Introduction to Information Storage

Why Information Storage and Management

 Organizations are dependent on continuous and reliable access to information


 Organizations seek to store, protect, process, manage, and use information
 Organizations are increasingly implementing intelligent storage solutions:

 To efficiently store and manage information


 To gain competitive advantage
 To derive new business opportunities

Notes

Organizations have become increasingly


information-dependent in the 21st century, and
information must be available whenever and
wherever it is required. It is critical for users and applications to have continuous,
fast, reliable, and secure access to information for business operations to run as
required. Some examples of such organizations and processes include banking
and financial institutions, online retailers, airline reservations, social networks, stock
trading, scientific research, and healthcare.

Data is the lifeblood of a rapidly growing digital existence, opening up new


opportunities for businesses and gain a competitive edge. For example, an online
retailer may need to identify the preferred product types and brands of customers
by analyzing their search, browsing, and purchase patterns. This information helps
the retailer to maintain a sufficient inventory of popular products, and also advertise
relevant products to the existing and potential customers. It is essential for
organizations to store, protect, process, and manage information in an efficient and
cost-effective manner. Legal, regulatory, and contractual obligations regarding the
availability, retention, and protection of data further add to the challenges of storing
and managing information.

To meet all these requirements and more, organizations are increasingly


undertaking digital transformation initiatives to implement intelligent storage
solutions. These solutions enable efficient and optimized storage and management
of information. They also enable extraction of value from information to derive new

Information Storage and Management (ISM) v4

Page 8 © Copyright 2019 Dell Inc.


Introduction to Information Storage

business opportunities, gain a competitive advantage, and create sources of


revenue.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 9


Introduction to Information Storage

Digital Data

Definition: Digital Data


A collection of facts that is transmitted and stored in electronic form,
and processed through software.

Video

Laptop
Text

Photos
Desktop

Internal or External Storage Digital Data

Tablet and Mobile

Notes

A generic definition of data is that it is a collection of facts, typically collected for


analysis or reference. Data can exist in various forms such as facts stored in a
person's mind, photographs and drawings, a bank ledger, and tabled results of a
scientific survey. Digital data is a collection of facts that is transmitted and stored in
electronic form, and processed through software. Devices such as desktops,
laptops, tablets, mobile phones, and electronic sensors generate digital data.

Digital data is stored as strings of binary values on a storage medium. This storage
medium is either internal or external to the devices generating or accessing the
data. The storage devices may be of different types, such as magnetic, optical, or
SSD. Examples of digital data are electronic documents, text files, emails, ebooks,
digital images, digital audio, and digital video.

Information Storage and Management (ISM) v4

Page 10 © Copyright 2019 Dell Inc.


Introduction to Information Storage

Types of Digital Data

Unstructured

Quasi-Structured

Semi-Structured

Structured

 Unstructured data has no inherent structure and is usually stored as different


types of files
 Text documents, PDFs, images, and videos
 Quasi-structured data consists of textual data with erratic formats that can be
formatted with effort and software tools
 Clickstream data
 Semi-structured data consists of textual data files with an apparent pattern,
enabling analysis
 Spreadsheets and XML files
 Structured data has a defined data model, format, structure

 Database

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 11


Introduction to Information Storage

Notes

Based on how it is stored and managed, digital data can be broadly categorized
into structured, semi-structured, quasi-structured, and unstructured.
 Structured data is organized in fixed fields within a record or file. To structure
the data, you require a data model. A data model specifies the format for
organizing data, and also specifies how different data elements are related to
each other. For example, in a relational database, data is organized in rows and
columns within named tables.
 Semi-structured data does not have a formal data model but has an apparent,
self-describing pattern and structure that enable its analysis. Examples of semi-
structured data include spreadsheets that have a row and column structure, and
XML files that are defined by an XML schema.
 Quasi-structured data consists of textual data with erratic data formats, and
can be formatted with effort, software tools, and time. An example of quasi-
structured data is a “clickstream” that includes data about which webpages a
user visited and in what order – which is the result of the successive mouse
clicks the user made. A clickstream shows when a user entered a website, the
pages viewed, the time that is spent on each page, and when the user exited.
 Unstructured data does not have a data model and is not organized in any
particular format. Some examples of unstructured data include text documents,
PDF files, emails, presentations, images, and videos.

The majority, which is more than 90 percent of the data that is generated in the
digital universe today is non-structured data (semi-, quasi-, and unstructured).
Although the illustration shows four different and separate types of data, in reality a
mixture of these data is typically generated.

Information Storage and Management (ISM) v4

Page 12 © Copyright 2019 Dell Inc.


Introduction to Information Storage

What is Information?

Definition: Information
Processed data that is presented in a specific content to enable useful
interpretation and decision-making.

 Example: Annual sales data processed into a sales report


 Enables calculation of the average sales for a product and the comparison
of actual sales to projected sales
 Emerging architectures and technologies enable extracting information from
non-structured data

Notes

The terms “data” and “information” are closely related and you can use these two
terms interchangeably. However, it is important to understand the difference
between the two. Data, by itself, is simply a collection of facts that requires
processing for it to be useful. For example, annual sales figures of an organization
is data. When data is processed and in a specific context, it can be interpreted in a
useful manner. This processed and organized data is called information.

For example, when you process the annual sales data into a sales report, it
provides useful information, such as the average sales for a product (indicating
product demand and popularity), and a comparison of the actual sales to the
projected sales.

Information thus creates knowledge and enables decision-making. Processing and


analyzing data is vital to any organization. It enables organizations to derive value
from data, and create intelligence to enable decision-making and organizational
effectiveness. It is easier to process structured data due to its organized form. On
the other hand, processing non-structured data and extracting information from it
using traditional applications is difficult, time-consuming, and requires considerable
resources. Emerging architectures, technologies, and techniques enable storing,

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 13


Introduction to Information Storage

managing, analyzing, and deriving value from unstructured data coming from
numerous sources.

Information Storage and Management (ISM) v4

Page 14 © Copyright 2019 Dell Inc.


Introduction to Information Storage

Information Storage

 Information is stored on storage devices on non-volatile media


 Magnetic storage devices: Hard disk drive and magnetic tape drive
 Optical storage devices: Blu-ray, DVD, and CD
 Flash-based storage devices: Solid-state drive (SSD), memory card, and
USB thumb drive
 Storage devices are assembled within a storage system or “array”
 Provides high capacity, scalability, performance, reliability, and security
 Storage systems along with other IT infrastructure are housed in a data center

Notes

In a computing environment, storage devices (or storage) are devices consisting of


nonvolatile recording media on which digital data or information can be persistently
stored. Storage may be internal or external to a compute system. Based on the
nature of the storage media used, storage devices are classified as: magnetic
storage devices, optical storage devices, or flash-based storage devices.

Storage is a core component in an organization’s IT infrastructure. Various factors


such as the media, architecture, capacity, addressing, reliability, and performance
influence the choice and use of storage devices in an enterprise environment. For
example, disk drives and SSDs are used for storing business-critical information
that needs to be continuously accessible to applications. Magnetic tapes and
optical storage are typically used for backing up and archiving data.

In enterprise environments, information is typically stored on storage


systems/storage arrays. A storage system is a hardware component that contains a
group of homogeneous/heterogeneous storage devices that are assembled within
a cabinet. These enterprise-class storage systems are designed for high capacity,
scalability, performance, reliability, and security to meet business requirements.

The compute systems that run business applications are provided storage capacity
from storage systems. Storage systems are covered in Module, ‘Intelligent Storage
Systems (ISS)’. Organizations typically house their IT infrastructure, including
compute systems, storage systems, and network equipment within a data center.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 15


Introduction to Information Storage

Information Storage and Management (ISM) v4

Page 16 © Copyright 2019 Dell Inc.


Introduction to Information Storage

Data Center

 A data center typically comprises:


 Facility: The building and floor space where
the data center is constructed
 IT equipment: Compute system, storage, and
connectivity elements
 Support infrastructure: Power supply, fire
detection, HVAC, and security systems
 Organizations are moving towards modern data
center to overcome the business and IT
challenges

 Helps them to be successful in their digital


transformation journey

Notes

A data center is a dedicated facility where an organization houses, operates, and


maintains its IT infrastructure along with other supporting infrastructures. It
centralizes an organization’s IT equipment and data-processing operations. A data
center may be constructed in-house and located in an organization’s own facility.
The data center may also be outsourced, with equipment being at a third-party site.

A data center typically consists of the following:


 Facility: It is the building and floor space where organizations construct the data
center. It typically has a raised floor with ducts underneath holding power and
network cables.
 IT equipment: It includes components such as compute systems, storage, and
connectivity elements along with cabinets for housing the IT equipment.
 Support infrastructure: It includes power supply, fire, heating, ventilation, and air
conditioning (HVAC) systems. It also includes security systems such as
biometrics, badge readers, and video surveillance systems.

Digital transformation is disrupting every industry, and with the evolution of modern
technologies, organizations are facing too many business challenges.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 17


Introduction to Information Storage

Organizations must operate in real time, develop smarter products, and deliver a
great user experience. They must be agile, operate efficiently, and make decisions
quickly to be successful. However, these disruptive technologies along with agile
methodologies are less resilient on traditional IT infrastructure and services.
Organization’s IT department also faces several challenges in supporting business
challenges. So, organizations are moving towards modern data center to overcome
the business challenges and be successful in their digital transformation journey.

Information Storage and Management (ISM) v4

Page 18 © Copyright 2019 Dell Inc.


Introduction to Information Storage

Key Characteristics of a Data Center

Data centers are designed and built to fulfill the key characteristics as shown in the
figure. Although the characteristics are applicable to almost all data center
components, the details here primarily focus on storage systems.

Availability

Data Integrity Security

Manageability

Performance Capacity

Scalability

Notes

Data center characteristics are:


 Availability: Availability of information as and when required should be
ensured. Unavailability of information can severely affect business operations,
lead to substantial financial losses, and damage the reputation of an
organization.
 Security: Policies and procedures should be established, and control measures
should be implemented to prevent unauthorized access to and alteration of
information.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 19


Introduction to Information Storage

 Capacity: Data center operations require adequate resources to efficiently store


and process large and increasing amounts of data. When capacity requirements
increase, additional capacity should be provided either without interrupting the
availability or with minimal disruption. Capacity may be managed by adding new
resources or by reallocating existing resources.
 Scalability: Organizations may need to deploy additional resources such as
compute systems, new applications, and databases to meet the growing
requirements. Data center resources should scale to meet the changing
requirements, without interrupting business operations.
 Performance: Data center components should provide optimal performance
based on the required service levels.
 Data integrity: Data integrity refers to mechanisms, such as error correction
codes or parity bits, which ensure that data is stored and retrieved exactly as it
was received.
 Manageability: A data center should provide easy, flexible, and integrated
management of all its components. Efficient manageability can be achieved
through automation for reducing manual intervention in common, repeatable
tasks.

Information Storage and Management (ISM) v4

Page 20 © Copyright 2019 Dell Inc.


Introduction to Information Storage

Digital Transformation

Digital transformation puts technology at the heart of an organization’s products,


services, and operations.

Notes

Digital transformation is imperative for all businesses. Businesses of all shapes and
sizes are changing to a more digital mindset. This digital mindset is being driven by
the need to innovate more quickly. Digital transformation puts technology at the
heart of an organization’s products, services, and operations.

In general terms, digital transformation is defined as the integration of digital


technology into all areas of a business. This results in fundamental changes to how
businesses operate and how they deliver value to customers, improve efficiency,
reduce business risks, and uncover new opportunities.

With people, customers, businesses, and things communicating, transacting, and


negotiating with each other, a new world comes into being. It is the world of the
digital business that uses data as a way to create value. According to Gartner, by
2020, more than seven billion people and businesses, and at least 30 billion
devices, will be connected to the Internet. Organizations need to accelerate their
digital transformation journeys to avoid being left behind in an increasingly digital
world.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 21


Introduction to Information Storage

Information Storage and Management (ISM) v4

Page 22 © Copyright 2019 Dell Inc.


Introduction to Information Storage

Key Technologies Driving Digital Transformation

In this digital world, organizations need to develop new applications using agile
processes and new tools to assure rapid time-to-market. Simultaneously, the
organizations still expect IT to operate and manage the traditional applications
which provide much revenue.

To survive, the organization has to transform and adopt modern technologies to


support the digital transformation. Some of the key technologies that drive digital
transformation are listed in the figure.

Cloud Big Data Analytics Internet of Things Machine Learning

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 23


Introduction to Information Storage

Assessment

1. Which data asset is an example of unstructured data?

A. XML data file

B. News article text

C. Database tableTBF

D. Webserver log

2. Why are businesses undergoing the digital transformation?

A. To innovate more quickly

B. To avoid security risks

C. To avoid compliance penalty

D. To eliminate management costs

Information Storage and Management (ISM) v4

Page 24 © Copyright 2019 Dell Inc.


Summary

Summary

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 25


Modern Technologies Driving Digital Transformation

Introduction

This module presents an overview on the modern technologies that are driving
digital transformation in today’s world. The modern technologies covered in this
lesson include cloud computing, big data analytics, Internet of Things (IoT), and
machine learning.

Upon completing this module, you will be able to:


 Describe cloud computing
 Describe Big Data analytics
 Describe Internet of Things
 Describe machine learning

Information Storage and Management (ISM) v4

Page 26 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Cloud Computing Lesson

Introduction

This lesson presents an overview of cloud computing along with its essential
characteristics, various cloud deployment and service models, and uses cases.

This lesson covers the following topics:


 Cloud computing and its essential characteristics
 Cloud service models
 Cloud deployment models
 Use cases of cloud computing

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 27


Cloud Computing Lesson

Cloud Computing

Cloud Computing: An Overview

Definition: Cloud Computing


A model for enabling convenient, on-demand network access to a
shared pool of configurable computing resources (for example,
networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management effort or
service provider interaction.
Source: The National Institute of Standards and Technology (NIST)—
a part of the U.S. Department of Commerce—in its Special
Publication 800-145

Cloud Infrastructure

VM VM
APP APP
Desktop
OS OS
Applications
Hypervisor

LAN/WAN

Laptop
Compute Network Storage Applications Platform Software

Tablet and
Mobile

Information Storage and Management (ISM) v4

Page 28 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Notes

The term “cloud” originates from the cloud-like bubble that is commonly used in
technical architecture diagrams to represent a system. This system may be the
Internet, a network, or a compute cluster. In cloud computing, a cloud is a collection
of IT resources, including hardware and software resources. You can deploy these
resources either in a single data center, or across multiple geographically
dispersed data centers that are connected over a network.

A cloud service provider is responsible for building, operating, and managing cloud
infrastructure. The cloud computing model enables consumers to hire IT resources
as a service from a provider. A cloud service is a combination of hardware and
software resources that are offered for consumption by a provider. The cloud
infrastructure contains IT resource pools, from which you can provision resources
to consumers as services over a network, such as the Internet or an intranet.
Resources are returned to the pool when the consumer releases them.

Example: The cloud model is similar to utility services such as electricity, water,
and telephone. When consumers use these utilities, they are typically unaware of
how the utilities are generated or distributed. The consumers periodically pay for
the utilities based on usage. Similarly, in cloud computing, the cloud is an
abstraction of an IT infrastructure. Consumers hire IT resources as services from
the cloud without the risks and costs that are associated with owning the resources.
Cloud services are accessed from different types of client devices over wired and
wireless network connections. Consumers pay only for the services that they use,
either based on a subscription or based on resource consumption.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 29


Cloud Computing Lesson

Essential Cloud Characteristics

In SP 800-145, NIST specifies that a cloud infrastructure should have the five
essential characteristics.

Measured Service

Resource Pooling

Cloud Characteristics Rapid Elasticity

On-demand Self-
service

Broad Network
Access

Notes

The five characteristics are:


 Measured Service: “Cloud systems automatically control and optimize
resource use by leveraging a metering capability at some level of abstraction
appropriate to the type of service (for example, storage, processing, bandwidth,
and active user accounts). Resource usage can be monitored, controlled, and
reported, providing transparency for both the provider and consumer of the
utilized service.” – NIST
 Resource Pooling: “The provider’s computing resources are pooled to serve
multiple consumers using a multitenant model, with different physical and virtual
resources that are dynamically assigned and reassigned according to consumer
demand. There is a sense of location independence in that the customer
generally has no control or knowledge over the exact location of the provided

Information Storage and Management (ISM) v4

Page 30 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

resources but may be able to specify location at a higher level of abstraction (for
example, country, state, or datacenter). Examples of resources include storage,
processing, memory, and network bandwidth.” – NIST
 Rapid Elasticity: “Capabilities can be rapidly and elastically provisioned, in
some cases automatically, to scale rapidly outward and inward commensurate
with demand. To the consumer, the capabilities available for provisioning often
appear to be unlimited and can be appropriated in any quantity at any time.” –
NIST
 On-demand Self-service: “A consumer can unilaterally provision computing
capabilities, such as server time or networked storage, as needed automatically
without requiring human interaction with each service provider.” – NIST
 Broad Network Access: “Capabilities are available over the network and
accessed through standard mechanisms that promote use by heterogeneous
thin or thick client platforms (for example, mobile phones, tablets, laptops, and
workstations).” – NIST

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 31


Cloud Computing Lesson

Cloud Service Models

Infrastructure as a Service

Cloud Service Models Platform as a Service

Software as a Service

 A cloud service model specifies the services and the capabilities that are
provided to consumers
 In SP 800-145, NIST classifies cloud service offerings into the three primary
models:

 Infrastructure as a Service (IaaS)


 Platform as a Service (PaaS)
 Software as a Service (SaaS)

Notes

Cloud administrators or architects assess and identify potential cloud service


offerings. The assessment includes evaluating what services to create and
upgrade, and the necessary feature set for each service. It also includes the
service level objectives (SLOs) of each service aligning to consumer needs and
market conditions. SLOs are specific measurable characteristics such as
availability, throughput, frequency, and response time. They provide a
measurement of performance of the service provider. SLOs are key elements of a

Information Storage and Management (ISM) v4

Page 32 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

service level agreement (SLA). SLA is a legal document that describes items such
as what service level will be provided, how it will be supported, service location,
and the responsibilities of the consumer and the provider.

Many alternate cloud service models based on IaaS, PaaS, and SaaS are defined
in various publications and by different industry groups. These service models are
specific to the cloud services and capabilities that are provided. Examples of such
service models include Network as a Service (NaaS), Database as a Service
(DBaaS), Big Data as a Service (BDaaS), Security as a Service (SECaaS), and
Disaster Recovery as a Service (DRaaS). However, these models eventually
belong to one of the three primary cloud service models.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 33


Cloud Computing Lesson

Infrastructure as a Service (IaaS)

Definition: Infrastructure as a Service


“The capability provided to the consumer is to provision processing,
storage, networks, and other fundamental computing resources where
the consumer is able to deploy and run arbitrary software, which can
include operating systems and applications. The consumer does not
manage or control the underlying cloud infrastructure but has control
over operating systems, storage, and deployed applications; and
possibly limited control of select networking components (for example,
host firewalls).” – NIST

 IaaS pricing may be subscription-based or based on resource usage


 Provider pools the underlying IT resources and multiple consumers share
these resources through a multitenant model
 Organizations can even implement IaaS internally, where internal IT manages
the resources and services

Examples:
Application

Database
- Amazon EC2, S3
Consumer's Resources

Programming Framework
- Virtustream

Operating System
- Google Compute Engine
Cloud Infrastructure

Compute

Provider's Resources
Storage

Network

Information Storage and Management (ISM) v4

Page 34 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 35


Cloud Computing Lesson

Platform as a Service (PaaS)

Definition: Platform as a Service

 In the PaaS model, a cloud service includes compute, storage, and network
resources along with platform software
 Platform software includes software such as:
 Operating system, database, programming frameworks, middleware
 Tools to develop, test, deploy, and manage applications
 Most PaaS offerings support multiple operating systems and programming
frameworks for application development and deployment
 Typically you can calculate PaaS usage fees based on the following factors:

 Number of consumers
 Types of consumers (developer, tester, and so on)
 The time for which the platform is in use
 The compute, storage, or network resources that the platform consumes

Information Storage and Management (ISM) v4

Page 36 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Consumer's Resources
Application

Database

Programming Framework

Operating System
Provider's Resources Cloud
Infrastructure

Compute

Storage

Network

Examples:

- Pivotal Cloud Foundry

- Google App Engine

- AWS Elastic Beanstalk

- Microsoft Azure

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 37


Cloud Computing Lesson

Software as a Service (SaaS)

Definition: Software as a Service


“The capability provided to the consumer is to use the provider’s
applications running on a cloud infrastructure. The applications are
accessible from various client devices through either a thin client
interface, such as a web browser (for example, web-based email), or
a program interface. The consumer does not manage or control the
underlying cloud infrastructure including network, servers, operating
systems, storage, or even individual application capabilities, except
limited user-specific application configuration settings.” – NIST

 In the SaaS model, a provider offers a cloud-hosted application to multiple


consumers as a service
 The consumers do not own or manage any aspect of the cloud infrastructure
 Some SaaS applications may require installing a client interface locally on an
end-point device
 Examples of applications that are delivered through SaaS:

 Customer Relationship Management (CRM)


 Enterprise Resource Planning (ERP)
 Email and Office Suites

Information Storage and Management (ISM) v4

Page 38 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Application

Database

Programming Framework

Provider's Resources
Operating System

Cloud Infrastructure

Compute

Storage

Network

Examples:

- Salesforce

- Google Apps

- Microsoft Office 365

- Oracle

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 39


Cloud Computing Lesson

Cloud Deployment Models

 A cloud deployment model provides a basis for how cloud infrastructure is built,
managed, and accessed
 In SP 800 to 145, NIST specifies the four primary cloud deployment models
listed in the figure
 Each cloud deployment model may be used for any of the cloud service models:
IaaS, PaaS, and SaaS
 The different deployment models present several tradeoffs in terms of control,
scale, cost, and availability of resources

Public Cloud

Private Cloud

Cloud Deployment Models


Hybrid Cloud

Community Cloud

Information Storage and Management (ISM) v4

Page 40 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Public Cloud

Definition: Public Cloud


“The cloud infrastructure is provisioned for open use by the general
public. It may be owned, managed, and operated by a business,
academic, or government organization, or some combination of them.
It exists on the premises of the cloud provider.” – NIST

 Public cloud services may be free, subscription-based, or provided on a pay-


per-use model
 A public cloud provides the benefits of low upfront expenditure on IT
resources and enormous scalability
 Some concerns for the consumers include:

 Network availability
 Risks associated with multitenancy
 Visibility
 Control over the cloud resources and data
 Restrictive default service levels.

Enterprise P Enterprise Q

Resources of
Cloud Provider

VM VM

Individual R
Hypervisor
Applications

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 41


Cloud Computing Lesson

Information Storage and Management (ISM) v4

Page 42 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Private Cloud

Definition: Private Cloud


“The cloud infrastructure is provisioned for exclusive use by a single
organization comprising multiple consumers (for example, business
units). It may be owned, managed, and operated by the organization,
a third party, or some combination of them, and it may exist on or off
premises.” – NIST

 Many organizations may not want to adopt public clouds due to concerns
related to privacy, external threats, and lack of control over the IT resources and
data
 When compared to a public cloud, a private cloud offers organizations a
greater degree of privacy and control over the cloud infrastructure,
applications, and data
 There are two variants of private cloud: on-premise and externally hosted

 An organization deploys on-premise private cloud in its data center within its
own premises
Enterprise P

Resources of Cloud
Provider

1. On-premise Private Cloud

Enterprise P

Resources of
Enterprise P

Dedicated for
Enterprise P

2. Externally Hosted Private Cloud


Application

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 43


Cloud Computing Lesson

Notes

In the externally hosted private cloud (or off-premise private cloud) model:
 An organization outsources the implementation of the private cloud to an
external cloud service provider
 The cloud infrastructure is hosted on the premises of the provider and multiple
tenants may share

 However, the organization’s private cloud resources are securely separated


from other cloud tenants by access policies implemented by the provider

Information Storage and Management (ISM) v4

Page 44 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Community Cloud

Definition: Community Cloud


“The cloud infrastructure is provisioned for exclusive use by a specific
community of consumers from organizations that have shared
concerns (for example, mission, security requirements, policy, and
compliance considerations). It may be owned, managed, and
operated by one or more of the organizations in the community, a
third party, or some combination of them, and it may exist on or off
premises.” – NIST

 The organizations participating in the community cloud typically share the cost
of deploying the cloud and offering cloud services
 This enables them to lower their individual investments
 Since the costs are shared by a fewer consumer than in a public cloud, this
option may be more expensive
 However, a community cloud may offer a higher level of control and protection
than a public cloud
 There are two variants of a community cloud: on-premise and externally hosted

Enterprise P Enterprise Q Enterprise R

Community
Users

Resources of Cloud
Provider

Dedicated for
Community

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 45


Cloud Computing Lesson

Information Storage and Management (ISM) v4

Page 46 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

Hybrid Cloud

Definition: Hybrid Cloud


“The cloud infrastructure is a composition of two or more distinct cloud
infrastructures (private, community, or public) that remain unique
entities, but are bound by standardized or proprietary technology that
enables data and application portability (for example, cloud bursting
for load balancing between clouds.)” – NIST

Enterprise Q

Public

Enterprise P

Private

Individual R

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 47


Cloud Computing Lesson

Evolution of Hybrid Cloud: Multicloud

 To create the best possible solution for their businesses, today organizations
want to choose different public cloud service providers
 To achieve this goal, some organizations have started adopting a multicloud
approach

Public Cloud

Private Cloud

Public Cloud

Notes

The drivers for adopting this approach include avoiding vendor lock-in, data control,
cost savings, and performance optimization. This approach helps to meet the
business demands since, sometimes no single cloud model can suit the varied
requirements and workloads across an organization. Some application workloads
run better on one cloud platform while other workloads achieve higher performance
and lower cost on another platform.

Also, certain compliance, regulation, and governance policies require an


organization’s data to reside in particular locations. A multicloud strategy can help
organizations meet those requirements because different cloud models from

Information Storage and Management (ISM) v4

Page 48 © Copyright 2019 Dell Inc.


Cloud Computing Lesson

various cloud service providers can be selected. Each cloud vendor offers different
service options at different prices.

Organizations can also analyze the performance of their various application


workloads and compare them to what is available from other vendors. This method
helps to analyze both workload performance and cost for various services in each
cloud. Options can then be identified that meet the workload performance and cost
requirements of the organization.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 49


Cloud Computing Lesson

Cloud Computing Use Cases

Use Case Description

Cloud bursting Provisioning resources for a limited time from a public


cloud to handle peak workloads

Web application Hosting less critical applications on the public cloud


hosting

Migrating packaged Migrating standard packaged applications such as e-mail


applications to the public cloud

Application Developing and testing applications in the public cloud


development and before launching them
testing

Big Data Analytics Using cloud to analyze the voluminous data to gain
insights and for deriving business value

Disaster Recovery Adopting cloud for a DR solution can provide cost benefit,
scalability and faster recovery of data

Internet of Things IoT in cloud provides infrastructure for enhancing the


network connectivity, storage space, and tools for data
analysis

Information Storage and Management (ISM) v4

Page 50 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

Big Data Analytics Lesson

Introduction

This lesson presents an overview of Big Data along with its characteristics, data
repositories, components of big data analytics solution, and uses cases.

This lesson covers the following topics:


 Big Data and its key characteristics
 Data repositories
 Components of Big Data analytics solution
 Use cases of Big Data

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 51


Big Data Analytics Lesson

Big Data Analytics

Big Data: An Overview

Definition: Big Data


Information assets whose high volume, high velocity, and high variety
require the use of new technical architectures and analytical methods
to gain insights and for deriving business value.

Characteristics of Data

Data Processing Nodes

Business Value
Big Data

Big Data:
 Represents the information assets whose high volume, high velocity, and high
variety require the use of new technical architectures and analytical methods to
gain insights and for deriving business value.
 Many organizations such as government departments, retail,
telecommunications, healthcare, social networks, banks, and insurance
companies employ data science techniques to benefit from Big Data analytics.

The definition of Big Data has three principal aspects, which are:

Information Storage and Management (ISM) v4

Page 52 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

Characteristics of Data

Big Data includes data sets of considerable sizes containing both structured and
non-structured digital data. Apart from its size, the data gets generated and
changes rapidly, and also comes from diverse sources. These and other
characteristics are covered next.

Data Processing Needs

Big Data also exceeds the storage and processing capability of conventional IT
infrastructure and software systems. It not only needs a highly-scalable architecture
for efficient storage, but also requires new and innovative technologies and
methods for processing.

These technologies typically make use of platforms such as distributed processing,


massively-parallel processing, and machine learning. The emerging discipline of
Data Science represents the synthesis of several existing disciplines, such as
statistics, mathematics, data visualization, and computer science for Big Data
analytics.

Business Value

Big Data analytics has tremendous business importance to organizations.


Searching, aggregating, and cross-referencing large data sets in real-time or near-
real time enables gaining valuable insights from the data. This enables better data-
driven decision making.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 53


Big Data Analytics Lesson

Characteristics of Big Data

Apart from the characteristics of volume, velocity, and variety—popularly known as


“the 3V’s, the three other characteristics of Big Data include variability, veracity,
and value

Volume Velocity Variety Variability Veracity Value

• Massive volumes • Rapidly • Diverse data • Constantly • Varying quality • Cost-


of data changing data from numerous changing meaning and reliability of effectiveness and
sources of data data business value

• Challenges in • Challenges in • Challenges in • Challenges in • Challenges in


storage and real-time analysis integration, and gathering and transforming and
analysis analysis interpretation trusting data

Notes

 Volume: The word “Big” in Big Data refers to the massive volumes of data.
Organizations are witnessing an ever-increasing growth in data of all types.
These types include transaction-based data that is stored over the years,
sensor data, and unstructured data streaming in from social media. The volume
of data has already reached Petabyte and Exabyte scales, and it is still growing
everyday. The excessive volume not only requires substantial cost-effective
storage, but also rises challenges in data analysis.
 Velocity: Velocity refers to the rate at which data is produced and changes, and
also how fast the data must be processed to meet business requirements.
Today, data is generated at an exceptional speed, and real-time or near-real
time analysis of the data is a challenge for many organizations. It is essential to
process and analyze the data, and to deliver the results in a timely manner. An
example of such a requirement is real-time face recognition for screening
passengers at airports.
 Variety: Variety refers to the diversity in the formats and types of data. There
are numerous sources that generate data in various structured and unstructured
forms. Organizations face the challenge of managing, merging, and analyzing

Information Storage and Management (ISM) v4

Page 54 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

the different varieties of data in a cost-effective manner. The combination of


data from a variety of data sources and in a variety of formats is a key
requirement in Big Data analytics. An example of such a requirement could be
an autonomous vehicle dealing with various data formats and sources to
operate safely.
 Variability: Variability refers to the constantly changing meaning of data. It
highlights the importance of deriving the right information at all possible
contexts. For example, analysis of natural language search and social media
posts requires interpretation of complex and highly variable grammar. The
inconsistency in the meaning of data creates challenges that are related to
gathering the data and in interpreting its context.
 Veracity: Veracity refers to the reliability and verifiability of the data. The quality
of the data being gathered can differ greatly, and the accuracy of analysis
depends on the veracity of the source data. Establishing trust in Big Data
presents a major challenge because as the variety and number of sources
grows, the likelihood of noise and errors in the data increases. Therefore,
significant effort may go into cleaning data to remove noise and errors, and to
produce accurate datasets before analysis can begin. For example, a retail
organization may have gathered customer behavior data from across systems
to analyze product purchase patterns and to predict purchase intent. The
organization would have to clean and transform the data to make it consistent
and reliable.
 Value: Value refers to the cost-effectiveness of the Big Data analytics
technology that is used and the business value that is derived from it. Many
large enterprise scale organizations have maintained large data repositories,
such as data warehouses, managed unstructured data, and carried out real-
time data analytics for many years. With hardware and software becoming more
affordable and the emergence of more providers, Big Data analytics
technologies are now available to a broader market. Organizations are also
gaining the benefits of business process enhancements, increased revenues,
and better decision making.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 55


Big Data Analytics Lesson

Data Repositories

Data Warehouses Data Lake

• Central repository of data • Collection of structured and


gathered from different sources unstructured data assets

• Stores current and historical Data Repositories • Uses 'store everything'


data in a structured format approach to big data

• Designed for query and analysis • Presents unrefined view of


data

Notes

Data for analytics typically comes from repositories such as enterprise data
warehouses and data lakes.

A data warehouse is a central repository of integrated data that is gathered from


multiple different sources. It stores current and historical data in a structured
format. It is designed for query and analysis to support the decision-making
process of an organization. For example, a data warehouse may contain current
and historical sales data that is used for generating trend reports for sales
comparisons.

A data lake is a collection of structured and unstructured data assets that are
stored as exact or near-exact copies of the source formats. The data lake
architecture is a “store-everything” approach to Big Data. Unlike conventional data
warehouses, you do not classify the data when it is stored in the repository, as the
value of the data may not be clear at the outset. The data is also not arranged as
per a specific schema and is stored using an object-based storage architecture. As
a result, data preparation is eliminated and a data lake is less structured compared
to a data warehouse. Data is classified, organized, or analyzed only when it is
accessed. When a business need arises, the data lake is queried, and the resultant
subset of data is then analyzed to provide a solution. The purpose of a data lake is
to present an unrefined view of data to highly skilled analysts. Also to enable them
to implement their own data refinement and analysis techniques.

Information Storage and Management (ISM) v4

Page 56 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 57


Big Data Analytics Lesson

Components of a Big Data Analytics

 The technology layers in a Big Data analytics solution include storage plus
MapReduce and query technologies
 These components are collectively called the ‘SMAQ stack’
 SMAQ solutions may be implemented as a combination of multi-component
systems
 May also be offered as a product with a self-contained system comprising
storage, MapReduce, and query – all in one

• Foundational layer of the stack


Storage
• Distributed architecture

• Enables distribution of computation


MapReduce
• Uses multiple compute systems for parallel processing

• Implements NoSQL database


Query
• Provides platform for analytics and reporting

Notes

The technology layers in a Big Data analytics solution include storage, MapReduce
technologies, and query technologies. These components are collectively called
the ‘SMAQ stack’.

 Storage: It is the foundational layer of the stack, and has a distributed


architecture characteristic with primarily unstructured content in non-relational
form.

Information Storage and Management (ISM) v4

Page 58 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

 MapReduce: It is an intermediate layer in the stack. It enables the distribution


of computation across multiple generic compute systems for parallel processing
to gain speed and cost advantage. It also supports a batch-oriented processing
model of data retrieval and computation as opposed to the record-set
orientation of most SQL-based databases.
 Query: This layer typically implements a NoSQL database for storing,
retrieving, and processing data. It also provides a user-friendly platform for
analytics and reporting.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 59


Big Data Analytics Lesson

Storage

 Storage systems consist of multiple nodes that are collectively called a “cluster”
 Based on distributed file systems
 Each node has processing capability and storage capacity
 Highly scalable architecture
 You may implement a NoSQL database on top of the distributed file system

Notes

A storage system in the SMAQ stack is based on either a proprietary or an open-


source distributed file system, such as Hadoop Distributed File System (HDFS).
The storage system may also support multiple file systems for client access. The
storage system consists of multiple nodes—collectively called a “cluster”—, and the
file system is distributed across all the nodes in the cluster. Each node in the
cluster has processing capability and storage capacity. The system has a highly
scalable architecture, and you can add extra nodes dynamically to meet the
workload and the capacity needs.

The distributed file system like HDFS typically provides only an interface similar to
that of regular file systems. Unlike a database, they can only store and retrieve
data and not index it, which is essential for fast data retrieval. To mitigate this
challenge and gain the advantages of a database system, SMAQ solutions may
implement a NoSQL database on top of the distributed file system. NoSQL
databases may have built-in MapReduce features that enable processing to be
parallelized over their data stores. In many applications, the primary source of data
is in a relational database. Therefore, SMAQ solutions may also support the
interfacing of MapReduce with relational database systems.

MapReduce fetches datasets and stores the results of the computation in storage.
The data must be available in a distributed fashion, to serve each processing node.
The design and features of the storage layer are important not just because of the
interface with MapReduce, but also because they affect the ease with which data
can be loaded and the results of computation extracted and searched.

Information Storage and Management (ISM) v4

Page 60 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

MapReduce

 MapReduce is the driving force behind most Big Data processing solutions
 A parallel programming framework for processing large datasets on a
compute cluster
 The key innovation of MapReduce is the ability to take a query over a dataset,
divide it, and run it in parallel over multiple compute systems or nodes

 This distribution solves the issue of processing data that is too large for a
single machine to process

Notes

MapReduce is the driving force behind most Big Data processing solutions. It is a
parallel programming framework for processing large datasets on a compute
cluster. The key innovation of MapReduce is the ability to take a query over a
dataset, divide it, and run it in parallel over multiple compute systems or nodes.
This distribution solves the issue of processing data that is too large for a single
machine to process.

MapReduce works in two phases namely ‘Map’ and ‘Reduce’ as the name
suggests. An input dataset is split into independent chunks which are distributed to
multiple compute systems. The Map function processes the chunks in a parallel
manner, and transforms them into multiple smaller intermediate datasets. The
Reduce function condenses the intermediate results and reduces them to a
summarized dataset, which is the wanted end result. Typically both the input and
the output datasets are stored on a file-system. The MapReduce framework is
highly scalable and supports the addition of processing nodes to process chunks.
Apache’s Hadoop MapReduce is the predominant open source Java-based
implementation of MapReduce.

The illustration depicts a generic representation of how MapReduce works. You


can use this illustration to show various examples. A classic example of
MapReduce is the task of counting the number of unique words in a large body of
data including millions of documents. In the Map phase, each word is identified and
given the count of 1. In the Reduce phase, the counts are added for each word.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 61


Big Data Analytics Lesson

Another example is the task of grouping customer records within a dataset into
multiple age groups, such as 20- 30, 30- 40, 40- 50, and so on. In the Map phase,
you split the records and process in parallel to generate intermediate groups of
records. In the Reduce phase, you summarize the intermediate datasets to obtain
the distinct groups of customer records (depicted in the colored groups).

Information Storage and Management (ISM) v4

Page 62 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

MapReduce Example

 A classic example of MapReduce is the task of counting the number of unique


words in a large body of data including millions of documents
 In the Map phase, each word is identified and given the count of 1
 In the Reduce phase, the counts are added for each word
 Another example is the task of grouping customer records within a dataset into
multiple age groups, such as 20- 30, 30- 40, 40- 50, and so on.

 In the Map phase, you split the records and process in parallel to generate
intermediate groups of records
 In the Reduce phase, you summarize the intermediate datasets to obtain the
distinct groups of customer records
 The illustration depicts a generic representation of how MapReduce works; it
can be used to represent various examples

Input Data Output Data

Map Phase Reduce Phase

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 63


Big Data Analytics Lesson

Query

 Simplifies the specification of MapReduce operations, and the retrieval and


analysis of the results
 It is non-intuitive and inconvenient to specify MapReduce jobs in terms of
distinct Map and Reduce functions in a programming language
 SMAQ systems help mitigate this challenge by incorporating a higher-level
query layer to simplify both the:
o Specification of the MapReduce operations
o Analysis of the results
 Query layer implements high-level languages that enable users to describe, run,
and monitor MapReduce jobs

 Languages are designed to handle not only the processing, but also the
loading and saving of data from and to the MapReduce cluster
 Languages typically support integration with NoSQL databases that you
implement on the MapReduce cluster

Information Storage and Management (ISM) v4

Page 64 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

Big Data Use Cases

Use Case Description

Healthcare Provides consolidated diagnostic information and improves


patient care

Finance Effective sales promotion and fraud detection

Retail and Understand customer buying patterns, and anticipate future


eCommerce demand

Government Improves efficiency and effectiveness across various


domains

Social Network Discovery and analysis of communities, personalization of


Analysis solitary, and social activities

Gaming Improves revenue of gaming industry and gaming


experience

Geolocation Improves service, customer experience, and to gain


Services competitive advantage

Notes

 Healthcare: In healthcare, Big Data analytics solutions provide consolidated


diagnostic information and enable healthcare providers to analyze patient data;
improve patient care and outcomes; minimize errors; increase patient
engagement; and improve operations and services. These solutions also enable
healthcare providers to monitor patients and analyze their experiences in real
time.
 Finance: In finance, organizations use Big Data analytics for activities such as
correlating purchase history, profiling customers, and analyzing behavior on

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 65


Big Data Analytics Lesson

social networks. This also enables in controlling customer acquisition costs and
target sales promotions more effectively. Big Data analytics is also being used
extensively in detecting credit card frauds.
 eCommerce: eCommerce organizations use Big Data analytics to gain
valuable insights from the data. They use this solution to understand customer
buying patterns, and anticipate future demand. Also for effective marketing
campaigns, optimize inventory assortment, and improve distribution. This
solution enables them to provide optimal prices and services to customers, and
also improve operations and revenue.
 Government: In government organizations, Big data analytics enables
improved efficiency and effectiveness across a variety of domains such as
social services, education, defense, national security, crime prevention,
transportation, tax compliance, and revenue management.
 Social Network Analysis: The increasing use of online social networking
services has led to a massive growth of data in the digital universe. Through Big
Data analytics, organizations can gain valuable insights from the data that is
generated through social networking. This analysis enables the discovery and
analysis of communities, personalization for solitary activities (for example,
search) and social activities (for example, discovery of potential friends). It also
involves the analysis of user behavior in open forums (for example,
conventional sites, blogs, and communities) and in commercial platforms (for
example, eCommerce).
 Gaming: Big Data plays a very important role in gaming industry due to billions
of video game players in the world. Gamers are generating a massive amount
of data through offline and online games. There are many factors that contribute
to the rapid growth of data in the gaming industry. These factors include what
game the gamers play and with whom they play, advertisements, and real time
information of the gamer. These industries use Big Data technologies to
improve their revenue and gaming experience.
 Geolocation Services: Businesses like finance, social media, retailers, and
transport are using geolocation services in their applications to locate their
customers. This service generates a huge amount of data which requires them
to use Big Data algorithms to derive a meaning information. Businesses are
using this information to improve their service, customer experience, and to gain
competitive advantage.

Information Storage and Management (ISM) v4

Page 66 © Copyright 2019 Dell Inc.


Big Data Analytics Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 67


Internet of Things Lesson

Internet of Things Lesson

Introduction

This lesson presents an overview of the Internet of Things (IoT) along with its
components and protocols used. It also focuses on the impact of IoT on data center
and its use cases.

This lesson covers the following topics:


 Internet of Things and its components
 Use cases of Internet of Things

Information Storage and Management (ISM) v4

Page 68 © Copyright 2019 Dell Inc.


Internet of Things Lesson

Internet of Things

Internet of Things: An Overview

Concept of networking objects and people for real time


applications

Allow real-life objects to independently share and process


information

Internet of Things

Enable Machine to Machine communication to provide real


time results

Notes

In this rapidly transforming digital landscape, the speed of communication has


become a necessary metric for every organization to access their information. The
evolution of Internet and the rise of devices that are connected to Internet provide
new opportunities for smarter decision making, getting a competitive edge, and to
improve the life of customers. These devices range from laptops, mobiles phones
to irrigation systems to cars generating digital data.

The Internet of Things (IoT) is the concept of networking things such as objects and
people to collect and exchange data. The idea is that real-life objects can
independently share and process information - without humans having anything to
do with the data input stage.

IoT supports Machine to Machine (M2M) communication enabling devices to


communicate with each other to provide faster, accurate, and timely data-driven
results. The use of IoT requires organizations to store large volumes of data, and to
process and analyze data in real time. It also requires a transformation in data

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 69


Internet of Things Lesson

center to meet the network, security, and data storage and management
requirements.

Information Storage and Management (ISM) v4

Page 70 © Copyright 2019 Dell Inc.


Internet of Things Lesson

Components of Internet of Things

 IoT implementation requires a proper understanding of its components

Sensors Actuators Gateways

Detect changes in the Collect data from sensors to Manage data traffic and
surrounding environment perform required action translate network protocols

Produce and transmit digital In IoT, they help to automate Ensure that the devices are
data the operations interoperable

IoT Example: Modern Irrigation System


 IoT devices are used to monitor the crop field and automate the irrigation
system to increase the efficiency and productivity of the overall agricultural
processes.
 Soil moisture sensors detect the moisture levels in the soil and send the
appropriate data to the actuator
 Based on the data, the actuator device will control the flow of water through the
valves
 Since these devices generate a lot of data, gateways help to transfer this data
to the cloud for storage

– Gateways communicate with sensors using various protocols and translate


the data that is appropriate for cloud transmission.

Notes

The main components include:


 Sensors: Smart devices that detect changes in their surrounding environment,
produce, and transmit digital data. Sensors should be able to detect a wide
range of physical phenomenon ranging from temperature, pressure, to motion

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 71


Internet of Things Lesson

and magnetic fields. Examples of sensors include thermostats, moisture


sensors, accelerometer, gas/smoke sensors and so on. In IoT, different sensors
are used for different IoT applications to produce and transfer the data for
processing.
 Actuators: Devices that collect data from sensors and perform the required
action. Actuators consume energy to produce physical action like creating a
motion or controlling a system. Examples of actuators include electric motor that
uses electric power to generate motion, hydraulic actuators use fluid pressure to
generate motion. In IoT, actuators help to automate the operations by applying
a force based on the dynamics of data generated by sensors.
 Gateways: IoT involves billions of devices that are on various networks getting
connected for data communication. Gateways are devices that manage data
traffic between networks by translating their network protocols. This process
ensures that devices operating in various networks are interoperable. In IoT,
these gateway devices can also be designed to analyze and secure the data
that are collected from sensors before transmitting it to the next phase.

Information Storage and Management (ISM) v4

Page 72 © Copyright 2019 Dell Inc.


Internet of Things Lesson

Internet of Things Use Cases

Use Case Description

Home Automation Allows home owners to monitor and control home appliances
anytime irrespective of the location

Smart Cities Highlights the need to enhance the quality of life of the
citizens using IoT

Wearables Helps to collect data about the users health. Also helps to
detect and report crimes

Manufacturing Helps industries to identify optimization possibilities in their


Industries day to day operations

Notes

 Home Automation: The use of IoT has entered the residential environment with
the introduction of smart home technology. Various electronic objects at home
such as air conditioner, lights, refrigerators, security cameras, kitchen stoves
can be connected to the Internet with the help of sensors. This will allow the
home owners to efficiently monitor and control the objects anytime irrespective
of the location.
 Smart Cities: The smart cities concept highlights the need to enhance the
quality of life of the citizens using smart public infrastructure. This process
enables optimization of power usage, efficient water supply, manage waste
collections, reliable public transportation using IoT sensors. All these data will
collected and sent to a control center which directs the necessary actions. This
application of IoT can also be extended to build smarter environment by early
detection of earthquake, air pollution, and forest fire.
 Wearables: With the use of wearables and embedded devices on people, IoT
sensors can collect data about the users regarding their health, heartbeat, and
exercise patterns. For example, embedded chips enable doctors to monitor
patients who are in critical care, by tracking and charting all their vital signs

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 73


Internet of Things Lesson

constantly. Wearables also have their application in detecting and reporting


crimes in the city.
 Manufacturing Industries: Using IoT in manufacturing industries is helping them
to identify optimization possibilities in their day to day operations. By applying
IoT, they are not just able to monitor but they are also able to automate the
complex tasks involved.

Information Storage and Management (ISM) v4

Page 74 © Copyright 2019 Dell Inc.


Machine Learning Lesson

Machine Learning Lesson

Introduction

This lesson presents an overview of machine learning and the different types of
machine learning algorithms. It focuses on the impact of machine learning on data
center, and its use cases.

This lesson covers the following topics:


 Overview of Machine Learning
 Types of Machine Learning Algorithms
 Impact of Machine Learning on the Data Center
 Use cases of Machine Learning

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 75


Machine Learning Lesson

Machine Learning

Machine Learning (ML) Overview

 Automation can provide faster, better, and deeper data insights


 Intelligent machines are being built to automatically learn from data and to make
decisions

– Intelligent machines help to process the data in real time

Artificial Intelligence

Machine Learning

Deep Learning

Notes

Artificial intelligence, machine learning, and deep learning are three intertwined
concepts that help to build this human-like ability into computer systems. Artificial
Intelligence (AI) is an umbrella term, while machine and deep learning are the
techniques that make AI possible. AI is a technology of creating intelligent systems
that work and think like humans.

Machine learning refers to the process of ‘training’ the machine, feeding large
amounts of data into algorithms that give it the ability to learn how to perform the

Information Storage and Management (ISM) v4

Page 76 © Copyright 2019 Dell Inc.


Machine Learning Lesson

task without being explicitly programmed. Instead of writing a program, a machine


is provided with data. With the help of algorithms, machines learn from the data
and complete a specific task. When the machine is provided with a new dataset, it
adapts to it by learning from previous experiences to produce reliable outputs.

Deep learning is a machine learning technique that uses neural networks as the
underlying architecture for training models. Fast compute and storage with a lot of
memory and high-bandwidth networking will enable machine to learn faster and
provide accurate results. Neural networks is a set of algorithms that are used to
establish relationships in a dataset by imitating a human brain. A training model is
an object which is provided with an algorithm along with a set of data from which it
can learn.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 77


Machine Learning Lesson

Algorithm Types

 A machine learning process involves creating mathematical and statistical


algorithms that can accept input data and use some sort of analysis to predict
the output
 In this process, the first step is to collect the datasets for analysis.
 Once the data is collected, select the type of algorithm to be used, then build a
model.

 Train the model with test data sets, and improvise the model accordingly for
future decision making
 Most machine learning algorithms can be classified into the following three
types:

Supervised Learning Unsupervised Learning Reinforcement Learning

Models are trained to predict future Models are left to discover the Algorithms/models interacts with their
events information environment

Algorithms try to find patterns using Algorithms uses unlabeled data and try Produces results based on trial and
labeled dataset to find similarities/differences error method

Inputs and outputs can be clearly Only input data is given and output data Uses rewards and errors as feedback to
defined is not available learn

Notes

 Supervised Learning: Models are trained to predict future events by learning


from previous datasets. In supervised learning, you teach the models by training
them on a labelled data set. The algorithm then tries to find patterns in the data
to predict events on new data sets. This labeled dataset includes attributes
(properties) and observations (values). This type of learning is used when the
inputs and outputs can be clearly identified. The learning algorithm can
compare its predictions with the correct output. If any errors are recognized,
they have the capability to correct themselves and improvise the model
accordingly.

Information Storage and Management (ISM) v4

Page 78 © Copyright 2019 Dell Inc.


Machine Learning Lesson

 Example: A dataset consists of bikes and cars. The machine is trained by


providing the features of each. A bike has 2 wheels, and a car has four
wheels. Now, when the machine is provided with a new dataset, the
machine can identify either of them based on the previous experience.
 Unsupervised Learning: Models are left to discover the information/structure
that is hidden in the unlabeled data. This type of learning is used when only the
input data is available and there is no output data. The model itself has to
identify the output by grouping the unlabeled data by similarities or differences
without any prior training.
 Example: A machine is provided with an image having dogs and cats in it
without specifying either the dog feature or the cat feature. The machine
categorizes them by comparing the similarities and differences into two
groups. One group having all dogs in it and the second group having all cats
in it.
 Reinforcement Learning: The learning algorithm is enabled to interact with its
environment and produce results based on a trial and error method. The model
continues to train itself with the help of rewards and errors feedback. The
machine depends on both learning from the previous feedback, and discovers
the new strategy to complete a specific task. By determining which action
results in greater rewards, the output results are produced and this is how the
machine maximizes its performance.

 Example: Reinforcement learning is similar to playing video games, when a


gamer completes a level, he will be rewarded, and if he is unable to
complete over a certain number of chances, the game will be over. Now, the
gamer has to make new strategies based on the previous game experience
to improve his performance.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 79


Machine Learning Lesson

Impact on the Data Center

 Improves efficiency of the data center and its management


 Helps to identify security issues which would otherwise become challenging
using manual operations
 Requires sufficient storage capacity to manage and store datasets
 Requires high-end microprocessors and modern storage solution

Notes

Artificial Intelligence and


Machine Learning are providing
new opportunities for the data
center as well as creating
challenges if organizations are
not prepared to support these
technologies from their
infrastructure aspect.

Machine learning helps in


making the data center and its management efficient by reducing energy usage,
maximizing usage and operation of resources, automating operations, and
preventing downtime.

Machine learning algorithms can be applied to data logs collected from


infrastructure resources to identify any problems or security issues that would
otherwise become challenging using manual operations. As this operational data
log becomes a larger dataset for machine learning systems, it requires sufficient
storage capacity to efficiently manage and store.

Machine learning applications require high-end microprocessors for faster


processing of data and modern storage solutions to keep up with the processing
speed. Organizations can consider using hybrid cloud storage options for reducing
the data center footprint, load-balancing, and cost-effectiveness.

Information Storage and Management (ISM) v4

Page 80 © Copyright 2019 Dell Inc.


Machine Learning Lesson

Machine Learning Use Cases

Use Case Description

Energy A large amount of data that is generated by this industry is


processed using machine learning solutions to increase their
productivity. It helps to efficiently use energy storage by
tracking the usage. Handles different types of energy
sources using autonomous grids. It also helps to predict
component failures and consumption demand.

Media and Content from the media and entertainment industry can be
Entertainment automatically tagged using metadata by applying machine
learning solutions. This method enhances content-based
search activity by finding the right content quickly and helps
the content developers to optimize the content to specific
audiences based on their search data. It also plays an
important role in creating video subtitles using natural
language processing.

Sports Machine learning can be applied to sports in predicting the


results of the games, helps coaches to get insights into the
players performance and to better organize the games with
appropriate strategy by analyzing the performance and game
data.

Financial Services Banks and other businesses use machine learning to detect
and prevent fraudulent activities for credit cards and bank
accounts. It also helps to identify investment opportunities for
traders by monitoring market changes. It is used to provide
risk management solutions like predicting financial crisis,
loan repayment capabilities of the customers, and securing
financial data.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 81


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Dell EMC Cloud for Microsoft Azure,
 Dell EMC Ready Solution for Artificial Intelligence
 Dell Edge Gateway
 VMware Cloud on AWS
 Pivotal Cloud Foundry

Information Storage and Management (ISM) v4

Page 82 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts in Practice

Concepts in Practice

Dell EMC Cloud for Microsoft Azure

Delivers Infrastructure and Platform as a Service with a consistent Azure


experience on-premises and in the public cloud. This platform is built on VxRack
AS hyper-converged architecture that has modular building blocks that are called
nodes and powered by Microsoft Windows software-defined storage and
networking capabilities. It is managed using Microsoft Azure Stack interface. Cloud
for Microsoft Azure Stack provides a simple, cost-effective solution that delivers
multiple performance and capacity options to match any use case and covers a
wide variety of cloud-native applications and workloads.

Pivotal Cloud Foundry

An enterprise Platform as a Service solution, which is built on the foundation of the


Cloud Foundry open-source PaaS project. Pivotal CF, powered by Cloud Foundry,
enables streamlined application development, deployment, and operations in both
private and public clouds. It supports multiple programming languages and
frameworks. It helps developers to deploy their applications without being
concerned about configuring and managing the underlying cloud infrastructure. It
provides zero downtime stack updates while migrating the applications to the new
stack. Developers can use the security controls offered by PCF.

Dell Edge Gateway

An intelligent device that is designed to aggregate, secure, analyze, and relay data
from diverse sensors and equipment at the edge of the network. These gateways
bridge both legacy systems and modern sensors to the internet, helping to get
business insights from the real-time, pervasive data in your machines and
equipment. It is compact, consumes less power, and suitable for challenging field
and mobile use cases. It is designed for flexible manageability using Dell Edge
Device Manager or a third-party on-premise console.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 83


Concepts in Practice Lesson

Dell EMC Ready Solution for Artificial Intelligence

These solutions shorten the deployment time from months to days. They include
software that streamlines the set‑ up of data science environments to just a few
clicks, boosting data scientist productivity. These solutions are optimized with
software, servers, networking, storage, and services to help organizations to get
faster and deeper insights. These solutions include:

 Dell EMC Machine Learning with Hadoop: Builds on the power of tested and
proven Dell EMC Ready Bundles for Hadoop, created in partnership with
Cloudera®. This solution includes an optimized solution stack along with data
science and framework optimization. It consists of Cloudera Data Science
Workbench with the added ease of a Dell EMC Data Science Engine
 Dell EMC Deep Learning with Intel: Simplifies and accelerates the adoption of
deep learning technology with an optimized solution stack that simplifies the
entire workflow from model building to training to inferencing. It consists of
PowerEdge C servers and Dell EMC H-series networking based on Intel Omni-
Path networking.
 Dell EMC Deep Learning with NVIDIA: Provides a GPU‑ optimized solution
stack that can shave valuable time from deep learning projects. It consists of
PowerEdge servers with NVIDIA GPUs and Isilon Scale-out NAS storage.

VMware Cloud on AWS

Extends the VMware Software Defined Data Center (SDDC) software onto the
AWS cloud. This SDDC software consists of several other products including
vCenter Server for data center management, vSAN for software-defined storage,
and NSX for software-defined networking. It enables customers to run their
VMware vSphere based applications across private, public, and hybrid cloud
environments with optimized access to AWS services. It helps virtual machines in
SDDC to access AWS EC2 and S3 services. This solution provides workload
migration, allows customers to use the global presence of AWS data centers, and
flexibility of management.

Information Storage and Management (ISM) v4

Page 84 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Assessment

1. What is a machine learning technique that uses neural networks as the


underlying architecture for training models?

A. Deep learning

B. Bigdata analytics

C. Edge computing

D. Internet of Things

2. Identify the cloud computing characteristic that controls and optimizes resource
use by leveraging a metering capability.

A. Measured services

B. On-demand self service

C. Resource pooling

D. Rapid elasticity

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 85


Summary

Summary

Information Storage and Management (ISM) v4

Page 86 © Copyright 2019 Dell Inc.


Modern Data Center Environment

Introduction

This module focuses on the compute system, its components, and its types. This
module also focuses on compute virtualization and application virtualization.
Further, this module focuses on an overview of storage and connectivity in a data
center. Finally, this module focuses on an overview of software-defined data
center.

Upon completing this module, you will be able to:


 Describe a compute system, its components, and its types
 Describe compute virtualization, desktop virtualization, and application
virtualization
 Provide an overview of storage and connectivity in a data center
 Provide an overview of software-defined data center

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 87


Compute System Lesson

Compute System Lesson

Introduction

This lesson covers compute system, and its key physical and logical components.
This lesson also covers the types of compute systems.

This lesson covers the following topics:


 Explain physical and logical components of a compute system
 Explain types of compute systems

Information Storage and Management (ISM) v4

Page 88 © Copyright 2019 Dell Inc.


Compute System Lesson

Compute System

What is a Compute System?

 A computing platform (hardware and system software) that runs applications

 Physical components include processor, memory, internal storage, and I/O


devices
 Logical components include OS, device drivers, file system, and logical
volume manager

Compute System

Notes

A compute system is a computing device (combination of hardware, firmware, and


system software) that runs business applications.

Examples of compute systems include physical servers, desktops, laptops, and


mobile devices. The term compute system refers to physical servers and hosts on
which platform software, management software, and business applications of an
organization are deployed.

A compute system’s hardware consists of processor(s), memory, internal storage,


and I/O devices. The logical components of a compute system include the
operating system (OS), file system, logical volume manager, and device drivers.
The OS may include the other software, or they can be installed individually.

In an enterprise data center, applications are typically deployed on compute


clusters for high availability and for balancing computing workloads. A compute
cluster is a group of two or more compute systems that function together, sharing
certain network and storage resources, and logically viewed as a single system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 89


Compute System Lesson

Information Storage and Management (ISM) v4

Page 90 © Copyright 2019 Dell Inc.


Compute System Lesson

Types of Compute Systems

The compute systems used in building data centers are typically classified into
three categories: tower compute system, rack-mounted compute system, and blade
compute system

Rack-mounted Compute System

Blade Compute System


Tower Compute System

Tower

A tower compute system, also known as a tower server, is a compute system built
in an upright stand-alone enclosure called a “tower”, which looks similar to a
desktop cabinet. Tower servers have a robust build, and have integrated power
supply and cooling. They typically have individual monitors, keyboards, and mice.

Tower servers occupy significant floor space and require complex cabling when
deployed in a data center. They are also bulky, and a group of tower servers
generate considerable noise from their cooling units. Tower servers are typically
used in smaller environments. Deploying many tower servers in large environments
may involve substantial expenditure.

Rack-mounted

A rack-mounted compute system, also known as a rack server, is a compute


system designed to be fixed inside a frame called a “rack”. A rack is a standardized

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 91


Compute System Lesson

enclosure containing multiple mounting slots called “bays”, each of which holds a
server in place with the help of screws. A single rack contains multiple servers
stacked vertically in bays, thereby simplifying network cabling, consolidating
network equipment, and reducing the floor space use. Each rack server has its own
power supply and cooling unit. Typically, a console is mounted on a rack to enable
administrators to manage all the servers in the rack.

Some concerns with rack servers are that they are cumbersome to work with, and
they generate many heat because of which more cooling is required, which in turn
increases power costs. A “rack unit” (denoted by U or RU) is a unit of measure of
the height of a server designed to be mounted on a rack. One rack unit is 1.75
inches (44.45 mm). A 1 U rack server is typically 19 inches (482.6 mm) wide.

The standard rack cabinets are 19 inches wide and the common rack cabinet sizes
are 42U, 37U, and 27U. The rack cabinets are also used to house network,
storage, telecommunication, and other equipment modules. A rack cabinet may
also contain a combination of different types of equipment modules.

Blade

A blade compute system, also known as a blade server, is an electronic circuit


board containing only core processing components, such as processor(s), memory,
integrated network controllers, storage drive, and essential I/O cards and ports.
Each blade server is a self-contained compute system and is typically dedicated to
a single application.

A blade server is housed in a slot inside a blade enclosure (or chassis), which
holds multiple blades and provides integrated power supply, cooling, networking,
and management functions. The blade enclosure enables interconnection of the
blades through a high-speed bus and also provides connectivity to external storage
systems.

The modular design of the blade servers makes them smaller, which minimizes the
floor space requirements, increases the compute system density and scalability,
and provides better energy efficiency as compared to the tower and the rack
servers. It also reduces the complexity of the compute infrastructure and simplifies
compute infrastructure management. It provides these benefits without
compromising on any capability that a non-blade compute system provides.

Information Storage and Management (ISM) v4

Page 92 © Copyright 2019 Dell Inc.


Compute System Lesson

Some concerns with blade servers include the high cost of a blade system (blade
servers and chassis), and the proprietary architecture of most blade systems due to
which a blade server can typically be plugged only into a chassis from the same
vendor.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 93


Compute System Lesson

Physical Components of a Compute System

Component Description

Processor An IC that executes software programs by performing


arithmetical, logical, and input/output operations

Random-Access Volatile data storage that contains the programs for execution
Memory and the data that are used by the processor

Read-Only Semiconductor memory containing boot, power management,


Memory and other device-specific firmware

Motherboard A PCB that holds the processor, RAM, ROM, network and I/O
ports, and other integrated components, such as GPU and NIC

Chipset A collection of microchips on a motherboard to manage specific


functions, such as processor access to RAM and to peripheral
ports

Secondary A persistent storage device such as HDD or SSD


Storage

Notes

Key components are:


 Processor: A processor, also known as a Central Processing Unit (CPU), is an
integrated circuit (IC). This processor executes the instructions of a software
program by performing fundamental arithmetical, logical, and input/output
operations. A common processor/instruction set architecture is the x86
architecture with 32-bit and 64-bit processing capabilities. Modern processors
have multiple cores (independent processing units), each capable of functioning
as an individual processor. Socket- A single package which can have one or
more processor cores with one or more logical processors in each core. A dual-
core processor, for example, can provide almost double the performance of a

Information Storage and Management (ISM) v4

Page 94 © Copyright 2019 Dell Inc.


Compute System Lesson

single-core processor, by allowing two virtual CPUs to execute at the same


time.
 Random-Access Memory (RAM): The RAM or main memory is an IC that
serves as a volatile data storage internal to a compute system. The RAM is
directly accessible by the processor, and holds the software programs for the
execution and the data that are used by the processor.
 Read-Only Memory (ROM): A ROM is a type of non-volatile semiconductor
memory from which data can only be read but not written to. It contains the boot
firmware (that enables a compute system to start), power management
firmware, and other device-specific firmware.
 Motherboard: A motherboard is a printed circuit board (PCB) to which all
compute system components connect. It has sockets to hold components such
as the microprocessor chip, RAM, and ROM. It also has network ports, I/O ports
to connect devices such as keyboard, mouse, and printers, and essential
circuitry to carry out computing operations. A motherboard may also have
integrated components, such as a graphics processing unit (GPU), a network
interface card (NIC), and adapters to connect to external storage devices.
 Chipset: A chipset is a collection of microchips on a motherboard, and it is
designed to perform specific functions. The two key chipset types are
Northbridge and Southbridge. Northbridge manages processor access to the
RAM and the GPU, while Southbridge connects the processor to different
peripheral ports, such as USB ports.
 Secondary storage: Secondary storage is a persistent storage device, such as
a hard disk drive or a solid-state drive. In this storage the OS and the
application software are installed. The processor cannot directly access
secondary storage. The desired applications and data are loaded from the
secondary storage on to the RAM to enable the processor to access them.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 95


Compute System Lesson

Logical Components of a Compute System

The key logical components of a compute system are:


 Operating system
 Virtual memory
 Logical volume manager
 File system

Information Storage and Management (ISM) v4

Page 96 © Copyright 2019 Dell Inc.


Compute System Lesson

Logical Components: Operating System

An operating system (OS) is software that acts as an intermediary between a user


of a compute system and the compute system hardware.

The OS manages hardware functions, applications execution, and provides a user


interface (UI) for users to operate and use the compute system.

User Applications

Operating System

User Interface

GUI Command Line

System Calls (APIS)

Services

Program Memory Resource I/O Operations File System Networking Security


Execution Management Management Management

Compute System Hardware

Notes

The image depicts a generic architecture of an OS. Some functions (or services) of
an OS include program execution, memory management, resources management
and allocation, and input/output management. An OS also provides networking and
basic security for the access and usage of all managed resources. It also performs
basic storage management tasks while managing other underlying components,
such as the device drivers, logical volume manager, and file system. An OS also
contains high-level Application Programming Interfaces (APIs) to enable programs
to request services.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 97


Compute System Lesson

Logical Components: Virtual Memory

The amount of physical memory (RAM) in a compute system determines both the
size and the number of applications that can run on the compute system.

Address Translation

Unavailable

Unavailable

Physical Memory

Virtual Memory Storage


Drive

Notes

Memory virtualization presents physical memory to applications as a single logical


collection of contiguous memory locations called virtual memory. While executing
applications, the processor generates logical addresses (virtual addresses) that
map into the virtual memory. The memory management unit of the processor and
then maps the virtual address to the physical address. The OS utility, which is
known as the virtual memory manager (VMM), manages the virtual memory and
also the allocation of physical memory to virtual memory.

An extra memory virtualization feature of an OS enables the capacity of secondary


storage devices to be allocated to the virtual memory. This device creates a virtual
memory with an address space that is larger than the physical memory space

Information Storage and Management (ISM) v4

Page 98 © Copyright 2019 Dell Inc.


Compute System Lesson

present in the compute system. This virtual memory enables multiple applications
and processes, whose aggregate memory requirement is greater than the available
physical memory to run on a compute system without impacting each other.

The VMM manages the virtual-to-physical memory mapping. This VMM fetches
data from the secondary storage when a process references a virtual address that
points to data at the secondary storage. The space used by the VMM on the
secondary storage is known as a swap space. A swap space (also known as page
file or swap file) is a portion of the storage drive that is used as physical memory.

In a virtual memory implementation, the memory of a system is divided into


contiguous blocks of fixed-size pages. A process known as paging moves inactive
physical memory pages onto the swap file and brings them back to the physical
memory when required.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 99


Compute System Lesson

Logical Components: Logical Volume Manager (LVM)

 Creates and controls compute


level logical storage: Physical Volumes

 Provides a logical view of


physical storage
 Logical data blocks are
mapped to physical data Volume Group
blocks
 Physical volumes form a
volume group:
Logical Logical Logical
 LVM manages volume Volume Volume Volume

groups as a single entity


 Logical volumes are created
from a volume group

Notes

Logical Volume Manager (LVM) is software that runs on a compute system and
manages logical and physical storage. LVM is an intermediate layer between the
file system and the physical drives. It can partition a larger-capacity disk into virtual,
smaller-capacity volumes (partitioning) or aggregate several smaller disks to form a
larger virtual volume (concatenation). LVMs are mostly offered as part of the OS.
The evolution of LVMs enabled dynamic extension of file system capacity and
efficient storage management. The LVM provides optimized storage access and
simplifies storage resource management. It hides details about the physical disk
and the location of data on the disk. It enables administrators to change the storage
allocation even when the application is running.

The basic LVM components are physical volumes, logical volume groups, and
logical volumes. In LVM terminology, each physical disk that is connected to the
compute system is a physical volume (PV). A volume group is created by grouping
one or more PVs. A unique physical volume identifier (PVID) is assigned to each
PV when it is initialized for use by the LVM. Physical volumes can be added or

Information Storage and Management (ISM) v4

Page 100 © Copyright 2019 Dell Inc.


Compute System Lesson

removed from a volume group dynamically. Each PV is divided into equal-sized


data blocks called physical extents when the volume group is created.

Logical volumes (LV) are created within a given volume group. A LV can be
thought of as a disk partition, whereas the volume group itself can be thought of as
a disk. The size of a LV is based on a multiple of the number of physical extents.
The LV appears as a physical device to the OS. A LV is made up of noncontiguous
physical extents and may span over multiple physical volumes. A file system is
created on a logical volume.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 101


Compute System Lesson

Logical Components: LVM Example

 Disk partitioning was introduced to improve the flexibility and utilization of disk
drives
 In partitioning, a disk drive is divided into logical containers called logical
volumes.

Compute Systems

Logical
Volume(s)

Physical
Volume(s)
Partitioning Concatenation

Notes

For example, a large physical drive can be partitioned into multiple LVs to maintain
data according to the file system and application requirements. The partitions are
created from groups of contiguous cylinders when the hard disk is initially set up on
the host. The host’s file system accesses the logical volumes without any
knowledge of partitioning and physical structure of the disk. Concatenation is the
process of grouping several physical drives and presenting them to the host as one
large logical volume.

Information Storage and Management (ISM) v4

Page 102 © Copyright 2019 Dell Inc.


Compute System Lesson

Logical Components: File System

 File is a collection of related records stored as a single named unit in


contiguous logical address space.
 A file system controls and manages the storage and retrieval of files.
 Enables users to perform various operations on files
 Groups and organizes files in a hierarchical structure.
 File system may be broadly classified as:

 Disk-based file system


 Network-based file system
 Virtual file system

Notes

Files are of different types, such as text, executable, image, audio/video, binary,
library, and archive. Files have various attributes, such as name, unique identifier,
type, size, location, owner, and protection.

A file system is an OS component that controls and manages the storage and
retrieval of files in a compute system. A file system enables easy access to the files
residing on a storage drive, a partition, or a logical volume. It consists of logical
structures and software routines that control access to files. It enables users to
perform various operations on files, such as create, access (sequential/random),
write, search, edit, and delete.

A file system typically groups and organizes files in a tree-like hierarchical


structure. It enables users to group files within a logical collection called a directory,
which is containers for storing pointers to multiple files. A file system maintains a
pointer map to the directories, subdirectories (if any), and files that are part of the
file system. It also stores all the metadata (file attributes) associated with the files.

A file system block is the smallest unit allocated for storing data. Each file system
block is a contiguous area on the physical disk. The block size of a file system is

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 103


Compute System Lesson

fixed at the time of its creation. The file system size depends on the block size and
the total number of file system blocks

File systems may be broadly classified as follows:

Disk-based

A disk-based file system manages the files stored on storage devices such as
solid-state drives, disk drives, and optical drives. Examples of disk-based file
systems are Microsoft NT File System (NTFS), Apple Hierarchical File System
(HFS) Plus, Extended File System family for Linux, Oracle ZFS, and Universal Disk
Format (UDF).

Network-based

A network-based file system uses networking to enable file system access between
compute systems. Network-based file systems may use either the client/server
model, or may be distributed/clustered. In the client/server model, the file system
resides on a server, and is accessed by clients over the network. The client/server
model enables clients to mount the remote file systems from the server.

NFS for UNIX environment and CIFS for Windows environment (both covered in
Module, ‘File-based Storage System (NAS)’) are two standard client/server file
sharing protocols. Examples of network-based file systems are: Microsoft
Distributed File System (DFS), Hadoop Distributed File System (HDFS), VMware
Virtual Machine File System (VMFS), Red Hat GlusterFS, and Red Hat CephFS.

Virtual

A virtual file system is a memory-based file system. This process enables compute
systems to transparently access different types of file systems on local and network
storage devices. It provides an abstraction layer that enables applications to
access different types of file systems in a uniform way. It bridges the differences
between the file systems for different operating systems, without the application’s
knowledge of the type of file system they are accessing. The examples of virtual file
systems are Linux Virtual File System (VFS) and Oracle CacheFS.

Information Storage and Management (ISM) v4

Page 104 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

Compute and Desktop Virtualization Lesson

Introduction

This lesson covers compute virtualization, hypervisor, and virtual machine. This
lesson also covers desktop virtualization.

This lesson covers the following topics:


 Explain compute virtualization, hypervisor, and virtual machine
 Explain desktop virtualization

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 105


Compute and Desktop Virtualization Lesson

Compute and Desktop Virtualization

What is Compute Virtualization?

Definition: Compute Virtualization


The technique of abstracting the physical compute hardware from the
operating system and applications enabling multiple operating
systems to run concurrently on a single or clustered physical compute
system(s).

VM VM VM

APP APP APP

OS OS OS

Compute Virtualization (Hypervisor)

Pool of Physical Compute Capacity (x86 Hardware)

Notes

Compute virtualization is a technique of abstracting the physical hardware of a


compute system from the operating system (OS) and applications. The decoupling
of the physical hardware from the OS and applications enables multiple operating
systems to run concurrently on a single or clustered physical compute system(s).
Compute virtualization enables the creation of virtual compute systems called
virtual machines (VMs). Each VM runs an OS and applications, and is isolated from
the other VMs on the same compute system. Compute virtualization is achieved by
a hypervisor, which is virtualization software that is installed on a physical compute
system. The hypervisor provides virtual hardware resources, such as CPU,

Information Storage and Management (ISM) v4

Page 106 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

memory, storage, and network resources to all the VMs. Depending on the
hardware capabilities, many VMs can be created on a single physical compute
system.

A VM is a logical entity; but to the OS running on the VM, it appears as a physical


compute system, with its own processor, memory, network controller, and disks.
However, all VMs share the same underlying physical hardware of the compute
system. The hypervisor allocates the compute system’s hardware resources
dynamically to each VM. From a hypervisor’s perspective, each VM is a discrete
set of files.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 107


Compute and Desktop Virtualization Lesson

Need for Compute Virtualization

Before Virtualization

Drawbacks
 IT silos and underutilized resources
 Inflexible and expensive
 Management inefficiencies
 Risk of downtime

APP

After Virtualization
OS

X86 Hardware
Benefits:
 Server consolidation and improved
resource utilization
 Flexible infrastructure at lower costs
 Increased management efficiency
 Increased availability and improved business continuity

VM VM VM

APP APP APP

OS OS OS

Compute Virtualization (Hypervisor)

Pool of Physical Compute Capacity (x86 Hardware)


Notes

In an x86-based physical compute system, the software,


and hardware are tightly coupled and it can run only one OS at a time. A physical
compute system often faces resource conflicts when multiple applications running
on the compute have conflicting requirements. Moreover, many applications do not
take full advantage of the hardware capabilities available to them.

Information Storage and Management (ISM) v4

Page 108 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

Resources such as processors, memory, and storage frequently remain


underutilized. Many compute systems also requires complex network cabling and
considerable floor space and power requirements. Hardware configuration,
provisioning, and management become complex and require more time. A physical
compute is a single point of failure because its failure leads to application
unavailability.

Compute virtualization enables overcoming these challenges by allowing multiple


operating systems and applications to run on a single compute system. It converts
physical machines to virtual machines and consolidates the converted machines
onto a single compute system. Server consolidation improves resource utilization
and enables organizations to run their data center with a fewer machines. This
server consolidation, in turn reduces the hardware acquisition costs and
operational costs, and saves the data center space and energy requirements.

Compute virtualization increases the management efficiency and reduces the


maintenance time. The creation of VMs takes less time compared to a physical
compute setup. The organizations can provision compute resources faster, and
with greater ease to meet the growing resource requirements. Individual VMs can
be restarted, upgraded, or even failed, without affecting the other VMs on the same
physical compute. Moreover, VMs are portable and can be copied or moved from
one physical compute to another without causing application unavailability.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 109


Compute and Desktop Virtualization Lesson

What is a Hypervisor?

Definition: Hypervisor
Software that provides a virtualization layer for abstracting compute
system hardware, and enables the creation of multiple virtual
machines.

 There are two key components


to a hypervisor VM VM

 Hypervisor Kernel APP APP

o Provides functionality
similar to an OS kernel OS OS

o Presents resource VMM VMM

requests to physical
Hypervisor Kernel
hardware
 Virtual machine manager Physical Compute System

(VMM)
o Each VM is assigned a
VMM
 There are also two types of hypervisor

 Bare-metal
 Hosted

Notes

Hypervisor is a compute virtualization software that is installed on a compute


system. It provides a virtualization layer that abstracts the processor, memory,
network, and storage of the compute system and enables the creation of multiple
virtual machines. Each VM runs its own OS, which essentially enables multiple

Information Storage and Management (ISM) v4

Page 110 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

operating systems to run concurrently on the same physical compute system. The
hypervisor provides standardized hardware resources to all the VMs.

A hypervisor has two key components: kernel and virtual machine manager (VMM).
A hypervisor kernel provides the same functionality as the kernel of any OS,
including process management, file system management, and memory
management. It is designed and optimized to run multiple VMs concurrently. It
receives requests for resources through the VMM, and presents the requests to the
physical hardware. Each virtual machine is assigned a VMM that gets a share of
the processor, memory, I/O devices, and storage from the physical compute
system to successfully run the VM.

Hypervisors are categorized into two types: bare-metal (Type I) and hosted (Type
II). A bare-metal hypervisor is directly installed on the physical compute hardware
in the same way as an OS. It has direct access to the hardware resources of the
compute system and is therefore more efficient than a hosted hypervisor. A bare-
metal hypervisor is designed for enterprise data centers and third platform
infrastructure. It also supports the advanced capabilities such as resource
management, high availability, and security. The image represents a bare-metal
hypervisor. A hosted hypervisor is installed as an application on an operating
system. The hosted hypervisor does not have direct access to the hardware, and
all requests pass through the OS running on the physical compute system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 111


Compute and Desktop Virtualization Lesson

What is a Virtual Machine?

Definition: Virtual Machine (VM)


A logical compute system with virtual hardware on which a supported
guest OS and its applications run.

Important points about a VM:


 Created by a hypervisor installed on a physical compute system
 Comprises virtual hardware, such as virtual processor, virtual storage, and
virtual network resources
 Appears as a physical compute system to the guest OS
 Hypervisor maps the virtual hardware to the physical hardware
 VMs on a compute system are isolated from each other

Notes

A virtual machine (VM) is a logical compute system with virtual hardware on which
a supported guest OS and its applications run. A VM is created by a hosted or a
bare-metal hypervisor installed on a physical compute system. An OS, called a
“guest OS”, is installed on the VM in the same way it is installed on a physical
compute system. From the perspective of the guest OS, the VM appears as a
physical compute system.

A VM has a self-contained operating environment, comprising OS, applications,


and virtual hardware, such as a virtual processor, virtual memory, virtual storage,
and virtual network resources. As discussed previously, a dedicated virtual
machine manager (VMM) is responsible for the execution of a VM. Each VM has its
own configuration for hardware, software, network, and security. The hypervisor
translates the VM’s resource requests and maps the virtual hardware of the VM to
the hardware of the physical compute system. For example, a VM’s I/O requests

Information Storage and Management (ISM) v4

Page 112 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

that to a virtual disk drive are translated by the hypervisor and mapped to a file on
the physical compute system’s disk drive

Compute virtualization software enables creating and managing several VMs. Each
VM has a different OS of its own—on a physical compute system or on a compute
cluster. VMs are created on a compute system, and provisioned to different users
to deploy their applications. The VM hardware and software are configured to meet
the application’s requirements. The different VMs are isolated from each other, so
that the applications and the services running on one VM do not interfere with
those running on other VMs. The isolation also provides fault tolerance so that if
one VM crashes, the other VMs remain unaffected.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 113


Compute and Desktop Virtualization Lesson

VM Hardware

 When a VM is created, it is presented with virtual hardware components that


appear as physical hardware components to the guest OS
 Within a given vendor’s environment, each VM has standardized hardware
components that make them portable across physical compute systems
 The image shows the typical hardware components of a VM

Floppy/Optical Drives and


Controllers

HBA
RAM

Graphics
Card Storage
Device

VM
Mouse Hardware
SCSI/IDE
Controllers

Keyboard

USB
Controller

Processor NIC

Notes

Based on the requirements, the virtual components can be added or removed from
a VM. However, not all components are available for addition and configuration.
Some hardware devices are part of the virtual motherboard and cannot be modified

Information Storage and Management (ISM) v4

Page 114 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

or removed. For example, the video card and the PCI controllers are available by
default and cannot be removed.

A VM can be configured with one or more virtual processors. Each VM is assigned


a virtual motherboard with the standardized devices essential for a compute system
to function. Virtual RAM is the amount of physical memory allocated to a VM, and it
can be configured based on the requirements. The virtual disk is a large physical
file, or a set of files that stores the VM’s OS, program files, application data, and
other data associated with the VM. A virtual network adapter functions like a
physical network adapter. It provides connectivity between VMs running on the
same or different compute systems, and between a VM and physical compute
systems.

Virtual optical drives and floppy drives can be configured to connect to either
physical devices or to image files, such as ISO on the storage. SCSI/IDE virtual
controllers provide a way for the VMs to connect to the storage devices. The virtual
USB controller is used to connect to a physical USB controller and to access the
connected USB devices. Serial and parallel ports provide an interface for
connecting peripherals to the VM.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 115


Compute and Desktop Virtualization Lesson

VM Files

From a hypervisor’s perspective, a VM is a discrete set of files on a storage device.


These files are:

Configuration file Stores information, such as VM name, BIOS information,


guest OS type, memory size

Virtual disk file Stores the contents of the VM's disk drive

Memory state file Stores the memory contents of a VM in a suspended state

Snapshot file Stores the VM settings and virtual disk of a VM

Log file Keeps a log of the VM’s activity and is used in troubleshooting

Notes

From a hypervisor’s perspective, a VM is a discrete set of files on a storage device.


Some of the key files that make up a VM are the configuration file, the virtual disk
file, the memory file, and the logfile. The configuration file stores the VM’s
configuration information, including VM name, location, BIOS information, guest OS
type, virtual disk parameters, number of processors, memory size, number of
adapters and associated MAC addresses, SCSI controller type, and disk drive type.
The virtual disk file stores the contents of a VM’s disk drive. A VM can have
multiple virtual disk files, each of which appears as a separate disk drive to the VM.

The memory state file stores the memory contents of a VM and is used to resume a
VM that is in a suspended state. The snapshot file stores the running state of the
VM including its settings and the virtual disk, and may optionally include the
memory state of the VM. It is typically used to revert the VM to a previous state.
Log files are used to keep a record about the VM’s activity and are often used for
troubleshooting purposes.

Information Storage and Management (ISM) v4

Page 116 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

For managing VM files, a hypervisor may use a native clustered file system, or the
Network File System (NFS). A hypervisor’s native clustered file system is optimized
to store VM files. It may be deployed on Fibre Channel and iSCSI storage, apart
from the local storage. The virtual disks are stored as files on the native clustered
file system. Network File System enables storing of VM files on remote file servers
(NAS device) accessed over an IP network. The NFS client built into the hypervisor
uses the NFS protocol to communicate with the NAS device.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 117


Compute and Desktop Virtualization Lesson

What is Desktop Virtualization?

Definition: Desktop Virtualization


Technology that decouples the OS, applications, and user state from
a physical compute system to create a virtual desktop environment
that can be accessed from any client device.

 Desktops are hosted and managed centrally


 Desktop virtualization benefits include

 Simplified desktop infrastructure management


 Improved data protection and compliance
 Flexibility of access

Notes

With the traditional desktop machine, the OS, applications, and user profiles are all
tied to a specific piece of hardware. With legacy desktops, business productivity is
impacted greatly when a client device is broken or lost. Managing a vast desktop
environment is also a challenging task.

Desktop virtualization decouples the OS, applications, and user state (profiles,
data, and settings) from a physical compute system. These components,
collectively called a virtual desktop, are hosted on a remote compute system. It can
be accessed by a user from any client device, such as laptops, desktops, thin
clients, or mobile devices. A user accesses the virtual desktop environment over a
network on a client through a web browser or a client application.

The OS and applications of the virtual desktop execute on the remote compute
system, while a view of the virtual desktop’s user interface (UI) is presented to the
end-point device. Desktop virtualization uses a remote display protocol to transmit
the virtual desktop’s UI to the end-point devices. The remote display protocol also

Information Storage and Management (ISM) v4

Page 118 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

sends back keystrokes and graphical input information from the end-point device,
enabling the user to interact with the virtual desktop.

Some key benefits of desktop virtualization are:


 Simplified desktop infrastructure management: Desktop virtualization
simplifies desktop infrastructure management, and creates an opportunity to
reduce the maintenance costs. New virtual desktops can be configured and
deployed faster than physical machines. The patches, updates, and upgrades
can be centrally applied to the OS and applications. This process simplifies or
eliminates many redundant, manual, and time-consuming tasks.
 Improved data protection and compliance: Applications and data are located
centrally, which ensures that business-critical data is not at risk if there is loss or
theft of the device. Virtual desktops are also easier to back up compared to
deploying backup solutions on end-point devices.
 Flexibility of access: Desktop virtualization enables users to access their
desktops and applications without being bound to a specific end-point device.
The virtual desktops can be accessed remotely from different end-point devices.
These benefits create a flexible work scenario and enables user productivity
from remote locations. Desktop virtualization also enables Bring Your Own
Device (BYOD), which creates an opportunity to reduce acquisition and
operational costs.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 119


Compute and Desktop Virtualization Lesson

Use Cases for Compute and Desktop Virtualization

Compute and desktop virtualization provide several benefits to organizations and


facilitate the transformation to the modern data center. Two use cases are
described below.

Use Case Description

Cloud Application  Streaming applications from the cloud to diverse


Streaming client devices
 Applications flexibly scale to meet growth in
processing and storage needs
 Applications can be delivered to devices on which
they may run natively

Desktop as a Service  Cloud service in which a VDI is hosted by a cloud


(DaaS) service provider
 Provider manages VDI and OS updates
 Facilitates CAPEX and OPEX savings

Cloud application streaming: Cloud application streaming employs application


virtualization to stream applications from the cloud to client devices. Streaming
applications from the cloud enable organizations to reach more users on multiple
devices, without modifying the application code. The application is deployed on a
cloud infrastructure, and the output is streamed to client devices, such as desktops,
tablets, and mobile phones. Because the application runs in the cloud, it can
flexibly scale to meet the massive growth in processing and storage needs,
regardless of the client devices the end users are using. The cloud service can
stream either all or portions of the application from the cloud. Cloud application
streaming enables an application to be delivered to client devices on which it may
not be possible to run the application natively.

Information Storage and Management (ISM) v4

Page 120 © Copyright 2019 Dell Inc.


Compute and Desktop Virtualization Lesson

Desktop as a Service: Desktop as a Service (DaaS) is a cloud service in which a


virtual desktop infrastructure (VDI) is hosted by a cloud service provider. The
provider offers a complete, business-ready VDI solution, delivered as a cloud
service with either subscription-based, or pay-as-you-go billing. The service
provider (internal IT or public) manages the deployment of the virtual desktops,
data storage, backup, security, and OS updates/upgrades. The virtual desktops are
securely hosted in the cloud and managed by the provider. DaaS has a multitenant
architecture, wherein virtual desktops of multiple users share the same underlying
infrastructure. However, individual virtual desktops are isolated from each other
and protected against unauthorized access and crashes on other virtual desktops.
The virtual desktops can be easily provisioned by consumers, and they are
delivered over the Internet to any client device. DaaS provides organizations with a
simple, flexible, and efficient approach to IT. It enables to lower CAPEX and OPEX
for acquiring and managing end-user computing infrastructure.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 121


Storage and Network Lesson

Storage and Network Lesson

Introduction

This lesson covers evolution of storage architecture and the types of storage
devices. This lesson also covers compute-to-compute and compute-to-storage
connectivity. Further, this lesson covers different storage connectivity protocols.

This lesson covers the following topics:


 Explain evolution of storage architecture
 List types of storage devices
 Explain compute-to-compute and compute-to-storage connectivity
 Explain storage connectivity protocols

Information Storage and Management (ISM) v4

Page 122 © Copyright 2019 Dell Inc.


Storage and Network Lesson

Storage and Network

Evolution of Storage Architecture: Server-Centric (Internal DAS)

 In a traditional environment, business units/departments in an organization have


their own servers running the business applications of the respective business
unit/department:
 Storage devices are connected directly to the servers and are typically
internal to the server
 These storage devices cannot be shared with any other server
 This is called server-centric storage architecture (Internal DAS)
 In this architecture:
 Each server has a limited number of storage devices
 The storage device exists only in relation to the server to which it is
connected
 The figure depicts an example of server-centric architecture; in the image:

 The servers of different departments in an organization have directly


connected storage
 Clients connect to the servers over a local area network (LAN) or a wide
area network (WAN)

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 123


Storage and Network Lesson

Sales Server
Storage Device

Clients
LAN/WAN

Finance Server
Storage Device

R&D Server
Storage Device

Notes

Traditional server-centric architecture (Internal DAS) has several limitations, and is


inadequate to satisfy the growing demand for storage capacity in modern
applications environments. The number of storage devices that can be connected
to one server is limited, and it is not possible to scale the storage capacity.
Moreover, a server cannot directly access the unused storage space available on
other servers.

A server failure or any administrative tasks, such as maintenance of the server or


increasing its storage capacity, also results in unavailability of information.
Furthermore, the proliferation of departmental servers in an organization results in
silos of information. These devices are difficult to manage and lead to an increase
in capital expenditure (CAPEX) and operating expenditure (OPEX).

Information Storage and Management (ISM) v4

Page 124 © Copyright 2019 Dell Inc.


Storage and Network Lesson

Evolution of Storage Architecture: Information-Centric (SAN)

 To overcome the challenges of the server-centric architecture, storage evolved


to the information-centric architecture
 In information-centric architecture (SAN), storage devices exist independently of
servers, and are managed centrally and shared between multiple compute
systems
 The figure depicts an example of information-centric architecture; in the image:
 The servers of different departments in an organization are connected to the
shared storage over a SAN
 The clients connect to the servers over a LAN or a WAN
 When a new server is deployed in the environment, storage is assigned to
the server from the same shared pool of storage devices
 The storage capacity can be increased dynamically and without impacting
information availability by adding storage devices to the pool
 This architecture improves the overall storage capacity utilization, while making
management of information and storage more flexible and cost-effective

Sales Server

Storage Area
Clients LAN/WAN
Network
Finance Server

Storage Devices

R&D Server

Notes

Storage devices assembled within storage systems form a storage pool, and
several compute systems access the same storage pool over a specialized, high-

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 125


Storage and Network Lesson

speed storage area network (SAN). A SAN is used for information exchange
between compute systems and storage systems, and for connecting storage
systems. It enables compute systems to share storage resources, improve the
utilization of storage systems, and facilitate centralized storage management.

SANs are classified based on protocols they support. Common SAN deployment
types are Fibre Channel SAN (FC SAN), Internet Protocol SAN (IP SAN), and Fibre
Channel over Ethernet SAN (FCoE SAN). These topics are covered later in the
course.

Information Storage and Management (ISM) v4

Page 126 © Copyright 2019 Dell Inc.


Storage and Network Lesson

Types of Storage Devices

Storage Description
Type

Magnetic  Stores data on a circular disk with a ferromagnetic coating


disk drive  Provides random read/write access
 Most popular storage device with large storage capacity

Solid-state  Stores data on a semiconductor-based memory


(flash) drive  Very low latency per I/O, low power requirements, and very high
throughput

Magnetic  Stores data on a thin plastic film with a magnetic coating


tape drive  Provides only sequential data access
 Low-cost solution for long term data storage

Optical disc  Stores data on a polycarbonate disc with a reflective coating


drive  Write Once and Read Many capability: CD, DVD, BD
 Low-cost solution for long-term data storage

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 127


Storage and Network Lesson

Overview of Storage Virtualization

 Abstracts physical storage resources to create virtual storage resources:


 Virtual volumes
 Virtual disk files
 Virtual storage systems
 Storage virtualization software can be

 Built into the operating environment of a storage system


 Installed on an independent compute system
 Built into a hypervisor

Information Storage and Management (ISM) v4

Page 128 © Copyright 2019 Dell Inc.


Storage and Network Lesson

Introduction to Connectivity

 Communication paths between IT infrastructure components for information


exchange and resource sharing
 Types of connectivity

 Compute-to-compute connectivity
 Compute-to-storage connectivity

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 129


Storage and Network Lesson

Compute-to-Compute Connectivity

 Compute-to-compute connectivity typically uses protocols based on the Internet


Protocol (IP)
 Each physical compute system is connected to a network through one or
more host interface devices, called a network interface controller (NIC)
 Physical switches and routers are the commonly used interconnecting
devices
 A switch enables different compute systems in the network to communicate
with each other
 A router is an OSI Layer-3 device that enables different networks to
communicate with each other
 Commonly used network cables are copper cables and optical fiber cables
 The figure shows a network (LAN or WAN) that provides interconnections
among the physical compute systems:

 It is necessary to ensure that appropriate switches and routers, with


adequate bandwidth and ports, are available to provide the required network
performance

VM VM
APP AP
P
OS OS

Hypervisor

Client Compute
Systems
Ethernet Switch IP Router Ethernet Switch
VM VM

AP AP
P P
OS OS

Hypervisor

Information Storage and Management (ISM) v4

Page 130 © Copyright 2019 Dell Inc.


Storage and Network Lesson

Compute-to-Storage Connectivity

 Enabled through physical components and interface protocols


 Physical connectivity components
 Host bus adapter, port, and cable
 Storage may be connected directly or over a SAN

Clients

Ethernet Switch
LAN

Servers

iSCSI Target

FC Switch

Storage System

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 131


Storage and Network Lesson

Notes

Storage may be connected directly to a compute system or over a SAN.


Connectivity and communication between compute and storage are enabled
through physical components and interface protocols. The physical components
that connect compute to storage are host interface device, port, and cable.

Host bus adapter: A host bus adapter (HBA) is a host interface device that
connects a compute system to storage or to a SAN. It is an application-specific
integrated circuit (ASIC) board. It performs I/O interface functions between a
compute system and storage, relieving the processor from more I/O processing
workload. A compute system typically contains multiple HBAs.

Port: A port is a specialized outlet that enables connectivity between the compute
system and storage. An HBA may contain one or more ports to connect the
compute system to the storage. Cables connect compute systems to internal or
external devices using copper or fiber optic media.

Information Storage and Management (ISM) v4

Page 132 © Copyright 2019 Dell Inc.


Storage and Network Lesson

What is a Protocol?

Definition: Protocols
Define formats for communication between devices. Protocols are
implemented using interface devices (or controllers) at both the
source and the destination devices.

Protocol Description

Fibre Channel (FC)  Widely used protocol for high-speed compute-


to-storage communication
 Provides a serial data transmission that
operates over copper wire and/or optical fiber

Internet Protocol (IP)  Existing IP-based network leveraged for


storage communication
 Examples: iSCSI and FCIP protocols

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 133


Storage and Network Lesson

Overview of Network Virtualization

Abstracts physical network resources to create virtual network resources:


 Virtual switch
 Virtual LAN
 Virtual SAN

Network virtualization software can be:


 Built into the operating environment of a network device
 Installed on an independent compute system
 Built into a hypervisor

Information Storage and Management (ISM) v4

Page 134 © Copyright 2019 Dell Inc.


Applications Lesson

Applications Lesson

Introduction

This lesson covers traditional and modern application. Further, this lesson covers
microservices and application virtualization.

This lesson covers the following topics:


 Explain traditional and modern applications
 Explain microservices
 Describe application virtualization

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 135


Applications Lesson

Applications

Application Overview

Definition: Application

 Definition: Application
– A software program or set of
programs that is designed to
perform a group of
coordinated tasks.
 Examples
Applications
– Customer relationship
management (CRM)
– Enterprise Resource
Planning (ERP)
– Email such as Microsoft
Outlook

Notes

For anyone who uses computers or smartphones, applications are used every day.
From reading your email to Facebook and Twitter, when you post pictures or write
your tweet, you are using an application.

For the business, applications unlock value from the digital world. Using a great
application reshapes user experiences and creates touch points in how to get the
information you want. Applications are crucial in how businesses provide value to
their customers, which drives fundamental business objectives. Applications

Information Storage and Management (ISM) v4

Page 136 © Copyright 2019 Dell Inc.


Applications Lesson

manage the information and provide it in a form that is useful to the business to
meet specific requirements.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 137


Applications Lesson

Modern Applications

Modern applications consist of a set of business-related functional parts, called


microservices, that are assembled with specific rules and best practices.

Modern Applications

 It deliver the services


in hours and not
weeks or months that
are common in the
new world of a digital
business
 Long-term technology
commitments are
reduced
 Things are loosely
coupled making updates much easier and seamless from the end user
perspective
 Examples: Facebook, Uber, and Netflix

Microservices

 Microservices run in their own process and


communicate to other services through
REST APIs
 It is a distinctive method of developing
software systems that has grown in
popularity in recent years
 In this architecture, the application is
decomposed into small, loosely coupled,
and independently operating services

Information Storage and Management (ISM) v4

Page 138 © Copyright 2019 Dell Inc.


Applications Lesson

Traditional vs. Modern Applications

Traditional Application Modern Application Characteristics


Characteristics

Monolithic Distributed

Common programming language Multiple programming languages

Resiliency and scale are infrastructure Resiliency and scale are application
managed managed

Infrastructure is application-specific Infrastructure is application-agnostic

PC-based devices Large variety of devices (BYOD)

Separate Build/Test/Run DevOps, Continuous development and


deployment

Examples: CRM, ERP, and Email – Examples: Facebook, Uber, and Netflix
Microsoft Outlook

Notes

Traditional applications are monolithic, it means, the modules are interdependent.


Changing one affects the others. Modern applications are designed to run
independently. These independent and distributed runtime modules that make up
an application are termed microservices.

Generally traditional applications are built using a single programming language


and framework. The modern application modules are decomposed, multiple
programming languages can be used to develop these applications.

The source code for traditional application is commonly commercial off-the-shelf, or


custom developed in-house, such as Oracle Financials. The modern applications
often use open-source or support a Freemium model, where the code is available
as open-source but support and enhancements can be purchased.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 139


Applications Lesson

In traditional application environment, the infrastructure manages the resiliency


from hardware failure and scalability of the application. The modern application
handles component failure and scalability itself, by using distributed system
architectures driving high availability.

Information Storage and Management (ISM) v4

Page 140 © Copyright 2019 Dell Inc.


Applications Lesson

What is Application Virtualization?

Definition: Application Virtualization


The technique of decoupling an application from the underlying
computing platform (operating system and hardware) to enable the
application to be used on a compute system without installation.

An application is either delivered from a remote compute system, or encapsulated


in a virtualized container.

Application Virtualization benefits are


 Simplified application deployment and management
 Eliminate OS modifications
 Resolve application conflicts and compatibility issues
 Flexibility of application access

Notes

Some key benefits of application virtualization are described below.


 Simplified application management: Application virtualization provides a
solution to meet an organization’s need for simplified and improved application
deployment, delivery, and manageability.
 Eliminate OS modifications: Since application virtualization decouples an
application from the OS, it leaves the underlying OS unaltered. This process
provides additional security, and protects the OS from potential corruptions and
problems that may arise due to changes to the file system and registry.
 Resolve application conflicts and compatibility issues: Application
virtualization enables the use of conflicting applications on the same end-point
device. It also enables the use of applications that otherwise do not execute on
an end-point device due to incompatibility with the underlying computing
platform.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 141


Applications Lesson

 Simplified OS image management: Application virtualization simplifies OS


image management. Since application delivery is separated from the OS, there
is no need to include "standard" applications in end-point images. As a result,
managing images is simpler, especially in the context of OS patches and
upgrades.
 Flexibility of access: Application virtualization enables an organization’s
workforce and customers to access applications hosted on a remote compute
system from any location, and through diverse end-point devices types.

Information Storage and Management (ISM) v4

Page 142 © Copyright 2019 Dell Inc.


Applications Lesson

Application Virtualization Techniques

Listed are the three techniques for application virtualization:


 Application encapsulation
 Application is converted into a standalone, self-contained executable
package
 Application packages may run directly from local drive, USB, or optical disc
 Application presentation
 Application is hosted and executes remotely, and the application’s UI data is
transmitted to client
 Locally-installed agent on the client manages the exchange of UI information
with user’s remote application session
 Application streaming

 Application-specific data is transmitted in portions to clients for local


execution
 Requires locally-installed agent, client software, or web browser plugin

Application Encapsulation

In application encapsulation, an application is aggregated within a virtualized


container, along with the assets, such as files, virtual registry, and class libraries
that it requires for execution. This process, known as packaging or sequencing,
converts an application into a standalone, self-contained executable package that
can directly run on a compute system. The assets required for execution are
included within the virtual container. Therefore, the application does not have any
dependency on the underlying OS, and does not require a traditional installation on
the compute system.

The application’s virtual container isolates it from the underlying OS and other
applications, thereby minimizing application conflicts. During application execution,
all function calls made by the application to the OS for assets get redirected to the
assets within the virtual container. The application is thus restricted from writing to
the OS file system or registry, or modifying the OS in any other way.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 143


Applications Lesson

Application Presentation

In application presentation, an application’s user interface (UI) is separated from its


execution. The application executes on a remote compute system, while its UI is
presented to an end-point client device over a network. When a user accesses the
application, the screen pixel information and the optional sound for the application
are transmitted to the client. A software agent installed on the client receives this
information and updates the client’s display. The agent also transmits the
keystrokes and graphical input information back from the client, allowing the user to
control the application.

This process makes it appear as if the application is running on the client when, in
fact, it is running on the remote compute system. Application presentation enables
the delivery of an application on devices that have less computing power than what
is normally required to execute the application. In application presentation,
application sessions are created in the remote compute system and a user
connects to an individual session from a client by means of the software agent.
Individual sessions are isolated from each other, which secures the data of each
user and also protects the application crashes.

Application Streaming

In application streaming, an application is deployed on a remote compute system,


and is downloaded in portions to an end-point client device for local execution. A
user typically launches the application from a shortcut, which causes the client to
connect to the remote compute system to start the streaming process. Initially, only
a limited portion of the application is downloaded into memory. This portion is
sufficient to start the execution of the application on the client.

Since a limited portion of the application is delivered to the client before the
application starts, the user experiences rapid application launch. The streaming
approach also reduces network traffic. As the user accesses different application
functions, more of the application is downloaded to the client. The additional
portions of the application may also be downloaded in the background without user
intervention. Application streaming requires an agent or client software on clients.

Alternatively, the application may be streamed to a web browser by using a plug-in


installed on the client. In some cases, application streaming enables offline access
to the application by caching them locally on the client.

Information Storage and Management (ISM) v4

Page 144 © Copyright 2019 Dell Inc.


Applications Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 145


Software-Defined Data Center (SDDC) Lesson

Software-Defined Data Center (SDDC) Lesson

Introduction

This lesson covers software-defined data center and its architecture. This lesson
also covers software-defined controller and the benefits of software-defined
architecture.

This lesson covers the following topics:


 Explain software-defined data center architecture
 Explain software-defined controller
 List benefits of software-defined data center architecture

Information Storage and Management (ISM) v4

Page 146 © Copyright 2019 Dell Inc.


Software-Defined Data Center (SDDC) Lesson

Software-Defined Data Center (SDDC)

What is a Software-Defined Data Center?

Definition: Software-Defined Data Center (SDDC)


An architectural approach to IT infrastructure that extends
virtualization concepts such as abstraction, pooling, and automation
to all of the data center’s resources and services to achieve IT as a
service.

 Compute, storage, network, security, and availability services are pooled and
delivered as a service
 SDDC services are managed by intelligent, policy-driven software
 Regarded as the foundational infrastructure for the modern data centere

Notes

In an SDDC, compute, storage, networking, security, and availability services are


pooled, aggregated, and delivered as a service. SDDC services are managed by
intelligent, policy-driven software.SDDC is a vision that can be interpreted in many
ways and can be implemented by numerous concrete architectures.

Typically, an SDDC is viewed as a conglomeration of virtual infrastructure


components, among which are software-defined compute (compute virtualization),
software-defined network (SDN), and software-defined storage (SDS).

SDDC is viewed as an important step in the progress towards a complete


virtualized data center (VDC), and is regarded as the necessary foundational
infrastructure for the modern data center.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 147


Software-Defined Data Center (SDDC) Lesson

SDDC Architecture

 The software-defined approach separates the control or management functions


from the underlying components and provides it to external software
 The external software takes over the control operations and enables the
management of multi-vendor infrastructure components centrally

Applications

APIs

Software-Defined Compute (Controller) Software-Defined Storage (Controller) Software-Defined Network (Controller)

APIs

Compute Storage Network

Notes

Principally, a physical infrastructure component (compute, network, and storage)


has a control path and a data path. The control path sets and manages the policies
for the resources, and the data path performs the transmission of data. The
software-defined approach decouples the control path from the data path. By
abstracting the control path, resource management function operates at the control
layer. This layer gives the ability to partition the resource pools, and manage them
uniquely by policy.

This decoupling of the control path and data path enables the centralization of data
provisioning and management tasks through software that is external to the
infrastructure components. The software runs on a centralized compute system or
a stand-alone device, called the software-defined controller. The figure illustrates

Information Storage and Management (ISM) v4

Page 148 © Copyright 2019 Dell Inc.


Software-Defined Data Center (SDDC) Lesson

the software-defined architecture, where the management function is abstracted


from the underlying infrastructure components using controller.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 149


Software-Defined Data Center (SDDC) Lesson

Software-Defined Controller

 Discovers underlying resources and provides an aggregated view of resources


 Abstracts the underlying hardware resources and pools them
 Enables the rapid provisioning of resources based on predefined policies
 Enables to apply policies uniformly across the infrastructure components, all
from a software interface
 Provides interfaces that enable applications external to the controller to request
resources and access them as services

Notes

A software-defined controller is software with built-in intelligence that automates


provisioning and configuration based on the defined policies. It enables
organizations to dynamically, uniformly, and easily modify and manage their
infrastructure.

The controller discovers the available underlying resources and provides an


aggregated view of resources. It abstracts the underlying hardware resources
(compute, storage, and network) and pools them. This enables the rapid
provisioning of resources from the pool based on predefined policies that align to
the service level agreements for different consumers.

The controller provides a single control point to the entire infrastructure enabling
policy-based infrastructure management. The controller enables an administrator to
use a software interface to manage the resources, node connectivity, and traffic
flow; control behavior of underlying components; apply policies uniformly across
the infrastructure components; and enforce security.

The controller also provides interfaces that enable applications, external to the
controller, to request resources and access these resources as services.

Information Storage and Management (ISM) v4

Page 150 © Copyright 2019 Dell Inc.


Software-Defined Data Center (SDDC) Lesson

Benefits of Software-Defined Architecture

By extending virtualization throughout the data center, SDDC provides several


benefits to the organizations. Some key benefits are described here:

Benefit Description

Agility  On-demand self-service


 Faster resource provisioning

Cost efficiency  Use of the existing infrastructure and commodity hardware


lowers CAPEX

Improved control  Policy-based governance


 Automated Business Continuity (BC) / Disaster Recovery
(DR)
 Support for operational analytics

Centralized  Unified management platform for centralized monitoring


management and administration

Flexibility  Use of commodity and advanced hardware technologies


 Hybrid cloud support

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 151


Modern Data Center Infrastructure Lesson

Modern Data Center Infrastructure Lesson

Introduction

This lesson covers the building blocks of a data center infrastructure. It covers the
components and functions of the five layers of a data center. It also covers the
three cross-layer functions in a data center.

This lesson covers the following topics:


 List layers of a data center infrastructure
 Explain components and functions of each layer
 Explain cross-layer functions in a data center

Information Storage and Management (ISM) v4

Page 152 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

Modern Data Center Infrastructure

Modern Data Center Infrastructure

The image is a block diagram depicting the core IT infrastructure building blocks
that make up a data center.

APPLICATIONS

Internal Business Modern Cloud Extensibility


Applications Applications Applications Cloud

DATA CENTER INFRASTRUCTURE

MANAGEMENT SERVICES BUSINESS SECURITY


Self-Service Service Catalog CONTINUITY
Portal

ORCHESTRATION Orchestration
Software

SOFTWARE-DEFINED INFRASTRUCTURE
Software-Defined Compute Software-Defined Software-Defined Fault Tolerance
Storage Network Mechanisms

VIRTUAL INFRASTRUCTURE

Backup and Security


Virtual Compute Virtual Storage Virtual Network
Archive Mechanisms

PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)


Storage Operation
Management Compute Storage Network Replication Governance, Risk, and
Compliance

Notes

The IT infrastructure is arranged in five logical layers and three cross-layer


functions. The five layers are physical infrastructure, virtual infrastructure, software-
defined infrastructure, orchestration, and services. Each of these layers has various
types of hardware and/or software components as shown in the figure. The three
cross-layer functions are business continuity, security, and management. Business

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 153


Modern Data Center Infrastructure Lesson

continuity and security functions include mechanisms and processes that are
required to provide reliable and secure access to applications, information, and
services. The management function includes various processes that enable the
efficient administration of the data center and the services for meeting business
requirements. Applications that are deployed in the data center may be a
combination of internal applications, business applications, and modern
applications that are either custom-built or off-the-shelf. The fulfillment of the five
essential cloud characteristics ensures the infrastructure can be transformed into a
cloud infrastructure that could be either private or public. Further, by integrating
cloud extensibility, the infrastructure can be connected to an external cloud to
leverage the hybrid cloud model.

Information Storage and Management (ISM) v4

Page 154 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

Physical Infrastructure

Foundation layer of the data APPLICATIONS

Business Modern Cloud


Internal Applications

center infrastructure
Applications Applications Extensibilit
Cl

DATA CENTER INFRASTRUCTURE

Physical components are: MANAGEMENT SERVICE

Self-Service Portal Service Catalog


BUSINESS
CONTINUITY
SECURITY

compute systems, storage, and ORCHESTRATION


Orchestration
Software

network devices; they require SOFTWARE-DEFINED INFRASTRUCTURE

Software-Defined Software-Defined Software-Defined Fault Tolerance


Compute Storage Network Mechanisms

operating systems, system VIRTUAL INFRASTRUCTURE

Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms

software, and protocols for their PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)

Storage
Governance, Risk,

functions Operation Compute Storage Network Replication


and Compliance

Executes the requests generated


by the virtual and software-defined
layers

Notes

The physical infrastructure forms the foundation layer of a data center. It includes
equipment such as compute systems, storage systems, and networking devices.
This equipment along with the operating systems, system software, protocols, and
tools that enable the physical equipment to perform their functions. A key function
of physical infrastructure is to execute the requests generated by the virtual and
software-defined infrastructure. Additional functions are: storing data on the storage
devices, performing compute-to-compute communication, executing programs on
compute systems, and creating backup copies of data.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 155


Modern Data Center Infrastructure Lesson

Virtual Infrastructure

Virtualization abstracts physical APPLICATIONS

Business Modern Cloud


Internal Applications

resources and creates virtual


Applications Applications Extensibilit Clo

resources. DATA CENTER INFRASTRUCTURE

MANAGEMENT SERVICE BUSINESS SECURITY


CONTINUITY
Self-Service Portal Service Catalog

Virtual components: ORCHESTRATION


Orchestration
Software

SOFTWARE-DEFINED INFRASTRUCTURE


Software-Defined Software-Defined Software-Defined Fault Tolerance

Virtual compute, virtual Compute

VIRTUAL INFRASTRUCTURE
Storage Network Mechanisms

storage, and virtual network. Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms

PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)

Storage


Compute Storage Network Governance, Risk,

Created from physical resource


Operation Replication
and Compliance

pools using virtualization


software

Benefits of virtualization:

 Resource consolidation and multitenant environment


 Improved resource utilization and increased ROI
 Flexible resource provisioning and rapid elasticity

Notes

Virtualization is the process of abstracting physical resources, such as compute,


storage, and network, and creating virtual resources from them. Virtualization is
achieved by using virtualization software that is deployed on compute systems,
storage systems, and network devices.

Virtualization software aggregates physical resources into resource pools from


which it creates virtual resources. A resource pool is an aggregation of computing
resources, such as processing power, memory, storage, and network bandwidth.

For example, storage virtualization software pools the capacity of multiple storage
devices to create a single large storage capacity. Similarly, compute virtualization
software pools the processing power and memory capacity of a physical compute
system. This physical computes create an aggregation of the power of all
processors (in megahertz) and all memory (in megabytes). Examples of virtual
resources include virtual compute (virtual machines), virtual storage (LUNs), and
virtual networks.

Information Storage and Management (ISM) v4

Page 156 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

Virtualization enables a single hardware resource to support multiple concurrent


instances of systems, or multiple hardware resources to support a single instance
of system. For example, a single disk drive can be partitioned and presented as
multiple disk drives to a compute system. Similarly, multiple disk drives can be
concatenated and presented as a single disk drive to a compute system.

Note: While deploying a data center, an organization may choose not to deploy
virtualization. In such an environment, the software-defined layer is deployed
directly over the physical infrastructure. Further, it is also possible that part of the
infrastructure is virtualized and rest is not virtualized.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 157


Modern Data Center Infrastructure Lesson

Software-Defined Infrastructure

Deployed either on virtual layer or APPLICATIONS

Business Modern Cloud


Internal Applications

on physical layer
Applications Applications Extensibilit
Cloud

DATA CENTER INFRASTRUCTURE

All infrastructure components are MANAGEMENT SERVICES

Self-Service Portal Service Catalog


BUSINESS
CONTINUITY
SECURITY

virtualized and aggregated into ORCHESTRATION


Orchestration
Software

pools. SOFTWARE-DEFINED INFRASTRUCTURE

Software-Defined Software-Defined Software-Defined Fault Tolerance


Compute Storage Network Mechanisms

VIRTUAL INFRASTRUCTURE

 Underlying resources are Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms

PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)

abstracted from applications Storage


Operation Compute Storage Network Replication Governance, Risk,
and Compliance

 Enables ITaaS

Centralized, automated, and policy-driven management and delivery of


heterogeneous resources

Components:

 Software-defined compute
 Software-defined storage
 Software-defined network

Notes

The software-defined infrastructure layer is deployed either on the virtual layer or


on the physical layer. In the software-defined approach, all infrastructure
components are virtualized and aggregated into pools. This component abstracts
all underlying resources from applications.

The software-defined approach enables ITaaS, in which consumers provision all


infrastructure components as services. It centralizes and automates the
management and delivery of heterogeneous resources based on policies. The key
architectural components in the software-defined approach include software-
defined compute (equivalent to compute virtualization), software-defined storage
(SDS), and software-defined network (SDN).

Information Storage and Management (ISM) v4

Page 158 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

Orchestration

Component: orchestration APPLICATIONS

Business Modern Cloud


Internal Applications

software, which provides:


Applications Applications Extensibilit
Clou


DATA CENTER INFRASTRUCTURE

Workflows for executing MANAGEMENT SERVICE


BUSINESS SECURITY
CONTINUITY
Self-Service Portal Service Catalog

automated tasks
ORCHESTRATION
Orchestration
Software

 Interaction with various SOFTWARE-DEFINED INFRASTRUCTURE

Software-Defined Software-Defined Software-Defined Fault Tolerance


Compute Storage Network Mechanisms

components across layers and VIRTUAL INFRASTRUCTURE

Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms

functions to invoke provisioning PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)

Storage
Governance, Risk,

tasks Operation Compute Storage Network Replication


and Compliance

Notes

The orchestration layer includes the orchestration software. The key function of this
layer is to provide workflows for executing automated tasks to accomplish a wanted
outcome. Workflow refers to a series of interrelated tasks that perform a business
operation. The orchestration software enables this automated arrangement,
coordination, and management of the tasks. This function helps to group and
sequence tasks with dependencies among them into a single, automated workflow.

Associated with each service listed in the service catalog, there is an orchestration
workflow defined. When a service is selected from the service catalog, an
associated workflow in the orchestration layer is triggered. Based on this workflow,
the orchestration software interacts with the components across the software-
defined layer and the BC, security, and management functions. This orchestration
entities executes the provisioning of tasks.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 159


Modern Data Center Infrastructure Lesson

Services

Delivers IT resources as services APPLICATIONS

Business Modern Cloud


Internal Applications

to users: Applications Applications Extensibilit


Cl

DATA CENTER INFRASTRUCTURE

 Enables users to achieve MANAGEMENT SERVICE

Self-Service Portal Service Catalog


BUSINESS
CONTINUITY
SECURITY

desired business results ORCHESTRATION


Orchestration
Software

 Users have no liabilities


SOFTWARE-DEFINED INFRASTRUCTURE
Software-Defined Software-Defined Software-Defined Fault Tolerance
Compute Storage Network Mechanisms

associated with owning the VIRTUAL INFRASTRUCTURE

Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms

resources Storage
Operation
PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)

Compute Storage Network Replication


Governance, Risk,
and Compliance

Components:

 Service catalog
 Self-service portal

Functions of service layer:

 Stores service information in service catalog and presents them to the users
 Enables users to access services using a self-service portal

Notes

Similar to a cloud service, an IT service is a means of delivering IT resources to the


end users to enable them to achieve the desired business results and outcomes
without having any liabilities such as risks and costs associated with owning the
resources. Examples of services are application hosting, storage capacity, file
services, and email. The service layer is accessible to applications and end users.

This layer includes a service catalog that presents the information about all the IT
resources being offered as services. The service catalog is a database of
information about the services and includes various information about the services,
including the description of the services, the types of services, cost, supported
SLAs, and security mechanisms.

Information Storage and Management (ISM) v4

Page 160 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

The provisioning and management requests are passed on to the orchestration


layer, where the orchestration workflows—to fulfill the requests—are defined.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 161


Modern Data Center Infrastructure Lesson

Business Continuity

Enables ensuring the availability of APPLICATIONS

Business Modern Cloud


Internal Applications

services in line with SLA


Applications Applications Extensibilit
Cl

DATA CENTER INFRASTRUCTURE

Supports all the layers to provide MANAGEMENT SERVICE

Self-Service Portal Service Catalog


BUSINESS
CONTINUITY
SECURITY

uninterrupted services ORCHESTRATION


Orchestration
Software

SOFTWARE-DEFINED INFRASTRUCTURE

Software-Defined Software-Defined Software-Defined Fault Tolerance

Includes adoption of measures to Compute

VIRTUAL INFRASTRUCTURE
Storage Network Mechanisms

mitigate the impact of downtime Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms

PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)


Storage
Operation Compute Storage Network Replication Governance, Risk,
Management and Compliance

Measure Description

Proactive  Business impact analysis


 Risk assessment
 Technology solutions deployment (backup and replication)

Reactive  Disaster recovery


 Disaster restart

Notes

The business continuity (BC) cross-layer function specifies the adoption of


proactive and reactive measures that enable an organization to mitigate the impact
of downtime due to planned and unplanned outages.

The proactive measures include activities and processes such as business impact
analysis, risk assessment, and technology solutions such as backup, archiving, and
replication.

The reactive measures include activities and processes such as disaster recovery
and disaster restart to be invoked in the event of a service failure.

Information Storage and Management (ISM) v4

Page 162 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

This function supports all the layers—physical, virtual, software-defined,


orchestration, and services—to provide uninterrupted services to the consumers.

The BC cross-layer function of a cloud infrastructure enables a business to ensure


the availability of services in line with the service level agreement (SLA).

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 163


Modern Data Center Infrastructure Lesson

Security

Supports all the layers to provide APPLICATIONS

Business Modern Cloud


Internal Applications

secure services
Applications Applications Extensibilit
Cl

DATA CENTER INFRASTRUCTURE

Specifies the adoption of MANAGEMENT SERVICE

Self-Service Portal Service Catalog


BUSINESS
CONTINUITY
SECURITY

administrative mechanisms ORCHESTRATION


Orchestration
Software

SOFTWARE-DEFINED INFRASTRUCTURE


Software-Defined Software-Defined Software-Defined Fault Tolerance

Security and personnel policies Compute

VIRTUAL INFRASTRUCTURE
Storage Network Mechanisms

Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms

 Standard procedures to direct PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)

Storage
Operation Compute Storage Network Replication Governance, Risk,

safe execution of operations Management and Compliance

Specifies the adoption of technical


mechanisms

 Firewall
 Intrusion detection and prevention systems
 Anti-virus

Security mechanisms enables organization to meet governance, risk, and


compliance (GRC) requirements

Notes

The security cross-layer function supports all the infrastructure layers—physical,


virtual, software-defined, orchestration, and service—to provide secure services to
the consumers. Security specifies the adoption of administrative and technical
mechanisms that mitigate or minimize the security threats and provide a secure
data center environment.

Administrative mechanisms include security and personnel policies or standard


procedures to direct the safe execution of various operations. Technical
mechanisms are usually implemented through tools or devices deployed on the IT
infrastructure. Examples of technical mechanisms include firewall, intrusion
detection and prevention systems, and anti-virus software.

Information Storage and Management (ISM) v4

Page 164 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

Governance, risk, and compliance (GRC) specify processes that help an


organization in ensuring that their acts are ethically correct and in accordance with
their risk appetite (the risk level an organization chooses to accept), internal
policies, and external regulations.

Security mechanisms should be deployed to meet the GRC requirements. Security


and GRC are covered in Module, ‘Storage Infrastructure Security’.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 165


Modern Data Center Infrastructure Lesson

Management

Enables the following: APPLICATIONS

Business Modern Cloud


Internal Applications
Applications Applications Extensibilit
Cl

 Storage infrastructure DATA CENTER INFRASTRUCTURE

configuration and capacity MANAGEMENT SERVICE

Self-Service Portal Service Catalog


BUSINESS
CONTINUITY
SECURITY

provisioning ORCHESTRATION
Orchestration
Software


SOFTWARE-DEFINED INFRASTRUCTURE

Problem resolution Software-Defined


Compute
Software-Defined
Storage
Software-Defined
Network
Fault Tolerance
Mechanisms

VIRTUAL INFRASTRUCTURE

 Capacity and availability Virtual Compute Virtual Storage Virtual Network Backup and Archive Security Mechanisms

PHYSICAL INFRASTRUCTURE (Do-It-Yourself/CONVERGED)

management Storage
Operation
Management
Compute Storage Network Replication Governance, Risk,
and Compliance

 Compliance conformance
 Monitoring services

Notes

The management cross-layer function specifies the adoption of activities related to


data center operations management. Adoption of these activities enables an
organization to align the creation and delivery of IT services to meet their business
objectives. This course focuses on the aspect of storage infrastructure
management.

Storage operation management enables IT administrators to manage the data


center infrastructure and services. Storage operation management tasks include
handling of infrastructure configuration, resource provisioning, problem resolution,
capacity, availability, and compliance conformance.

This function supports all the layers to perform monitoring, management, and
reporting for the entities of the infrastructure.

Information Storage and Management (ISM) v4

Page 166 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

Do-It-Yourself Infrastructure

In the Do-It-Yourself
(DIY) approach,
organizations integrate
the best in class Vendor A Products
infrastructure
Router
components including Switc
hardware and software h
that is purchased from
different vendors.

This approach enables


Load Balancer
the organizations to
use the advantages of
high-quality products
and services from the Vendor B Products
respective leading
vendors and provides
specific functions with Storage Server
more options and
Rack
configurations for
Server
organizations to build
their cloud
infrastructure.

You can build the


infrastructure for cloud
in two methods using the do-it-yourself approach. The two methods are:

 Greenfield
 Brownfield

Notes

Two do-it-yourself approaches are:

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 167


Modern Data Center Infrastructure Lesson

Greenfield Method

Greenfield environments enable architects to design exactly what is required to


meet the business needs using new infrastructure that is built specifically for a
purpose. Greenfield environments can avoid some of the older and less efficient
processes, rules, methods, misconfigurations, constraints, and bottlenecks that
exist in the current environment. Greenfield environments also have the added
benefit of enabling a business to migrate infrastructure to a different technology or
vendor and to build in technologies that help avoid future lock-in. But greenfield
environments also have some downsides, such as higher cost, lack of staff
expertise, and possibly increased implementation time.

Brownfield Method

Brownfield involves upgrading or adding new cloud infrastructure elements to the


already existing infrastructure. This method allows organizations to repurpose the
existing infrastructure components, providing a cost benefit. Simultaneously the
organization may face integration issues, which can compromise the stability of the
overall system. Existing infrastructure or processes such as resource type,
available capacity, provisioning processes and managing the resources may place
extra constraints on the architect’s design. These constraints may negatively affect
performance or functionality.

Information Storage and Management (ISM) v4

Page 168 © Copyright 2019 Dell Inc.


Modern Data Center Infrastructure Lesson

Converged and Hyper-Converged Infrastructure

There are two types of converged systems; to learn more, click each tab.

Converged Infrastructure (CI)

CI brings together distinct infrastructure components into a single


package, including compute, network, storage, virtualization, and
management. They are hardware-focused systems where the compute
system access storage over a SAN.

The infrastructure components are integrated, tested, optimized, and delivered to


the customers as a single block. This solution offers single management software
capable of managing all of the components within the package.

Hyper-converged Infrastructure (HCI)

HCI offers efficiency using modular building blocks that are known as
nodes. A node consists of a server with Direct Attached Storage. They
are software-defined systems that decouple the compute, storage,
networking functions and run these functions on a common set of
physical resources. They do not have a physical Storage Area Network (SAN), or a
distinct physical storage controller like converged infrastructure.

The storage controller function runs as a software-based service on each compute


system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 169


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Dell EMC VxBlock
 Dell EMC VxRail
 Dell EMC VxRack FLEX
 Dell EMC VxRack SDDC
 Dell EMC PowerEdge Server
 Dell EMC XC Series Appliance
 Dell Wyse Thin Clients
 VMware Horizon
 VMware ESXi
 VMware Cloud Foundations

Information Storage and Management (ISM) v4

Page 170 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts in Practice

Concepts in Practice

Dell EMC VxBlock

Simplifies all aspects of IT and enables customers to modernize their infrastructure


and achieve better business outcomes faster. By seamlessly integrating enterprise-
class compute, network, storage, and virtualization technologies, it delivers most
advanced converged infrastructure. It is designed to support large-scale
consolidation, peak performance, and high availability for traditional and cloud-
based workloads. It is a converged system optimized for data reduction and copy
data management. Customers can quickly deploy, easily scale, and manage your
systems simply and effectively. Deliver on both midrange and enterprise
requirements with the all-flash design, enterprise features, and support for a broad
spectrum of general-purpose workloads.

Dell EMC VxRail

 Consists of the following software:


 VMware vSphere (ESXi, vCenter)
 VxRail Manager
 VMware vSAN
 Consists of the following hardware:
 Nodes based on industry leading PowerEdge servers
 High density general purpose nodes
 Designed, purchased, and supported as one product
 Fastest growing hyper-converged system
 Transforms VMware infrastructures by simplifying IT operations
 Accelerates transformation

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 171


Concepts in Practice Lesson

 Drives operational efficiency


 Lowers capital and operational costs

Dell EMC VxRail Appliances are the fastest growing hyper-converged systems
worldwide. They are the standard for transforming VMware infrastructures,
dramatically simplifying IT operations while lowering overall capital and operational
costs.

It is important to remember that while VxRail is composed of many industry


standard components it is treated as a single entity. You don’t need to worry about
updating VMware or the PowerEdge microcode. That is all handled by VxRail. This
makes VxRail the simplest way to stand up VMware clusters. The details can make
VxRail seem more complex than it is. VxRail gives you VMware clusters. You can
run whatever runs on a normal VxRail cluster on a VxRail.

VxRail Appliances accelerate transformation and reduces risk with automated


lifecycle management. For example, users have to perform one-click for software
and firmware updates after deployment.

Drives operational efficiency for a 30% TCO advantage versus HCI systems built
using VSAN Ready Nodes. Unifies support for all VxRail hardware and software
delivering 42% lower total cost of serviceability.Engineered, manufactured,
managed, supported, and sustained as ONE for single end-to-end lifecycle
support.Fully loaded with enterprise data services for built-in data protection, cloud
storage, and disaster recovery.

Dell EMC VxRack FLEX

A Dell EMC engineered and manufactured rack-scale hyper-converged system that


delivers an unmatched combination of performance, resiliency and flexibility to
address enterprise data center needs. VxRack FLEX creates a server-based SAN
by combining virtualization software, known as VxFlex OS, with Dell EMC
PowerEdge servers to deliver flexible, scalable performance, and capacity on
demand. Local storage resources are combined to create a virtual pool of block
storage with varying performance tiers.

The architecture enables you to scale from as few as four nodes to over a
thousand nodes. In addition, it provides enterprise-grade data protection,
multitenant capabilities, and add-on enterprise features such as QoS, thin
provisioning, and snapshots. VxRack FLEX delivers the scalability, flexibility,

Information Storage and Management (ISM) v4

Page 172 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

performance, and time-to-value required to meet the demands of the modern


enterprise data center.

Dell EMC VxRack SDDC

The ultimate infrastructure foundation for realizing a multi-cloud vision. VxRack


SDDC creates IT certainty, improves service outcomes and reduces operational
risk by leveraging known, trusted technologies and operational processes.
Optimized for predictable performance, scalability, optimal user experience and
cost savings, VxRack SDDC delivers the simplest path to hybrid cloud with an
automated elastic cloud infrastructure at rack scale. The industry’s most advanced
integrated system for VMware Cloud Foundation, VxRack SDDC is a hyper-
converged rack-scale system engineered with automation and serviceability
extensions offering integrated end to end lifecycle management and 24x7 single
vendor support.

 Easily creates a foundation for a complete VMware private cloud


 Fully integrated with VMware vSphere, vSAN, and NSX
 Includes physical and virtual network infrastructure for multi-rack scaling and
growth
 Automated management and serviceability extensions integrated with VMware
Cloud Foundation for single pane of glass management
 Full lifecycle management and support for the entire engineered system

Dell EMC PowerEdge Server

As the foundation for a complete, adaptive and scalable solution, the 13th
generation of Dell EMC PowerEdge servers delivers outstanding operational
efficiency and top performance at any scale. It increases productivity with
processing power, exceptional memory capacity, and highly scalable internal
storage. PowerEdge provide insight from data, environment virtualization, and
enable a mobile workforce. Major benefits of PowerEdge Servers are:

 Scalable Business Architecture: maximizes performance across the widest


range of applications with highly scalable architectures and flexible internal
storage.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 173


Concepts in Practice Lesson

 Intelligent Automation: Automates the entire server lifecycle from deployment


to retirement with embedded intelligence that dramatically increases
productivity.
 Integrated Security: Protects customers and business with a deep layer of
defense built into the hardware and firmware of every server.

Dell EMC XC Series Appliance

A hyper-converged appliance. It integrates with the Dell EMC PowerEdge servers,


the Nutanix software, and a choice of hypervisors to run any virtualized workload. It
is ideal for enterprise business applications, server virtualization, hybrid or private
cloud projects, and virtual desktop infrastructure (VDI). User can deploy an XC
Series cluster in 30 minutes and manage it without specialized IT resources. The
XC Series makes managing infrastructure efficient with a unified HTML5-based
management interface, enterprise-class data management capabilities, cloud
integration, and comprehensive diagnostics and analytics.

The features of Dell EMC XC Series are:


 Available in flexible combinations of CPU, memory, and SSD/HDD
 Includes thin provisioning and cloning, replication, and tiering
 Dell EMC validates, tests, and supports globally
 Able to grow one node at a time with nondisruptive, scale-out expansion

Dell Wyse Thin Clients

Dell offers a wide selection of secure, reliable, cost-effective Wyse thin clients
designed to integrate into any virtualized or web-based infrastructure, while
meeting the budget and performance requirements for any application. Wyse thin
and zero clients are built for easy integration into VDI or web-based environment
with instant, hands-free operation and performance that meets demands. Simplify
security and scalability with simple deployment and remote management in an
elegant, space-saving design. Malware-resistant and tailored for Citrix, Microsoft
and VMware.

Information Storage and Management (ISM) v4

Page 174 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

VMware Horizon

VMware Horizon is a VDI solution for delivering virtualized or hosted desktops and
applications through a single platform to the end users. These desktop and
application services—including RDS, hosted apps, packaged apps with VMware
ThinApp, and SaaS apps—can all be accessed from one unified workspace across
devices and locations. Horizon provides IT with a streamlined approach to deliver,
protect, and manage desktops and applications while containing costs and
ensuring that end users can work anytime, anywhere, on any device. Horizon
supports both Windows as well as Linux-based desktops.

VMware ESXi

VMware ESXi is a bare-metal hypervisor. ESXi has a compact architecture that is


designed for integration directly into virtualization-optimized compute system
hardware, enabling rapid installation, configuration, and deployment. ESXi
abstracts processor, memory, storage, and network resources into multiple VMs
that run unmodified operating systems and applications. The ESXi architecture
comprises underlying operating system called VMkernel, that provides a means to
run management applications and VMs. VMkernel controls all hardware resources
on the compute system and manages resources for the applications. It provides
core OS functionality, such as process management, file system, resource
scheduling, and device drivers.

VMware Cloud Foundation

VMware Cloud Foundation makes it easy to deploy and run a hybrid cloud. It
provides integrated cloud infrastructure (compute, storage, networking, and
security) and cloud management services to run enterprise applications in both
private and public environments.

Cloud Foundation provides a complete set of software-defined services for


compute, storage, networking and security, and cloud management to run
enterprise apps - traditional or containerized - in private or public environments.
Cloud Foundation simplifies the path to the hybrid cloud by delivering a single
integrated solution that is easy to operate with integrated automated life cycle
management. Cloud Foundation is built on VMware’s leading hyperconverged
architecture (vSAN) with all-flash performance and enterprise-class storage
services including deduplication, compression, and erasure coding. vSAN

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 175


Concepts in Practice Lesson

implements hyperconverged storage architecture delivers elastic storage and


drastically simplifies storage management.

Cloud Foundation delivers end to end security for all applications by delivering
microsegmentation, distributed firewalls, and VPN (NSX), VM, hypervisor, and
vMotion encryption (vSphere), and data at rest, cluster, and storage encryption
(vSAN).

Cloud Foundation delivers self-driving operations (vRealize Operations, vRealize


Log Insight) from applications to infrastructure to help organizations plan, manage,
and scale their SDDC. Users can perform application-aware monitoring and
troubleshooting along with automated and proactive workload management,
balancing, and remediation. It automatically deploys all of the building blocks of the
Software-Defined Data Center: compute, storage, networking, and cloud
management.

Information Storage and Management (ISM) v4

Page 176 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Assessment

1. Which cross-layer function enables an organization to mitigate the impact of


downtime?

A. Security

B. Service

C. Business continuity

D. Management

2. Which layer function provides workflows for executing automated tasks to


accomplish a wanted outcome?

A. Orchestration

B. Security

C. Services

D. Management

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 177


Summary

Summary

Information Storage and Management (ISM) v4

Page 178 © Copyright 2019 Dell Inc.


Intelligent Storage Systems

Introduction

This module focuses on the key components of an intelligent storage system. This
module also focuses on storage subsystems and provides details on components,
addressing, and performance parameters of a hard disk drive (HDD),solid state
drive (SSD) and hybrid storage drives. Then, this module focuses on RAID
techniques and their use to improve performance and protection. Finally, this
module focuses on the types of intelligent storage systems and their architectures.

Upon completing this module, you will be able to:


 Describe the key components of an intelligent storage system
 Describe hard disk drive, solid-state drive, and hybrid drive components
 Describe RAID techniques
 Discuss the types of intelligent storage systems

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 179


Components of Intelligent Storage Systems Lesson

Components of Intelligent Storage Systems Lesson

Introduction

This lesson covers components of intelligent storage systems. This lesson also
covers components, addressing, and performance of hard disk drives, solid state
drives and Hybrid drives.

This lesson covers the following topics:


 List the components of intelligent storage systems
 Explain hard disk drive, solid-state drive, and hybrid drive components

Information Storage and Management (ISM) v4

Page 180 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

ISMv4 Source - Intelligent Storage Systems - Components

Video: Components of Intelligent Storage System

The video is located at


https://edutube.emc.com/Player.aspx?vno=8xUkRMX6cXIsfToijT6gaw

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 181


Components of Intelligent Storage Systems Lesson

What Is an Intelligent Storage System?

Definition: Intelligent Storage System


A feature-rich storage array that provides highly optimized I/O
processing capabilities.

 Has a purpose-built operating environment that provides intelligent resource


management capability
 Provides large amount of cache
 Provides multiple I/O paths

Key Features of ISS:


 Supports combination of HDD and SS
 Service massive amount of IOPS
 Scale-out architecture
 Deduplication, compression, and encryption
 Automated storage tiering
 Virtual storage provisioning
 Supports APIs to integrate with SDDC and cloud
 Data Protection

Notes

Intelligent storage systems are feature-rich storage arrays that provide highly
optimized I/O processing capabilities. These intelligent storage systems have the
capability to meet the requirements of today’s I/O intensive modern applications.
These applications require high levels of performance, availability, security, and
scalability.

Therefore, to meet the requirements of the applications, many vendors of intelligent


storage systems now support SSDs, hybrid drives, encryption, compression,
deduplication, and scale-out architecture.

Information Storage and Management (ISM) v4

Page 182 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

The storage systems have an operating environment that intelligently and optimally
handles the management, provisioning, and utilization of storage resources. The
storage systems are configured with a large amount of memory (called cache) and
multiple I/O paths and use sophisticated algorithms to meet the requirements of
performance-sensitive applications. The storage systems also support various
technologies such as automated storage tiering and virtual storage provisioning.
These capabilities have added a new dimension to storage system performance.
Further, the intelligent storage systems support APIs to enable integration with
SDDC and cloud environments.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 183


Components of Intelligent Storage Systems Lesson

ISS Components

Information Storage and Management (ISM) v4

Page 184 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Storage

Controller(s)

Intelligent Storage System

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 185


Components of Intelligent Storage Systems Lesson

Two key components of an ISS:

Controller Storage

 Block-based  All HDDs


 File-based  All SSDs
 Object-based  Combination of both
 Unified

Notes

An intelligent storage system has two key components, controller and storage. A
controller is a compute system that runs a purpose-built operating system that is
responsible for performing several key functions for the storage system. Examples
of such functions are serving I/Os from the application servers, storage
management, RAID protection, local and remote replication, provisioning storage,
automated tiering, data compression, data encryption, and intelligent cache
management. An intelligent storage system typically has more than one controller
for redundancy. Each controller consists of one or more processors and a certain
amount of cache memory to process a large number of I/O requests. These
controllers are connected to the compute system either directly or via a storage
network. The controllers receive I/O requests from the compute systems that are
read or written from/to the storage by the controller. Depending on the type of the
data access method used for a storage system, the controller can either be
classified as block-based, file-based, object-based, or unified. An storage system
can have all hard disk drives, all solid state drives, or a combination of both.

Information Storage and Management (ISM) v4

Page 186 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Hard Disk Drive Components

A hard disk drive is a persistent storage device that stores and retrieves data using
rapidly rotating disks (platters) coated with magnetic material.

Controller
Board

HDA
Platter and
Read/Write
Power
Head Interface
Connectors

The key components of a hard disk drive (HDD) are:


 Platter
 Spindle
 Read/write head
 Actuator arm assembly
 Controller board

Notes

I/O operations in hard drives are performed by rapidly moving the arm across the
rotating flat platters that are coated with magnetic material.

Data is transferred between the disk controller and magnetic platters through the
read/write (R/W) head which is attached to the arm. Data can be recorded and
erased on magnetic platters any number of times.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 187


Components of Intelligent Storage Systems Lesson

Platter

A typical hard disk drive consists of one or more flat circular disks called platters.
The data is recorded on these platters in binary codes (0s and 1s). The set of
rotating platters is sealed in a case, called Head Disk Assembly (HDA). A platter is
a rigid, round disk coated with magnetic material on both surfaces (top and
bottom).

The data is encoded by polarizing the magnetic area or domains of the disk
surface. Data can be written to or read from both surfaces of the platter. The
number of platters and the storage capacity of each platter determine the total
capacity of the drive.

Spindle

A spindle connects all the platters and is connected to a motor. The motor of the
spindle rotates with a constant speed. The disk platter spins at a speed of several
thousands of revolutions per minute (rpm).

Read/Write head

Read/write (R/W) heads, read and write data from or to the platters. Drives have
two R/W heads per platter, one for each surface of the platter. The R/W head
changes the magnetic polarization on the surface of the platter when writing data.
While reading data, the head detects the magnetic polarization on the surface of
the platter.

During reads and writes, the R/W head senses the magnetic polarization and never
touches the surface of the platter. When the spindle rotates, a microscopic air gap
is maintained between the R/W heads and the platters, known as the head flying
height. This air gap is removed when the spindle stops rotating and the R/W head
rests on a special area on the platter near the spindle. This area is called the
landing zone

Actuator Arm Assembly

R/W heads are mounted on the actuator arm assembly, which positions the R/W
head at the location on the platter where the data needs to be written or read. The
R/W heads for all platters on a drive are attached to one actuator arm assembly
and move across the platters simultaneously.

Information Storage and Management (ISM) v4

Page 188 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Drive Controller Board

The controller is a printed circuit board, mounted at the bottom of a disk drive. It
consists of a microprocessor, internal memory, circuitry, and firmware.

The firmware controls the power supplied to the spindle motor as well as controls
the speed of the motor. It also manages the communication between the drive and
the compute system.

In addition, it controls the R/W operations by moving the actuator arm and
switching between different R/W heads, and performs the optimization of data
access.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 189


Components of Intelligent Storage Systems Lesson

Physical Disk Structure and Logical Block Addressing

Spindle Sector
Sector

Track

Cylinder

Track
Platter

In the illustration, the drive shows eight sectors per track, six heads, and four cylinders. This means
a total of 8 × 6 × 4 = 192 blocks. The block number ranges from 0 to 191. Each block has its own
unique address. Assuming that the sector holds 512 bytes, a 500 GB drive with a formatted capacity
of 465.7 GB has in excess of 976,000,000 blocks.

Notes

Data on the disk is recorded on tracks, which are concentric rings on the platter
around the spindle. The tracks are numbered, starting from zero, from the outer
edge of the platter. The number of tracks per inch (TPI) on the platter (or the track
density) measures how tightly the tracks are packed on a platter.

Each track is divided into smaller units called sectors. A sector is the smallest,
individually addressable unit of storage. The track and sector structure is written on
the platter by the drive manufacturer using a low-level formatting operation. The
number of sectors per track varies according to the drive type. Typically, a sector
holds 512 bytes of user data. Besides user data, a sector also stores other
information, such as the sector number, head number or platter number, and track
number. This information helps the controller to locate the data on the drive.

A cylinder is a set of identical tracks on both surfaces of each drive platter. The
location of R/W heads is referred to by the cylinder number, not by the track
number. Earlier drives used physical addresses consisting of cylinder, head, and
sector (CHS) number. These addresses referred to specific locations on the disk,
and the OS had to be aware of the geometry of each disk used.

Information Storage and Management (ISM) v4

Page 190 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Logical block addressing (LBA) has simplified the addressing by using a linear
address to access physical blocks of data. The disk controller translates LBA to a
CHS address; the compute system needs to know only the size of the disk drive in
terms of the number of blocks. The logical blocks are mapped to physical sectors
on a 1:1 basis.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 191


Components of Intelligent Storage Systems Lesson

HDD Performance

 A disk drive is an electromechanical device that governs the overall


performance of the storage system environment
 The various factors that affect the performance of disk drives are:
 Seek time
 Rotational latency
 Disk transfer rate
 Disk service time = Seek time + Rotational latency + Data transfer rate

Information Storage and Management (ISM) v4

Page 192 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Seek Time

Radial
Movement

 The time to position the read/write head


 The lower the seek time, the faster the I/O operation
 Seek time specifications include:
 Full stroke
 Average
 Track-to-track
 The drive manufacturer specifies seek time of a disk

Notes

The seek time (also called access time) describes the time taken to position the
R/W heads across the platter with a radial movement (moving along the radius of
the platter). In other words, it is the time taken to position and settle the arm and
the head over the correct track. Therefore, the lower the seek time, the faster the
I/O operation.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 193


Components of Intelligent Storage Systems Lesson

Disk vendors publish the following seek time specifications:

 Full Stroke: It is the time taken by the R/W head to move across the entire width
of the disk, from the innermost track to the outermost track.
 Average: It is the average time taken by the R/W head to move from one
random track to another, normally listed as the time for one-third of a full stroke.
 Track-to-Track: It is the time taken by the R/W head to move between adjacent
tracks.

Each of these specifications is measured in milliseconds (ms). The seek time of a


disk is typically specified by the drive manufacturer. The average seek time on a
modern disk is typically in the range of 3 to 15 ms. Seek time has more impact on
the I/O operation of random tracks rather than the adjacent tracks.

To minimize the seek time, data can be written to only a subset of the available
cylinders. This results in lower usable capacity than the actual capacity of the drive.
For example, a 500 GB disk drive is set up to use only the first 40 percent of the
cylinders and is effectively treated as a 200 GB drive. This is known as short-
stroking the drive.

Information Storage and Management (ISM) v4

Page 194 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Rotational Latency

 The time the platter takes to rotate and position


the data under the R/W head
 Depends on the rotation speed of the spindle
 Average rotational latency: One-half of the time
taken for a full rotation

Notes

To access data, the actuator arm moves the R/W


head over the platter to a particular track while the
platter spins to position the requested sector under
the R/W head. The time taken by the platter to rotate and position the data under
the R/W head is called rotational latency.

This latency depends on the rotation speed of the spindle and is measured in
milliseconds. The average rotational latency is one-half of the time taken for a full
rotation. Similar to the seek time, rotational latency has more impact on the
reading/writing of random sectors on the disk than on the same operations on
adjacent sectors.

Average rotational latency is approximately 5.5 ms for a 5,400-rpm drive, and


around 2 ms for a 15,000-rpm drive.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 195


Components of Intelligent Storage Systems Lesson

Data Transfer Rate

Average amount of data per unit time that the drive can deliver to the HBA :
 Internal transfer rate: Speed at which data moves from the surface of a platter
to the internal buffer of the disk

 External transfer rate: Rate at which data move through the interface to the
HBA

Head Disk
HBA Interface Buffer Assembly

Internal transfer rate


measured here

External transfer Rate Disk Drive


measured here

Notes

The data transfer rate (also called transfer rate) refers to the average amount of
data per unit time that the drive can deliver to the HBA. In a read operation, the
data first moves from disk platters to R/W heads; then it moves to the drive’s
internal buffer. Finally, data moves from the buffer through the interface to the
compute system’s HBA.

In a write operation, the data moves from the HBA to the internal buffer of the disk
drive through the drive’s interface. The data then moves from the buffer to the R/W
heads. Finally, it moves from the R/W heads to the platters. The data transfer rates
during the R/W operations are measured in terms of internal and external transfer
rates.

Internal transfer rate is the speed at which data moves from a platter’s surface to
the internal buffer (cache) of the disk. The internal transfer rate takes into account
factors such as the seek time and rotational latency. External transfer rate is the
rate at which data can move through the interface to the HBA.

Information Storage and Management (ISM) v4

Page 196 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

The external transfer rate is generally the advertised speed of the interface, such
as 133 MB/s for ATA.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 197


Components of Intelligent Storage Systems Lesson

I/O Controller Utilization vs. Response Time

 Based on fundamental laws of disk drive performance


 For performance-sensitive applications disks are commonly utilized below 70%
of their I/O serving capability

Knee of curve: disks at about


70% utilization

Response Time
(ms)
Low Queue Size

0% 70% 100%
Utilization

Notes

The utilization of a disk I/O controller has a significant impact on the I/O response
time. Consider that a disk is viewed as a black box consisting of two elements: the
queue and the disk I/O controller. Queue is the location where an I/O request waits
before it is processed by the I/O controller and disk I/O controller processes I/Os
waiting in the queue one by one.

The I/O requests arrive at the controller at the rate generated by the application.
The I/O arrival rate, the queue length, and the time taken by the I/O controller to
process each request determines the I/O response time. If the controller is busy or
heavily utilized, the queue size will be large and the response time will be high.

As the utilization reaches 100 percent, that is, as the I/O controller saturates, the
response time moves closer to infinity. In essence, the saturated component or the
bottleneck forces the serialization of I/O requests; meaning, each I/O request must
wait for the completion of the I/O requests that preceded it.

Information Storage and Management (ISM) v4

Page 198 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 199


Components of Intelligent Storage Systems Lesson

Solid State Drive Components

 Solid state drives (SSDs) are:


 Storage devices that contain non-volatile flash memory
 Internally, a solid state drive’s hardware architecture consists of the following
components:
 I/O interface
 Controller
 Mass storage
 The I/O interface enables connecting the power and data connectors to the
solid state drives.
 Solid state drives typically support standard connectors such as SATA, SAS, or
FC.

Flash Memory Flash Memory Flash Memory


RAM Cache

Flash Memory

I/O
Interfaces
Drive Controller

Non-Volatile Memory

Flash Memory Flash Memory

Mass Storage
Controller

The I/O interface enables connecting the power and data connectors to the solid
state drives.

Information Storage and Management (ISM) v4

Page 200 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Notes

Solid state drives are especially well suited for low-latency applications that require
consistent, low (less than 1 millisecond) read/write response times.

An HDD servicing small-block, highly-concurrent, and random workloads involves


considerable rotational and seek latency, which significantly reduces throughput.
Externally, solid state drives have the same physical format and connectors as
mechanical hard disk drive. This uniformity maintains the compatibility in both form
and format with mechanical hard disk drives. It also allows for easy replacement of
a mechanical drive with a solid state drive.

The controller includes a drive controller, RAM, and non-volatile memory (NVRAM).
The drive controller manages all drive functions.

The non-volatile RAM (NVRAM) is used to store the SSD’s operational software
and data. Not all SSDs have separate NVRAM. Some models store their programs
and data to the drive’s mass storage.

The RAM is used in the management of data being read and written from the SSD
as a cache, and for the SSD’s operational programs and data. SSDs include many
features such as encryption and write coalescing.

The mass storage is an array of non-volatile memory chips. They retain their
contents when powered off. These chips are commonly called Flash memory. The
number and capacity of the individual chips vary directly in relationship to the
SSD’s capacity. The larger the capacity of the SSD, the larger is the capacity and
the greater is the number of the Flash memory chips.

SSDs consume less power compared to hard disk drives. Because SSDs do not
have moving parts, they generate less heat compared to HDDs. Therefore, it
further reduces the need for cooling in storage enclosure, which further reduces the
overall system power consumption.

SSDs have multiple parallel I/O channels from its drive controller to the flash
memory storage chips. Generally, the larger the number of flash memory chips in
the drive, the larger is the number of channels.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 201


Components of Intelligent Storage Systems Lesson

SSD Addressing

Solid state memory chips have different capacities, for example a solid state
memory chip can be 32 GB or 4 GB per chip. However, all memory chips share the
same logical organization, that is pages and blocks.

Logically mapped to pages (SSD


metadata)
8 KB write to SSD Saved as two 4KB pages
LBA 0 x 3000

LBA 0 x 2000

128KB Block (32 x


4KB pages)
4KB Page

Notes

At the lowest level, a solid state drive stores bits. Eight bits make up a byte, and
while on the typical mechanical hard drive 512 bytes would make up a sector, solid
state drives do not have sectors. Solid state drives have a similar physical data
object called a page.Solid state memory chips have different capacities, for
example a solid state memory chip can be 32 GB or 4 GB per chip. However, all
memory chips share the same logical organization, that is pages and blocks.

Like a mechanical hard drive sector, the page is the smallest object that can be
read or written on a solid state drive. Unlike mechanical hard drives, pages do not
have a standard capacity. A page’s capacity depends on the architecture of the
solid state memory chip. Typical page capacities are 4 KB, 8 KB, and 16 KB.

A solid state drive block is made up of pages. A block may have 32, 64, or 128
pages. 32 is a common block size. The total capacity of a block depends on the
solid state chip’s page size. Only entire blocks may be written or erased on a solid
state memory chip.

Information Storage and Management (ISM) v4

Page 202 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Individual pages may be read or invalidated (a logical function). For a block to be


written, pages are assembled into full blocks in the solid state drive’s cache RAM
and then written to the block storage object.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 203


Components of Intelligent Storage Systems Lesson

Flash Memory Page States

A page has three possible states, erased (empty), valid, and invalid.

Write
Start Erased Valid

Erase
(Electrical) (Re)Write
or
Delete
Invalid

Notes

In order to write any data to a page, its owning block location on the flash memory
chip must be electrically erased. This function is performed by the SSD’s hardware.
Once a page has been erased, new data can be written to it.

For example: when a 4 KB of data is written to a 4 KB capacity page, the state of


that page is changed to valid, as it is holding valid data. A valid page’s data can be
read any number of times. If the drive receives a write request to the valid page,
the page is marked invalid and that write goes to another page. This is because
erasing blocks is time consuming and may increase the response time.

Once a page is marked invalid, its data can no longer be read. An invalid page
needs to be erased before it can once again be written with new data. Garbage
collection handles this process. Garage collection is the process of providing new
erased blocks.

Information Storage and Management (ISM) v4

Page 204 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

SDD Performance

 Access type
 SSD performs random reads the best
 SSDs use all internal I/O channels in parallel for multithreaded large block
I/Os
 Drive state
 New SSD or SSD with substantial unused capacity offers best performance
 Workload duration

 SSDs are ideal for most workloads

Notes

Solid state drives are semiconductor, random-access devices; these result in very
low response times compared to hard disk drives. This, combined with the multiple
parallel I/O channels on the back end, gives SSDs performance characteristics that
are better than hard drives.SSD performance is dependent on access type, drive
state, and workload duration. SSD performs random reads the best.

In carefully tuned multi-threaded, small-block random I/O workload storage


environments, SSDs can deliver much lower response times and higher throughput
than hard drives. Because they are random access devices, SSDs pay no penalty
for retrieving I/O that is stored in more than one area; as a result their response
time is in an order of magnitude faster than the response time of hard drives.

A new SSD or an SSD with substantial unused capacity has the best performance.
Drives with substantial amounts of their capacity consumed will take longer to
complete the read-modify-write cycle. SSDs are best for workloads with short
bursts of activity.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 205


Components of Intelligent Storage Systems Lesson

Solid State Hybrid Drive

Definition: Solid-State Hybrid Drive


Hybrid storage technologies combine NAND flash memory or SSDs,
with the HDD technology.

NAND
Flash
Memory

HDD

Optimized performance is ensured by placing "hot data", or data that is most


directly associated with improved performance, on the "faster" part of the storage
architecture.

Information Storage and Management (ISM) v4

Page 206 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Notes

In SSHDs the data elements that are associated with performance, such as most
frequently accessed data items, are stored in the NAND flash memory. This
method provides a significant performance improvement over traditional hard
drives.

In hybrid storage technology, the objective is to achieve a balance of improved


performance and high-capacity storage availability by combining hard drives and
SSD.

Optimized performance is ensured by placing "hot data", or data that is most


directly associated with improved performance, on the "faster" part of the storage
architecture.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 207


Components of Intelligent Storage Systems Lesson

Non-Volatile Memory Express (NVMe)

Definition: NVMe
NVMe (Non-Volatile Memory Express) is a new device interface for
Non-Volatile Memory (NVM) storage technologies using PCIe
connectivity.

 A standard developed by an open industry consortium, directed by a 13


company promoter group which includes Dell
 Core design objective is to achieve high levels of parallelism, concurrency, and
scalability and realize the performance benefits of NAND flash and emerging
Storage Class Memory (SCM)

Notes

NVM stands for non-volatile memory such as NAND flash memory. NVMe has
been designed to capitalize on the low latency and internal parallelism of solid-state
storage devices.

The previous interface protocols like SCSI were developed for use with far slower
hard disk drives where a very lengthy delay exists between a request and data
transfer, where data speeds are much slower than RAM speeds, and where disk
rotation and seek time give rise to further optimization requirements.

NVMe is a command set and associated storage interface standards that specify
efficient access to storage devices and systems based on Non-Volatile Memory
(NVM) media. NVMe is broadly applicable to NVM storage technology, including
current NAND-based flash and higher-performance, Storage Class Memory (SCM).

Information Storage and Management (ISM) v4

Page 208 © Copyright 2019 Dell Inc.


Components of Intelligent Storage Systems Lesson

Storage Class Memory (SCM)

Definition: Storage Class Memory


A solid-state memory that blurs the boundaries between storage and
memory by being low-cost, fast, and nonvolatile.

Features:

 Non-volatile
 Short access time like DRAM
 Low cost per bit like disk
 Solid-state, no moving parts

Notes

Despite the emergence of flash storage and more recently, the NVMe stack,
external storage systems are still orders of magnitude slower than server memory
technologies (RAM). They can also be a barrier to achieving the highest end-to-end
system performance.

The memory industry has been aiming towards something that has the speed of
DRAM but the capacity, cost, and persistence of NAND flash memory. The shift
from SATA to faster interfaces such as SAS and PCI-Express using the NVMe
protocol has made SSDs much faster, but nowhere near the speed of DRAM.

Now, a new frontier in storage media bridges the latency gap between server
storage and external storage: storage-class memory (SCM). This new class of
memory technology has performance characteristics that fall between DRAM and
flash characteristics. Figure highlights where SCM fits into the storage media
hierarchy.

SCM is slower than DRAM but read and write speeds are over 10 times faster than
flash and can support higher IOPS while offering comparable throughput.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 209


Components of Intelligent Storage Systems Lesson

Furthermore, data access in flash is at the block and page levels, but SCM can be
addressed at the bit or word level. This granularity eliminates the need to erase an
entire block to program it, and it also simplifies random access.

However, because the price per gigabyte is expected to be substantially higher,


SCM is unlikely to be a replacement for flash in enterprise storage. With new
storage media, price per gigabyte is a key contributor to adoption. For example, in
spite of the clear advantages of flash over HDDs, the industry hasn’t yet completely
converted from HDDs to flash.

Other persistent memory technologies are also in development, some with the
potential for broad adoption in enterprise and embedded applications, such as
nanotube RAM (NRAM) and resistive RAM (ReRAM).

Information Storage and Management (ISM) v4

Page 210 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

RAID Techniques Lesson

Introduction

This lesson covers RAID and its use to improve performance and protection. It
covers various RAID implementations, techniques, and levels commonly used.
This lesson also covers the erasure coding technique and its advantages.

This lesson covers the following topics:


 Describe RAID techniques and implementation methods
 Describe commonly used RAID levels
 Compare RAID levels based on their cost, performance,and protection

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 211


RAID Techniques Lesson

RAID Techniques

RAID Overview

Definition: RAID (Redundant Array of Independent Disks)


A technique that combines multiple disk drives into a logical unit
(RAID set) and provides protection, performance, or both.

 Provides data protection against drive failures


 Improves storage system performance by serving I/Os from multiple drives
simultaneously
 Two implementation methods

 Software RAID
 Hardware RAID

Notes

RAID is a technique in which multiple disk drives are combined into a logical unit
called a RAID set and data is written in blocks across the disks in the RAID set.
RAID protects against data loss when a drive fails, by using redundant drives and
parity. RAID also helps in improving the storage system performance as read and
write operations are served simultaneously from multiple disk drives.

RAID is typically implemented by using a specialized hardware controller present


either on the compute system or on the storage system. The key functions of a
RAID controller are: management and control of drive aggregations, translation of
I/O requests between logical and physical drives, and data regeneration in the
event of drive failures.

Information Storage and Management (ISM) v4

Page 212 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

Software RAID uses compute system-based software to provide RAID functions


and is implemented at the operating-system level. Software RAID implementations
offer cost and simplicity benefits when compared with hardware RAID.

However, they have the following limitations:

 Performance: Software RAID affects the overall system performance. This is


due to additional CPU cycles required to perform RAID calculations.
 Supported features: Software RAID does not support all RAID levels.
 Operating system compatibility: Software RAID is tied to the operating system;
hence, upgrades to software RAID or to the operating system should be
validated for compatibility. This leads to inflexibility in the data-processing
environment.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 213


RAID Techniques Lesson

RAID Array Components

A RAID array is an enclosure that contains various disk drives and supporting
hardware to implement RAID.

A subset of disks within a RAID array can be grouped to form logical associations
called logical arrays, also known as a RAID set or a RAID group.

VM VM

Logical Array
(RAID Sets)

Hypervisor RAID Controller

Compute System Hard Disks

RAID Array

Information Storage and Management (ISM) v4

Page 214 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

RAID Techniques

Three different RAID techniques form the basis for defining various RAID levels;
they are:

Striping Mirroring Parity


VM VM VM VM VM VM

Hypervisor Hypervisor Hypervisor

A A A

RAID Controller RAID Controller RAID Controller

D1 D2 D3 P

A1 A2 A3 A4 A A A1 A2 A3 Ap

Strip Rebuilding data of the failed D3 drive:


D1 + D2 + ? =
P =P – D1 – D2
D3
Stripe

Notes

Striping

Striping is a technique of spreading data across multiple drives (more than one) in
order to use the drives in parallel. All the read/write heads work simultaneously,
allowing more data to be processed in a shorter time and increasing performance,
compared to reading and writing from a single disk. Within each disk in a RAID set,
a predefined number of contiguously addressable disk blocks are defined as strip.

The set of aligned strips that spans across all the disks within the RAID set is called
a stripe. The illustration shows representations of a striped RAID set. Strip size
(also called stripe depth) describes the number of blocks in a strip (represented as
“A1, A2, A3, and A4”). The maximum amount of data that can be written to or read
from a single disk in the set, assuming that the accessed data starts at the
beginning of the strip.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 215


RAID Techniques Lesson

All strips in a stripe have the same number of blocks. Having a smaller strip size
means that the data is broken into smaller pieces while it is spread across the
disks. Stripe size (represented as A) is a multiple of strip size by the number of
data disks in the RAID set.

For example: in a four-disk striped RAID set with a strip size of 64KB, the stripe
size is 256 KB (64KB x 4). In other words, A = A1 +A2 + A3 + A4. Stripe width
refers to the number of data strips in a stripe. Striped RAID does not provide any
data protection unless parity or mirroring is used.

Mirroring

Mirroring is a technique whereby the same data is stored on two different disk
drives, yielding two copies of the data. If one disk drive failure occurs, the data
remains intact on the surviving disk drive and the controller continues to service the
compute system’s data requests from the surviving disk of a mirrored pair.

When the failed disk is replaced with a new disk, the controller copies the data from
the surviving disk of the mirrored pair. This activity is transparent to the compute
system. In addition to providing complete data redundancy, mirroring enables fast
recovery from disk failure. However, disk mirroring provides only data protection
and is not a substitute for data backup.

Mirroring constantly captures changes in the data, whereas a backup captures


point-in-time images of the data. Mirroring involves duplication of data – the amount
of storage capacity needed is twice the amount of data being stored. Therefore,
mirroring is considered expensive and is preferred for mission-critical applications
that cannot afford the risk of any data loss. Mirroring improves read performance
because read requests can be serviced by both disks.

However, write performance is slightly lower than that in a single disk because
each write request manifests as two writes on the disk drives. Mirroring does not
deliver the same levels of write performance as a striped RAID.

Parity

Parity is a method to protect striped data from disk drive failure without the cost of
mirroring. An additional disk drive is added to hold parity, a mathematical construct
that allows re-creation of the missing data. Parity is a redundancy technique that
ensures protection of data without maintaining a full set of duplicate data.

Information Storage and Management (ISM) v4

Page 216 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

Calculation of parity is a function of the RAID controller. Parity information can be


stored on separate, dedicated disk drives, or distributed across all the drives in a
RAID set. The first three disks in the figure, labeled D1 to D3, contain the data. The
fourth disk, labeled P, stores the parity information, which, in this case, is the sum
of the elements in each row. Now, if one of the data disks fails, the missing value
can be calculated by subtracting the sum of the rest of the elements from the parity
value. In the diagram, for simplicity, the computation of parity is represented as an
arithmetic sum of the data. However, parity calculation is a bitwise XOR operation.

Compared to mirroring, parity implementation considerably reduces the cost


associated with data protection. Consider an example of a parity RAID
configuration with four disks where three disks hold data, and the fourth holds the
parity information. In this example, parity requires only 33 percent extra disk space
compared to mirroring, which requires 100 percent extra disk space.

However, there are some disadvantages of using parity. Parity information is


generated from data on the data disk. Therefore, parity is recalculated every time
there is a change in data. This recalculation is time-consuming and affects the
performance of the RAID array.

For parity RAID, the stripe size calculation does not include the parity strip.

For example: in a four (3 + 1) disk parity RAID set with a strip size of 64 KB, the
stripe size will be 192 KB (64KB x 3).

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 217


RAID Techniques Lesson

RAID Levels

Commonly used RAID levels are:


 RAID 0 – Striped set with no fault tolerance
 RAID 1 – Disk mirroring
 RAID 1 + 0 – Mirroring and Striping RAID
 RAID 3 - Striped set with parallel access and dedicated parity
 RAID 5 – Striped set with independent disk access and a distributed parity
 RAID 6 – Striped set with independent disk access and dual distributed parity

The RAID level selection depends on the parameters such as application


performance, data availability requirements, and cost.

These RAID levels are defined based on striping, mirroring, and parity techniques.
Some RAID levels use a single technique, whereas others use a combination of
techniques.

The commonly used RAID levels are RAID 0, RAID 1, 3, 5, 6 and 1+0.

Information Storage and Management (ISM) v4

Page 218 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

Video: RAID

The video is located at


https://edutube.emc.com/Player.aspx?vno=41vs6WVDSGBt6uD6g2erTw

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 219


RAID Techniques Lesson

RAID 0

RAID 0 configuration uses data striping techniques, where data is striped across all
the disks within a RAID set.

C Data from compute


B system
A

RAID Controller

A1 A2 A3 A4 A5
B1 B2 B3 B4 B5
C1 C2 C3 C4 C5

Data Disks

Notes

RAID 0 utilizes the full storage capacity of a RAID set.

To read data, all the strips are gathered by the controller. When the number of
drives in the RAID set increases, the performance improves because more data
can be read or written simultaneously.

RAID 0 is a good option for applications that need high I/O throughput. However, if
these applications require high availability during drive failures, RAID 0 does not
provide data protection and availability.

Information Storage and Management (ISM) v4

Page 220 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

RAID 1

A RAID 1 set consists of two disk drives and every write is written to both disks.

Data from
compute
system Notes
C
B In RAID 1, the mirroring is transparent
A to the compute system. During disk
failure, the impact on data recovery in
RAID 1 is the least among all RAID
implementations. This is because the
RAID Controller RAID controller uses the mirror drive
for data recovery.

RAID 1 is suitable for applications that


require high availability and cost is not
a constraint.

A A
B B
C C
Mirror Set

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 221


RAID Techniques Lesson

RAID 1+0 (Mirroring and Striping)

Most data centers require data redundancy and performance from their RAID
arrays.

RAID 1+0 combines the performance benefits of RAID 0 with the redundancy
benefits of RAID 1.

C
B Data from compute
system
A

RAID
Striping
Controller

Mirroring Mirroring Mirroring

A1 A1 A2 A2 A3 A3

B1 B1 B2 B2 B3 B3

C1 C1 C2 C2 C3 C3

Mirror Set A Mirror Set B Mirror Set C

Notes

RAID 1+0 uses mirroring and striping techniques and combines their benefits. This
RAID type requires an even number of disks, the minimum being four.

RAID 1+0 is also known as RAID 10 (Ten) or RAID 1/0. RAID 1+0 is also called
striped mirror. The basic element of RAID 1+0 is a mirrored pair. This means that
data is first mirrored and then both copies of the data are striped across multiple
disk drive pairs in a RAID set.

Information Storage and Management (ISM) v4

Page 222 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

When replacing a failed drive, only the mirror is rebuilt. In other words, the storage
system controller uses the surviving drive in the mirrored pair for data recovery and
continuous operation. Data from the surviving disk is copied to the replacement
disk.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 223


RAID Techniques Lesson

RAID 3

Parity information is stored on a dedicated drive so that the data can be


reconstructed if a drive fails in a RAID set. For example, in a set of five disks, four
are used for data and one for parity.

Note: RAID 3 is not typically used in practice.

C
B Data from compute
system
A

RAID Controller

A1 A2 A3 A4 Ap

B1 B2 B3 B4 Bp

C1 C2 C3 C4 Cp

Dedicated Parity Disk


Data Disks

Notes

In RAID 3, parity information is stored on a dedicated drive so that the data can be
reconstructed if a drive fails in a RAID set. For example, in a set of five disks, four
are used for data and one for parity.

Therefore, the total disk space that is required is 1.25 times the size of the data
disks. RAID 3 always reads and writes complete stripes of data across all disks

Information Storage and Management (ISM) v4

Page 224 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

because the drives operate in parallel. There are no partial writes that update one
out of many strips in a stripe.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 225


RAID Techniques Lesson

RAID 5

RAID 5 is a versatile RAID implementation. It is similar to RAID 4 because it uses


striping. The drives (strips) are also independently accessible.

C Data from compute


B system
A

RAID Controller

A1 A2 A3 A4 Ap

B1 B2 B3 Bp B4

C1 C2 Cp C3 C4

Distributed Parity

Notes

The difference between RAID 4 and RAID 5 is the parity location. In RAID 4, parity
is written to a dedicated drive, creating a write bottleneck for the parity disk.

In RAID 5, parity is distributed across all disks to overcome the write bottleneck of a
dedicated parity disk.

Information Storage and Management (ISM) v4

Page 226 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

RAID 6

RAID 6 works the same way as RAID 5, except that RAID 6 includes a second
parity element to enable survival if two disk failures occur in a RAID set. Therefore,
a RAID 6 implementation requires at least four disks.

C Data from compute


B system
A

RAID Controller

A1 A2 A3 Ap Aq

B1 B2 Bp Bq B3

C1 Cp Cq C2 C3

Dual Distributed Parity

Notes

RAID 6 distributes the parity across all the disks. The write penalty (explained later
in this module) in RAID 6 is more than that in RAID 5; therefore, RAID 5 writes
perform better than RAID 6.

The rebuild operation in RAID 6 may take longer than that in RAID 5 due to the
presence of two parity sets.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 227


RAID Techniques Lesson

RAID Impacts on Performance

 In RAID 5, every write (update) to a disk manifests as four I/O operations (2


reads and 2 writes)
 In RAID 6, every write (update) to a disk manifests as six I/O operations (3
reads and 3 writes)
 In RAID 1, every write manifests as two I/O operations (2 writes)

The figure illustrates a single write operation on RAID 5 that contains a group of
five disks.

Cp new Cp old C4 old C4 new

RAID Controller 2
3
4
1

A1 A2 A3 A4 Ap

B1 B2 B3 Bp B4

C1 C2 Cp C3 C4

Notes

When choosing a RAID type, it is imperative to consider its impact on disk


performance and application IOPS. In both mirrored and parity RAID
configurations, every write operation translates into more I/O overhead for the
disks, which is referred to as a write penalty.

In a RAID 1 implementation, every write operation must be performed on two disks


configured as a mirrored pair, whereas in a RAID 5 implementation, a write
operation may manifest as four I/O operations. When performaing I/Os to a disk
configured with RAID 5, the controller has to read, recalculate, and write a parity
segment for every data write operation.

Information Storage and Management (ISM) v4

Page 228 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

The figure illustrates a single write operation on RAID 5 that contains a group of
five disks. The parity (P) at the controller is calculated as follows:

Cp = C1 + C2 + C3 + C4 (XOR operations)

Whenever the controller performs a write I/O, parity must be computed by reading
the old parity (Cp old) and the old data (C4 old) from the disk, which means two
read I/Os. Then, the new parity (Cp new) is computed as follows:

Cp new = Cp old – C4 old + C4 new (XOR operations)

After computing the new parity, the controller completes the write I/O by writing the
new data and the new parity onto the disks, amounting to two write I/Os. Therefore,
the controller performs two disk reads and two disk writes for every write operation,
and the write penalty is 4.

In RAID 6, which maintains dual parity, a disk write requires three read operations:
two parity and one data. After calculating both the new parities, the controller
performs three write operations: two parity and an I/O. Therefore, in a RAID 6
implementation, the controller performs six I/O operations for each write I/O, and
the write penalty is 6.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 229


RAID Techniques Lesson

RAID Comparison

RAID Minimum Available Write Protection


Level Number of Storage Penalty
Disks Capacity (%)

1 2 50 2 Mirror

1+0 4 50 2 Mirror

3 3 [(n-1)/n] * 100 4 Parity (Supports


single disk failure)

5 3 [(n-1)/n] * 100 4 Parity (Supports


single disk failure)

6 4 [(n-2)/n] * 100 6 Parity (Supports two


disk failures)

Information Storage and Management (ISM) v4

Page 230 © Copyright 2019 Dell Inc.


RAID Techniques Lesson

Dynamic Disk Sparing (Hot Sparing)

A hot sparing refers to a process that temporarily replaces a failed disk drive with a
spare drive in a RAID array by taking the identity of the failed disk drive.

With the hot spare, one of the following methods of data recovery is performed
depending on the RAID implementation:
 If parity RAID is used, the data is rebuilt onto the hot spare from the parity and
the data on the surviving disk drives in the RAID set.
 If mirroring is used, the data from the surviving mirror is used to copy the data
onto the hot spare.

Failed Disk

RAID
Controller Replaced
Failed Disk

Hot Spare

Notes

When a new disk drive is added to the system, data from the hot spare is copied to
it. The hot spare returns to its idle state, ready to replace the next failed drive.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 231


RAID Techniques Lesson

Alternatively, the hot spare replaces the failed disk drive permanently. This means
that it is no longer a hot spare, and a new hot spare must be configured on the
storage system.

A hot spare should be large enough to accommodate data from a failed drive.
Some systems implement multiple hot spares to improve data availability.A hot
spare can be configured as automatic or user initiated, which specifies how it will
be used in the event of disk failure.

In an automatic configuration, when the recoverable error rates for a disk exceed a
predetermined threshold, the disk subsystem tries to copy data from the failing disk
to the hot spare automatically. If this task is completed before the damaged disk
fails, the subsystem switches to the hot spare and marks the failing disk as
unusable.

Otherwise, it uses parity or the mirrored disk to recover the data. In the case of a
user-initiated configuration, the administrator has control of the rebuild process. For
example, the rebuild could occur overnight to prevent any degradation of system
performance. However, the system is at risk of data loss if another disk failure
occurs.

Information Storage and Management (ISM) v4

Page 232 © Copyright 2019 Dell Inc.


Types of Intelligent Storage Systems Lesson

Types of Intelligent Storage Systems Lesson

Introduction

This lesson covers different types of data access methods. It also covers types of
intelligent storage systems. Finally, this lesson covers the scale-up and scale-out
architectures.

This lesson covers the following topics:


 Explain data access methods
 Describe types of intelligent storage systems
 Compare scale-up and scale-out architectures

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 233


Types of Intelligent Storage Systems Lesson

Types of Intelligent Storage Systems

Video: Types of Intelligent Storage Systems

The video is located at


https://edutube.emc.com/Player.aspx?vno=QCuidmacmU3QZVzB5fsBvQ

Information Storage and Management (ISM) v4

Page 234 © Copyright 2019 Dell Inc.


Types of Intelligent Storage Systems Lesson

Types of Intelligent Storage Systems

Based on the type of data access, a storage system can be classified as :


 Block-based
 File-based
 Object-based
 Unified

A unified storage system provides block-based, file-based, and object-based data


access in a single system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 235


Types of Intelligent Storage Systems Lesson

Scale-up Vs. Scale-out Architecture

An intelligent storage system may be built either based on scale-up or scale-out


architecture:
 Scale-up storage architecture provides the capability to scale the capacity and
performance of a single storage system based on requirements
 Scale-out storage architecture provides the capability to maximize its capacity
by simply adding nodes to the cluster

Scale-out

Node 1 Node 2 Node 3

Storage Storage Storage

Controller(s) Controller(s) Controller(s) Controller(s)


Scale-up

Cluster

Notes

Scaling up a storage system involves upgrading or adding controllers and storage.


These systems have a fixed capacity ceiling, which limits their scalability and the
performance also starts degrading when reaching the capacity limit.

In scale-out, nodes can be added quickly to the cluster, when more performance
and capacity is needed, without causing any downtime. This provides the flexibility
to use many nodes of moderate performance and availability characteristics to
produce a total system that has better aggregate performance and availability.
Scale-out architecture pools the resources in the cluster and distributes the
workload across all the nodes. This results in linear performance improvements as
more nodes are added to the cluster.

Information Storage and Management (ISM) v4

Page 236 © Copyright 2019 Dell Inc.


Types of Intelligent Storage Systems Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 237


Types of Intelligent Storage Systems Lesson

Assessment

1. Which one of the following is characteristic of RAID 5?

A. All parity in a single disk

B. Distributed parity

C. No parity

D. Double parity

2. What is the stripe size of a five disk parity RAID 5 set that has a strip size of 64
KB?

A. 256 KB

B. 64 KB

C. 128 KB

D. 320 KB

Information Storage and Management (ISM) v4

Page 238 © Copyright 2019 Dell Inc.


Summary

Summary

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 239


Block-Based Storage System

Introduction

This module focuses on the key components of a block-based storage system. It


details the function of each component, including cache management and
protection techniques. This module also focuses on the two storage provisioning
methods. Finally, this module focuses on the storage tiering mechanisms.

Upon completing this module, you will be able to:


 Describe the components of block-based storage system
 Describe traditional and virtual storage provisioning
 Describe storage tiering mechanisms

Information Storage and Management (ISM) v4

Page 240 © Copyright 2019 Dell Inc.


Components of a Block-Based Storage System Lesson

Components of a Block-Based Storage System Lesson

Introduction

This lesson covers block-based storage system components, intelligent cache


algorithms, and cache protection mechanisms.

This lesson covers the following topics:


 Explain block-based storage system components
 Describe intelligent cache algorithms
 List cache protection mechanisms

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 241


Components of a Block-Based Storage System Lesson

Components of a Block-Based Storage System

Video: Components of a Block-Based Storage System

The video is located at


https://edutube.emc.com/Player.aspx?vno=Tqdv5ScTt8OHZXsSamWsoA

Information Storage and Management (ISM) v4

Page 242 © Copyright 2019 Dell Inc.


Components of a Block-Based Storage System Lesson

What is a Block-Based Storage System?

A block-based storage system provides compute systems with block-level access


to the storage volumes. In this environment, the:
 File system is created on the compute systems and data is accessed on a
network at the block level
 Block-based storage systems can either be based on scale-up or scale-out
architecture
 Block-based storage system consists of one or more controllers and storage

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 243


Components of a Block-Based Storage System Lesson

Components of a Controller

 A controller of a block-based storage system consists of three key components:


 Front end, cache, and back end
 An I/O request that is received from the compute system at the front-end port is
processed through cache and the back end
 Enables retrieval of data from the storage
 A read request can be serviced directly from cache if the requested data is
found in the cache
 In modern intelligent storage systems, front end, cache, and back end are
typically integrated on a single board

 Referred to as a storage processor or storage controller

Controller

Compute
Front End Back End Storage
VM VM

Cache
Connectivity
Hypervisor

Storage Network

Information Storage and Management (ISM) v4

Page 244 © Copyright 2019 Dell Inc.


Components of a Block-Based Storage System Lesson

Component: Front End

 The front end provides the interface between the storage system and the
compute system. It consists of two components:

 Front-end ports
 Front-end controllers
Controller

Compute
Front End Back End Storage
VM VM

Cache
Connectivity

Hypervisor
Storage Network

Ports Front-end Controllers

Notes

Typically, a front end has redundant controllers for high availability. Plus, each
controller contains multiple ports that enable large numbers of compute systems to
connect to the intelligent storage system.

Each front-end controller has processing logic that executes the appropriate
transport protocol, such as Fibre Channel, iSCSI, FICON, or FCoE for storage
connections. Front-end controllers route data to and from cache through the
internal data bus. When the cache receives the write data, the controller sends an
acknowledgment message back to the compute system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 245


Components of a Block-Based Storage System Lesson

Component: Cache

Cache is semiconductor memory where data is placed temporarily to reduce the


time that is required to service I/O requests from the compute system

Controller

Compute
Front End Back End Storage
VM VM

Cache
Connectivity
Hypervisor
Storage
Network

Notes

Cache improves storage system performance by isolating compute systems from


the storage (HDDs and SSDs). In this case, cache improves storage system
performance by isolating compute systems from the mechanical delays that are
associated with rotating disks or HDDs.

Rotating disks are the slowest component of an intelligent storage system. Data
access on rotating disks usually takes several milliseconds because of seek time
and rotational latency. Accessing data from cache is fast and typically takes less
than a millisecond. On intelligent storage systems, write data is first placed in
cache and then written to the storage.

Information Storage and Management (ISM) v4

Page 246 © Copyright 2019 Dell Inc.


Components of a Block-Based Storage System Lesson

Read Operation with Cache

When a compute system issues a read request, the storage controller reads the tag
RAM to determine whether the required data is available in cache.
 If the requested data is found in the cache, it is called a read cache hit or read
hit
 If the requested data is not found in cache, it is called a cache miss

Data found in cache = Read Hit

VM VM
Data found in cache
1. Read request

Hypervisor

Compute 2. Data sent to compute system

Storage

Data not found in cache = Read Miss

VM VM
Data not found in cache
1. Read request 2. Read request

Hypervisor

4. Data sent to 3. Data Copied to


Compute compute system cache

Storage

Notes

When a compute system issues a read request, the storage controller reads the tag
RAM to determine whether the required data is available in cache. If the requested
data is found in the cache, it is called a read cache hit or read hit and data is sent
directly to the compute system, without any back-end storage operation. This
provides a fast response time to the compute system (about a millisecond).

If the requested data is not found in cache, it is called a cache miss and the data
must be read from the storage. The back end accesses the appropriate storage
device and retrieves the requested data. Data is then placed in cache and finally

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 247


Components of a Block-Based Storage System Lesson

sent to the compute system through the front end. Cache misses increase the I/O
response time.

Read performance is measured in terms of the read hit ratio, or the hit rate,
expressed as a percentage. This ratio is the number of read hits with respect to the
total number of read requests. A higher read hit ratio improves the read
performance.

Information Storage and Management (ISM) v4

Page 248 © Copyright 2019 Dell Inc.


Components of a Block-Based Storage System Lesson

Write Operation with Cache

 Write operations with cache provide performance advantages over writing


directly to storage.
 A write operation with cache is implemented in the following ways:

Write-through Cache

VM VM
Cache
1. Data Write 2. Data Write

Hypervisor

4. Acknowledgement 3. Acknowledgement
Compute

Storage

Write-back Cache

VM VM
Cache
1. Data Write 3. Data Write

Hypervisor

2. Acknowledgement 4. Acknowledgement
Compute

Storage

Notes

When an I/O is written to cache and acknowledged, it is completed in less time


(from the compute system’s perspective) than it would take to write directly to
storage. Sequential writes also offer opportunities for optimization because many
smaller writes can be coalesced for larger transfers to storage with the use of
cache.

A write operation with cache is implemented in the following ways:

Write-through cache

Data is placed in the cache and immediately written to the storage, and an
acknowledgment is sent to the compute system. Because data is committed to

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 249


Components of a Block-Based Storage System Lesson

storage as it arrives, the risks of data loss are low, but the write-response time is
longer because of the storage operations.

Write-back cache

Data is placed in cache and an acknowledgment is sent to the compute system


immediately. Later, data from several writes are committed (de-staged) to the
storage. Write response times are much faster because the write operations are
isolated from the storage devices. However, uncommitted data is at risk of loss if
cache failures occur.

For bypass, if the size of an I/O request exceeds the predefined size, called write
aside size, writes are sent directly to storage. This reduces the impact of large
writes consuming a large cache space. This is particularly useful in an environment
where cache resources are constrained and cache is required for small random
I/Os.

With dedicated cache, separate sets of memory locations are reserved for reads
and writes. In global cache, both reads and writes can use any of the available
memory addresses. Cache management is more efficient in a global cache
implementation because only one global set of addresses has to be managed.

Global cache enables users to specify the percentages of cache available for reads
and writes for cache management. Typically, the read cache is small, but it should
be increased if the application being used is read-intensive. In other global cache
implementations, the ratio of cache available for reads versus writes is dynamically
adjusted based on the workloads.

Information Storage and Management (ISM) v4

Page 250 © Copyright 2019 Dell Inc.


Components of a Block-Based Storage System Lesson

Cache Management: Algorithms

Pre-fetch
 Used when read requests are sequential
 Contiguous set of associated blocks is retrieved
 Significantly improves the response time experienced by the compute system

Least recently used (LRU)


 Discards data that has not been accessed for a long time

New Data

Cache LRU Data

Notes

Cache is an expensive resource that needs proper management to improve


performance and to proactively maintain a set of free pages.

Even though modern intelligent storage systems come with a large amount of
cache, when all cache pages are filled, some pages have to be freed up to
accommodate new data and avoid performance degradation.

Various cache management algorithms are implemented in intelligent storage


systems to proactively maintain a set of free pages. A list of pages that can be
potentially freed up whenever required may also be maintained.

Least Recently Used (LRU): An algorithm that continuously monitors data access
in cache and identifies the cache pages that have not been accessed for a long
time. LRU either frees up these pages or marks them for reuse. This algorithm is
based on the assumption that data that has not been accessed for a while will not

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 251


Components of a Block-Based Storage System Lesson

be requested by the compute system. However, if a page contains write data that
has not yet been committed to storage, the data is first written to the storage before
the page is reused.

Prefetch: A prefetch or read-ahead algorithm is used when read requests are


sequential. In a sequential read request, a contiguous set of associated blocks is
retrieved. Several other blocks that have not yet been requested by the compute
system can be read from the storage and placed into cache in advance. When the
compute system subsequently requests these blocks, the read operations will be
read hits. This process significantly improves the response time experienced by the
compute system.

Information Storage and Management (ISM) v4

Page 252 © Copyright 2019 Dell Inc.


Components of a Block-Based Storage System Lesson

Cache Data Protection

Cache is volatile memory; so a power failure or any kind of cache failure will cause
loss of the data that is not yet committed to the storage drive. This risk of losing
uncommitted data that is held in cache can be mitigated using cache mirroring and
cache vaulting:

Cache mirroring

Each write to cache is held in two different memory locations on two independent
memory cards. If a cache failure occurs, the write data will still be safe in the
mirrored location and can be committed to the storage drive. Reads are staged
from the storage drive to the cache; therefore, if a cache failure occurs, the data
can still be accessed from the storage drives. Because only writes are mirrored,
this method results in better utilization of the available cache.

In cache mirroring approaches, the problem of maintaining cache coherency is


introduced. Cache coherency means that data in two different cache locations must
be identical at all times. It is the responsibility of the storage system's operating
environment to ensure coherency.

Cache vaulting

The risk of data loss due to power failure can be addressed in various ways:
powering the memory with a battery until the AC power is restored or using battery
power to write the cache content to the storage drives. If an extended power failure
occurs, using batteries is not a viable option. This is because in intelligent storage
systems, large amounts of data might need to be committed to numerous storage
drives, and batteries might not provide power for sufficient time to write each piece
of data to its intended storage drive.

Therefore, storage vendors use a set of physical storage drives to dump the
contents of cache during power failure. This is called cache vaulting and the
storage drives are called vault drives. When power is restored, data from these
storage drives is written back to write cache and then written to the intended drives.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 253


Components of a Block-Based Storage System Lesson

Component: Back End

 Back end provides an interface between cache and the physical storage drives;
it consists of two components:
 Back-end ports
 Back-end controllers
 Back-end controls data transfers between cache and the physical drives

 From cache, data is sent to the back end and then routed to the destination
storage drives

Controller

Compute
Front End Back End Storage
VM VM

Cache
Connectivity
Hypervisor

Storage Network

Back-end Controllers Ports

Notes

Physical drives are connected to ports on the back end. The back-end controller
communicates with the storage drives when performing reads and writes and also
provides additional, but limited, temporary data storage. The algorithms that are
implemented on back-end controllers provide error detection and correction, along
with RAID functionality.

For high data protection and high availability, storage systems are configured with
dual controllers with multiple ports. Such configurations provide an alternative path
to physical storage drives if a controller or port failure occurs. This reliability is
further enhanced if the storage drives are also dual-ported. In that case, each drive
port can connect to a separate controller. Multiple controllers also facilitate load
balancing.

Information Storage and Management (ISM) v4

Page 254 © Copyright 2019 Dell Inc.


Components of a Block-Based Storage System Lesson

Storage

Physical storage drives are connected to the back-end storage controller and
provide persistent data storage.

Controller
Compute
Front End Back End Storage
VM VM

Cache
Connectivity
Hypervisor
Storage
Network

Notes

Modern intelligent storage systems provide support to a variety of storage drives


with different speeds and types, such as FC, SATA, SAS, and solid state drives.
They also support the use of a mix of SSD, FC, or SATA within the same storage
system.

Workloads that have predictable access patterns typically work well with a
combination of HDDs and SSDs. If the workload changes, or constant high
performance is required for all the storage being presented, using a SSD can meet
the desirable performance requirements.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 255


Storage Provisioning Lesson

Storage Provisioning Lesson

Introduction

This lesson covers traditional and virtual provisioning processes. This lesson also
covers LUN expansion and LUN masking mechanisms.

This lesson covers the following topics:


 Explain traditional and virtual provisioning
 Describe LUN expansion
 List the importance of LUN masking

Information Storage and Management (ISM) v4

Page 256 © Copyright 2019 Dell Inc.


Storage Provisioning Lesson

Storage Provisioning

Video: Storage Provisioning

The video is located at


https://edutube.emc.com/Player.aspx?vno=UwaDbHyIxAL3UAgyyKt0yg

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 257


Storage Provisioning Lesson

Overview of Storage Provisioning

Definition: Storage Provisioning


The process of assigning storage resources to compute system
based on capacity, availability, and performance requirements.

 Storage provisioning can be performed in two ways:


 Traditional
 Virtual
 Virtual provisioning leverages virtualization technology for provisioning storage
for applications

Information Storage and Management (ISM) v4

Page 258 © Copyright 2019 Dell Inc.


Storage Provisioning Lesson

Logical Unit Number (LUN)

Definition: LUN
Each logical unit created from the RAID set is assigned a unique ID,
called a LUN. A LUN is also referred to as a volume, partition, or
device.

 LUNs hide the organization and composition of the RAID set from the compute
systems
 LUNs created by traditional storage provisioning methods are also referred to
as thick
 Once allocated, a LUN appears to a host as an internal physical disk

Notes

RAID sets usually have a large capacity because they combine the total capacity of
individual drives in the set. Logical units are created from the RAID sets by
partitioning (seen as slices of the RAID set) the available capacity into smaller
units. These units are then assigned to the compute system based on their storage
requirements. Logical units are spread across all the physical drives that belong to
that set.

Each logical unit created from the RAID set is assigned a unique ID, called a logical
unit number (LUN). LUNs hide the organization and composition of the RAID set
from the compute systems. LUNs created by traditional storage provisioning
methods are also referred to as thick LUNs to distinguish them from the LUNs
created by virtual provisioning methods.

When a LUN is configured and assigned to a non-virtualized compute system, a


bus scan is required to identify the LUN. This LUN appears as a raw storage drive
to the operating system. To make this drive usable, it is formatted with a file system
and then the file system is mounted. In a virtualized compute system environment,
the LUN is assigned to the hypervisor, which recognizes it as a raw storage drive.
This drive is configured with the hypervisor file system, and then virtual storage
drives are created on it.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 259


Storage Provisioning Lesson

Virtual storage drives are files on the hypervisor file system. The virtual storage
drives are then assigned to virtual machines and appear as raw storage drive to
them. To make the virtual storage drive usable to the virtual machine, similar steps
are followed as in a non-virtualized environment. Here, the LUN space may be
shared and accessed simultaneously by multiple virtual machines.

Virtual machines can also access a LUN directly on the storage system. In this
method the entire LUN is allocated to a single virtual machine. Storing data in this
way is recommended when the applications running on the virtual machine are
response-time sensitive, and sharing storage with other virtual machines may
impact their response time. The direct access method is also used when a virtual
machine is clustered with a physical machine. In this case, the virtual machine is
required to access the LUN that is being accessed by the physical machine.

Information Storage and Management (ISM) v4

Page 260 © Copyright 2019 Dell Inc.


Storage Provisioning Lesson

Traditional Provisioning

In traditional storage provisioning, physical storage drives are logically grouped


together on which a required RAID level is applied to form a set, called RAID set.

The illustration shows a RAID set consisting of five storage drives that have been
sliced or partitioned into two LUNs: LUN 0 and LUN 1.These LUNs are then
assigned to Compute 1 and Compute 2 for their storage requirements.

Controller
LUN 0
Storage
Front End Back End
(RAID Set)
Compute 1
Cache
LUN 0

Storage
Network

VM VM

LUN 1

Compute 2
LUN 1

Notes

For traditional provisioning, the number of drives in the RAID set and the RAID
level determine the availability, capacity, and performance of the RAID set. It is
highly recommended to create the RAID set from drives of the same type, speed,
and capacity to ensure maximum usable capacity, reliability, and consistency in
performance.

For example, if drives of different capacities are mixed in a RAID set, the capacity
of the smallest drive is used from each drive in the set to make up the RAID set’s
overall capacity. The remaining capacity of the larger drives remains unused.
Likewise, mixing higher speed drives with lower speed drives lowers the overall
performance of the RAID set.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 261


Storage Provisioning Lesson

Virtual Provisioning

 Virtual provisioning enables creating and presenting a LUN with more capacity
than is physically allocated to it on the storage system
 The LUN created using virtual provisioning is called a thin LUN to distinguish it
from the traditional LUN
 Thin LUNs do not require physical storage to be completely allocated to them at
the time they are created and presented to a compute system

10 TB

Compute System
Thin
LUN 0
Reported Capacity Controller

3 TB
Allocated
Storage
Front End Back End (Storage Pool)
Compute 1

Cache Thin
LUN 0

Storage
System

VM VM
Thin
LUN 1

Hypervisor

10 TB
Compute 2
Thin
Compute System Reported
LUN 1
Capacity
4 TB
Allocated

Notes

Physical storage is allocated to the compute system “on-demand” from a shared


pool of physical capacity. A shared pool consists of physical storage drives. A
shared pool in virtual provisioning is analogous to a RAID set, which is a collection
of drives on which LUNs are created. Similar to a RAID set, a shared pool supports
a single RAID protection level. However, unlike a RAID set, a shared pool might
contain large numbers of drives. Shared pools can be homogeneous (containing a
single drive type) or heterogeneous (containing mixed drive types, such as SSD,
FC, SAS, and SATA drives).

Virtual provisioning enables more efficient allocation of storage to compute


systems. Virtual provisioning also enables oversubscription, where more capacity is
presented to the compute systems than is actually available on the storage system.
Both the shared pool and the thin LUN can be expanded non-disruptively as the

Information Storage and Management (ISM) v4

Page 262 © Copyright 2019 Dell Inc.


Storage Provisioning Lesson

storage requirements of the compute systems grow. Multiple shared pools can be
created within a storage system, and a shared pool may be shared by multiple thin
LUNs.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 263


Storage Provisioning Lesson

Expand Thin LUNs and Storage Pool

 A storage pool comprises physical drives that provide the physical storage that
is used by Thin LUNs
 A storage pool is created by specifying a set of drives and a RAID type for
that pool
 Thin LUNs are then created out of that pool (similar to traditional LUN created
on a RAID set)
 All the Thin LUNs created from a pool share the storage resources of that
pool
 Adding drives to a storage pool increases the available shared capacity for
all the Thin LUNs in the pool
 Drives can be added to a storage pool while the pool is used in production

 The allocated capacity is reclaimed by the pool when Thin LUNs are
destroyed

User
capacity
after
In-use
expansion
capacity

Thin LUN

Storage Pool Storage Pool


Thin LUN
expansion

User
capacity
Adding storage drives to In-use before
Thin pool rebalancing capacity expansion
the storage pool

Thin LUN
Storage Pool Expansion
Thin LUN Expansion

When a storage pool is expanded, the sudden introduction of new empty drives
combined with relative full drives cause a data imbalance. This imbalance is
resolved by automating a one-time data relocation, referred to as rebalancing.
Storage pool rebalancing is a technique that provides the ability to automatically
relocate extents (minimum amount of physical storage capacity that is allocated to
the thin LUN from the pool) on physical storage drives over the entire pool when
new drives are added to the pool.

Information Storage and Management (ISM) v4

Page 264 © Copyright 2019 Dell Inc.


Storage Provisioning Lesson

Storage pool rebalancing restripes data across all the drives( both existing and new
drives) in the storage pool. This enables spreading out the data equally on all the
physical drives within the storage pool, ensuring that the used capacity of each
drive is uniform across the pool. After the storage pool capacity is increased, the
capacity of the existing LUNs can be expanded.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 265


Storage Provisioning Lesson

Traditional Provisioning vs. Virtual Provisioning

 Administrators typically allocate storage capacity based on anticipated storage


requirements.
 This generally results in the over provisioning of storage capacity, which
then leads to higher costs and lower capacity utilization.
 Administrators often over-provision storage to an application for various reasons
such as to:
 Avoid frequent provisioning of storage if the LUN capacity is exhausted
 Reduce disruption to application availability
 Virtual provisioning:

 Addresses these challenges


 Improves storage capacity utilization and simplifies storage management
The illustration compares virtual provisioning with traditional storage provisioning.

150 GB Available
Capacity

1650 GB or 1.65 TB
800 GB Available Capacity

1500 GB or 1.5 TB 550 GB


Allocated Unused
500 GB
Capacity
600 GB
Allocated
Unused
400 GB 500 GB Allocated Capacity
Allocated Unused Capacity
Unused
Capacity

350 GB Actual 200 GB Allocated


350 GB Actual Data
100 GB Allocated 200 GB Allocated Data 100 GB Allocated 50 GB Allocated
50 GB Allocated
Storage System 2 TB
Thin LUN 1 Thin LUN 2 Thin LUN 3
Storage System 2 TB
LUN 1 500 GB LUN 2 550 GB LUN 3 800 GB

Traditional Provisioning Virtual Provisioning

Information Storage and Management (ISM) v4

Page 266 © Copyright 2019 Dell Inc.


Storage Provisioning Lesson

Notes

With traditional provisioning, three LUNs are created and presented to one or more
compute systems. The total storage capacity of the storage system is 2 TB. The
allocated capacity of LUN 1 is 500 GB, of which only 100 GB is consumed, and the
remaining 400 GB is unused. The size of LUN 2 is 550 GB, of which 50 GB is
consumed, and 500 GB is unused. The size of LUN 3 is 800 GB, of which 200 GB
is consumed, and 600 GB is unused.

In total, the storage system has 350 GB of data, 1.5 TB of allocated but unused
capacity, and only 150 GB of remaining capacity available for other applications.
Now consider the same 2 TB storage system with virtual provisioning. Here, three
thin LUNs of the same sizes are created. However, there is no allocated unused
capacity. In total, the storage system with virtual provisioning has the same 350 GB
of data, but 1.65 TB of capacity is available for other applications, whereas only
150 GB is available in traditional storage provisioning.

Virtual provisioning and thin LUN offer many benefits, although in some cases
traditional LUN is better suited for an application. Thin LUNs are appropriate for
applications that can tolerate performance variations. In some cases, performance
improvement is perceived when using a thin LUN, due to striping across a large
number of drives in the pool. However, when multiple thin LUNs contend for shared
storage resources in a given pool, and when utilization reaches higher levels, the
performance can degrade. Thin LUNs provide the best storage space efficiency
and are suitable for applications where space consumption is difficult to forecast.
Using thin LUNs benefits organizations in reducing power and acquisition costs and
in simplifying their storage management.

Traditional LUNs are suited for applications that require predictable performance.
Traditional LUNs provide full control for precise data placement and allow an
administrator to create LUNs on different RAID groups if there is any workload
contention. Organizations that are not highly concerned about storage space
efficiency may still use traditional LUNs. Both traditional and thin LUNs can coexist
in the same storage system. Based on the requirement, an administrator may
migrate data between thin and traditional LUNs.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 267


Storage Provisioning Lesson

LUN Masking

Definition: LUN Masking


A process that provides data access control by defining which LUNs a
compute system can access.

 Implemented on a storage system


 Prevents unauthorized or accidental use of LUNs in a shared environment

Notes

The LUN masking function is implemented on the storage system. This ensures
that volume access by a compute system is controlled appropriately, preventing
unauthorized, or accidental use in a shared environment.

For example, consider a storage system with two LUNs that store data of the sales
and finance departments. Without LUN masking, both departments can easily see
and modify each other’s data, posing a high risk to data integrity and security. With
LUN masking, LUNs are accessible only to the designated compute systems.

Information Storage and Management (ISM) v4

Page 268 © Copyright 2019 Dell Inc.


Storage Tiering Lesson

Storage Tiering Lesson

Introduction

This lesson covers FAST VP and cache tiering.

This lesson covers the following topics:


 Explain Fully Automated Storage Tiering for Virtual Provisioning (FAST VP)
 Discuss Cache tiering

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 269


Storage Tiering Lesson

Storage Tiering

Video: Storage Tiering

The video is located at


https://edutube.emc.com/Player.aspx?vno=lH431R/R5rQM6ICC5+VYFQ

Information Storage and Management (ISM) v4

Page 270 © Copyright 2019 Dell Inc.


Storage Tiering Lesson

Storage Tiering Overview

Definition: Storage Tiering


A technique of establishing a hierarchy of storage types and
identifying the candidate data to relocate to the appropriate storage
type to meet service level requirements at a minimal cost.

 Definition: Storage Tiering


– A technique of establishing a hierarchy of storage types and identifying the
candidate data to relocate to the appropriate storage type to meet service
level requirements at a minimal cost
 Each tier has different levels of protection, performance, and cost
 Efficient storage tiering requires defining tiering policies
 Tiering options in block-based storage systems are: FAST VP and Cache tiering

Notes

Storage tiering is a technique of establishing a hierarchy of different storage types


(tiers). This enables storing the right data to the right tier, based on service level
requirements, at a minimal cost. Each tier has different levels of protection,
performance, and cost. For example, high performance solid-state drives (SSDs) or
FC drives can be configured as tier 1 storage to keep frequently accessed data and
low cost SATA drives as tier 2 storage to keep the less frequently accessed data.

Keeping frequently used data in SSD or FC improves application performance.


Moving less-frequently accessed data to SATA can free up storage capacity in high
performance drives and reduce the cost of storage. This movement of data
happens based on defined tiering policies. The tiering policy might be based on
parameters, such as frequency of access.

For example, if a policy states “move the data that are not accessed for the last 30
mins to the lower tier,” then all the data matching this condition are moved to the
lower tier.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 271


Storage Tiering Lesson

The process of moving the data from one type of tier to another is typically
automated. In automated storage tiering, the application workload is proactively
monitored; the active data is automatically moved to a higher performance tier and
the inactive data is moved to higher capacity, lower performance tier. The data
movement between the tiers is performed non-disruptively.

The techniques of storage tiering implemented in a block-based storage system


are: FAST VP and cache tiering.

Information Storage and Management (ISM) v4

Page 272 © Copyright 2019 Dell Inc.


Storage Tiering Lesson

LUN and Sub-LUN (FAST VP) Tiering

The process of
storage tiering Tier 0
Move entire LUN with
within a storage inactive data from tier 0 to
LUN LUN
tier 1
system is called
intra-array storage Move entire LUN
with active data Tier 1
from tier 1 to tier 0
tiering. It enables for improved
performance LUN LUN
the efficient use of
SSD, FC, and LUN Tiering
SATA drives within
a system and
provides Tier 0

performance and Move active data from


tier 1 to tier 0 for
cost optimization. improved performance
Move inactive data from
tier 0 to tier 1

The goal is to
keep the SSDs Tier 1
busy by storing the
Sub-LUN Tiering
most frequently
accessed data on
Inactive Data
them, while
Active Data
moving out the
less frequently
accessed data to
the SATA drives.
Data movements that are executed between tiers can be performed at the LUN
level or at the sub-LUN level. The performance can be further improved by
implementing tiered cache.

LUN tiering Sub-LUN Tiering


 Moves entire LUN from one tier to  A LUN is broken down into smaller
another. segments and tiered at that level.
 Does not give effective cost and  Provides effective cost and
performance benefits. performance benefits

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 273


Storage Tiering Lesson

Notes

Traditionally, storage tiering is operated at the LUN level that moves an entire LUN
from one tier of storage to another. This movement includes both active and
inactive data in that LUN. This method does not give effective cost and
performance benefits.

Today, storage tiering can be implemented at the sub-LUN level. In sub-LUN level
tiering, a LUN is broken down into smaller segments and tiered at that level.
Movement of data with much finer granularity, for example 8 MB, greatly enhances
the value proposition of automated storage tiering. Tiering at the sub-LUN level
effectively moves active data to faster drives and less active data to slower drives.

Information Storage and Management (ISM) v4

Page 274 © Copyright 2019 Dell Inc.


Storage Tiering Lesson

Cache Tiering

DRAM Cache Tier 0

Tier 1
SSD

Tiered Cache

Storage System

 Enables creation of a large capacity secondary cache using SSDs


 Enables tiering between DRAM cache and SSDs (secondary cache)
 Most reads are served directly from high performance tiered cache
 Enhances performance during peak workloads
 Non-disruptive and transparent to applications

Notes

Tiering is also implemented at the cache level. A large cache in a storage system
improves performance by retaining large amount of frequently accessed data in a
cache; so most reads are served directly from the cache. However, configuring a
large cache in the storage system involves more cost.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 275


Storage Tiering Lesson

An alternative way to increase the size of the cache is by utilizing the SSDs on the
storage system. In cache tiering, SSDs are used to create a large capacity
secondary cache and to enable tiering between DRAM (primary cache) and SSDs
(secondary cache).

Server flash-caching is another tier of cache in which flash-cache card is installed


in the server to further enhance the application performance.

Information Storage and Management (ISM) v4

Page 276 © Copyright 2019 Dell Inc.


Storage Tiering Lesson

Use Case - Block-Based Storage in a Cloud

Storage as a Service
VM instances running business
applications
To gain cost advantage,
organizations may move their VM VM VM

APP APP APP


application to a cloud. To
ensure proper functioning of OS OS OS

the application and provide


acceptable performance,
service providers offer block-
based storage in cloud.
Block-based
storage volumes
The service providers enable
the consumers to create
block-based storage volumes
and attach them to the virtual
machine instances. After the
volumes are attached, Block-based storage
system
consumers can create the file
system on these volumes
and run applications the way
they would on an on-premise
data center.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 277


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Dell EMC XtremIO
 Dell EMC FAST VP
 Dell EMC PowerMax
 Dell EMC SC Series

Information Storage and Management (ISM) v4

Page 278 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts in Practice

Concepts in Practice

Dell EMC XtremIO

DellEMC XtremIO is an all-flash, block-based, scale-out enterprise storage system


that provides substantial improvements to I/O performance. It is purpose-built to
leverage flash media and delivers new levels of real-world performance,
administrative ease, and advanced data services for applications. It uses a scale-
out clustered design that grows capacity and performance linearly to meet any
requirement.

XtremIO storage systems are created from building blocks called "X-Bricks" that
are each a high-availability, high-performance, fully active/active storage system
with no single point of failure. XtremIO's powerful operating system, XIOS,
manages the XtremIO storage cluster. XIOS ensures that the system remains
balanced and always delivers the highest levels of performance with no
administrator intervention.

XtremIO helps the administrators to become more efficient by enabling system


configuration in a few clicks, provisioning storage in seconds, and monitoring the
environment with real-time metrics.

Dell EMC FAST VP

Performs storage tiering at a sub-LUN level in a virtual provisioned environment.


FAST VP automatically moves more active data (data that is more frequently
accessed) to the best performing storage tier, and it moves less active data to a
lower performance and less expensive tier.

Data movement between the tiers is based on user-defined policies, and is


executed automatically and non-disruptively by FAST VP.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 279


Concepts in Practice Lesson

Dell EMC PowerMax

DellEMC PowerMax is the fast storage array delivering unprecedented levels of


performance with up to 10M IOPS, 150 GB per second of sustained bandwidth.
The key to unlocking the next level of performance is NVMe, which removes the
bottleneck form storage (SAS), which maximizes the power of flash drives, and
most importantly opens the door to the next media disruption with storage class
memory (SCM).

PowerMax will deliver up to 25% better response times with NVMe Flash drives.
The combination of NVMe and SCM will unlock even greater performance reaching
up to 50% better response times.

The array offers flexible scale-up and scale-out architecture. Configuration


management is simple with Unisphere for PowerMax. The intuitive HTML5 GUI
provides a simple and feature-rich user experience.

The easiest way to describe CloudIQ is that it is like a fitness tracker for your
storage environment, providing a single, simple, display to monitor and predict the
health of your storage environment. CloudIQ makes it simple to track storage
health, report on historical trends, plan for future growth, and proactively discover
and remediate issues from any browser or mobile device.

Dell EMC SC Series

SC offers two categories of arrays SC Hybrid(SSD & HDD) and SC All-Flash. SC


Series was one of the original pioneers of auto-tiering – and have the most full-
featured, powerful implementation, helping you get great flash performance with
less hardware, and a less expensive mix of hardware. SC arrays also provision
RAID dynamically to help cut costs and increase performance.

In addition to leading platform efficiency (auto-tiering, RAID tiering, thin methods),


SC arrays also offer the most comprehensive data reduction with Intelligent
Deduplication and Compression on:

 SSDs in all-flash configurations


 SSDs and HDDs in hybrid configurations

SC Series provides users with advanced thin provisioning technologies that


optimize storage utilization within their environments. Unlike traditional SANs,

Information Storage and Management (ISM) v4

Page 280 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Storage Center does not require users to pre-allocate space. Storage is pooled,
ensuring space is available when and where it is needed. You can even reclaim
capacity that is no longer in use by applications, automatically reduce the space
needed for virtual OS volumes and thin import volumes on legacy storage to
improve capacity utilization.

SC Series Remote Instant Replay software efficiently replicates periodic snapshots


between local and remote sites, helping to ensure business continuity at a fraction
of the cost of other replication solutions.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 281


Concepts in Practice Lesson

Assessment

1. The process of storage tiering within a storage system is called ?

A. Intra-array storage tiering

B. Inter-array storage tiering

C. LUN tiering

D. Sub-LUN tiering

2. Which is a process that provides data access control by defining which LUNs a
compute system can access?

A. LUN masking

B. Tiering

C. Virtual provisioning

D. Thin LUN

Information Storage and Management (ISM) v4

Page 282 © Copyright 2019 Dell Inc.


Summary

Summary

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 283


Fibre Channel SAN

Introduction

This module presents an overview of Fibre Channel Storage Area Network (FC
SAN), its components and architecture. It also focuses on FC SAN topologies, and
zoning along with describing virtualization process in FC SAN environment.

Upon completing this module, you will be able to:


 Describe Fibre Channel (FC) SAN and its components
 Describe FC architecture
 Describe FC SAN topologies and zoning
 Describe a Virtual SAN (VSAN)

Information Storage and Management (ISM) v4

Page 284 © Copyright 2019 Dell Inc.


Introduction to SAN Lesson

Introduction to SAN Lesson

Introduction

This lesson presents definition of SAN and its benefits and requirements.

This lesson covers the following topics:


 Definition of SAN
 Benefits of SAN
 Requirements for a SAN

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 285


Introduction to SAN Lesson

Introduction to SAN

Storage Area Network (SAN) Overview

Definition: SAN
A network whose primary purpose is the transfer of data between
computer systems and storage devices and among storage devices.
Source: Storage Networking Industry Association

Storage Area Network (SAN) is a network that primarily connects the storage
systems with the compute systems and also connects the storage systems with
each other. It enables multiple compute systems to access and share storage
resources. It also enables to transfer data between the storage systems. With long-
distance SAN, the data transfer over SAN can be extended across geographic
locations. A SAN usually provides access to block-based storage systems.

Data Center 1 Data Center 2

Client Client

LAN WAN LAN

VM V VM V
M M
AP AP AP AP

NAS O
P P

O
P P
NAS
O O
S S S S

Hypervisor Hypervisor

Compute Clients Compute


System System
SAN SAN

Storage Systems Storage Systems

Information Storage and Management (ISM) v4

Page 286 © Copyright 2019 Dell Inc.


Introduction to SAN Lesson

Benefits of SAN

 Enables both consolidation and sharing of storage resources across multiple


compute systems
 Improves utilization of storage resources
 Centralizes management
 Enables connectivity across geographically dispersed locations

 Enables compute systems across locations to access shared data


 Enables replication of data between storage systems that reside in separate
locations
 Facilitates remote backup of application data

Notes

SAN addresses the limitations of Direct-Attached Storage (DAS) environment.


Unlike a DAS environment, where the compute systems own the storage, SANs
enable both consolidation and sharing of storage resources across multiple
compute systems. This process improves the utilization of storage resources
compared to a DAS environment. It also reduces the total amount of storage that
an organization needs to purchase and manage. With consolidation, storage
management becomes centralized and less complex, which further reduces the
cost of managing information. A SAN may span over wide locations. This flexibility
enables organizations to connect geographically dispersed compute systems and
storage systems. The long-distance SAN connectivity enables the compute
systems across locations to access shared data. The long-distance connectivity
also enables the replication of data between storage systems that reside in
separate locations. The replication over long-distances helps in protecting data
against local and regional disaster. Further, the long-distance SAN connectivity
facilitates remote backup of application data. Backup data can be transferred
through a SAN to a backup device that may reside at a remote location. This
feature avoids having to ship tapes (backup media) from the primary site to the
remote site. Also avoids associated pitfalls such as packing and shipping expenses
and lost tapes in transit.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 287


Introduction to SAN Lesson

Requirements for a SAN

An effective SAN infrastructure must provide:


 High throughput to support high-performance computing
 Interconnectivity among many devices over wide locations to transfer massively
distributed, high volume of data
 Elastic and non-disruptive scaling to support applications that are horizontally
scaled
 Automated and policy-driven infrastructure configuration
 Simplified, flexible, and agile management operations

Information Storage and Management (ISM) v4

Page 288 © Copyright 2019 Dell Inc.


FC SAN Overview Lesson

FC SAN Overview Lesson

Introduction

This lesson presents the components of FC SAN, three FC interconnectivity


options, and FC port types.

This lesson covers the following topics:


 Components of FC SAN
 FC interconnectivity options
 FC port types

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 289


FC SAN Overview Lesson

FC SAN Overview

Video: FC SAN Overview

The video is located at


https://edutube.emc.com/Player.aspx?vno=iU0awg65X0P0YpgkPgH5Lw

Information Storage and Management (ISM) v4

Page 290 © Copyright 2019 Dell Inc.


FC SAN Overview Lesson

FC SAN Overview

Hypervisor Hypervisor
Hypervisor

Compute Systems

FC SAN

Storage Systems

 A SAN that uses FC protocol for communication


 A high-speed network that runs on high-speed optical fiber cables and serial
copper cables
 FC speeds commonly run at 1, 2, 4, 8, 16, 32, and 128 Gb/s
 Provides high scalability

Notes

Fibre Channel SAN (FC SAN) uses Fibre Channel (FC) protocol for
communication. FC protocol (FCP) is used to transport data, commands, and
status information between the compute systems and the storage systems. It is
also used to transfer data between the storage systems. FC is a high-speed
network technology that runs on high-speed optical fiber cables and serial copper
cables. The FC technology was developed to meet the demand for the increased
speed of data transfer between compute systems and mass storage systems. In
comparison with Ultra-Small Computer System Interface (Ultra-SCSI) that is
commonly used in the DAS environments, FC is a significant leap in storage
networking technology. Note: FibRE refers to the protocol, whereas fibER refers to
a media.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 291


FC SAN Overview Lesson

Information Storage and Management (ISM) v4

Page 292 © Copyright 2019 Dell Inc.


FC SAN Overview Lesson

Components of FC SAN

The key FC SAN components Cladding Core

are network adapters, cables,


and interconnecting devices.
These components are
Light In
described in the following:
 Network adapters
 FC HBAs in compute
system Multimode Fibre

 Front-end adapters in
Cladding
storage system Core

 Cables
 Copper cables for short
Light In
distance
 Optical fiber cables for
long distance
 Two types: Single-mode Fibre

o Multimode
o Single-mode
 Interconnecting devices

 FC hubs, FC switches, and FC directors

Notes

Network Adapters

In an FC SAN, the end devices, such as compute systems and storage systems
are all referred to as nodes. Each node is a source or destination of information.
Each node requires one or more network adapters to provide a physical interface
for communicating with other nodes. Examples of network adapters are FC host
bus adapters (HBAs) and storage system front-end adapters. An FC HBA has
SCSI-to-FC processing capability. It encapsulates operating system or hypervisor

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 293


FC SAN Overview Lesson

storage I/Os (usually SCSI I/O) into FC frames before sending the frames to the FC
storage systems over an FC SAN.

Cables

FC SAN implementations primarily use optical fiber cabling. Copper cables may be
used for shorter distances because it provides acceptable signal-to-noise ratio for
distances up to 30 meters. Optical fiber cables carry data in the form of light. There
are two types of optical cables: multimode and single-mode. Multimode fiber (MMF)
cable carries multiple beams of light that is projected at different angles
simultaneously onto the core of the cable. In an MMF transmission, multiple light
beams traveling inside the cable tend to disperse and collide. This collision
weakens the signal strength after it travels a certain distance – a process that is
known as modal dispersion. Due to modal dispersion, an MMF cable is typically
used for short distances, commonly within a data center.

Single-mode fiber (SMF) carries a single ray of light that is projected at the center
of the core. The small core and the single light wave help to limit modal dispersion.
Single-mode provides minimum signal attenuation over maximum distance (up to
10 km). A single-mode cable is used for long-distance cable runs, and the distance
usually depends on the power of the laser at the transmitter and the sensitivity of
the receiver. A connector is attached at the end of a cable to enable swift
connection and disconnection of the cable to and from a port. A standard connector
(SC) and a lucent connector (LC) are two commonly used connectors for fiber optic
cables.

Interconnecting Devices

The commonly used interconnecting devices in FC SANs are FC hubs, FC


switches, and FC directors.

Information Storage and Management (ISM) v4

Page 294 © Copyright 2019 Dell Inc.


FC SAN Overview Lesson

FC Interconnecting Devices

FC Hub FC Switch FC Director

 Nodes are  Each node has a  High-end switches


connected in a dedicated communication with a higher port
logical loop path count
 Nodes share loop  Provides a fixed port  Has a modular
count ─ active or unused architecture
 Provides limited
connectivity and  Active ports can be  Port count is scaled-
scalability scaled-up non-disruptively up by inserting line
cards/blades
 Some components are
redundant and hot-  All key components
swappable are redundant and
hot-swappable

Notes

FC hubs are used as communication devices in Fibre Channel Arbitrated Loop


(FC-AL) implementations (discussed later). Hubs physically connect nodes in a
logical loop or a physical star topology. All the nodes must share the loop because
data travels through all the connection points. Because of the availability of low-
cost and high-performance switches, the FC switches are preferred over the FC
hubs in FC SAN deployments.

FC switches are more intelligent than FC hubs and directly route data from one
physical port to another. Therefore, the nodes do not share the data path. Instead,
each node has a dedicated communication path. The FC switches are commonly
available with a fixed port count. Some of the ports can be active for operational
purpose and the rest remain unused. The number of active ports can be scaled-up
non-disruptively. Some of the components of a switch such as power supplies and
fans are redundant and hot-swappable. Hot-swappable means components can be
replaced while a device is powered-on and remains in operation.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 295


FC SAN Overview Lesson

FC directors are high-end switches with a higher port count. A director has a
modular architecture and its port count is scaled-up by inserting extra line cards or
blades to the director’s chassis. Directors contain redundant components with
automated failover capability. Its key components such as switch controllers,
blades, power supplies, and fan modules are all hot-swappable. These ensure high
availability for business critical applications.

Information Storage and Management (ISM) v4

Page 296 © Copyright 2019 Dell Inc.


FC SAN Overview Lesson

FC Interconnecting Options

The FC architecture supports three basic interconnectivity options: point-to-point,


fibre channel arbitrated loop (FC-AL), and fibre channel switched fabric (FC-SW).
These interconnectivity options are described in the following:

Point-to-Point

In this configuration, two nodes are connected directly to each other. This
configuration provides a dedicated connection for data transmission between
nodes. However, the point-to-point configuration offers limited connectivity and
scalability and is used in a DAS environment.

VM VM

Hypervisor
Compute System Storage System

FC Arbitrated Loop (FC-AL)

In this configuration, the devices are attached to a shared loop. Each device
contends with other devices to perform I/O operations. The devices on the loop
must “arbitrate” to gain control of the loop. At any given time, only one device can
perform I/O operations on the loop. Because each device in a loop must wait for its
turn to process an I/O request, the overall performance in FC-AL environments is
low.

Further, adding or removing a device results in loop re-initialization, which can


cause a momentary pause in loop traffic. As a loop configuration, FC-AL can be
implemented without any interconnecting devices by directly connecting one device
to another two devices in a ring through cables. However, FC-AL implementations
may also use FC hubs through which the arbitrated loop is physically connected in
a star topology.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 297


FC SAN Overview Lesson

VM VM

Compute
Systems
Hypervisor

VM VM

Compute
Systems FC Hub

Hypervisor

Compute System
Storage System

FC Switched Fabric (FC-SW)

It includes a single FC switch or a network of FC switches (including FC directors)


to interconnect the nodes. It is also referred to as fabric connect. A fabric is a
logical space in which all nodes communicate with one another in a network. In a
fabric, the link between any two switches is called an interswitch link (ISL). ISLs
enable switches to be connected together to form a single, larger fabric. They
enable the transfer of both storage traffic and fabric management traffic from one
switch to another.

In FC-SW, nodes do not share a loop. Instead, data is transferred through a


dedicated path between the nodes. Unlike a loop configuration, an FC-SW
configuration provides high scalability. The addition or removal of a node in a
switched fabric is minimally disruptive. It does not affect the ongoing traffic between
other nodes.

VM VM

Hypervisor
FC Switch FC Switch

Compute System

Interswitch Link

Storage System
Compute System

Information Storage and Management (ISM) v4

Page 298 © Copyright 2019 Dell Inc.


FC SAN Overview Lesson

Port Types in Switched Fabric

Port Description

N_Port An end point in the fabric. This port is also known as the node port.
Typically, it is a compute system port (FC HBA port) or a storage system
port that is connected to a switch in a switched fabric.

E_Port A port that forms the connection between two FC switches. This port is
also known as the expansion port. The E_Port on an FC switch
connects to the E_Port of another FC switch in the fabric ISLs.

F_Port A port on a switch that connects an N_Port. It is also known as a fabric


port.

G_Port A generic port on a switch that can operate as an E_Port or an F_Port


and determines its functionality automatically during initialization.

VM VM

Hypervisor
N_Port

Compute System

F_Port

FC Switch
FC Switch

F_Port E_Port E_Port F_Port

ISL
N_Port

N_Port

Storage System Storage System

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 299


FC SAN Overview Lesson

NVMe over Fibre Channel

 Organizations are adopting NVMe protocol to access SSDs over the PCIe bus
 NVMe over FC is designed to transfer NVMe-based data over a FC network
 Reduces latency and improves the performance of SSDs
 FC protocol maps NVMe (upper layer protocol) to the lower layers for the data
transfer

Information Storage and Management (ISM) v4

Page 300 © Copyright 2019 Dell Inc.


FC SAN Overview Lesson

Definition: SAN
A network whose primary purpose is the transfer of data between
computer systems and storage devices and among storage devices.
Source: Storage Networking Industry Association

Storage Area Network (SAN) is a network that primarily connects the storage
systems with the compute systems and also connects the storage systems with
each other. It enables multiple compute systems to access and share storage
resources. It also enables to transfer data between the storage systems. With long-
distance SAN, the data transfer over SAN can be extended across geographic
locations. A SAN usually provides access to block-based storage systems.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 301


FC Architecture Lesson

FC Architecture Lesson

Introduction

This lesson presents FC protocol stack, FC addressing, the structure and


organization of FC data, and fabric login types.

This lesson covers the following topics:


 FC protocol stack
 FC addressing
 Structure and organization of FC data
 Fabric login types

Information Storage and Management (ISM) v4

Page 302 © Copyright 2019 Dell Inc.


FC Architecture Lesson

FC SAN Architecture

Video: Fibre Channel Architecture

The video is located at


https://edutube.emc.com/Player.aspx?vno=t9/lhF5Hze0I9ybdsceHpg

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 303


FC Architecture Lesson

FC Architecture Overview

 Provides benefits of both channel and network technologies


 Provides high performance with low protocol overheads
 Provides high scalability with long-distance capability
 Implements SCSI over FC network
 Transports SCSI data through FC network
 Storage devices, attached to FC SAN, appear as locally attached to the
operating system or hypervisor

Notes

Traditionally, compute operating systems have communicated with peripheral


devices over channel connections, such as Enterprise Systems Connection
(ESCON) and SCSI. Channel technologies provide high levels of performance with
low protocol overheads. Such performance is achievable due to the static nature of
channels and the high level of hardware and software integration that is provided
by the channel technologies. However, these technologies suffer from inherent
limitations in terms of the number of devices that can be connected and the
distance between these devices.

In contrast to channel technology, network technologies are more flexible and


provide greater distance capabilities. Network connectivity provides greater
scalability and uses shared bandwidth for communication. This flexibility results in
greater protocol overhead and reduced performance.

The FC architecture represents true channel and network integration and captures
some of the benefits of both channel and network technology. FC protocol provides
both the channel speed for data transfer with low protocol overhead and the
scalability of network technology. FC provides a serial data transfer interface that
operates over copper wire and optical fiber.

FC protocol forms the fundamental construct of the FC SAN infrastructure. FC


protocol predominantly is the implementation of SCSI over an FC network. SCSI
data is encapsulated and transported within FC frames. SCSI over FC overcomes
the distance and the scalability limitations that are associated with traditional direct-
attached storage. Storage devices attached to the FC SAN appear as locally

Information Storage and Management (ISM) v4

Page 304 © Copyright 2019 Dell Inc.


FC Architecture Lesson

attached devices to the operating system (OS) or hypervisor running on the


compute system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 305


FC Architecture Lesson

FC Protocol Stack

It is easier to understand a communication protocol by viewing it as a structure of


independent layers. FCP defines the communication protocol in five layers: FC-0
through FC-4 (except FC-3 layer, which is not implemented).

Upper Layer Protocol Examples: SCSI,


HIPPI, ESCON, ATM, IP

Upper Layer Protocol Mapping


FC-4

Framing/ flow Control


FC-2

Encode/Decode
FC-1

1 Gb/s 2 Gb/s 4 Gb/s 8 Gb/s 16 Gb/s


FC-0

Listed is a breakdown of each layer with its function and features.

FC Function Features Specified by FC Layer


Layer

FC-4 Mapping Mapping upper layer protocol (for example SCSI) to lower
interface FC layers

FC-3 Common Not implemented


services

FC-2 Routing, flow Frame structure, FC addressing, flow control


control

FC-1 Encode/decode 8b/10b or 64b/66b encoding, bit, and frame


synchronization

FC-0 Physical layer Media, cables, connector

Information Storage and Management (ISM) v4

Page 306 © Copyright 2019 Dell Inc.


FC Architecture Lesson

Notes

FC-4 Layer: It is the uppermost layer in the FCP stack. This layer defines the
application interfaces and the way Upper Layer Protocols (ULPs) are mapped to
the lower FC layers. The FC standard defines several protocols that can operate on
the FC-4 layer. Some of the protocols include SCSI, High Performance Parallel
Interface (HIPPI) Framing Protocol, ESCON, Asynchronous Transfer Mode (ATM),
and IP.

FC-2 Layer: It provides FC addressing, structure, and organization of data (frames,


sequences, and exchanges). It also defines fabric services, classes of service, flow
control, and routing.

FC-1 Layer: It defines how data is encoded prior to transmission and decoded
upon receipt. At the transmitter node, an 8-bit character is encoded into a 10-bit
transmission character. This character is then transmitted to the receiver node. At
the receiver node, the 10-bit character is passed to the FC-1 layer, which decodes
the 10-bit character into the original 8-bit character. FC links, with a speed of 10
Gbps and above, use 64-bit to 66-bit encoding algorithm. This layer also defines
the transmission words such as FC frame delimiters, which identify the start and
the end of a frame and the primitive signals that indicate events at a transmitting
port. In addition to these, the FC-1 layer performs link initialization and error
recovery.

FC-0 Layer: It is the lowest layer in the FCP stack. This layer defines the physical
interface, media, and transmission of bits. The FC-0 specification includes cables,
connectors, and optical and electrical parameters for various data rates. The FC
transmission can use both electrical and optical media.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 307


FC Architecture Lesson

FC Addressing in Switched Fabric

 FC address is assigned to node ports during fabric login


 Used for communication between nodes in an FC SAN
 FC address size is 24 bits:

Domain ID Area ID Port ID

Bits (23 -26) Bits (15-08) Bits (07 - 00)

 Main purpose of an FC address is routing data through the fabric

Notes

An FC address is dynamically assigned when a node port logs on to the fabric. The
FC address has a distinct format, as shown on the image. The first field of the FC
address contains the domain ID of the switch. A domain ID is a unique number that
is provided to each switch in the fabric. The area ID is used to identify a group of
switch ports that are used for connecting nodes.

An example of a group of ports with common area ID is a port card on the switch.
The last field, the port ID, identifies the port within the group. The FC address size
is 24 bits. The primary purpose of an FC address is routing data through the fabric.

Information Storage and Management (ISM) v4

Page 308 © Copyright 2019 Dell Inc.


FC Architecture Lesson

World Wide Name

 Unique 64-bit identifier


 Static to node ports on an FC network

 Similar to MAC address of NIC


 WWNN and WWPN are used to physically identify FC network adapters and
node ports respectively
World Wide Name - Array

5 0 0 6 0 1 6 0 0 0 6 0 0 1 B 2

0101 0000 0000 0110 0000 0001 0110 0000 0000 0000 0110 0000 0000 0001 1011 0010
Format Company ID Model Seed
Port
Type 24 bits 32 bits

World Wide Name - HBA

1 0 0 0 0 0 0 0 c 9 2 0 d c 4 0

Format Reserved Company ID Company Specific 24


Type 12 bits 24 bits bits

Notes

Each device in the FC environment is assigned a 64-bit unique identifier that is


called the World Wide Name (WWN). The FC environment uses two types of
WWNs: World Wide Node Name (WWNN) and World Wide Port Name (WWPN).
WWNN is used to physically identify FC network adapters, and WWPN is used to
physically identify FC adapter ports or node ports. For example, a dual-port FC
HBA has one WWNN and two WWPNs.

Unlike an FC address, which is assigned dynamically, a WWN is a static name for


each device on an FC network. WWNs are similar to the Media Access Control
(MAC) addresses used in IP networking. WWNs are burned into the hardware or
assigned through software. Several configuration definitions in an FC SAN use
WWN for identifying storage systems and FC HBAs. WWNs are critical for FC SAN
configuration as each node port has to be registered by its WWN before the FC
SAN recognizes it.

The name server in an FC SAN environment keeps the association of WWNs to the
dynamically created FC addresses for node ports. The illustration on the slide
illustrates the WWN structure examples for a storage system and an HBA.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 309


FC Architecture Lesson

Structure and Organization of FC Data

Frame Data Field


SOF 4 EOF 4
Header 24 0-2112 CRC 4 Bytes
Bytes Bytes
Bytes Bytes

FC Data Structure Description

Exchange  Enables two N_Ports to identify and manage a set of


information units
 Information unit: upper layer protocol-specific
information that is sent to another port to perform
certain operation
 Each information unit maps to a sequence
 Includes one or more sequences

Sequence  Contiguous set of frames that correspond to an


information unit

Frame  Fundamental unit of data transfer


 Each frame consists of five parts: SOF, frame header,
data field, CRC, and EOF

Notes

Exchange: An exchange operation enables two node ports to identify and manage
a set of information units. Each upper layer protocol (ULP) has its protocol-specific
information that must be sent to another port to perform certain operations. This
protocol-specific information is called an information unit. The structure of these
information units is defined in the FC-4 layer. This unit maps to a sequence. An
exchange is composed of one or more sequences.

Sequence: A sequence refers to a contiguous set of frames that are sent from one
port to another. A sequence corresponds to an information unit, as defined by the
ULP.

Information Storage and Management (ISM) v4

Page 310 © Copyright 2019 Dell Inc.


FC Architecture Lesson

Frame: A frame is the fundamental unit of data transfer at FC-2 layer. An FC frame
consists of five parts: start of frame (SOF), frame header, data field, cyclic
redundancy check (CRC), and end of frame (EOF). The SOF and EOF act as
delimiters. The frame header is 24 bytes long and contains addressing information
for the frame. The data field in an FC frame contains the data payload, up to 2,112
bytes of actual data – usually the SCSI data. The CRC checksum facilitates error
detection for the content of the frame. This checksum verifies data integrity by
checking whether the content of the frames is received correctly. The CRC
checksum is calculated by the sender before encoding at the FC-1 layer. Similarly,
it is calculated by the receiver after decoding at the FC-1 layer.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 311


FC Architecture Lesson

Fabric Login Types

Fabric services define three login types:

 Fabric login (FLOGI)


 Occurs between an N_Port and an F_Port
 Node sends a FLOGI frame with WWN to Fabric Login Server on switch
 Node obtains FC address from switch
 Immediately after FLOGI, N_Port registers with Name Server on switch
 N_Port queries name server about all other logged in ports
 Port login (PLOGI)
 Occurs between two N_Ports to establish a session
 Exchange service parameters relevant to the session
 Process login (PRLI)

 Occurs between two N_Ports to exchange ULP related parameters

Notes

Fabric Login (FLOGI): It is performed between an N_Port and an F_Port. To log


on to the fabric, a node sends a FLOGI frame with the WWNN and WWPN
parameters to the login service at the predefined FC address FFFFFE (Fabric
Login Server). In turn, the switch accepts the login and returns an Accept (ACC)
frame with the assigned FC address for the node. Immediately after the FLOGI, the
N_Port registers itself with the local Name Server on the switch, indicating its
WWNN, WWPN, port type, class of service, assigned FC address, and so on. After
the N_Port has logged in, it can query the name server database for information
about all other logged in ports.

Port Login (PLOGI): It is performed between two N_Ports to establish a session.


The initiator N_Port sends a PLOGI request frame to the target N_Port, which
accepts it. The target N_Port returns an ACC to the initiator N_Port. Next, the
N_Ports exchange service parameters relevant to the session.

Information Storage and Management (ISM) v4

Page 312 © Copyright 2019 Dell Inc.


FC Architecture Lesson

Process Login (PRLI): It is also performed between two N_Ports. This login
relates to the FC-4 ULPs, such as SCSI. If the ULP is SCSI, N_Ports exchange
SCSI-related service parameters.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 313


Topologies, Link Aggregation and Zoning Lesson

Topologies, Link Aggregation and Zoning Lesson

Introduction

This lesson presents FC SAN topologies such as single-switch, mesh, and core-
edge. This lesson also focuses on the types of zoning.

This lesson covers the following topics:


 Single-switch topology
 Mesh topology
 Core-edge topology
 Link aggregation
 Types of zoning

Information Storage and Management (ISM) v4

Page 314 © Copyright 2019 Dell Inc.


Topologies, Link Aggregation and Zoning Lesson

Topologies, Link Aggregation and Zoning

Video: FC Topologies Including Link Aggregation and Zoning

The video is located at


https://edutube.emc.com/Player.aspx?vno=ZvP288ft8adIH2wQDuFL6w

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 315


Topologies, Link Aggregation and Zoning Lesson

Single-switch Topology

Single Switch Fabric

FC Director
VM VM

Hypervisor Kernel

Compute System
Compute System

Storage System

 Fabric consists of only a single switch


 Both compute systems, and storage systems connect to same switch
 No ISLs are required for compute-to-storage traffic
 Every switch port is usable for node connectivity

Notes

FC switches (including FC directors) may be connected in various ways to form


different fabric topologies. Each topology provides certain benefits.

Information Storage and Management (ISM) v4

Page 316 © Copyright 2019 Dell Inc.


Topologies, Link Aggregation and Zoning Lesson

In a single-switch topology, the fabric consists of only a single switch. Both the
compute systems and the storage systems are connected to the same switch. A
key advantage of a single-switch fabric is that it does not need to use any switch
port for ISLs. Therefore, every switch port is usable for compute system or storage
system connectivity. Further, this topology helps eliminate FC frames traveling over
the ISLs and therefore eliminates the ISL delays.

A typical implementation of a single-switch fabric would involve the deployment of


an FC director. FC directors are high-end switches with a high port count. When
extra switch ports are needed over time, new ports can be added through add-on
line cards (blades) in spare slots available on the director chassis. To some extent,
a bladed solution alleviates the port count scalability problem inherent in a single-
switch topology.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 317


Topologies, Link Aggregation and Zoning Lesson

Mesh Topology

Full Mesh Topology

Full Mesh Fabric

VM VM
FC Switches
APP APP

OS OS
VMM VMM
Hypervisor Kernel

Compute System

Compute System
Storage System

 Each switch is connected to every other switch


 Maximum of one ISL is required
 Compute systems and storage systems can be connected to any switch

Partial Mesh Topology

Information Storage and Management (ISM) v4

Page 318 © Copyright 2019 Dell Inc.


Topologies, Link Aggregation and Zoning Lesson

Partial Mesh Fabric

VM VM

APP APP FC Switches


OS OS
VMM VMM
Hypervisor Kernel

Compute System

Compute System
Storage System

 Not all the switches are connected to every other switch


 Several ISLs may be required

Notes

In a full mesh, every switch is connected to every other switch in the topology.

A full mesh topology may be appropriate when the number of switches that are
involved is small. A typical deployment would involve up to four switches or
directors, with each of them servicing highly localized compute-to-storage traffic. In
a full mesh topology, a maximum of one ISL or hop is required for compute-to-
storage traffic.

However, with the increase in the number of switches, the number of switch ports
that are used for ISL also increases. This process reduces the available switch
ports for node connectivity.

In a partial mesh topology, not all the switches are connected to every other switch.
In this topology, several hops or ISLs may be required for the traffic to reach its
destination.

Partial mesh offers more scalability than full mesh topology. However, without
proper placement of compute and storage systems, traffic management in a partial
mesh fabric might be complicate. Also ISLs could become overloaded due to
excessive traffic aggregation.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 319


Topologies, Link Aggregation and Zoning Lesson

Information Storage and Management (ISM) v4

Page 320 © Copyright 2019 Dell Inc.


Topologies, Link Aggregation and Zoning Lesson

Core-Edge Topology

Notes

The edge tier is composed of switches and offers an inexpensive approach to


adding more compute systems in a fabric. The edge-tier switches are not
connected to each other. Each switch at the edge tier is attached to a switch at the
core tier through ISLs.

The core tier is composed of directors that ensure high fabric availability. Also,
typically all traffic must either traverse this tier or terminate at this tier. In this
configuration, all storage systems are connected to the core tier, enabling compute-
to-storage traffic to traverse only one ISL. Compute systems that require high
performance may be connected directly to the core tier and therefore avoid ISL
delays.The core-edge topology increases connectivity within the FC SAN while
conserving the overall port utilization. It eliminates the need to connect edge
switches to other edge switches over ISLs.

Reduction of ISLs can greatly increase the number of node ports that can be
connected to the fabric. If fabric expansion is required, then administrators would
need to connect extra edge switches to the core. The core of the fabric is also
extended by adding more switches or directors at the core tier. Based on the
number of core-tier switches, this topology has different variations, such as single-
core topology and dual-core topology. To transform a single-core topology to dual-
core, new ISLs are created to connect each edge switch to the new core switch in
the fabric.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 321


Topologies, Link Aggregation and Zoning Lesson

Link Aggregation

 Combines multiple ISLs into a single logical ISL (port-channel)

– Provides higher throughput than a single ISL could provide


– Distributes network traffic over ISLs, ensuring even ISL utilization
H1-5Gb/s H2-1.5Gb/s H3-2Gb/s H4-4.5Gb/s H1-5Gb/s H2-1.5Gb/s H3-2Gb/s H4-4.5Gb/s

FC Switch FC Switch
{H1, S1} {H1, S1}

{H4, S4} 3 ISLs(No Aggregation) {H4, S4} ISL


Aggregation
{H2, S2} ISL Bandwidth = 8Gb/s {H2, S2} Port-Channel Bandwidth = 24 Gb/s

{H3, S3} {H3, S3}

FC Switch FC Switch

S1-5Gb/s S2-1.5Gb/s S3-2Gb/s S4-4.5Gb/s S1-5Gb/s S2-1.5Gb/s S3-2Gb/s S4-4.5Gb/s

Notes

Link aggregation combines two or more parallel ISLs into a single logical ISL,
called a port-channel, yielding higher throughput than a single ISL could provide.

For example, the aggregation of 10 ISLs into a single port-channel provides up to


160 Gb/s throughput assuming the bandwidth of an ISL is 16 Gb/s. Link
aggregation optimizes fabric performance by distributing network traffic across the
shared bandwidth of all the ISLs in a port-channel. This allows the network traffic
for a pair of node ports to flow through all the available ISLs in the port-channel
rather than restricting the traffic to a specific, potentially congested ISL. The
number of ISLs in a port channel can be scaled depending on application’s
performance requirement.

Information Storage and Management (ISM) v4

Page 322 © Copyright 2019 Dell Inc.


Topologies, Link Aggregation and Zoning Lesson

Example Notes

This image illustrates two examples.

The example on the left is based on an FC SAN infrastructure with no link


aggregation enabled.

 Four HBA ports H1, H2, H3, and H4 have been configured to generate I/O
activity to four storage system ports S1, S2, S3, and S4 respectively.
 The HBAs and the storage systems are connected to two separate FC switches
with three ISLs between the switches.
 Let us assume that the bandwidth of each ISL is 8 Gb/s and the data
transmission rate for the port-pairs {H1,S1}, {H2,S2}, {H3,S3}, and {H4,S4} are
5 Gb/s, 1.5 Gb/s, 2 Gb/s, and 4.5 Gb/s.

Without link aggregation, the fabric typically assigns a particular ISL for each of the
port-pairs in a round-robin fashion. It is possible that port-pairs {H1,S1} and {H4,S4}
are assigned to the same ISL in their respective routes. The other two ISLs are
assigned to the port-pairs {H2,S2} and {H3,S3}. Two of the three ISLs are under-
utilized, whereas the third ISL is saturated and becomes a performance bottleneck
for the port-pairs assigned to it.

The example on the right has aggregated the three ISLs into a port-channel that
provides throughput up to 24 Gb/s. Network traffic for all the port-pairs are
distributed over the ISLs in the port-channel, which ensures even ISL utilization.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 323


Topologies, Link Aggregation and Zoning Lesson

Zoning

Definition: Zoning
A logical private path between node ports in a fabric.

Zone 1

Compute System

FC SAN

VM VM Storage System

Storage System port


Hypervisor Kernel

Compute System Zone 2

FC HBA Port

 Each zone contains members (FC HBA and storage system ports)
 Benefits:

 Security
 Restricts RSCN traffic

Information Storage and Management (ISM) v4

Page 324 © Copyright 2019 Dell Inc.


Topologies, Link Aggregation and Zoning Lesson

Notes

Zoning is a logical private path between node ports in a fabric. Whenever a change
takes place in the name server database, the fabric controller sends a Registered
State Change Notification (RSCN) to all the nodes impacted by the change. If
zoning is not configured, the fabric controller sends the RSCN to all the nodes in
the fabric. Involving the nodes that are not impacted by the change increases the
amount of fabric-management traffic.

For a large fabric, the amount of FC traffic generated due to this process can be
significant and might impact the compute-to-storage data traffic. Zoning helps to
limit the number of RSCNs in a fabric. In the presence of zoning, a fabric sends the
RSCN to only those nodes in a zone where the change has occurred.

Zoning also provides access control, along with other access control mechanisms,
such as LUN masking. Zoning provides control by enabling only the members in
the same zone to establish communication with each other.

Zone members, zones, and zone sets form the hierarchy that is defined in the
zoning process. A zone set is composed of a group of zones that can be activated
or deactivated as a single entity in a fabric. Multiple zone sets may be defined in a
fabric, but only one zone set can be active at a time.

Members are the nodes within the FC SAN that can be included in a zone. FC
switch ports, FC HBA ports, and storage system ports can be members of a zone.
A port or node can be a member of multiple zones. Nodes that are distributed
across multiple switches in a switched fabric may also be grouped into the same
zone. Zone sets are also referred to as zone configurations.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 325


Topologies, Link Aggregation and Zoning Lesson

Types of Zoning

The illustration shows three types of zoning on an FC network.

VM VM Switch Domain = 15
APP AP
P Port 5
OS OS
Zone 2
VMM VMM
Hypervisor Kernel FC
Port 1 Switch
Storage System
Compute WWN 10:00:00:00:C9:20:DC:40
System

VM VM
APP AP
Port 12
P
Port 9
OS OS
VMM VMM
Hypervisor Kernel Zone 3

WWN 10:00:00:00:C9:20:DC:56
Compute
WWN 50:06:04:82:E8:91:2B:9E
System

Zone 1
Compute
WWN 10:00:00:00:C9:20:DC:82 Zone 1 (WWN Zone) =10:00:00:00:C9:20:DC:82; 50:06:04:82:E8:91:2B:9E
System Zone 2 (Port Zone) = 15,5;15,12
Zone 3 (Mixed Zone) =10:00:00:00:C9:20:DC:56; 15,12

The three types of zoning are:

WWN Zoning

Uses World Wide Names to define zones. The zone members are the unique
WWN addresses of the FC HBA and its targets (storage systems). A major
advantage of WWN zoning is its flexibility. If an administrator moves a node to
another switch port in the fabric, the node maintains connectivity to its zone
partners without having to modify the zone configuration. This functionality is
possible because the WWN is static to the node port.

Port Zoning

Uses the switch port ID to define zones. In port zoning, access to node is
determined by the physical switch port to which a node is connected. The zone
members are the port identifiers (switch domain ID and port number) to which FC
HBA and its targets (storage systems) are connected. If a node is moved to
another switch port in the fabric, port zoning must be modified to enable the node,
in its new port, to participate in its original zone. However, if an FC HBA or storage
system port fails, an administrator has to replace the failed device without changing
the zoning configuration.

Information Storage and Management (ISM) v4

Page 326 © Copyright 2019 Dell Inc.


Topologies, Link Aggregation and Zoning Lesson

Mixed Zoning

Combines the qualities of both WWN zoning and port zoning. Using mixed zoning
enables a specific node port to be tied to the WWN of another node.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 327


SAN Virtualization Lesson

SAN Virtualization Lesson

Introduction

This lesson presents an overview of Virtual SAN (VSAN), its configuration, VSAN
trunking, and VSAN tagging. It also focuses on concepts in practice for FC SAN
connectivity.

This lesson covers the following topics:


 Block-level storage virtualization
 Virtual SAN (VSAN) overview

Information Storage and Management (ISM) v4

Page 328 © Copyright 2019 Dell Inc.


SAN Virtualization Lesson

SAN Virtualization

Video: Virtualization in FC SAN

The video is located at


https://edutube.emc.com/Player.aspx?vno=guMa8RQ1SE/1aPg/kAy3Gg

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 329


SAN Virtualization Lesson

Block-level Storage Virtualization

VM VM VM VM
Compute
Systems

Hypervisor Hypervisor

Virtual Volume

FC SAN
Virtualization
Appliance

Storage Pool
LUN LUN

LUN Storage LUN


Storage System
System

The figure on the slide shows two compute systems, each of which has one virtual
volume assigned. These virtual volumes are mapped to the LUNs in the storage
systems. When an I/O is sent to a virtual volume, it is redirected to the mapped
LUNs through the virtualization layer at the FC SAN. Depending on the capabilities
of the virtualization appliance, the architecture may allow for more complex
mapping between the LUNs and the virtual volumes.
 Provides a virtualization layer in SAN
 Abstracts block-based storage systems
 Aggregates LUNs to create storage pool

Information Storage and Management (ISM) v4

Page 330 © Copyright 2019 Dell Inc.


SAN Virtualization Lesson

 Virtual volumes from storage pool are assigned to compute systems


 Virtualization layer maps virtual volumes to LUNs
 Benefits:

 Online expansion of virtual volumes


 Non-disruptive data migration

Notes

Block-level storage virtualization aggregates block storage devices (LUNs) and


enables provisioning of virtual storage volumes, independent of the underlying
physical storage. A virtualization layer, which exists at the SAN, abstracts the
identity of block-based storage systems and creates a storage pool by aggregating
LUNs from the storage systems.

Virtual volumes are created from the storage pool and assigned to the compute
systems. Instead of being directed to the LUNs on the individual storage systems,
the compute systems are directed to the virtual volumes provided by the
virtualization layer. The virtualization layer maps the virtual volumes to the LUNs on
the individual storage systems.

The compute systems remain unaware of the mapping operation and access the
virtual volumes as if they were accessing the physical storage attached to them.
Typically, the virtualization layer is managed via a dedicated virtualization
appliance to which the compute systems and the storage systems are connected.

Block-level storage virtualization enables extending the virtual volumes non-


disruptively to meet application’s capacity scaling requirements. It also provides the
advantage of non-disruptive data migration. In a traditional SAN environment, LUN
migration from one storage system to another is an offline event.

After migration, the compute systems are updated to reflect the new storage
system configuration. In other instances, processor cycles at the compute system
were required to migrate data from one storage system to the other, especially in a
multivendor environment.

With a block-level storage virtualization solution in place, the virtualization layer


handles the migration of data, which enables LUNs to remain online and accessible
while data is migrating. No physical changes are required because the compute

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 331


SAN Virtualization Lesson

system still points to the same virtual volume on the virtualization layer. However,
the mapping information on the virtualization layer should be changed. These
changes can be executed dynamically and are transparent to the end user.

Information Storage and Management (ISM) v4

Page 332 © Copyright 2019 Dell Inc.


SAN Virtualization Lesson

Virtual SAN/Virtual Fabric

Definition: VSAN

VSAN 10 VSAN 20

Compute Systems Compute Systems

VM VM
VM VM VM VM VM VM

APP APP APP


APP APP APP APP
APP

OS OS
OS OS OS OS
OS OS

VMM VMM VMM VMM VMM VMM


VMM VMM
Hypervisor Kernel Hypervisor Kernel
Hypervisor Kernel Hypervisor Kernel

FC SAN

Storage System Storage System

 Each VSAN has its own fabric services, configuration, and set of FC addresses
 VSANs improve SAN security, scalability, availability, and manageability

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 333


SAN Virtualization Lesson

Notes

In a VSAN, a group of node ports communicate with each other using a virtual
topology that is defined on the physical SAN. Multiple VSANs may be created on a
single physical SAN. Each VSAN behaves and is managed as an independent
fabric. Each VSAN has its own fabric services, configuration, and set of FC
addresses. Fabric-related configurations in one VSAN do not affect the traffic in
another VSAN. A VSAN may be extended across sites, enabling communication
among a group of nodes, in either site with a common set of requirements.

VSANs improve SAN security, scalability, availability, and manageability. VSANs


provide enhanced security by isolating the sensitive data in a VSAN and by
restricting the access to the resources located within that VSAN. For example, a
cloud provider typically isolates the storage pools for multiple cloud services by
creating multiple VSANs on an FC SAN.

Further, the same FC address can be assigned to nodes in different VSANs, thus
increasing the fabric scalability. The events causing traffic disruptions in one VSAN
are contained within that VSAN and are not propagated to other VSANs. VSANs
facilitate an easy, flexible, and less expensive way to manage networks.

Configuring VSANs is easier and quicker compared to building separate physical


FC SANs for various node groups. To regroup nodes, an administrator changes the
VSAN configurations without moving nodes and recabling.

Information Storage and Management (ISM) v4

Page 334 © Copyright 2019 Dell Inc.


SAN Virtualization Lesson

VSAN Configuration

 Define VSANs on fabric switch with specific VSAN IDs


 Assign VSAN IDs to F_Ports to include them in the VSANs
 An N_Port connecting to an F_Port in a VSAN becomes a member of that
VSAN
 Switch forwards FC frames between F_Ports that belong to the same VSAN

Notes

To configure VSANs on a fabric, an administrator first needs to define VSANs on


fabric switches. Each VSAN is identified with a specific number called VSAN ID.
The next step is to assign a VSAN ID to the F_Ports on the switch. By assigning a
VSAN ID to an F_Port, the port is included in the VSAN. In this manner, multiple
F_Ports can be grouped into a VSAN. For example, an administrator may group
switch ports (F_Ports) 1 and 2 into VSAN 10 (ID) and ports 6–12 into VSAN 20
(ID). If an N_Port connects to an F_Port that belongs to a VSAN, it becomes a
member of that VSAN. The switch transfers FC frames between switch ports that
belong to the same VSAN.

VSAN versus Zone:

 Both VSANs and zones enable node ports within a fabric to be logically
segmented into groups. But they are not same and their purposes are different.
There is a hierarchical relationship between them. An administrator first assigns
physical ports to VSANs and then configures independent zones for each
VSAN. A VSAN has its own independent fabric services, but the fabric services
are not available on a per-zone basis.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 335


SAN Virtualization Lesson

VSAN Trunking

 Allows network traffic from multiple VSANs to traverse a single ISL (trunk link)
 Enables an E_Port (trunk port) to send or receive multiple VSAN traffic over a
trunk link
 Reduces the number of ISLs between switches that are configured with multiple
VSANs

FC Switch VSAN 10,20,30 FC Switch VSAN 10,20,30

VSAN 10 VSAN VSAN 30 Trunk Link(VSAN


Traffic 20 Traffic 10,20,30 Traffic)
Traffic

FC Switch VSAN 10,20,30 FC SWitch VSAN 10,20,30

Without VSAN Trunking With VSAN Trunking

The illustration shows a VSAN trunking configuration that is contrasted with a


network configuration without VSAN trunking. In both the cases, the switches have
VSAN 10, VSAN 20, and VSAN 30 configured. If VSAN trunking is not used, three
ISLs are required to transfer traffic between the three distinct VSANs. When
trunking is configured, a single ISL is used to transfer all VSAN traffic.

Notes

VSAN trunking allows network traffic from multiple VSANs to traverse a single ISL.
It supports a single ISL to permit traffic from multiple VSANs along the same path.
The ISL through which multiple VSANs traffic travels is called a trunk link. VSAN
trunking enables a single E_Port to be used for sending or receiving traffic from
multiple VSANs over a trunk link. The E_Port capable of transferring multiple
VSANs traffic is called a trunk port. The sending and receiving switches must have
at least one trunk E_Port configured for all or a subset of the VSANs defined on the
switches.

Information Storage and Management (ISM) v4

Page 336 © Copyright 2019 Dell Inc.


SAN Virtualization Lesson

VSAN trunking eliminates the need to create dedicated ISL(s) for each VSAN. It
reduces the number of ISLs when the switches are configured with multiple
VSANs. As the number of ISLs between the switches decreases, the number of
E_Ports used for the ISLs also reduces. By eliminating needless ISLs, the
utilization of the remaining ISLs increases. The complexity of managing the FC
SAN is also minimized with a reduced number of ISLs.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 337


SAN Virtualization Lesson

VSAN Tagging

Definition: VSAN Tagging


A process of adding or removing a tag to the FC frames that contains
VSAN-specific information.

Associated with VSAN trunking, it helps isolate FC frames from multiple VSANs
that travel through and share a trunk link.

Whenever an FC frame enters an FC switch, it is tagged with a VSAN header


indicating the VSAN ID of the switch port (F_Port) before sending the frame down
to a trunk link. The receiving FC switch reads the tag and forwards the frame to the
destination port that corresponds to that VSAN ID. The tag is removed once the
frame leaves a trunk link to reach an N_Port.

VM VM VM VM
APP APP
APP APP

FC Switch OS OS
OS OS
VMM VMM
VMM VMM
Hypervisor Kernel
Hypervisor Kernel VSAN 20 Traffic VSAN 10 Traffic

Compute System
Compute System
VSAN 20 Traffic

VSANtags are added to FC frames before transmitting through


trunk link
VSAN 10 Traffic

ISLcarries tagged traffic from multiple VSANs


Trunk Link

VSANtags are removed when FC frames exit trunk link

Storage System Storage System

VSAN 20 Traffic VSAN 10 Traffic

FC Switch

Information Storage and Management (ISM) v4

Page 338 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Connectrix
 Dell EMC VPLEX

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 339


Concepts in Practice Lesson

Concepts In Practice

Concepts In Practice

Connectrix

 Group of networked storage connectivity products that support NVMe over FC


technology
 Products under Connectrix brand:

 Directors: Ideal for largest mission-critical storage area network


environments
 Switches: Ideal for departmental or edge storage area networks

Dell EMC VPLEX

 Provides solution for block-level storage virtualization and data migration both
within and across data centers

Information Storage and Management (ISM) v4

Page 340 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

 Provides the capability to mirror data of a virtual volume both within and across
locations
 VS6 engine with VPLEX for all-flash model provides the fastest and most
scalable VPLEX solution for all-flash systems
 Enables organizations to move cold data to inexpensive cloud storage

Dell EMC Connectrix

Connectrix: A group of networked storage connectivity products. Dell EMC offers


the following connectivity products under the Connectrix brand:

Directors: Ideal for largest mission-critical storage area network environments.


They offer high port density and high component redundancy. They allow physical
and virtual servers to share storage resources securely. They provide up to 32
Gbps Fibre Channel connectivity. They provide high-availability, maximum
scalability, and deliver high performance to keep pace with all-flash storage
environments.

Switches: Ideal for departmental or edge storage area networks. It provides


foundation for growth in smaller environments to deployment in large data centers.
They support up to 32 Gbps Fibre Channel connectivity. They provide high
availability through redundant connections and scales with 1U and 2U models.

Dell EMC VPLEX

Provides solution for block-level storage virtualization and data mobility both within
and across data centers. It forms a pool of distributed block storage resources and
enables creating virtual storage volumes from the pool. These virtual volumes are
then allocated to the compute systems.

VPLEX provides nondisruptive data mobility among storage systems to balance the
application workload and to enable both local and remote data access. It uses a
unique clustering architecture and advanced data caching techniques. They enable
multiple compute systems that are located across two locations to access a single
copy of data. Data migration with VPLEX can be done without any downtime,
saving countless weekends of maintenance downtime and IT resources. VPLEX
enables IT organizations to build modern data center infrastructure that is: Always
available even in the face of disasters Agile in responding to business requirements
Non-disruptive when adopting latest storage technology

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 341


Concepts in Practice Lesson

The new VS6 engine with VPLEX for all-flash model provides the fastest and most
scalable VPLEX solution for all-flash systems. VPLEX also enables organizations
to move cold data to inexpensive cloud storage.

Information Storage and Management (ISM) v4

Page 342 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Assessment

1. Which layer of FC protocol stack provides FC addressing, structure, and


organization of data?

A. FC - 0 - Layer

B. FC - 1 - Layer

C. FC - 2 - Layer

D. FC - 4 - Layer

2. Identify the topology that requires maximum of one ISL for compute to storage
communication. Select all that applies.

A. Full mesh topology

B. Single-switch topology

C. Partial mesh topology

D. Core-edge topology

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 343


Summary

Summary

Information Storage and Management (ISM) v4

Page 344 © Copyright 2019 Dell Inc.


IP and FCoE SAN

Introduction

This module focuses on IP SAN protocols such as Internet SCSI (iSCSI) and Fiber
Channel over IP (FCIP), components, and connectivity. It also covers details of
virtual LAN (VLAN) and reference models for communication.

Upon completing this module, you will be able to:


 Describe the reference models
 Explain iSCSI protocol, network components, and connectivity
 Explain VLANs
 Explain FCIP protocol, connectivity, and configuration
 Explain FCoE protocol

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 345


Overview of TCP/IP Lesson

Overview of TCP/IP Lesson

Introduction

This lesson presents the Open Systems Interconnect (OSI) and the Transmission
Control Protocol/Internet Protocol (TCP/IP) reference model. It also covers details
of network protocols and connection establishment process.

This lesson covers the following topics:


 Reference models for network communication
 Network protocols
 Three-way handshake process

Information Storage and Management (ISM) v4

Page 346 © Copyright 2019 Dell Inc.


Overview of TCP/IP Lesson

Overview of TCP/IP

OSI Reference Model

The OSI reference model is a logical structure for network operations standardized
by the International Standards Organization (ISO). Each layer in the OSI reference
model only interacts directly with the layer immediately beneath it, and provides
facilities for use by the layer above it. The following layers make up the OSI model:

 A logical structure
for network L7 Application Layer
operations
L6 Presentation Layer
 The OSI model End to
organizes the End
L5 Session Layer
communications
process into seven
L4 Transport Layer
different layers
 Protocols are within L3 Network Layer
the layers
 Layers 4-7 provide Network L2 Data Link Layer
end to end
communication L1 Physical Layer

 Layers 1-3 are used


for network access
providing packet, frame and bit level communication

Notes

Each layer is described as follows:


1. Physical Layer - Defines the electrical and physical specifications for devices.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 347


Overview of TCP/IP Lesson

2. Data Link Layer - Provides the functional and procedural means to transfer data
between network entities. It also detects and possibly correct errors that may
occur in the Physical Layer.
3. Network Layer - Transfers variable length data sequences from a source to
destination through one or more networks while also maintaining a quality of
service requested by the Transport Layer.
4. Transport Layer - Provides transparent transfer of data between end users,
providing reliable data transfer services to the upper layers.
5. Session Layer - Controls the connections between computers. It establishes,
manages, and terminates the connections between the local and remote
application.
6. Presentation Layer - Establishes a context between the Application layer
entities in which the high-layer entities can use different syntax and semantics.
7. Application Layer - Provides a user interface that enables user to access the
network and applications.

Information Storage and Management (ISM) v4

Page 348 © Copyright 2019 Dell Inc.


Overview of TCP/IP Lesson

TCP/IP Reference Model

Application Layer

Transport Layer

Network Layer

Link Layer

TCP/IP is a hierarchical protocol suite that is named after its two primary protocols
Transmission Control Protocol (TCP) and Internet Protocol (IP). It is made up of
four layers as specified in the image.

 TCP/IP is a 4-layer hierarchical model

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 349


Overview of TCP/IP Lesson

 An example of an implementation of the OSI reference model


 Also known as Internet Protocol Suite

Notes

The four layers are described as follows:

 The link layer is used to describe the local network topology and the interfaces
needed to affect transmission of Internet layer datagrams to next-neighbor
hosts.
 The network layer is responsible for end-to-end communications and delivery of
packets across multiple network links.
 The transport layer provides process to process delivery of the entire message.
 The application layer enables users to access the network.

Information Storage and Management (ISM) v4

Page 350 © Copyright 2019 Dell Inc.


Overview of TCP/IP Lesson

Comparing Reference Models

The purpose of the reference models is to show how to facilitate communication


between different systems without requiring changes to the logic of the underlying
architecture.

Application Layer

Presentation Layer

Application Layer
Session Layer

Transport Layer
Transport Layer

Network Layer
Internet Layer

Data Link Layer

Link Layer
Physical Layer

 Facilitates communication between different systems


 Layered architecture
 Standard protocols and interfaces
 Example

– OSI
– TCP/IP

Notes

The purpose of the reference models is to show how to facilitate communication


between different systems without requiring changes to the logic of the underlying
architecture. To understand the complex system and for simplification, the
reference models are implemented as a layered structure.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 351


Overview of TCP/IP Lesson

The OSI and the TCP/IP reference models have much in common. The
architectural layers form a hierarchy and items are listed in order by rank. Higher
layers depend upon services from lower layers, and lower layers provide services
for upper layers. Also, the functionality of layers is roughly similar, except a few.
The presentation and the session layer of the OSI reference model was combined
with the application layer and represented as the application layer in the TCP/IP
Model. The model also does not distinguish the physical and the data link layer.

To understand the complex system and for simplification, the reference models are
implemented as a layered structure. The Open Systems Interconnection (OSI) and
TCP/IP reference models are widely adopted and are important network
architectures (reference model). Both of them defines the essential features of
network services and enhanced functionality.OSI Model is a logical structure for
network operations standardized by the International Standards Organization
(ISO).

 The OSI model is a layered framework for the design of a network system that
enables communication between all types of systems.
 TCP/IP is a hierarchical protocol suite that is made up of interactive modules,
providing specific functionality.

Information Storage and Management (ISM) v4

Page 352 © Copyright 2019 Dell Inc.


Overview of TCP/IP Lesson

Network Layer and IP

IP is one of the major protocols in the Transmission Control Protocol (TCP)/Internet


Protocol (IP) protocol suite. This protocol works at layer 3, the network layer of the
OSI model and at the Internet layer of the TCP/IP model. Thus, this protocol is
responsible for end-to-end communication and delivery of packets across multiple
network links based on their logical addresses.

The current versions are:

 Internet Protocol version 4 (IPv4)


 32-bit address (example: 192.168.1.12)
 Internet Protocol version 6 (IPv6)

 128-bit address (example: 2002:ac18:af02:00f4:020e:cff:fe6e:d527

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 353


Overview of TCP/IP Lesson

Connection Establishment: Three-way handshake

The transport layer is the heart of the TCP/IP protocol suite. Due to the use of
connection-oriented protocol TCP, the layer provides reliable, process-to-process,
and full-duplex service.

Transmission Control Protocol (TCP) explicitly defines the connection


establishment process. The connection establishment in TCP is called three-way
handshaking. Three-way handshaking is a process to negotiate the sequence and
acknowledgment fields and start the session. The process consists of the following
steps:

 The client initiates the connection by sending the TCP SYN packet to the
destination host.
In the illustration,
 SYN refers to synchronous and ACK refers to acknowledgement
 The packet contains the random sequence number, which marks the
beginning of the sequence numbers of data that the client will transmit
 This sequence number is called the initial sequence number
 The server, which is a destination host, receives the packet, and responds with
its own sequence number. The response also includes the acknowledgment
number, which is client’s sequence number that is incremented by 1. That is
SYN+ACK segment is sent
 Client acknowledges the response of the server by sending the
acknowledgment ACK segment. It acknowledges the receipt of the second
segment with the ACK flag

Information Storage and Management (ISM) v4

Page 354 © Copyright 2019 Dell Inc.


Overview of TCP/IP Lesson

Client Listening Server

SYN_SENT
Listening
SYN

SYN_RCVD
SYN ACK

Established Established
ACK

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 355


Overview of IP SAN Lesson

Overview of IP SAN Lesson

Introduction

This lesson covers IP SAN and its protocols. It also focuses on the role of TCP/IP
in IP SAN.

This lesson covers the following topics:


 Describe IP SAN
 Describe the role of TCP/IP in IP SAN
 List IP SAN protocols

Information Storage and Management (ISM) v4

Page 356 © Copyright 2019 Dell Inc.


Overview of IP SAN Lesson

Overview of IP SAN

IP SAN

Compute Systems

VM VM VM VM VM VM
APP APP APP APP APP APP
OS OS OS OS OS OS

Hypervisor Hypervisor Hypervisor

iSCSI iSCSI iSCSI


HBA HBA HBA

IP

iSCSI Port iSCSI Port

Storage Systems

Uses Internet Protocol (IP) for the transport of storage traffic. It transports block I/O
over an IP-based network.

Provides an efficient and dedicated point-to-point storage solution.

Typically runs over a standard IP-based network and uses the TCP/IP) for
communication, commonly:

 Internet SCSI (iSCSI)


 Fibre Channel over IP (FCIP)

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 357


Overview of IP SAN Lesson

Drivers of IP SAN

The following are drivers have led to the adoption of IP SAN:

 Existing IP-based network infrastructure can be leveraged


 Reduced cost compared to deploying new FC SAN infrastructure
 IP network makes it possible to extend or connect SANs over long distances
 Many long-distance disaster recovery solutions already leverage IP-based
network
 Many robust and mature security options are available for IP network

Notes

The advantages of FC SAN such as scalability and high performance come with
the additional cost of buying FC components, such as FC HBA and FC switches.
On the other hand IP is a matured technology and using IP as a storage networking
option provides several advantages. These are listed below:

 Most organizations have an existing IP-based network infrastructure, which


could be used for storage networking. The use of existing network may be a
more economical option than deploying a new FC SAN infrastructure.
 IP network has no distance limitation, which makes it possible to extend or
connect SANs over long distances. With IP SAN, organizations can extend the
geographical reach of their storage infrastructure and transfer data that are
distributed over wide locations.
 Many long-distance disaster recovery (DR) solutions are already leveraging IP-
based networks. In addition, many robust and mature security options are
available for IP networks.

Information Storage and Management (ISM) v4

Page 358 © Copyright 2019 Dell Inc.


Overview of IP SAN Lesson

Role of TCP/IP in IP SAN

As we know, the IP SAN protocols typically run over a standard Ethernet network
and uses the Transmission Control Protocol/Internet Protocol (TCP/IP) for
communication along with transport of storage traffic.

The entire process of communication is carried out by the encapsulation of SCSCI


commands into the TCP segments. As depicted in the image, iSCSI fits into the
network protocol stack and sits on top of the TCP/IP protocol stack. It takes SCSI
commands, data, and responses and encapsulates them into TCP segments for
transportation. Upon receiving iSCSI TCP segments, the iSCSI layer pulls out the
SCSI information and passes it to the SCSI driver software.

Application Volume managers, file systems and so forth

SCSI Command Descriptor Blocks, data and


SCSI responses

iSCSI Build and receive iSCSI PDUs

Provides reliable transport and delivery, flow


TCP
control, ACKs; uses TCP port #s

IP IP routing to help get data through network; uses


IP addresses

Frame switching, MAC Address, transport


Ethernet connection to the physical layer

iSCSI Stack

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 359


Overview of IP SAN Lesson

IP SAN Protocols

Two primary protocols that leverage IP as the transport mechanism for block-level
data transmission are Internet SCSI (iSCSI) and Fibre Channel over IP (FCIP).

iSCSI

Compute Systems

VM VM VM VM VM VM
APP APP APP APP APP APP

OS OS OS OS OS OS

Hypervisor Hypervisor Hypervisor

iSCSI iSCSI iSCSI


HBA HBA HBA

IP

iSCSI iSCSI Port


Port

Storage Systems

 IP-based protocol that enables transporting SCSI data over an IP network


 Encapsulates SCSI I/O into IP packets and transports them using TCP/IP

FCIP

Information Storage and Management (ISM) v4

Page 360 © Copyright 2019 Dell Inc.


Overview of IP SAN Lesson

Compute System Compute System

VM VM VM VM VM VM VM VM
APP APP APP APP APP APP APP APP

OS OS OS OS OS OS OS OS

Hypervisor Hypervisor Hypervisor Hypervisor

FC SAN FCIP Tunnel FC SAN

FCIP Gateway LAN/WAN FCIP Gateway

Storage System Storage System

 IP-based protocol that is used to interconnect distributed FC SAN islands over


an IP network
 Encapsulates FC frames onto IP packet and transports over existing IP network
 Enables transmission by tunneling data between FC SAN islands

Notes

iSCSI: It is widely adopted for transferring SCSI data over IP between compute
systems and storage systems and among the storage systems. It is relatively
inexpensive and easy to implement, especially environments in which an FC SAN
does not exist.

FCIP: Organizations are looking for ways to transport data over a long distance
between their disparate FC SANs at multiple geographic locations. One of the best
ways to achieve this goal is to interconnect geographically dispersed FC SANs
through reliable, high-speed links. This approach involves transporting the FC block
data over the IP infrastructure.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 361


Overview of IP SAN Lesson

The FCIP standard has rapidly gained acceptance as a manageable, cost-effective


way to blend the best of the two worlds: FC SAN and the proven, widely deployed
IP infrastructure.

Information Storage and Management (ISM) v4

Page 362 © Copyright 2019 Dell Inc.


iSCSI Lesson

iSCSI Lesson

Introduction

This lesson covers iSCSI network components and connectivity. It also covers
iSCSI protocol stack, iSCSI address and name, and iSCSI discovery. The lesson
also focuses on the virtual LAN (VLAN) and stretched VLAN.

This lesson covers the following topics:


 iSCSI network components
 iSCSI connectivity
 iSCSI protocol stack
 iSCSI address and name
 iSCSI discovery
 Virtual LAN (VLAN) and stretched VLAN

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 363


iSCSI Lesson

iSCSI

Video: iSCSI

The video is located at


https://edutube.emc.com/Player.aspx?vno=bB5O5rcjrZ447ADdHPxC0A

Information Storage and Management (ISM) v4

Page 364 © Copyright 2019 Dell Inc.


iSCSI Lesson

iSCSI Overview

iSCSI is an IP-based protocol that establishes and manages connections between


compute systems and storage systems over IP.

It is an encapsulation of SCSI I/O over IP, where it encapsulates SCSI commands


and data into IP packets and transports them using TCP/IP.

It is widely adopted for transferring SCSI data over IP between compute systems
and storage systems and among the storage systems. iSCSI is relatively
inexpensive and easy to implement, especially environments in which an FC SAN
does not exist

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 365


iSCSI Lesson

Components of iSCSI Network

Key components for iSCSI communication are:

Compute Systems

VM VM VM VM VM VM
APP APP APP APP APP APP

OS OS OS OS OS OS

Hypervisor Hypervisor Hypervisor

iSCSI iSCSI iSCSI


HBA HBA HBA

IP

iSCSI Port iSCSI Port

Storage Systems

 iSCSI initiators
 Example: iSCSI HBA
 iSCSI targets
 Example: Storage system with iSCSI port
 IP-based network

 Example: Gigabit Ethernet LAN

Information Storage and Management (ISM) v4

Page 366 © Copyright 2019 Dell Inc.


iSCSI Lesson

Types of iSCSI Initiator

Hardware and software initiators are types of iSCSI initiators that are used by the
host to access iSCSI targets.

Initiator Types

iSCSI hardware Initiator iSCSI software Initiator

 Standard NIC with software iSCSI adapter


 NIC provides network interface
 Software adapters provide iSCSI functionality
 Both iSCSI and TCP/IP processing require CPU cycles of compute system
 TCP Offload Engine (TOE) NIC with software iSCSI adapter
 TOE NIC performs TCP/IP processing
 Software adapter provides iSCSI functionality
 iSCSI processing requires CPU cycles of compute system
 iSCSI HBA

 Performs both iSCSI and TCP/IP processing


 Frees-up CPU cycles of compute system for business applications

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 367


iSCSI Lesson

Notes

The computing operations of the software iSCSI initiator are performed by the
server’s operating system. Whereas a hardware iSCSI initiator is a dedicated, host-
based network interface card (NIC) with the integrated resources to handle the
iSCSI processing functions. Following are the common examples of iSCSI
initiators:

 Standard NIC with software iSCSI adapter: The software iSCSI adapter is an
operating system or hypervisor kernel-resident software. It uses an existing NIC
of the compute system to emulate an iSCSI initiator. It is least expensive and
easy to implement because most compute systems come with at least one, and
often with two embedded NICs. It requires only a software initiator for iSCSI
functionality. Because NICs provide standard networking function, both the
TCP/IP processing and the encapsulation of SCSI data into IP packets are
carried out by the CPU of the compute system. This functionality places more
overhead on the CPU. If a standard NIC is used in heavy I/O load situations, the
CPU of the compute system might become a bottleneck.
 TOE NIC with software iSCSI adapter: A TOE NIC offloads the TCP/IP
processing from the CPU of a compute system and leaves only the iSCSI
functionality to the CPU. The compute system passes the iSCSI information to
the TOE NIC and then the TOE NIC sends the information to the destination
using TCP/IP. Although this solution improves performance, the iSCSI
functionality is still handled by a software adapter that requires CPU cycles of
the compute system.
 iSCSI HBA: An iSCSI HBA is a hardware adapter with built-in iSCSI
functionality. It is capable of providing performance benefits over software iSCSI
adapters by offloading the entire iSCSI and TCP/IP processing from the CPU of
a compute system.

Information Storage and Management (ISM) v4

Page 368 © Copyright 2019 Dell Inc.


iSCSI Lesson

iSCSI Connectivity

iSCSI implementations support two types of connectivity: native and bridged. The
connectivities are described here:

Native

Storage System

Compute System

VM VM

Hypervisor
IP

iSCSI HBA iSCSI Port

 iSCSI initiators connect to iSCSI targets directly/through IP network


 No FC component

Bridged

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 369


iSCSI Lesson

Storage System

Compute System

VM VM
iSCSI Gateway

Hypervisor

IP FC SAN

iSCSI HBA FC Port

 iSCSI initiators are attached to IP network


 Storage systems are attached to FC SAN
 iSCSI gateway provides bridging functionality

Native iSCSI: In this type of connectivity, the compute systems with iSCSI initiators
may be either directly attached to the iSCSI targets or connected through an IP-
based network. FC components are not required for native iSCSI connectivity. The
figure on the left shows a native iSCSI implementation that includes a storage
system with an iSCSI port. The storage system is connected to an IP network. After
an iSCSI initiator is logged on to the network, it can access the available LUNs on
the storage system.

Bridged iSCSI: This type of connectivity enables the initiators to exist in an IP


environment while the storage systems remain in an FC SAN environment. It
enables the coexistence of FC with IP by providing iSCSI-to-FC bridging
functionality. The figure on the right illustrates a bridged iSCSI implementation. It
shows connectivity between a compute system with an iSCSI initiator and a
storage system with an FC port. As the storage system does not have any iSCSI
port, a gateway or a multiprotocol router is used. The gateway facilitates the
communication between the compute system with iSCSI ports and the storage
system with only FC ports. The gateway converts IP packets to FC frames and
conversely, thus bridging the connectivity between the IP and FC environments.
The gateway contains both FC and Ethernet ports to facilitate the communication

Information Storage and Management (ISM) v4

Page 370 © Copyright 2019 Dell Inc.


iSCSI Lesson

between the FC and the IP environments. The iSCSI initiator is configured with the
gateway’s IP address as its target destination. On the other side, the gateway is
configured as an FC initiator to the storage system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 371


iSCSI Lesson

Combining FC and Native iSCSI Connectivity

Typically, a storage system typically comes with both FC and iSCSI ports. The
combination enables both the native iSCSI connectivity and the FC connectivity in
the same environment and no bridge device is needed.

Compute System
VM VM

Hypervisor

IP
iSCSI
Storage System
iSCSI Port
HBA

Compute System
VM VM
FC
HBA
FC
Hypervisor Port
FC SAN

Information Storage and Management (ISM) v4

Page 372 © Copyright 2019 Dell Inc.


iSCSI Lesson

iSCSI Protocol Stack

The image displays a model of iSCSI protocol layers and depicts the encapsulation
order of the SCSI commands for their delivery through a physical carrier.

 SCSI is the command protocol that works at the application layer of the Open
System Interconnection (OSI) model
 The initiators and the targets use SCSI commands and responses to talk to
each other
 The SCSI commands, data, and status messages are encapsulated into TCP/IP
and transmitted across the network between the initiators and the targets

OSI Model iSCSI initiator iSCSI Target

Layer 7 Application SCSI Commands and Data SCSI

Layer 5 Session iSCSI Login and Discovery iSCSI

Layer 4 Transport TCP Windows and Segments TCP

Layer 3 Network IP Packets IP

Layer 2 Data Link Ethernet Frames Ethernet

Interconnect

Ethernet IP TCP iSCSI SCSI Data

Notes

The figure on the slide displays a model of iSCSI protocol layers and depicts the
encapsulation order of the SCSI commands for their delivery through a physical
carrier.SCSI is the command protocol that works at the application layer of the

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 373


iSCSI Lesson

Open System Interconnection (OSI) model. The initiators and the targets use SCSI
commands and responses to talk to each other. The SCSI commands, data, and
status messages are encapsulated into TCP/IP and transmitted across the network
between the initiators and the targets.

iSCSI is the session-layer protocol that initiates a reliable session between devices
that recognize SCSI commands and TCP/IP. The iSCSI session-layer interface is
responsible for handling login, authentication, target discovery, and session
management.

TCP is used with iSCSI at the transport layer to provide reliable transmission. TCP
controls message flow, windowing, error recovery, and retransmission. It relies
upon the network layer of the OSI model to provide global addressing and
connectivity. The OSI Layer 2 protocols at the data link layer of this model enable
node-to-node communication through a physical network.

Information Storage and Management (ISM) v4

Page 374 © Copyright 2019 Dell Inc.


iSCSI Lesson

iSCSI Address and Name

An iSCSI address is Common Types of iSCSI Name


the path to iSCSI
IQN: iSCSI Qualified Name
initiator/target, which
- iqn.2008-02.com.example:optional_string
is comprised of:

 An iSCSI EUI: Extended Unique Identifier

address is the - eui.0300732A32598D26


path to iSCSI
NAA: Network Address Authority
initiator/target,
- naa.52004567BA64678D
which is
comprised of

– Location of
iSCSI initiator/target
o Combination of IP address and TCP port number
– iSCSI name
o Unique identifier for initiator/target in an iSCSI network

Notes

An iSCSI address is comprised of the location of an iSCSI initiator or target on the


network and the iSCSI name. The location is a combination of the host name or IP
address and the TCP port number. For iSCSI initiators, the TCP port number is
omitted from the address.

iSCSI name is a unique worldwide iSCSI identifier that is used to identify the
initiators and targets within an iSCSI network to facilitate communication. The
unique identifier can be a combination of the names of the department, application,
manufacturer, serial number, asset number, or any tag that can be used to
recognize and manage the iSCSI nodes. The following are three types of iSCSI
names commonly used:

 iSCSI Qualified Name (IQN): An organization must own a registered domain


name to generate iSCSI Qualified Names. This domain name does not need to

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 375


iSCSI Lesson

be active or resolve to an address. It needs to be reserved to prevent other


organizations from using the same domain name to generate iSCSI names. A
date is included in the name to avoid potential conflicts caused by the transfer
of domain names. An example of an IQN is iqn.2015-
04.com.example:optional_string. The optional string provides a serial number,
an asset number, or any other device identifiers. IQN enables storage
administrators to assign meaningful names to the iSCSI initiators and the iSCSI
targets, and therefore, manages those devices more easily.
 Extended Unique Identifier (EUI): An EUI is a globally unique identifier based on
the IEEE EUI-64 naming standard. An EUI is composed of the eui prefix
followed by a 16-character hexadecimal name, such as
eui.0300732A32598D26.
 Network Address Authority (NAA): NAA is another worldwide unique naming
format as defined by the International Committee for Information Technology
Standards (INCITS) T11 – Fibre Channel (FC) protocols and is used by Serial
Attached SCSI (SAS). This format enables the SCSI storage devices that
contain both iSCSI ports and SAS ports to use the same NAA-based SCSI
device name. An NAA is composed of the naa prefix followed by a hexadecimal
name, such as naa.52004567BA64678D. The hexadecimal representation has
a maximum size of 32 characters (128 bit identifier).

Information Storage and Management (ISM) v4

Page 376 © Copyright 2019 Dell Inc.


iSCSI Lesson

iSCSI Discovery

For iSCSI communication, initiator must discover location and name of targets on
the network.

iSCSI discovery commonly takes place in two ways:

 iSCSI discovery commonly takes place in two ways

– Send Targets discovery


o Initiator is manually configured with the target’s network portal
o Initiator issues SendTargets command; target responds with required
parameters
– Internet Storage Name Service (iSNS)
o Initiators and targets register themselves with iSNS server
o Initiator may query iSNS server for a list of available targets

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 377


iSCSI Lesson

iSNS Discovery Domain

iSNS discovery domains function in the same way as FC zones. Discovery


domains provide functional groupings of devices (including iSCSI initiators and
targets) in an IP SAN. The iSNS server is configured with discovery domains.

For devices to communicate with one another, they must be configured in the same
discovery domain. The iSNS server may send state change notifications (SCNs) to
the registered devices. State change notifications inform the registered devices
about network events. These events affect the operational state of devices such as
the addition or removal of devices from a discovery domain.

Discovery Domains Discovery Domains

iSCSI initiator iSCSI initiator

VM VM VM VM VM VM

Hypervisor Hypervisor Hypervisor

iSNS

Queries and Notifications


iSCSI Target iSCSI Target

Information Storage and Management (ISM) v4

Page 378 © Copyright 2019 Dell Inc.


iSCSI Lesson

Virtual LAN (VLAN)

Definition: VLAN
A logical network created on a LAN enabling communication between
a group of nodes with a common set of functional requirements,
independent of their physical location in the network.

Well-suited for iSCSI deployments as they enable isolating the iSCSI traffic from
other network traffic (for example, compute-to-compute traffic).

Help in isolating specific network traffic from other network traffic in a physical
Ethernet network

Configuring a VLAN:

 Define VLANs on switches with specific VLAN IDs


 Configure VLAN membership based on a supported technique

 Port-based
 MAC-based
 Protocol-based
 IP subnet address-based
 Application-based

Notes

A VLAN conceptually functions in the same way as a VSAN. Each VLAN behaves
and is managed as an independent LAN. Two nodes connected to a VLAN can
communicate between themselves without routing of frames – even if they are in
different physical locations. VLAN traffic must be forwarded through a router or OSI
Layer-3 switching device when two nodes in different VLANs are communicating –
even if they are connected to the same physical LAN. Network broadcasts within a

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 379


iSCSI Lesson

VLAN generally do not propagate to nodes that belong to a different VLAN, unless
configured to cross a VLAN boundary.

To configure VLANs, an administrator first defines the VLANs on the switches.


Each VLAN is identified by a unique 12-bit VLAN ID (as per IEEE 802.1Q
standard). The next step is to configure the VLAN membership based on an
appropriate technique supported by the switches. The switches can be port-based,
MAC-based, protocol-based, IP subnet address-based, and application-based. In
the port-based technique, membership in a VLAN is defined by assigning a VLAN
ID to a switch port. When a node connects to a switch port that belongs to a VLAN,
the node becomes a member of that VLAN.

In the MAC-based technique, the membership in a VLAN is defined by the MAC


address of the node. In the protocol-based technique, different VLANs are
assigned to different protocols based on the protocol type field found in the OSI
Layer 2 header. In the IP subnet address-based technique, the VLAN membership
is based on the IP subnet address. All the nodes in an IP subnet are members of
the same VLAN. In the application-based technique, a specific application, for
example, a file transfer protocol (FTP) application can be configured to execute on
one VLAN. A detailed discussion on these VLAN configuration techniques is
beyond the scope of this course.

Information Storage and Management (ISM) v4

Page 380 © Copyright 2019 Dell Inc.


iSCSI Lesson

VLAN Trunking and Tagging

 VLAN trunking allows a single network link (trunk link) to carry multiple VLAN
traffic
 To enable trunking, trunk ports must be configured on both sending and
receiving network components
 Sending network component inserts a tag field containing VLAN ID into an
Ethernet frame before sending through a trunk link
 Receiving network component reads the tag and forwards the frame to
destination port(s)

 Tag is removed once a frame leaves trunk link to reach a node port

Notes

Similar to the VSAN trunking, network traffic from multiple VLANs may traverse a
trunk link. A single network port, called trunk port, is used for sending or receiving
traffic from multiple VLANs over a trunk link. Both the sending and the receiving
network components must have at least one trunk port configured for all or a
subset of the VLANs defined on the network component.

As with VSAN tagging, VLAN has its own tagging mechanism. The tagging is
performed by inserting a 4-byte tag field containing 12-bit VLAN ID into the
Ethernet frame (as per IEEE 802.1Q standard) before it is transmitted through a
trunk link. The receiving network component reads the tag and forwards the frame
to the destination port(s) that corresponds to that VLAN ID. The tag is removed
once the frame leaves a trunk link to reach a node port.

IEEE 802.1ad Multi-tagging: IEEE 802.1ad is an amendment to IEEE 802.1Q and


enables inserting multiple VLAN tags to an Ethernet frame. IEEE 802.1Q mandates
a single tag with a 12-bit VLAN ID field, which limits the number of VLANs in an
environment theoretically up to 4096. In a large environment such as a cloud
infrastructure, this limitation may restrict VLAN scalability. IEEE 802.1ad provides
the flexibility to accommodate a larger number of VLANs. For example, by using a
double-tag, theoretically 16777216 (4096×4096 ) VLANs may be configured.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 381


iSCSI Lesson

Information Storage and Management (ISM) v4

Page 382 © Copyright 2019 Dell Inc.


iSCSI Lesson

Stretched VLAN

Definition: Stretched VLAN


A VLAN that spans multiple sites and enables OSI Layer 2
communication between a group of nodes over an OSI Layer 3 WAN
infrastructure, independent of their physical location.

Site 1 Site 2

VLAN 10 VLAN 20 VLAN 10 VLAN 20


VLAN 10 VLAN 10

Compute System Compute System VLAN 20 VLAN 20 Compute System Compute System

VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM

Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor

VLAN 10 and 20 Traffic


Ethernet Ethernet
Switch WAN Switch

Ethernet Director Ethernet Director

Storage System Storage System Storage System Storage System

Notes

In a typical multisite environment, network traffic between sites is routed through an


OSI Layer 3 WAN connection. Because of the routing, it is not possible to transmit
OSI Layer 2 traffic between the nodes in two sites. A stretched VLAN extends a
VLAN across the sites. It also enables nodes in two different sites to communicate
over a WAN as if connected to the same network.

Stretched VLANs also enable the movement of virtual machines (VMs) between
sites without the need to change their network configurations. This simplifies the
creation of high-availability clusters, VM migration, and application and workload
mobility across sites. The clustering across sites, for example, enables moving
VMs to an alternate site in the event of a disaster or during the maintenance of one
site. Without a stretched VLAN, the IP addresses of the VMs must be changed to
match the addressing scheme at the other site.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 383


iSCSI Lesson

Information Storage and Management (ISM) v4

Page 384 © Copyright 2019 Dell Inc.


iSCSI Lesson

Advantages of IP SAN in Modern Data Center

Advances in IP-based networked storage technology such as IP SAN have created


an opportunity for organizations of all sizes to cost-effectively build, manage, and
maintain their data center. In comparison to internal server storage or DAS, it
efficiently handles the complexity of the modern data center by using existing IP
networks and components.

In a data center IP SAN offers multiple advantages which are common to midsize
businesses, including the following:

Increased Consolidated IP-based storage enables servers to access


utilization and share storage, helping maximize utilization of these
resources

Reduced Consolidated storage enables centralized management,


management costs helping simplify administrative tasks and reduce management
costs

Increased reliability A shared set of dedicated IP-based storage systems can help
significantly increase the reliability and availability of
application data

Simplified backup IP SAN enables administrators to easily implement


and recovery consistent, common, and simple backup and recovery
processes

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 385


FCIP Lesson

FCIP Lesson

Introduction

This lesson covers FCIP connectivity, FCIP tunnel configuration, and FCIP protocol
stack.

This lesson covers the following topics:


 FCIP connectivity
 FCIP tunnel configuration
 FCIP protocol stack

Information Storage and Management (ISM) v4

Page 386 © Copyright 2019 Dell Inc.


FCIP Lesson

FCIP

Video: FCIP

The video is located at


https://edutube.emc.com/Player.aspx?vno=TXNVxHBRJHNl2rK3Su0SZw

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 387


FCIP Lesson

FCIP Overview

FC SAN provides a high-performance infrastructure for localized data movement. It


also:

 Provides IP-based protocol that is used to interconnect distributed FC SAN


islands over an IP network
 Encapsulates FC frames onto IP packet and transports over existing IP network
 Enables transmission by tunneling data between FC SAN islands
 Provides disaster recovery solution by enabling replication of FC data across an
IP network
 Facilitates data sharing and data collaboration from worldwide locations

Information Storage and Management (ISM) v4

Page 388 © Copyright 2019 Dell Inc.


FCIP Lesson

FCIP Connectivity

Compute Systems Compute Systems

VM VM VM VM VM VM VM VM
APP APP APP APP APP APP APP APP

OS OS OS OS OS OS OS OS
Hypervisor Hypervisor Hypervisor Hypervisor

FC SAN FCIP Tunnel FC SAN


FCIP Gateway LAN/WAN FCIP Gateway

Storage System Storage System

 FCIP entity (e.g. FCIP gateway) is connected to each fabric to enable tunneling
through an IP network
 An FCIP tunnel consists of one or more independent connections between two
FCIP ports

 Transports encapsulated FC frames over TCI/IP

Notes

In an FCIP environment, FCIP entity such as an FCIP gateway is connected to


each fabric through a standard FC connection. The FCIP gateway at one end of the
IP network encapsulates the FC frames into IP packets. The gateway at the other
end removes the IP wrapper and sends the FC data to the adjoined fabric. The
fabric treats these gateways as fabric switches. An IP address is assigned to the
port on the gateway, which is connected to an IP network. After the IP connectivity
is established, the nodes in the two independent fabrics can communicate with
other.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 389


FCIP Lesson

An FCIP tunnel consists of one or more independent connections between two


FCIP ports on gateways (tunnel endpoints). Each tunnel transports encapsulated
FC frames over a TCP/IP network. The nodes in either fabric are unaware of the
existence of the tunnel. Multiple tunnels may be configured between the fabrics
based on connectivity requirement. Some implementations enable aggregating
FCIP links (tunnels) to increase throughput and to provide link redundancy and
load balancing.

Information Storage and Management (ISM) v4

Page 390 © Copyright 2019 Dell Inc.


FCIP Lesson

FCIP Tunnel Configuration - Merged Fabric

An FCIP tunnel may be configured to merge interconnected fabrics into a single


large fabric. In the merged fabric, FCIP transports existing fabric services across
the IP network.

The image illustrates a merged fabric deployment. In this deployment:

 The E_Port on an FCIP gateway connects to the E_Port of an FC switch in the


adjoined fabric
 The FCIP gateway is also configured with a VE_port that behaves like an
E_Port, except that the VE_Port is used to transport data through an FCIP
tunnel
 The FCIP tunnel has VE_Ports on both ends
 The VE_Ports establish virtual ISLs through the FCIP tunnel, which enable
fabrics on either side of the tunnel to merge

VE E
E E FCIP Tunnel V E

LAN/WAN
FCIP Gateway FCIP Gateway
FC SAN FC SAN

Storage System Storage System

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 391


FCIP Lesson

FCIP Tunnel Configuration – Separate Fabric

Only a small subset of nodes in either fabric requires connectivity across an FCIP
tunnel. Thus, an FCIP tunnel may also use vendor-specific features to route
network traffic between specific nodes without merging the fabrics.

The image illustrates a solution for FC-FC routing but the FCIP tunnel is configured
in a way that does not merge the fabrics. In this deployment:

 Ex_Port and VE_Port are configured on each FCIP gateway


 The EX_Port on the FCIP gateway connects to an E_Port on an FC switch in
the adjoined fabric
 The EX_Port functions similarly to an E_Port, but does not propagate fabric
services from one fabric to another
 The EX_Port enables FC-FC routing through the FCIP tunnel, but the fabrics
remain separate

E E V FCIP Tunnel V E E
X X
FCIP Gateway LAN/WAN
FC SAN FCIP Gateway FC SAN

Storage System Storage System

Information Storage and Management (ISM) v4

Page 392 © Copyright 2019 Dell Inc.


FCIP Lesson

FCIP Protocol Stack

Protocol Stack

The FCIP protocol stack is shown on the image.

 Applications generate SCSI commands and data, which are processed by


various layers of the protocol stack
 The upper layer protocol SCSI includes the SCSI driver program that executes
the read-and-write commands
 Below the SCSI layer is the FC protocol (FCP) layer, which is simply an FC
frame whose payload is SCSI
 The FC frames can be encapsulated into the IP packet and sent to a remote FC
SAN over the IP
 The FCIP layer encapsulates the FC frames onto the IP payload and passes
them to the TCP layer
 TCP and IP are used for transporting the encapsulated information across
Ethernet, wireless, or other media that support the TCP/IP traffic

Application
Encapsulation
SCSI Commands, Data, and Status
FC Frame

FCP (SCSI over FC)


Encapsulation of FC frame on to
FCIP IP packet could cause the IP
TCP packet to be fragmented. The
FC to IP Encapsulation

IP
fragmentation occurs when the
data link cannot support the
Physical Media
maximum transmission unit (MTU)
size of an IP packet.


When an IP packet
is fragmented, the required parts of the header must be copied by all fragments

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 393


FCIP Lesson

FC Frame SOF FC Header SCSI Data CRC EOF

FCIP Encapsulation

IP Header TCP Header FCIP Header IP Payload


IP Packet

 When a TCP packet is segmented, normal TCP operations are responsible for
receiving and resequencing the data
 The receiving and resequencing is performed prior to passing it on to the FC
processing portion of the device

Information Storage and Management (ISM) v4

Page 394 © Copyright 2019 Dell Inc.


FCoE Lesson

FCoE Lesson

Introduction

This lesson focuses on FCoE components and FCoE connectivity.It also covers
FCoE switch and CNA.

This lesson covers the following topics:


 FCoE components and connectivity
 FCoE switch and CNA

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 395


FCoE Lesson

FCoE

Video: FCoE

The video is located at


https://edutube.emc.com/Player.aspx?vno=g+QJTBgku2x0CRTtmGqyow

Information Storage and Management (ISM) v4

Page 396 © Copyright 2019 Dell Inc.


FCoE Lesson

FCoE Overview

 A protocol that transports FC data along with regular Ethernet traffic over a
Converged Enhanced Ethernet (CEE) network
 Uses FCoE protocol, defined by the T11 standards committee, that
encapsulates FC frames into Ethernet frames
 Ensures lossless transmission of FC traffic over Ethernet

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 397


FCoE Lesson

Drivers for FCoE

 Multi-function network components are used to transfer both compute-to-


compute and FC storage traffic

– Reduce the complexity of managing multiple discrete networks


– Reduce the number of network adapters, cables, and switches required in a
data center
– Reduce power and space consumption in a data center

Information Storage and Management (ISM) v4

Page 398 © Copyright 2019 Dell Inc.


FCoE Lesson

Components of FCoE

The key FCoE components are:

 Network adapters
 Example: Converged Network Adapter (CNA) and software FCoE adapter
 Cables
 Example: Copper cables and fiber optical cables
 FCoE switch

Compute Systems
VM VM VM VM VM VM

Hypervisor Hypervisor Hypervisor

CNA
CEE Link

FCoE Switch LAN

FC
SAN

FC Ports

Storage Systems

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 399


FCoE Lesson

What is CNA?

10GE/FCoE

FCoE ASIC

10GE FC
ASIC ASIC

PCIe Bus

 A physical adapter that provides functionality of both NIC and FC HBA


 Encapsulates FC frames into Ethernet frames and forwards them over CEE
links
 Contains separate modules for 10 GE, FC, and FCoE ASICs

Information Storage and Management (ISM) v4

Page 400 © Copyright 2019 Dell Inc.


FCoE Lesson

FCoE Switch

An FCoE switch has both Ethernet switch and FC switch functionalities. It has a
Fibre Channel Forwarder (FCF), an Ethernet Bridge, and a set of ports that can be
used for FC and Ethernet connectivity:

FC Port FC Port FC Port FC Port

Fibre Channel Forwarder (FCF)

Ethernet Bridge

Ethernet Port Ethernet Port Ethernet Port Ethernet Port

 FCF functions as the communication bridge between CEE and FC networks


 Encapsulates and decapsulates FC frames
 FCoE switch inspects the Ethertype and forwards to the appropriate destination

 FCoE frames contain an FC payload are forwarded to the FCF

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 401


FCoE Lesson

 Non FCoE frames are handled as typical Ethernet traffic and forwarded over
the Ethernet ports

Information Storage and Management (ISM) v4

Page 402 © Copyright 2019 Dell Inc.


FCoE Lesson

FCoE SAN Connectivity

The most common FCoE connectivity uses FCoE switches


 To interconnect a CEE network containing compute systems with an FC SAN
containing storage systems
 FCoE switches enable the consolidation of FC traffic and Ethernet traffic onto
CEE links

This type of FCoE connectivity is suitable when an organization has an existing FC


SAN environment. Connecting FCoE compute systems to the FC storage systems
through FCoE switches do not require any change in the FC environment.

VM VM

FCoE Switch
FCoE Port

Hypervisor

Compute Systems with


CNA

LAN FC Ports
VM VM
FC SAN

Hypervisor

Storage System
FCoE Switch
CEE Link

Notes

This type of FCoE connectivity is suitable when an organization has an existing FC


SAN environment. Connecting FCoE compute systems to the FC storage systems
through FCoE switches do not require any change in the FC environment.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 403


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Dell PowerConnect B-8000 Network Switch
 Dell EMC Networking S-Series 10GbE switches
 Dell Networking Z-Series core/aggregation switches
 Dell EMC S4148U

Information Storage and Management (ISM) v4

Page 404 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts In Practice

Concepts in Practice

Dell PowerConnect B-8000 Network Switch

 Provides a unified FCoE Solution


 Supports 10-GbE and FC ports
 Supports comprehensive Layer 2 LAN capabilities with high Performance and
availability
 Provides a versatile solution for Server I/O Consolidation

Dell EMC Networking S-Series 10GbE switches

 Provides high performance open networking top-of-rack switches


 Provides support for iSCSI storage area networks
 Provides flexibility and is cost effective
 Flexible, powerful 10-GbE ToR switches for data centers of all sizes

Dell Networking Z-Series core/aggregation switches

 Provides optimal flexibility, performance, density, and power efficiency


 Includes 10/25/40/50/100GbE options

Dell EMC S4148U

 Offers various port speed choices for Fibre Channel and Ethernet connectivity
 Provides flexibility and high performance for modern workloads
 Can be used in the following use cases:
 Provide end to end FC switch connectivity

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 405


Concepts in Practice Lesson

 NPIV Gateway Edge switch in large multi-vendor SAN environments


 Supports up to 32 Gbps FC and 100 GbE Ethernet connectivity

Dell PowerConnect B-8000 Network Switch

A top-of-rack link layer CEE/DCB and FCoE switch. It comprises of 24 10-Gigabit


Ethernet ports for LAN connections and 8 Fibre Channel ports with up to 8-Gigabit
speed for Fibre Channel SAN connections. The network switch supports
comprehensive Layer 2 LAN capabilities and, provide high performance and
availability. The functionality of server I/O Consolidation is too supported by the
Power Connect B-8000 Network Switch.

Dell EMC Networking S-Series 10GbE switches

A high-performance open networking top-of-rack switches with multirate Gigabit


Ethernet and unified ports. It offers flexibility and cost-effectiveness for the
enterprise, and Tier2 cloud service provider with demanding compute and storage
traffic environments. The switches support iSCSI and FC storage deployment,
including DCB converged lossless transactions. It comprises of 10GbE S4000-ON
Series switches, 1/10G BASE-T S4048T-ON, S4128T-ON, and S4148T-ON
switches. Dell EMC Networking S-Series 10GbE switches offers active fabric
designs using S- or Z-Series core switches to create a two-tier 1/20/40/100-GbE
data center network.

Dell Networking Z-Series core/aggregation switches

Open networking and SDN-ready fixed form factor switches. They are purpose-built
for applications in modern computing environments. They not only simplify
manageability, it provides optimal flexibility, performance, density and power
efficiency for the data center. It also supports both VLAN Tagging and Double
VLAN Tagging and comprises of 10/25/40/50/100GbE options.

Dell EMC S4148U

A feature rich multi-functional switch offering various port speed choices for Fibre
Channel and Ethernet connectivity. It is designed for flexibility and high
performance for today’s demanding modern workloads and performance. It can be
used as an end to end FC switch and as an NPIV Gateway Edge switch in a large

Information Storage and Management (ISM) v4

Page 406 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

multi-vendor SAN environment. It supports up to 32 Gbps FC and 100 GbE


Ethernet connectivity.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 407


Concepts in Practice Lesson

Assessment

1. Which of the following function is supported by the ICMP (Internet Control


Message Protocol)

A. Handles error and control message

B. Flow Control

C. Monitors Computers

D. Buffers packet

2. Which protocol is used by IP SAN for the transport of block-level data?

A. iSCSI

B. ARP

C. ICMP

D. Ethernet

Information Storage and Management (ISM) v4

Page 408 © Copyright 2019 Dell Inc.


Summary

Summary

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 409


File-Based and Object-Based Storage System

Introduction

This module focuses on the NAS components and architecture. This module also
focuses on object-based storage components and operations. Finally, this module
focuses on unified storage architecture.

Upon completing this module, you will be able to:


 Describe NAS components and architecture
 Describe object-based storage components and operations
 Describe unified storage architecture

Information Storage and Management (ISM) v4

Page 410 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

NAS Components and Architecture Lesson

Introduction

This lesson focuses on the components and architectures of a file-based storage


system. This module also focuses on various file access methods supported by a
file-based storage system. Finally, this module focuses on NAS I/O operations.

This lesson covers the following topics:


 Describe NAS components and architectures
 Discuss NAS file access methods
 Discuss NAS I/O operations

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 411


NAS Components and Architecture Lesson

NAS Components and Architecture

Video: NAS Components and Architecture

The video is located at


https://edutube.emc.com/Player.aspx?vno=eRMzm7xSHuP3ffzyGz0mQQ

Information Storage and Management (ISM) v4

Page 412 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

File Sharing Environment

 File sharing enables users to share files with other users


 File sharing environment ensures data integrity when multiple users access a
shared file simultaneously
 Examples of file sharing methods

– File Transfer Protocol (FTP)


– Peer-to-Peer (P2P)
– Network File System (NFS) and Common Internet File System (CIFS)
– Distributed File System (DFS)

Notes

In a file-sharing environment, a user who creates the file (the creator or owner of a
file) determines the type of access (such as read, write, execute, append, delete) to
be given to other users. When multiple users try to access a shared file
simultaneously, a locking scheme is required to maintain data integrity and
simultaneously make this sharing possible.

Some examples of file-sharing methods are the peer-to-peer (P2P) model, File
Transfer Protocol (FTP), client/server models that use file-sharing protocols such
as NFS and CIFS, and Distributed File System (DFS). FTP is a client/server
protocol that enables data transfer over a network. An FTP server and an FTP
client communicate with each other using TCP as the transport protocol.

A peer-to-peer (P2P) file sharing model uses peer-to-peer network. P2P enables
client machines to directly share files with each other over a network. Clients use a
file sharing software that searches for other peer clients. This software differs from
client/server model that uses file servers to store files for sharing.

The standard client/server file-sharing protocols are NFS and CIFS. These
protocols enable the owner of a file to set the required type of access, such as
read-only or read/write, for a particular user or group of users. Using this protocol,
the clients mount remote file systems that are available on dedicated file servers.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 413


NAS Components and Architecture Lesson

A distributed file system (DFS) is a file system that is distributed across several
compute systems. A DFS can provide compute systems with direct access to the
entire file system, while ensuring efficient management and data security. Hadoop
Distributed File System (HDFS) is an example of distributed file system which is
later discussed in this module. Vendors now support HDFS on their NAS systems
to support the scale-out architecture. The scale-out architecture helps to meet the
big data analytics requirements.

Information Storage and Management (ISM) v4

Page 414 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

What is NAS?

Definition: NAS
An IP-based, dedicated, high-performance file sharing and storage
device.

 Enables NAS clients to share files over IP network


 Uses specialized operating system that is optimized for file I/O
 Enables both UNIX and Windows users to share data

Clients

LAN

VM VM

NAS System

Hypervisor

Application Servers

Notes

NAS provides the advantages of server consolidation by eliminating the need for
multiple file servers. It also consolidates the storage used by the clients onto a
single system, making it easier to manage the storage. NAS uses network and file-
sharing protocols to provide access to the file data. These protocols include TCP/IP
for data transfer and Common Internet File System (CIFS) and Network File

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 415


NAS Components and Architecture Lesson

System (NFS) for network file service. Apart from these protocols, the NAS
systems may also use HDFS and its associated protocols (discussed later in the
module) over TCP/IP to access files. NAS enables both UNIX and Microsoft
Windows users to share the same data seamlessly.

A NAS device uses its own operating system and integrated hardware and
software components to meet specific file-service needs. Its operating system is
optimized for file I/O and, therefore, performs file I/O better than a general-purpose
server. As a result, a NAS device can serve more clients than general-purpose
servers and provide the benefit of server consolidation.

Information Storage and Management (ISM) v4

Page 416 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

General Purpose Servers Vs. NAS Devices

 A general-purpose server can be used to host any application because it runs a


general-purpose operating system
 Unlike a general-purpose server, a NAS device is dedicated to file-serving
 It has a specialized operating system dedicated for file serving by using industry
standard protocols. NAS vendors also support features, such as clustering for
high availability, scalability, and performance
 The clustering feature enables multiple NAS controllers/heads/nodes to function
as a single entity. The workload can be distributed across all the available
nodes. Therefore, NAS devices support massive workloads

Applications

Print Drivers File System

File System Operating System

Operating System Network Interface

Network Interface

General Purpose Server NAS System

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 417


NAS Components and Architecture Lesson

Components of NAS System

 Controller/NAS head consists of


 CPU, memory, network adapter, and so on
 Specialized operating systems installed
 Storage
 Supports different types of storage devices
 Scalability of the components depends on NAS architecture

 Scale-up NAS
 Scale-out NAS

Notes

A NAS system consists of two components, controller and storage. A controller is a


compute system that contains components such as network, memory, and CPU
resources. A specialized operating system optimized for file serving is installed on
the controller. Each controller may connect to all storage in the system. The
controllers can be active/active, with all controllers accessing the storage, or
active/passive with some controllers performing all the I/O processing while others
act as spares. A spare is used for I/O processing if an active controller fails. The
controller is responsible for configuration of RAID set, creating LUNs, installing file
system, and exporting the file share on the network.

Storage is used to persistently store data. The NAS system may have different
types of storage devices to support different requirements. The NAS system may
support SSD, SAS, and SATA in a single system.

The extent to which the components, such as CPU, memory, network adapters,
and storage, can be scaled depends upon the type of NAS architecture used.
There are two types of NAS architectures; scale-up and scale-out. Both these
architectures are detailed in the next few slides.

Information Storage and Management (ISM) v4

Page 418 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

Scale-Up NAS

Storage

NAS Head(s)

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 419


NAS Components and Architecture Lesson

A scale-up NAS architecture provides the capability to scale the capacity and
performance of a single NAS system based on requirements. Scaling up a NAS
system involves upgrading or adding NAS heads and storage.

These NAS systems have a fixed capacity ceiling, which limits their scalability. The
performance of these systems starts degrading when reaching the capacity limit.

Information Storage and Management (ISM) v4

Page 420 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

Scale-Up NAS Implementations

There are two types of scale-up NAS implementations:

Unified NAS

A unified NAS system contains one or more NAS heads and storage in a single
system. NAS heads are connected to the storage. The storage may consist of
different drive types, such as SAS, ATA, FC, and solid-state drives, to meet
different workload requirements.

Each NAS head in a unified NAS has front-end Ethernet ports, which connect to
the IP network. The front-end ports provide connectivity to the clients. Each NAS
head has back-end ports to provide connectivity to the attached storage. Unified
NAS systems have NAS management software that can be used to perform all the
administrative tasks for the NAS head and storage.

VM VM

Block Data
Access
Hypervisor FC SAN

FC Host
FC Port

VM VM

Block Data
Access
iSCSI SAN

Hypervisor
iSCSI Port

Ethernet Port

iSCSI Host

Unified NAS

Ethernet
File Data
Access

NAS Clients

Unified NAS

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 421


NAS Components and Architecture Lesson

Gateway NAS

A gateway NAS system consists of one or more NAS heads and uses external and
independently managed storage. In gateway NAS implementation, the NAS
gateway shares the storage from a block-based storage system. The management
functions in this type of solution are more complex than those in an integrated a
unified NAS environment. This is because there are separate administrative tasks
for the NAS head and the storage.

The administrative tasks of the NAS gateway are performed by the NAS
management software. The storage system is managed with the management
software of the block-based storage system. A gateway solution can use the FC
infrastructure, such as switches and directors for accessing SAN-attached storage
arrays or direct-attached storage arrays.

VM VM

Hypervisor

NAS Clients
Application Server

VM VM

IP FC SAN

Hypervisor

NAS Clients
Application Server

Storage System

NAS Gateway
NAS Clients

Gateway NAS

Information Storage and Management (ISM) v4

Page 422 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

Scale-Out NAS

External Switch

Node1 Node2 Node3

Storage Storage Storage

Cluster

Controller Controller Controller

Internal InfiniBand Internal InfiniBand


Switch1 Switch2

 Pools multiple nodes in a cluster to work as a single NAS device


 Scales performance and/or capacity non-disruptively
 Creates a single file system that runs on all nodes in the cluster
 File system grows dynamically as nodes are added
 Stripes data across nodes with mirror or parity protection

Notes

The scale-out NAS implementation pools multiple NAS nodes together in a cluster.
A node may consist of either the NAS head or the storage or both. The cluster
performs the NAS operation as a single entity. A scale-out NAS provides the
capability to scale its resources by simply adding nodes to a clustered NAS
architecture. The cluster works as a single NAS device and is managed centrally.
Nodes can be added to the cluster, when more performance or more capacity is

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 423


NAS Components and Architecture Lesson

needed, without causing any downtime. Scale-out NAS provides the flexibility to
use many nodes of moderate performance and the availability characteristics. This
scale-out NAS produce a total system that has better aggregate performance and
availability. It also provides ease of use, low cost, and theoretically unlimited
scalability.

Scale-out NAS uses a distributed clustered file system that runs on all nodes in the
cluster. All information is shared among nodes, so the entire file system is
accessible by clients connecting to any node in the cluster. Scale-out NAS stripes
data across all nodes in a cluster along with mirror or parity protection. As data is
sent from clients to the cluster, the data is divided and allocated to different nodes
in parallel. When a client sends a request to read a file, the scale-out NAS retrieves
the appropriate blocks from multiple nodes. It recombines the blocks into a file and
presents the file to the client. As nodes are added, the file system grows
dynamically, and data is evenly distributed to every node. Each node added to the
cluster increases the aggregate storage, memory, CPU, and network capacity.
Hence, cluster performance is also increased.

Scale-out NAS clusters use separate internal and external networks for back-end
and front-end connectivity respectively. An internal network provides connections
for intra-cluster communication, and an external network connection enables clients
to access and share file data. Each node in the cluster connects to the internal
network. The internal network offers high throughput and low latency and uses
high-speed networking technology, such as InfiniBand or Gigabit Ethernet. To
enable clients to access a node, the node must be connected to the external
Ethernet network. Redundant internal or external networks may be used for high
availability.

Tip: InfiniBand is a networking technology that provides a low-latency,


high-bandwidth communication link between hosts and peripherals. It
provides serial connection and is often used for inter-server
communications in high-performance computing environments.
InfiniBand enables remote direct memory access (RDMA) that
enables a device (host or peripheral) to access data directly from the
memory of a remote device. InfiniBand also enables a single physical
link to carry multiple channels of data simultaneously by using a
multiplexing technique.

Information Storage and Management (ISM) v4

Page 424 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 425


NAS Components and Architecture Lesson

NAS File Access Methods

Different methods can be used to access files on a NAS system. The most
common methods are:
 Common Internet File System / Server Message Block (CIFS/SMB)
 Network File System (NFS)
 Hadoop Distributed File System (HDFS)

CIFS/SMB

 Client-server application protocol


 An open variation of the Server Message Block (SMB) protocol which is
used for Windows file sharing
 Enables clients to access files that are on a server over TCP/IP
 Stateful Protocol

 Maintains connection information regarding every connected client


 Can automatically restore connections and reopen files that were open prior
to interruption

NFS

 Client-server application protocol


 Enables clients to access files that are on a server
 Uses Remote Procedure Call (RPC) mechanism to provide access to remote
file system

HDFS

 A file system that spans multiple nodes in a cluster and enables user data to be
stored in files.

Information Storage and Management (ISM) v4

Page 426 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

 Presents a traditional hierarchical file organization so that users or applications


can manipulate (create, rename, move, or remove) files and directories
 Presents a streaming interface to run any application of choice using the
MapReduce framework

Hadoop Cluster

Data Node Data Node Data Node Data Node

Clients
Ethernet
LAN

NameNode

Data Node Data Node Data Node Data Node

Clients

Notes

Common Internet File System (CIFS): is a client/server application protocol that


enables client programs to make requests for files and services on remote
computers over TCP/IP. It is a public or open variation of Server Message Block
(SMB) protocol.

The CIFS protocol enables remote clients to gain access to files on a server. CIFS
enables file sharing with other clients by using special locks. Filenames in CIFS are
encoded using Unicode characters. CIFS provides the following features to ensure
data integrity:
 It uses file and record locking to prevent users from overwriting the work of
another user on a file or a record.
 It supports fault tolerance and can automatically restore connections and
reopen files that were open prior to an interruption.

The fault tolerance features of CIFS depend on whether an application is written to


take advantage of these features. Moreover, CIFS is a stateful protocol because
the CIFS server maintains connection information regarding every connected client.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 427


NAS Components and Architecture Lesson

If a network failure or CIFS server failure occurs, the client receives a


disconnection notification. User disruption is minimized if the application has the
embedded intelligence to restore the connection. However, if the embedded
intelligence is missing, the user must take steps to reestablish the CIFS
connection.

Users refer to remote file systems with an easy-to-use file-naming scheme:


\\server\share or \\servername.domain.suffix\share.

Network File System (NFS): is a client/server protocol for file sharing that is
commonly used on UNIX systems. NFS was originally based on the connectionless
User Datagram Protocol (UDP). It uses a machine-independent model to represent
user data. It also uses Remote Procedure Call (RPC) for interprocess
communication between two computers.

The NFS protocol provides a set of RPCs to access a remote file system for the
following operations:
 Searching files and directories
 Opening, reading, writing to, and closing a file
 Changing file attributes
 Modifying file links and directories

NFS creates a connection between the client and the remote system to transfer
data.

HDFS: is supported by many of the scale-out NAS vendors. HDFS requires


programmatic access because the file system cannot be mounted. All HDFS
communication is layered on top of the TCP/IP protocol. HDFS has a master/slave
architecture. An HDFS cluster consists of a single NameNode that acts as a master
server. This cluster has in-memory maps of every file, file locations as well as all
the blocks within the file and which DataNodes they reside on. The NameNode is
responsible for managing the file system namespace and controlling the access to
the files by clients. DataNodes act as slaves that serve read/write requests and
perform block creation, deletion, and replication as directed by the NameNode.

Information Storage and Management (ISM) v4

Page 428 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

Scale-Up NAS I/O Operation

Application Storage Interface


3

Operating System NAS Operating System


Block I/O

NFS/CIFS NFS/CIFS

TCP/IP Stack TCP/IP Stack Storage


1

Network Interface Network Interface


File I/O 4

Client NAS Head

The figure illustrates an I/O operation in a scale-up NAS system. The process of
handling I/Os in a scale-up NAS environment is as follows:
1. The requestor (client) packages an I/O request into TCP/IP and forwards it
through the network stack. The NAS system receives this request from the
network.
2. The NAS system converts the I/O request into an appropriate physical storage
request, which is a block-level I/O. This system then performs the operation on
the physical storage.
3. When the NAS system receives data from the storage, it processes and
repackages the data into an appropriate file protocol response.
4. The NAS system packages this response into TCP/IP again and forwards it to
the client through the network.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 429


NAS Components and Architecture Lesson

Scale-Out NAS I/O Operation

Cluster

NAS NAS NAS


Node Node Node

Clients

Ethernet
LAN

NAS NAS NAS


Node Node Node

Clients

The figure illustrates I/O operation in a scale-out NAS system. A scale-out NAS
consists of multiple NAS nodes and each of these nodes has the functionality
similar to a NameNode or a DataNode. In some proprietary scale-out NAS
implementations, each node may function as both a NameNode and DataNode,
typically to provide Hadoop integration. All the NAS nodes in scale-out NAS are
clustered.

Write Operation Read Operation

1. Client sends a file to the NAS node 1. Client requests a file


2. Node to which the client is connected 2. Node to which the client is connected
receives the file receives the request
3. File is striped across the nodes 3. The node retrieves and rebuilds the
file and gives it to the client

Information Storage and Management (ISM) v4

Page 430 © Copyright 2019 Dell Inc.


NAS Components and Architecture Lesson

Notes

New nodes can be added as required. As new nodes are added, the file system
grows dynamically and is evenly distributed to each node. As the client sends a file
to store to the NAS system, the file is evenly striped across the nodes. When a
client writes data, even though that client is connected to only one node, the write
operation occurs in multiple nodes in the cluster. This operation is also true for read
operations. A client is connected to only one node at a time. However, when that
client requests a file from the cluster, the node to which the client is connected
don’t have the entire file locally on its drives. The node to which the client is
connected retrieves and rebuilds the file using the back-end InfiniBand network.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 431


File-Level Virtualization and Tiering Lesson

File-Level Virtualization and Tiering Lesson

Introduction

This lesson covers file-level virtualization, storage tiering, and NAS use case.

This lesson covers the following topics:


 Explain file-level virtualization
 Discuss storage tiering
 Discuss NAS use cases

Information Storage and Management (ISM) v4

Page 432 © Copyright 2019 Dell Inc.


File-Level Virtualization and Tiering Lesson

File-Level Virtualization and Tiering

Video: File-level Virtualization and Tiering

The video is located at


https://edutube.emc.com/Player.aspx?vno=DBo/fa0JMAj05Z4rWTktxQ

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 433


File-Level Virtualization and Tiering Lesson

What is File-Level Virtualization?

A network-based file sharing environment is composed of multiple file servers or


NAS devices. It might be required to move the files from one device to another due
to reasons such as cost or performance. File-level virtualization, which is
implemented in NAS or the file server environment, provides a simple, non-
disruptive file-mobility solution.

 Eliminates dependency between data accessed at the file-level and the location
where the files are physically stored
 Enables users to use a logical path, rather than a physical path, to access files
 Uses global namespace that maps logical path of file resources to their physical
path
 Provides non-disruptive file mobility across file servers or NAS devices

Information Storage and Management (ISM) v4

Page 434 © Copyright 2019 Dell Inc.


File-Level Virtualization and Tiering Lesson

Before and After File-Level Virtualization

Before virtualization, each client knows exactly where its file resources are located.
This environment leads to underutilized storage resources and capacity problems
because files are bound to a specific NAS device or file server. It may be required
to move the files from one server to another because of performance reasons or
when the file server fills up. Moving files across the environment is not easy and
may make files inaccessible during file movement. Moreover, hosts and
applications need to be reconfigured to access the file at the new location. This
operation makes it difficult for storage administrators to improve storage efficiency
while maintaining the required service level.

File-level virtualization simplifies file mobility. It provides user or application


independence from the location where the files are stored. File-level virtualization
facilitates the movement of files across online file servers or NAS devices. It means
that while the files are being moved, clients can access their files non-disruptively.
Clients can also read their files from the old location and write them back to the
new location without realizing that the physical location has changed.

Clients Clients

Virtualization
Appliance

NAS Head NAS Head NAS Head NAS Head

Storage System Storage System

Before File-level Virtualization After File-level Virtualization

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 435


File-Level Virtualization and Tiering Lesson

File-Level Storage Tiering

 Moves files from higher tier to lower tier


 Storage tiers are defined based on cost, performance, and availability
parameters
 Uses policy engine to determine the files that are required to move to the lower
tier
 Predominant use of file tiering is archival

Notes

As the unstructured data in the NAS environment grows, organizations deploy a


tiered storage environment. This environment optimizes the primary storage for
performance and the secondary storage for capacity and cost.

Storage tiering works on the principle of Hierarchical Storage Management (HSM).


HSM is a file mobility concept where a policy-engine, which can be software or
hardware. When these policies are configured, facilitates moving files from the
primary tiered storage to the secondary tiered storage that meets the predefined
policies. In HSM, a hierarchy of storage tier is defined based on parameters such
as cost, performance, and/or availability of storage. Some prevalent reasons to tier
data across storage systems or between storage system and cloud is archival or to
meet compliance requirements.

As an example, the policy engine might be configured to relocate all the files in the
primary storage tier that have not been accessed in one month and archive those
files to the secondary storage. For each archived file, the policy engine creates a
small space-saving stub file in the primary storage that points to the data on the
secondary storage. When a user tries to access the file from its original location on
the primary storage, the user is transparently provided with the actual file from the
secondary storage.

Information Storage and Management (ISM) v4

Page 436 © Copyright 2019 Dell Inc.


File-Level Virtualization and Tiering Lesson

Inter-array Tiering and Cloud Tiering

The figure illustrates the file-level storage tiering. In a file-level storage tiering
environment, a file can be moved to a secondary storage tier or to the cloud.
Before moving a file from primary NAS to secondary NAS or from primary NAS to
cloud, the policy engine scans the primary NAS to identify files that meet the
predefined policies. After identifying the data files, the stub files are created, and
the data files are moved to the destination storage tier.

Cloud Storage Tier


2/Tier 3

3. File is stored in Tier 2/Tier 3


storage

VM VM
LAN/WAN

Hypervisor

Secondary NAS
Tier2
Application Servers

1. The policy engine scans the 2.Creates a stub file on


primary NAS system primary NAS system
Policy Engine Primary NAS
Tier1

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 437


File-Level Virtualization and Tiering Lesson

Use-Case for Scale-Out NAS: Data Lake

The data lake represents a paradigm shift from the linear data flow model. As data
and the insights gathered from it increase in value, the enterprise-wide
consolidated storage is transformed into a hub around which the ingestion and
consumption systems work (see figure). This enables enterprises to bring analytics
to data and avoid expensive cost of multiple systems, storage, and time for
ingestion and analysis.

Ingest Store

Velocity Analyse

Data Lake

Sources Variety Surface

Volume Act

Scale-out NAS

The key characteristics of a scale-out data lake are that it:


 Accepts data from various sources like file shares, archives, web applications,
devices, and the cloud, in both streaming and batch processes
 Enables access to this data for a variety of uses from conventional purpose to
mobile, analytics, and cloud applications
 Scales to meet the demands of future consolidation and growth as technology
evolves; new possibilities emerge for applying data to gain competitive
advantage in the market place
 Provides a tiering ability that enables organizations to manage their costs
without setting up specialized infrastructures for cost optimization

Information Storage and Management (ISM) v4

Page 438 © Copyright 2019 Dell Inc.


File-Level Virtualization and Tiering Lesson

Notes

By eliminating a number of parallel linear data flows. The enterprises can


consolidate vast amounts of their data into a single store, a data lake, through a
native and simple ingestion process. Analytics can be performed on this data which
provides insight. Actions can be taken based on this insight in an iterative manner,
as the organization and technology mature. Enterprises can thus eliminate the cost
of having silos or islands of information spread across their enterprises.

Scale-out NAS has the ability to provide the storage platform to this data lake. The
scale-out NAS enhances this paradigm by providing scaling capabilities in terms of
capacity, performance, security, and protection.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 439


Object-Based and Unified Storage Lesson

Object-Based and Unified Storage Lesson

Introduction

This lesson focuses on the key object-based storage components. This lesson also
focuses on the key features of object-based storage. Finally, this lesson focuses on
unified storage architecture.

This lesson covers the following topics:


 Describe the key components of object-based storage
 Explain the key features of object-based storage
 Describe unified storage architecture

Information Storage and Management (ISM) v4

Page 440 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Object-Based and Unified Storage Overview

Drivers for Object-Based Storage

 Amount of data created annually is growing exponentially and more than 90% of
data generated is unstructured
 Rapid adoption of third platform technologies leads to significant growth of
data
 Longer data retention due to regulatory compliance also leads to data
growth
 Data must be instantly accessible through a variety of devices from anywhere in
the world
 Traditional storage solutions are inefficient in managing this data and in
handling the growth

Notes

The amount of data created each year is growing exponentially and the recent
studies have shown that more than 90 percent of data generated is unstructured
(e-mail, instant messages, graphics, images, and videos). Today, organizations not
only have to store and protect petabytes of data, but they also have to retain the
data over longer periods of time, for regulation and compliance reasons. They have
also recognized that data can help gain competitive advantages and even support
new revenue streams. In addition to increasing amounts of data, there has also
been a significant shift in how people want and expect to access their data. The
rising adoption rate of smartphones, tablets, and other mobile devices by
consumers, combined with increasing acceptance of these devices in enterprise
workplaces, has resulted in an expectation for on-demand access to data from
anywhere on any device.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 441


Object-Based and Unified Storage Lesson

Traditional storage solutions like NAS, which is a dominant solution for storing
unstructured data, cannot scale to the capacities required or provide universal
access across geographically dispersed locations. Data growth adds high overhead
to the NAS in terms of managing large number of permission and nested
directories. File systems require more management as they scale and are limited in
size. Their performance degrades as file system size increases, and do not
accommodate metadata beyond file properties which is a requirement of many new
applications.These challenges demand a smarter approach (object storage) that
allows to manage data growth at low cost, provides extensive metadata
capabilities, and also provides massive scalability to keep up with the rapidly
growing data storage and access demands.

Information Storage and Management (ISM) v4

Page 442 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Object-Based Storage Device (OSD)

Stores data in the form of objects on flat address space based on its
content and attributes rather than the name and location.

Ds

Metadata

Object

Object ID

Object

 Definition: Object-Based Storage Device


– Stores data in the form of objects on flat address space based on its content
and attributes rather than the name and location
 Object contains user data, related metadata, and user-defined attributes
 Objects are uniquely identified using object ID
 OSD provides APIs to integrate with software-defined data center and cloud

Notes

An object is the fundamental unit of object-based storage that contains user data,
related metadata (size, date, ownership, etc.), and user defined attributes of data

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 443


Object-Based and Unified Storage Lesson

(retention, access pattern, and other business-relevant attributes). The additional


metadata or attributes enable optimized search, retention and deletion of objects.

For example, when an MRI scan of a patient is stored as a file in a NAS system,
the metadata is basic and may include information such as file name, date of
creation, owner, and file type. When stored as an object, the metadata component
of the object may include additional information such as patient name, ID, and
attending physician’s name, apart from the basic metadata.

Each object stored in the object-based storage system is identified by a unique


identifier called the object ID. The object ID allows easy access to objects without
the need to specify the storage location. The object ID is generated using
specialized algorithms (such as a hash function) on the data and guarantees that
every object is uniquely identified. Any changes in the object, like user-based edits
to the file, results in a new object ID. Most of the object storage system supports
APIs to integrate it with software-defined data center and cloud environments.

Information Storage and Management (ISM) v4

Page 444 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Hierarchical File System Vs. Flat Address Space

 Hierarchical file system organizes data in the form of files/directories


 Limits the number of files that can be stored
 OSD uses flat address space that enables storing large number of objects

 Enables the OSD to meet the scale-out storage requirement of third platform
File Names/ Nodes

Object

Notes Object ID Object Object


Metadata Object

File-based storage systems (NAS) are based on file hierarchies that are complex in
Data Attributes Object Object Object

structure. Most file systems have restrictions on the number of files, directories and
Flat Address Space
Hierarchical File System
levels of hierarchy that can be supported, which limits the amount of data that can
be stored.

OSD stores data using flat address space where the objects exist at the same level
and one object cannot be placed inside another object. Therefore, there is no
hierarchy of directories and files, and as a result, billions of objects are to be stored
in a single namespace. This enables the OSD to meet scale-out storage
requirement needs.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 445


Object-Based and Unified Storage Lesson

Components of Object-Based Storage Device

VM VM

Metadata
Hypervisor Service
Internal
IP Network Storage Network
Service

OSD System

OSD system typically comprises three key components:


 OSD nodes (controllers)
 Internal network
 Storage

Notes

The OSD system is composed of one or more nodes. A node is a server that runs
the OSD operating environment and provides services to store, retrieve, and
manage data in the system. Typically OSD systems are architected to work with
inexpensive x86-based nodes, each node provides both compute and storage
resources, and scales linearly in capacity and performance by simply adding
nodes.

The OSD node has two key services: metadata service and storage service. The
metadata service is responsible for generating the object ID from the contents (may
also include other attributes of data) of a file. It also maintains the mapping of the
object IDs and the file system namespace. In some implementations, the metadata
service runs inside an application server. The storage service manages a set of
disks on which the user data is stored.

The OSD nodes connect to the storage via an internal network. The internal
network provides node-to-node connectivity and node-to-storage connectivity. The

Information Storage and Management (ISM) v4

Page 446 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

application server accesses the node to store and retrieve data over an external
network. OSD typically uses low-cost and high-density disk drives to store the
objects. As more capacity is required, more disk drives can be added to the
system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 447


Object-Based and Unified Storage Lesson

Key Features of OSD

Typically, the object-based storage device has the following features:

Features Description

Scale-out Provides linear scalability where nodes are independently


architecture added to the cluster to scale massively

Multitenancy Enables multiple applications/clients to be served from the


same infrastructure

Metadata-driven Intelligently drive data placement, protection, and data


policy services based on the service requirements

Global namespace Abstracts storage from the application and provides a


common view which is independent of location and making
scaling seamless

Flexible data Supports REST/SOAP APIs for web/mobile access, and file
access method sharing protocols (CIFS and NFS) for file service access

Automated system Provides auto-configuring, auto-healing capabilities to reduce


management administrative complexity and downtime

Data protection: Object is protected using either replication or erasure coding


Geo distribution technique and the copies are distributed across different
locations

Notes

Addition details for each OSD feature are:

Scale-out architecture: Scalability has always been the most important


characteristic of enterprise storage systems, since the rationale of consolidating

Information Storage and Management (ISM) v4

Page 448 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

storage assumes that the system can easily grow with aggregate demand. OSD is
based on distributed scale-out architecture where each node in the cluster
contributes with its resources to the total amount of space and performance. Nodes
are independently added to the cluster that provides massive scaling to support
petabytes and even exabytes of capacity with billions of objects that make it
suitable for cloud environment.

Multi-tenancy: Enables multiple applications to be securely served from the same


infrastructure. Each application is securely partitioned and data is neither co-
mingled nor accessible by other tenants. This feature is ideal for businesses
providing cloud services for multiple customers or departments within an
enterprise.

Metadata-driven policy: Metadata and policy-based information management


capabilities combine to intelligently (automate) drive data placement, data
protection, and other data services (compression, deduplication, retention, and
deletion) based on the service requirements. For example, when an object is
created, it is created on one node and subsequently copied to one or more
additional nodes, depending on the policies in place. The nodes can be within the
same data center or geographically dispersed.

Global namespace: Another significant value of object storage is that it presents a


single global namespace to the clients. A global namespace abstracts storage from
the application and provides a common view, independent of location and making
scaling seamless. This unburdens client applications from the need to keep track of
where data is stored. The global namespace provides the ability to transparently
spread data across storage systems for greater performance, load balancing, and
non-disruptive operation. The global namespace is especially important when the
infrastructure spans multiple sites and geographies.

Flexible data access method: OSD supports REST/SOAP APIs for web/mobile
access, and file sharing protocols (CIFS and NFS) for file service access. Some
OSD storage systems support HDFS interface for big data analytics.

Automated system management: OSD provides self-configuring and auto-


healing capabilities to reduce administrative complexity and downtime. With
respect to services or processes running in the OSD, there is no single point of
failure. If one of the services goes down, and if the node becomes unavailable, or
site becomes unavailable, there are redundant components and services that will
facilitate normal operations.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 449


Object-Based and Unified Storage Lesson

Data protection: The objects stored in an OSD are protected using two methods:
replication and erasure coding. The replication provides data redundancy by
creating an exact copy of an object. The replica requires the same storage space
as the source object. Based on the policy configured for the object, one or more
replicas are created and distributed across different locations. Erasure coding
technique is discussed in the next slide.

Information Storage and Management (ISM) v4

Page 450 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Object Protection: Erasure Coding

 Provides space-optimal data redundancy to protect data loss against multiple


drive failures

– A set of n disks is divided into m disks to hold data and k disks to hold
coding information
– Coding information is calculated from data

Data

Write

9 fragments

Encode

Encoded fragments

k=3 m=9

The figure illustrates an example of dividing a data into nine data segments (m = 9)
and three coding fragments (k = 3). The maximum number of drive failure
supported in this example is three.

Notes

Object storage systems support erasure coding technique that provides space-
optimal data redundancy to protect data loss against multiple drive failures. In
storage systems, erasure coding can also ensure data integrity without using RAID.
This avoids the capacity overhead of keeping multiple copies and the processing

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 451


Object-Based and Unified Storage Lesson

overhead of running RAID calculations on very large data sets. The result is data
protection for very large storage systems without the risk of very long RAID rebuild
cycles.

In general, erasure coding technique breaks the data into fragments, encoded with
redundant data and stored across a set of different locations, such as disks,
storage nodes, or geographic locations. In a typical erasure coded storage system,
a set of n disks is divided into m disks to hold data and k disks to hold coding
information, where n, m, and k are integers. The coding information is calculated
from the data. If up to k of the n disks fail, their contents can be recomputed from
the surviving disks.

Erasure coding offers higher fault tolerance (tolerates k faults) than replication with
less storage cost. The additional storage requirement for storing coding segments
increases as the value of k/m increases.

Information Storage and Management (ISM) v4

Page 452 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Use Case: Cloud-Based Storage

The capabilities or features of OSD such as multi-tenancy, scalability, geographical


distribution of data, and data sharing across heterogeneous platforms or tenants
while ensuring integrity of data, make it a strong option for cloud-based storage.
Enterprise end-users and cloud subscribers are also interested in the cloud storage
offerings because it provides better agility, on-demand scalability, lower cost, and
operational efficiency compared to traditional storage solution.

Cloud storage provides unified and universal access, policy-based data placement,
and massive scalability. It also enables data access through or file access
protocols and provides automated data protection and efficiency to manage large
amount of data. With the growing adoption of cloud computing, cloud service
providers can leverage OSD to offer storage-as-a-service, backup-as-a-service,
and archive-as-a-service to their consumers.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 453


Object-Based and Unified Storage Lesson

VM VM VM VM

Heterogeneous platforms or tenants


accessing data from cloud Storage

HTTP/S (REST, SOAP), NFS, CIFS

Global, Intelligent, Web-based, and Self-service

Site #1 Site #2 Site #3

Information Storage and Management (ISM) v4

Page 454 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Use Case: Cloud-based Object Storage Gateway

VM VM
Cloud-based object storage
gateway
Hypervisor
iSCSi/FC/ FCoE REST

Application Servers

Data Center Object-based Cloud Storage

Gateways provide a translation layer between the standard interfaces (iSCSI, FC,
NFS, CIFS) and cloud provider’s REST API
 Sits in a data center and presents file and block-based storage interfaces to
applications
 Performs protocol conversion to send data directly to cloud storage
 Encrypts the data before it transmits to the cloud storage
 Supports deduplication and compression
 Provides a local cache to reduce latency

Notes

The lack of standardized cloud storage APIs has made gateway appliance a crucial
component for cloud adoption. Typically service providers offer cloud-based object
storage with interfaces such as REST or SOAP, but most of the business
applications expect storage resources with block-based iSCSI or FC interfaces or
file-based interfaces, such as NFS or CIFS. The cloud-based object storage
gateways provide a translation layer between these standard interfaces and service
provider's REST API.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 455


Object-Based and Unified Storage Lesson

The gateway device is a physical or virtual appliance that sits in a data center and
presents file and block-based storage interfaces to the applications. It performs
protocol conversion so that data can be sent directly to cloud storage. To provide
security for the data sent to the cloud, most gateways automatically encrypt the
data before it is sent. To speed up data transmission times (as well as to minimize
cloud storage costs), most gateways support data deduplication and compression.

Cloud-based object storage gateway provides a local cache to reduce latency


associated with having the storage capacity far away from the data center. The
gateway appliances offer not only an interface to the cloud, but also provide a layer
of management that can even help to determine what data should be sent to the
cloud and what data should be held locally.

Information Storage and Management (ISM) v4

Page 456 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Video: Unified Storage

The video is located at


https://edutube.emc.com/Player.aspx?vno=IkvWeVAqqldarkMDHIfdig

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 457


Object-Based and Unified Storage Lesson

Unified Storage Overview

Definition: Unified Storage


A single integrated(converged) storage infrastructure that
consolidates block (iSCSI, FC, FCoE), file (CIFS/SMB, NFS), and
object (REST, SOAP) access.

 Deploying unified storage provides following benefits


 Reduces capital and operational expenses
 Managed through single management interface
 Increases storage utilization
 Integration with software-defined environment provides storage for mobile,
cloud, big data, and social applications

Notes

In an enterprise data center, typically different storage systems (block-based, file-


based, and object-based storage) are deployed to meet the needs of different
applications. In many cases, this situation has been complicated by mergers and
acquisitions that bring together disparate storage infrastructures. The resulting silos
of storage have increased the overall cost because of complex management, low
storage utilization, and direct data center costs for power, space, and cooling.

An ideal solution would be to have an integrated storage solution that supports


block, file, and object access.

There are numerous benefits associated with deploying unified storage systems:

Creates a single pool of storage resources that can be managed with a single
management interface.

Information Storage and Management (ISM) v4

Page 458 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Sharing of pooled storage capacity for multiple business workloads should lead to a
lower overall system cost and administrative time, thus reducing the total cost of
ownership (TCO).

Provides the capability to plan the overall storage capacity consumption. Deploying
a unified storage system takes away the guesswork associated with planning for
file and block storage capacity separately.

Increased utilization, with no stranded capacity. Unified storage eliminates the


capacity utilization penalty associated with planning for block and file storage
support separately.

Provides the capability to integrate with software-defined storage environment to


provide next generation storage solutions for mobile, cloud, big data, and social
computing needs.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 459


Object-Based and Unified Storage Lesson

Unified Storage Architecture

A unified storage architecture enables the creation of a common storage pool that
can be shared across a diverse set of applications with a common set of
management processes. The key component of a unified storage architecture is
unified controller. The unified controller provides the functionalities of block storage,
file storage, and object storage. It contains iSCSI, FC, FCoE, and IP front-end ports
for direct block access to application servers and file access to NAS clients.

SAN NAS
Object
(iSCSI/FC/FCOE) (CIFS/NFS)
(REST/SOAP)

Block Request File Request Object Request

Block Storage Functionality NAS Functionality OSD Functionality

Unified Controller

Unified Storage

Information Storage and Management (ISM) v4

Page 460 © Copyright 2019 Dell Inc.


Object-Based and Unified Storage Lesson

Notes

For block-level access, the controller configures LUNs and presents them to
application servers and the LUNs presented to the application server appear as
local physical disks. A file system is configured on these LUNs at the server and is
made available to applications for storing data.

For NAS clients, the controller configures LUNs and creates a file system on these
LUNs and creates a NFS, CIFS, or mixed share, and exports the share to the
clients. Some storage vendors offer REST API to enable object-level access for
storing data from the web/cloud applications.

In some implementation, there are dedicated or separate controllers for block


functionality, NAS functionality, and object functionality.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 461


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Dell EMC Isilon
 Dell EMC ECS
 Dell EMC Unity

Information Storage and Management (ISM) v4

Page 462 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts in Practice

Concepts in Practice

Dell EMC Isilon

A scale-out NAS product powered by the OneFS operating environment. It enables


pooling multiple nodes together to construct a clustered NAS system. OneFS
creates a single file system that spans across all nodes in an Isilon cluster. Isilon
provides the capability to manage and store large (petabyte-scale), high-growth
data in a single system with the flexibility to meet a broad range of performance
requirements. Available in All-Flash, Hybrid and Archive platforms to support a
wide range of demanding file workloads.

Dell EMC ECS

Provides a hyper-scale storage infrastructure that is specifically designed to


support modern applications with unparalleled availability, protection, simplicity,
and scale. It provides universal accessibility with support for object, and HDFS.
ECS Appliance enables cloud service providers to deliver competitive cloud
storage services at scale. ECS provides a single platform for all web, mobile, Big
Data, and social media applications.

Dell EMC Unity

Delivers a full block and file unified environment in a single enclosure. The purpose
built Dell EMC Unity system can be configured as an All Flash system with only
solid state drives, or as a Hybrid system with a mix of solid state and spinning
media to deliver the best on both performance and economics. The Unisphere
management interface offers a consistent look and feel whether you are managing
block resources, file resources, or both. Dell EMC Unity offers multiple solutions to
address security and availability. Unified Snapshots provide point-in-time copies of
block and file data that can be used for backup and restoration purposes.
Asynchronous Replication offers an IP-based replication strategy within a system

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 463


Concepts in Practice Lesson

or between two systems. Synchronous Block Replication benefits FC environments


that are close together and require a zero data loss schema. Data at Rest
Encryption ensures user data on the system is protected from physical theft and
can stand in the place of drive disposal processes, such as shredding.

Information Storage and Management (ISM) v4

Page 464 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Assessment

1. Which file access method provides file sharing that is commonly used on UNIX
systems?

A. NTFS

B. NFS

C. CIFS

D. HDFS

2. Which type of storage device stores data on a flat address space based on its
content and attributes?

A. Block-based

B. Scale-up NAS

C. Scale-out NAS

D. Object-based

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 465


Summary

Summary

Information Storage and Management (ISM) v4

Page 466 © Copyright 2019 Dell Inc.


Software-Defined Storage and Networking

Introduction

This module presents on software-defined storage attributes and architecture. This


module also focuses on the functions of the control plane and of software-defined
storage. Further, this module focuses on the overview and architecture of software-
defined networking.

Upon completing this module, you will be able to:


 Explain software-defined storage attributes and architecture
 Explain functions of the control plane
 Explain extensibility of software-defined storage
 Explain overview and architecture of SDN

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 467


Software-Defined Storage (SDS) Lesson

Software-Defined Storage (SDS) Lesson

Introduction

This lesson presents the drivers the drivers, the attributes, and the architecture of
software-defined storage. Further, this lesson covers asset discovery, resource
abstraction, pooling, and resource provisioning for services. Finally, this lesson
covers the application programming interface (API) and RESTful API.

This lesson covers the following topics:


 List drivers for software-defined storage
 Explain attributes of software-defined storage and architecture
 Explain asset discovery, resource abstraction, pooling, and resource
provisioning
 Explain application programming interface (API) and RESTful API

Information Storage and Management (ISM) v4

Page 468 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Software-Defined Storage (SDS)

Video: Introduction to Software-Defined Storage

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 469


Software-Defined Storage (SDS) Lesson

Drivers for Software-Defined Storage

 In traditional environments, the creation of complex IT silos in data centers


leads to
 Management overhead, increased costs, and poor resource utilization
 In data centers, critical functionality and management tied to storage system
limits
 Resource sharing, automation, and standardization
 Traditional architecture makes it difficult to provide for:

 Data growth, scaling and self-service

Notes

In a traditional data center, there are several challenges in provisioning and


managing storage in an efficient and cost-effective manner. Some key challenges
are described here. In a traditional environment, each application type normally has
its own vertical stack of compute, networking, storage, and security. This leads to
the creation of a loose collection of IT silos, which increases the infrastructure’s
complexity. This challenges creates management overhead and increases
operating expenses. It also leads to poor resource utilization because capacity
cannot be shared across stacks. Data centers have multi-vendor, heterogeneous
storage systems, and each type of storage system (block-based, file-based, and
object-based) has its own unique value. However, critical functionality is often tied
to specific storage types, and each storage system commonly has its own
monitoring and management tools. There is limited resource sharing, no
centralized management, a little automation, and a lack of standards in this
environment. Application workload complexities and higher SLA demands pose a
further challenge to IT. IT finds it difficult to allocate storage to satisfy the capacity
requirements of applications in real time. There are also new requirements and
expectations for continuous access and delivery of resources as in a cloud
environment. Traditional environments are not architected for technologies such as
cloud computing, Big Data analytics, and mobile applications. Therefore, there are
several challenges in managing massive data growth, cost-effective scaling, and
providing self-service access to storage. These challenges have led to the advent
of the software-defined storage model.

Information Storage and Management (ISM) v4

Page 470 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

What Is Software-Defined Storage?

Definition: Software-Defined Storage (SDS)


Storage infrastructure managed and automated by software, which
pools heterogeneous storage resources, and dynamically allocates
them based on policy to match application needs.

 Abstracts the physical details of storage and delivers storage as software


 Supports multiple types of storage systems and access methods
 Enables storing data on both storage systems and commodity disks
 Provides a unified external view of storage infrastructure
 Enables building cost-effective hyperscale storage infrastructure

Notes

SDS abstracts heterogeneous storage systems and their underlying capabilities,


and pools the storage resources. Storage capacity is dynamically and automatically
allocated from the storage pools based on policies to match the needs of
applications. In general, SDS software abstracts the physical details of storage
(media, formats, location, low-level hardware configuration), and delivers storage
as software. A storage system is a combination of hardware and software. The
software stack exposes the data access method such as block, file, or object. This
software stack also uses persistent media such as HDD or SSD to store the data.
SDS software separates the software layer of a storage system from the hardware.
It supports combinations of multiple storage types and access methods, such as
block, file, and object. It enables storing data on both storage systems and
commodity disks, while providing a unified external view of storage. This
functionality allows organizations to reuse existing storage assets, and mix and
match them with commodity resources. Thus SDS serve data through a single
namespace and storage system spread across these different assets. For example,
in a data center that contains several distinct file servers, SDS can provide a global
file system, spanning the file servers and allowing location-independent file access.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 471


Software-Defined Storage (SDS) Lesson

SDS enables organizations to build modern, hyperscale storage infrastructure in a


cost-effective manner using standardized, commercial off-the-shelf components.
The components individually provide lower performance. However, at sufficient
scale and with the use of SDS software, the pool of components provides greater
capacity and performance characteristics.

Information Storage and Management (ISM) v4

Page 472 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Key Attributes of Software-Defined Storage

SDS transforms existing heterogeneous physical storage into a simple, extensible,


and open virtual storage platform. The key attributes of software-defined storage
are as follows:

Attribute Description

Storage abstraction and pooling Single large storage pool spanning


across the underlying storage
infrastructure

Automated, policy-driven storage Dynamic composition of storage


provisioning services based on application policies

Unified management Single control point for the entire


infrastructure

Self-service Users self-provision storage services


from a service catalog

Open and extensible Integration of external interfaces and


applications through the use of APIs

Notes

Additional details on the key attributes of software-defined storage are as follows:


Storage abstraction and pooling: SDS abstracts and pools storage resources
across heterogeneous storage infrastructure. SDS software creates a single large
storage pool with the underlying storage resources, from which several virtual
storage pools are created. SDS decouples the storage control path from the data
path. Applications connect to storage through the data path. Automated, policy-
driven storage provisioning: A “storage service” is some combination of capacity,
performance, protection, encryption, and replication. In the SDS model, storage
services are dynamically composed from available resources. SDS uses
application policies to create a “just-in-time” model for storage service delivery.
Storage assets and capabilities are configured and assigned to specific

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 473


Software-Defined Storage (SDS) Lesson

applications only when they are needed. If the policy changes, the storage
environment dynamically and automatically responds with the new requested
service level. Unified management: SDS provides a unified storage management
interface that provides an abstract view of the storage infrastructure. Unified
management provides a single control point for the entire infrastructure across all
physical and virtual resources. Self-service: Resource pooling enables multi-
tenancy, and automated storage provisioning enables self-service access to
storage resources. Users select storage services from a self-service catalog and
self-provision them. Open and extensible: An SDS environment is open and easy
to extend enabling new capabilities to be added. An extensible architecture enables
integrating multi-vendor storage, and external management interfaces and
applications into the SDS environment through the use of application programming
interfaces (APIs).

Information Storage and Management (ISM) v4

Page 474 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Software-Defined Storage Architecture

External Cloud / Object Storage


External Management Interface Written Applications Monitoring and Reporting Tools
Service

Software-Defined Storage

REST API

Policy-Driven Control Plane


Storage Management Data Services
Automation, Tenants, Self-Service, Provisioning Blocks, File, Object Performance, Protection, Mobility

Block Storage Object Storage


Virtual Storage Pools NAS Pool
Pool Pool

Commodity

Multi-Vendor Heterogeneous Storage System (Data Plane)

The image depicts the generic architecture of a software-defined storage


environment. Although the physical storage devices themselves are central to SDS,
they are not a part of the SDS environment. Physical storage may be block-based,
file-based, or object-based storage systems or commodity hardware.

Notes

The fundamental component of the SDS environment is the policy-driven control


plane, which manages and provisions storage. The control plane is implemented
through software called “SDS controller”, which is also termed as a “storage
engine” in some SDS products. The SDS controller is software that manages,
abstracts, pools, and automates the physical storage systems into policy-based
virtual storage pools. By using automation and orchestration, the controller enables
self-service access to a catalog of storage resources. Users provision storage
using data services, which may be block, file, or object services. An SDS controller
may provide either all or a part of the features and services that are shown in the
architecture. For example, an SDS controller may only support file and block data
services. Some controllers may also support the Hadoop Distributed File System
(HDFS). Some SDS products provide the feature of creating a block-based storage
pool from the local direct-attached storage (DAS) of x86-based commodity servers

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 475


Software-Defined Storage (SDS) Lesson

in a compute cluster. The storage pool is then shared among the servers in the
cluster. The REST API is the core interface to the SDS controller. All underlying
resources managed by the controller are accessible through the API. The REST
API makes the SDS environment open and extensible, which enables integration of
multi-vendor storage, external management tools, and written applications. The API
also integrates with monitoring and reporting tools. Further, the API provides
access to external cloud/object storage.

Information Storage and Management (ISM) v4

Page 476 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Compute-Based Storage Area Network

ETH/ IB

Compute Systems with DAS

Compute-base SAN Storage Pool

 A software-defined SAN created from direct-attached storage


 Creates a large block-based storage pool
 A client program on compute systems exposes shared block volumes
 Compute systems that contribute storage run a server program
 Server program performs I/O requested by client
 Metadata manager configures and monitors the compute-based SAN

Notes

A compute-based storage area network is a software-defined virtual SAN created


from the direct-attached storage located locally on the compute systems in a
cluster. A compute-based SAN software creates a large pool of block-based
storage that can be shared among the compute systems (or nodes) in the cluster.
This software creates a large-scale SAN without storage systems, and enables
using the local storage of existing compute systems. The convergence of storage
and compute ensures that the local storage on compute systems, which often go

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 477


Software-Defined Storage (SDS) Lesson

unused, is not wasted. A compute system that requires access to the block storage
volumes, runs a client program. The client program is a block device driver that
exposes shared block volumes to an application on the compute system. The
blocks that the client exposes can be blocks from anywhere within the compute-
based SAN. This process enables the application to issue an I/O request, and the
client fulfills it regardless of where the particular blocks reside. The client
communicates with other compute systems either over Ethernet (ETH) or Infiniband
(IB) – a high-speed, low latency communication standard for compute networking.
The compute systems that contribute their local storage to the shared storage pool
within the virtual SAN, run an instance of a server program. The server program
owns the local storage and performs I/O operations as requested by a client from a
compute system within the cluster. A compute-based SAN’s control component,
which is known as the metadata manager, serves as the monitoring and
configuration agent. It holds cluster-wide mapping information and monitors
capacity, performance, and load balancing. It is also responsible for decisions
regarding migration, rebuilds, and all system-related functions. The metadata
manager is not on the virtual SAN data path, and reads and writes do not traverse
the metadata manager. The metadata manager may communicate with other
compute-based SAN components within the cluster to perform system
maintenance and management operations but not data operations. The metadata
manager may run on a compute system within the compute-based SAN, or on an
external compute system.

Information Storage and Management (ISM) v4

Page 478 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Benefits of Software-Defined Storage

The key benefits of software-defined storage are described below:

Benefit Description

Simplified storage  Breaks down storage silos and their associated complexity
environment  Provides centralized management across all physical and
virtual storage environments
 Simplifies management by enabling administrators to
centralize storage management and provisioning tasks

Operational
efficiency  Automated policy-driven storage provisioning improves
quality of services, reduces errors, and lowers operational
cost
 Provides faster streamlined storage provisioning, which
enables new requirements to be satisfied more rapidly

Agility  Ability to deliver self-service access to storage through a


service catalog provides agility and reduces time-to-
market

Reusing existing  Supports multi-vendor storage systems and commodity


infrastructure hardware, which enables organizations to work with their
existing infrastructure and protects the current investments
of organizations

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 479


Software-Defined Storage (SDS) Lesson

Cloud support  Enables an enterprise data center to connect to external


cloud storage services for consuming services such as
cloud-based backup, and disaster recovery
 Facilitates extending object storage to existing file and
block-based storage, which enables organizations to
deploy mobile and cloud applications on their existing
infrastructure.

Information Storage and Management (ISM) v4

Page 480 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Control Plane Functions and User Interfaces

Key control plane functions are:

 Asset discovery
 Resource abstraction and pooling
 Provisioning resources for services

SDS controller provides two native user interfaces:

 Command-line interface (CLI)


 Graphical user interface (GUI)

 Has an administrator view and a user view

Notes

The control plane in software-defined storage is implemented by SDS controller


software, which enables storage management and provisioning. An SDS controller
commonly provides two native user interfaces: a command-line interface (CLI) and
a graphical user interface (GUI). Both the interfaces may either be integrated into
the controller, or may be external to it. If the native user interfaces are external, and
then they apply the REST API to interact with the controller. The CLI provides
granular access to the controller’s functions and more control over controller
operations as compared to the GUI.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 481


Software-Defined Storage (SDS) Lesson

Asset Discovery

 Controller automatically detects assets when they are added to the SDS
environment
 Controller obtains or confirms asset configuration information
 Examples of asset categories that can be discovered are:

 Storage systems
 Storage networks
 Compute systems and clusters
 Data protection solutions

Notes

An SDS controller automatically detects an asset when it is added to the SDS


environment. The controller uses the asset’s credentials to connect to it over the
network, and either obtains or confirms its configuration information. This process
is called “discovery”. Discovery can also be initiated manually to verify the status of
an asset. Examples of assets are storage systems, storage networks, compute
systems and clusters, and data protection solutions. If the asset is a storage
system, the controller collects information about the storage ports and the pools
that it provides. If the asset is a compute system, the controller discovers its
initiator ports. Clusters can also be discovered, enabling volumes to be provisioned
to the compute systems in the cluster. The controller can also discover the storage
area networks within a data center.

Information Storage and Management (ISM) v4

Page 482 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Resource Abstraction and Pooling

Data centers commonly contain many physical storage systems of different types
and often from multiple manufacturers. Each physical storage system must also be
individually managed, which is time consuming and error prone.

An SDS controller exposes the storage infrastructure through a simplified model,


hiding and handling details such as storage system and disk selection, LUN
creation, LUN masking, and the differences between the storage systems.

The SDS controller leverages the intelligence of individual storage systems. It


abstracts storage across the physical storage systems and manages individual
components. This functionality enables administrators and users to treat storage as
a large resource. It enables focusing just on the amount of storage needed, and the
performance and protection characteristics required.

Physical Storage Abstraction

Flash Flash Flash

Pool Pool Pool Pool Pool Pool Pool Pool


Pool
A B C A B C A C
B

FC FC FC

Pool Pool Pool


Pool Pool Pool Pool Pool Pool
A C A
B B C A B C

SATA SATA SATA

Pool Pool Pool Pool Pool


Pool Pool Pool Pool
A C A C C
B B A B

Physical Storage Abstracted Storage

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 483


Software-Defined Storage (SDS) Lesson

Resource Provisioning

Service Catalog and Self-Service

 Administrator creates storage services and organizes them into categories in a


service catalog
 Services are block, file, and object data services
 Administrator can restrict services to specific users
 Service catalog provides users with self-service access to predefined storage
services
 Users place service requests through the GUI or a client software
 SDS controller automates the provisioning of resources
 Administrators can view details of requests in real time

Block Data Service

 Provides a block volume of required size, and performance and protection


levels
 Examples of block services:

 Create a block volume


 Delete a block volume
 Bind a block volume to compute
 Unbind a block volume from compute
 Mount a block volume
 Unmount a block volume
 Expand a block volume

Information Storage and Management (ISM) v4

Page 484 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Notes

Service Catalog and Self-Service

After configuring the storage abstractions, an administrator customizes and


exposes storage services by creating service catalogs for tenants. The
administrator uses the GUI’s administrator view to create storage services and
organize them into categories in a service catalog. The service catalog provides the
tenant users with access to the set of predefined storage services. An administrator
can create different categories of services such as block service, file service, and
object service. The administrator can configure the different services within each
category, and also restrict them to specific users or user groups. The user view of
the GUI provides users within a tenant with access to their service catalog. The
user view presents all the services and categories that are available for
provisioning for a specific user. Users can request a service by simply clicking the
service and placing a request to run it. Some SDS platforms may not provide an
interface for users to request services, and require the use of external client
software. An SDS controller automates the provisioning of resources when a user
requests for a service. It employs a policy-based placement algorithm to find the
best fit in the infrastructure to fulfill user requests for data services. The SDS
controller uses orchestration for automating the provisioning process. Orchestration
uses workflows to automate the arrangement, coordination, and management of
various functions required to provision resources. As a result, provisioning does not
require administrator or user interaction.

Block Data Service

The block data service provides a block volume of required size, performance level,
and protection level to a user. Examples of the services that an administrator
defines in this service category are as follows: Create a block volume: A user can
create a block storage volume by selecting a virtual storage system and virtual
pool. On receiving the request, the SDS controller chooses the physical pool from
the selected virtual pool and storage system. It creates a block volume, which
corresponds to a LUN on the storage system. Delete a block volume: A user can
delete an existing volume. On receiving the request, the SDS controller destroys
the volume from the physical storage pool. Bind a block volume to compute: A user
can assign a block volume to a selected compute system/cluster. On receiving this
request, the SDS controller binds the block volume to the specified compute
system/cluster. However, the volume cannot be written to or read from unless it is

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 485


Software-Defined Storage (SDS) Lesson

mounted. Unbind block volume from compute: A user can unbind a volume from a
compute system/cluster. This block service simply makes the block volume invisible
to the compute. Mount a block volume: A user can mount a block volume on a
compute system/cluster. The SDS controller sends commands to the OS to mount
the volume. This operation is specific to the type of OS on the compute system
such as Windows, Linux, and ESXi. Unmount block volume: A user can unmount a
block volume from a compute system/cluster. On receiving the request, the SDS
controller sends commands to the compute to unmount the volume. Expand block
volume: A user can expand/extend a block volume by combining it either with a
newly created volume or with an existing volume. On receiving the request to
expand a volume, the SDS controller commands the storage system to expand the
LUN.

Information Storage and Management (ISM) v4

Page 486 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Software-Defined Storage Extensibility

Definition: Application Programming Interface (API)


A set of programmatic instructions and specifications that provides an
interface for software components to communicate with each other. It
specifies a set of routines that can be called from a software
component enabling interaction with the software providing the API.

 A set of programmatic instructions and specifications that provides an interface


for software components to communicate with each other. It specifies a set of
routines that can be called from a software component enabling interaction with
the software providing the API.
 Web-based APIs may be implemented as:

 Simple Object Access Protocol (SOAP) based web services


 Representational state transfer (REST) APIs

Notes

An API specifies a set of routines (operations), input parameters,


outputs/responses, datatype, and errors. The routines can be called from a
software component enabling it to interact with the software providing the API.
Thus, an API provides a programmable interface, which is a means for
communicating with an application without understanding its underlying
architecture. This functionality enables programmers to use the component-based
approach to build software systems. APIs may be pre-compiled code that is applied
in programming languages, and can also be web-based.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 487


Software-Defined Storage (SDS) Lesson

Need for APIs

 APIs enable integrating third-party data services and capabilities into existing
architecture
 In SDDC, APIs enable orchestration and provisioning resources from pools
 Ensures meeting the SLAs that organizations require
 In SDS, the REST API provides the interface to all underlying resources

 Enables storage provisioning, management, and metering


 Enables extension of functionality, and integration with external platforms
and applications

Notes

As modern technologies become more prevalent, the ability to dynamically adapt to


variations in application workloads and storage requirements is becoming
increasingly important. The next-generation software-defined data centers and
cloud stacks are powered by APIs. With advancements in technology, APIs are
providing improving communication and connectivity between IT systems, and
increasing agility through automation. APIs provide a flexible, easy-to-use means
for integrating third-party applications and capabilities into existing infrastructure.
This integration also provides a layer of security between public (external) and
private (internal) business capabilities. Further enabling organizations to provide
services in the way they see fit while offering end users various services. For
example, a public cloud storage provider may provide an API that allows a
consumer-written application to access and use cloud storage as regular storage.
Similarly, online social networks may provide APIs that enable developers to
access to the feeds of their users. Further, with the advent of the Internet of Things,
devices enabled with web-based APIs are becoming common. APIs enable the
smart devices to communicate with each other and with applications. In a software-
defined data center, APIs enable automated provisioning of resources from
compute, storage, and networking pools to ensure that SLAs are met. The use of
APIs is enabling software-defined storage to be easily managed and provisioned.
In SDS, the REST API provides the interface to all underlying resources.
Management interfaces use the API to provision, manage, monitor, and meter
logical storage resources. The API also provides a means to integrate with multi-

Information Storage and Management (ISM) v4

Page 488 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

vendor storage systems and external storage platforms. It also offers a


programmable environment enabling developers and users to extend SDS
functionality.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 489


Software-Defined Storage (SDS) Lesson

Representational State Transfer (REST)

 REST is a client/server software architecture style


 Leverages HTTP methods for client/server interaction
 Used for developing “RESTful” APIs
 Provides an easy means to consume services, and combine multiple web
resources into applications

Notes

Representational State Transfer (REST) is a client/server software architecture


approach that was originally introduced for building large-scale, distributed
hypermedia (for example, hypertext, audio, video, image, and text) systems. REST
is not a standard but rather an architectural style that has become a choice for
developing HTTP-based APIs called “RESTful” APIs. It leverages HTTP methods
such as GET, POST, PUT, DELETE for client/server interaction. It supports the
resource-oriented architecture for the development of scalable and lightweight web
applications while adhering to a set of constraints. REST-based communication
provides simple, human-readable data access methods. RESTful APIs do not
require XML-based web service protocols such as SOAP to support their light-
weight interfaces. However, they still support XML-based and JSON data formats.
These services provide an easy means to consume services, and support the
combination of multiple web resources into new applications. Recent trends reveal
increasing adoption of REST for developing APIs to provide simple and cost-
effective request-based services, and support the demand for real-time data.

Information Storage and Management (ISM) v4

Page 490 © Copyright 2019 Dell Inc.


Software-Defined Storage (SDS) Lesson

Integrating External Management Tools and Applications

External Management External Cloud/Object Storage Monitoring and Reporting


Interfaces and Applications Services Tools

REST API

SDS Controller

Storage Systems

The REST API enables the extensibility of the SDS functionality through integration
with written applications, and external management tools and cloud stacks such as
VMware, Microsoft, and OpenStack. This provides an alternative to provisioning
storage from the native management interface. The open platform enables users
and developers to write new data services. This enables building an open
development community around the platform.

The API also integrates with tools for monitoring and reporting system utilization,
performance, and health. This also enables generating chargeback/showback
reports. The API may also support cloud/object storage platforms such as, Amazon
S3, and OpenStack Swift. Further, the API may also support integration with HDFS
for running Hadoop applications.

The REST API :

 Describes the programmatic interfaces that allow users to create, read, update,
and delete resources through the HTTP methods PUT, GET, POST, and
DELETE
 Accessible using any web browser or programming platform that can issue
HTTP requests

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 491


Software-Defined Storage (SDS) Lesson

The browser may require a special plugin such as httpAnalyzer for Internet
Explorer, Poster for Firefox, and PostMan for Chrome. The REST API may also be
accessed using scripting platforms such as Perl. Vendors may also provide class
libraries that enable developers to write applications that access the SDS data
services.

Information Storage and Management (ISM) v4

Page 492 © Copyright 2019 Dell Inc.


Software-Defined Networking (SDN) Lesson

Software-Defined Networking (SDN) Lesson

Introduction

This lesson presents an overview of Software-Defined Networking (SDN),


architecture of SDN along with a use case.

This lesson covers the following topics:


 Overview of Software-Defined Networking (SDN)
 Architecture of SDN
 Use case of SDN

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 493


Software-Defined Networking (SDN) Lesson

Software-Defined Networking (SDN)

Software-Defined Networking Overview

Definition: Software-Defined Networking (SDN)


An approach to abstract and separate the control plane functions from
the data plane functions. Instead of the integrated control functions at
the network components level, the software external to the
components takes over the control functions. The software runs on a
compute system or a stand-alone device and is called network
controller.

VM VM VM

APP APP APP

OS OS OS

Control Plane

Network OS

Data Plane

Information Storage and Management (ISM) v4

Page 494 © Copyright 2019 Dell Inc.


Software-Defined Networking (SDN) Lesson

 Controller gathers configuration information from network components


 Controller provides instructions to data plane

Notes

Traditionally, a network component such as a switch or a router consists of a data


plane and a control plane. These planes are bundled together and implemented in
the firmware of the network components. The function of the data plane is to
transfer the network traffic from one physical port to another port by following rules
that are programmed into the component. The function of the control plane is to
provide the programming logic that the data plane follows for switching or routing of
the network traffic.

Software-defined networking is an approach to abstract and separate the control


plane functions from the data plane functions. Instead of the integrated control
functions at the network components level, the software external to the
components takes over the control functions. The software runs on a compute
system or a stand-alone device and is called network controller. The network
controller interacts with the network components to gather configuration information
and to provide instructions for data plane to handle the network traffic.

Software-defined networking versus network virtualization: Network virtualization is


a process of abstracting all the network components and their functions into
software. Whereas SDN does not virtualize all the network components, but moves
the decision making to a control plane. Based on the decision, the hardware
components execute the actions. Though they both allow for flexible network
operations, they perform different roles and functions.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 495


Software-Defined Networking (SDN) Lesson

Software-Defined Networking Architecture

Application Layer
Application Plane
Application Application

Northbound APIs

Control Layer
Control Plane
Controller Controller

Southbound APIs

Infrastructure Layer
Data Plane
Networking Device Networking Device

The architecture of SDN consists of three layers along with APIs in between to
define the communication.

 Infrastructure Layer: This layer consists of networking devices such as


switches and routers. It is responsible for handling data packets such as
forwarding or dropping of packets and handling the devices. This layer forms
the data plane and performs actions based on the instructions received.
 Control Layer: This layer consists of controllers and acts as a the brain of the
SDN architecture. It is responsible for making decisions such as how the
packets should be forwarded based on the requirements, and relays the
decisions to the networking devices (data plane) for execution. It also extracts
the information about the network from the data plane and communicates it to
the application layer. This layer forms the control plane.
 Application Layer: This layer consists of applications and services such as
business applications, and analytics that define the network behavior through
policies and also define the requirements. It communicates the requirements
through the APIs to the control layer. This layer forms the application plane of
the SDN architecture.
 APIs: in SDN architecture, APIs are referred as northbound interfaces and
southbound interfaces. Northbound interfaces define the communications

Information Storage and Management (ISM) v4

Page 496 © Copyright 2019 Dell Inc.


Software-Defined Networking (SDN) Lesson

between the controller and application layer. Southbound interfaces define the
communications between the control and infrastructure layer.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 497


Software-Defined Networking (SDN) Lesson

Software-Defined Networking Benefits

Software-defined networking in a SAN provides several benefits. These benefits


are:

Benefit Details

Centralized Control  Provides a single point of control for the entire network
infrastructure that may span across data centers
 Centralized control plane provides the programming logic
for transferring the network traffic, which can be uniformly
and quickly applied across the network infrastructure
 Programming logic can be upgraded centrally to add new
features based on application requirements.

Policy-based  Many hardware-based network management operations


Automation such as zoning can be automated
 Management operations may be programmed in the
network controller based on business policies and best
practices
 Reduces the need for manual operations that are
repetitive, error-prone, and time-consuming
 Helps to standardize the management operations

Simplified, Agile  Network controller usually provides a management


Management interface that includes a limited and standardized set of
management functions
 Management functions are available in a simplified form,
abstracting the underlying operational complexity
 Makes it easier to configure a network infrastructure and
to modify the network configuration to respond to changing
application requirements

Information Storage and Management (ISM) v4

Page 498 © Copyright 2019 Dell Inc.


Software-Defined Networking (SDN) Lesson

Notes

 Centralized Control: The software-defined approach provides a single point of


control for the entire network infrastructure that may span across data centers.
The centralized control plane provides the programming logic for transferring
the network traffic, which can be uniformly and quickly applied across the
network infrastructure. The programming logic can be upgraded centrally to add
new features based on application requirements.
 Policy-based Automation: With a software-defined approach, many hardware-
based network management operations such as zoning can be automated.
Management operations may be programmed in the network controller based
on business policies and best practices. This process reduces the need for
manual operations that are repetitive, error-prone, and time-consuming. Policy-
based automation also helps to standardize the management operations.
 Simplified, Agile Management: The network controller usually provides a
management interface that includes a limited and standardized set of
management functions. With policy-based automation in place, these
management functions are available in a simplified form, abstracting the
underlying operational complexity. This process makes it easy to configure a
network infrastructure and to modify the network configuration to respond to
changing application requirements.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 499


Software-Defined Networking (SDN) Lesson

Software-Defined Networking Use Case

Listed some common use cases where SDN is used to strengthen the security,
automate the processes for faster provisioning of network resources and enable
business continuity.

Use Case Details

Data Center  Security against lateral movements


Security  Visibility of trends using analytics such as switch data
 Security policies and control for each workload

Automation  Automated network provisioning


 Programmatically control the entire network environment

Business  Hybrid cloud initiatives


Continuity  Disaster recovery

Note: Micro-segmentation is a method of isolating and securing the workloads by


defining various security policies and controls for each workload.

Notes

 Data Center Security: Protecting information is a strategic necessity for


organizations. With SDN, organizations protect data through embedded
security, to prevent credential stealing and computer infiltration for both the
physical and virtual layers. It enables visibility of trends using analytics available
that offer insight into switch traffic. Micro-segmentation feature of SDN lets
organizations define security policies and controls for each workload based on
dynamic security groups. This process helps to ensure immediate responses to
threats inside the data center.
 Automation: Many organizations cannot change their networks fast enough to
keep up with new applications and workloads. With SDN, Organizations can

Information Storage and Management (ISM) v4

Page 500 © Copyright 2019 Dell Inc.


Software-Defined Networking (SDN) Lesson

bring up workloads in seconds or minutes using automated network


provisioning. There is no need to make major revisions to the physical network
every time the organization introduces an application or service. Changes can
be quickly made through software and require few, if any, cabling updates. IT
can programmatically create, snapshot, store, move, delete, and restore entire
networking environments with simplicity and speed. This automation of
networking tasks benefits both new application deployments as well as changes
to existing applications in the IT infrastructure.
 Business Continuity: SDN also simplifies and accelerates private and hybrid
cloud initiatives. Organizations can rapidly develop, automatically deliver, and
manage all their enterprise applications, whether they reside on-premises or off-
premises, from a single unified platform. IT can easily replicate entire
application environments to remote data centers for disaster recovery. It can
also move them from one corporate data center to another or deploy them into
a hybrid cloud environment, without disrupting the applications or touching the
physical network.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 501


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Dell EMC ViPR Controller
 Dell EMC VxFlex OS
 VMware NSX

Information Storage and Management (ISM) v4

Page 502 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts in Practice

Concepts in Practice

Dell EMC ViPR Controller

A software-defined storage platform that abstracts, pools, and automates a data


center’s physical storage infrastructure. It delivers block and file storage services
on demand through a self-service catalog. It supports data protection across
geographically dispersed data centers. It provides a single control plane to manage
heterogeneous storage environments, including Dell EMC and non-Dell EMC block
and file storage.

ViPR Controller also provides a REST-based API making the storage architecture
extensible. It supports multiple vendors enabling organizations to choose storage
platforms from either Dell EMC or third-party. It also supports different cloud stacks
such as VMware, Microsoft, and OpenStack. ViPR Controller development is driven
by the open-source community, which enables expanding its features and
functionalities.

Dell EMC VxFlex OS

Software that creates a server and IP-based SAN from direct-attached server
storage to deliver flexible and scalable performance and capacity on demand. As
an alternative to a traditional SAN infrastructure, VxFlex OS combines HDDs,
SSDs, and PCIe flash cards to create a virtual pool of block storage with varying
performance tiers. It decouples compute and storage, and scales each resource
together or independently to drive maximum efficiency and to eliminate wasted
CAPEX at scale. Distributed I/O Parallelism vs. Data Locality: uses all resources to
deliver against all I/O requests to drive massive performance. Eliminates
bottlenecks and scales performance linearly.

VxFlex OS is built for workload variability and consolidates many workloads onto a
single system with consistent performance for all. For storage utilization , VxFlex
OS is completely agnostic because the OS and Hypervisor enable the sharing of
storage resources across multiple operating systems/clusters. Regarding

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 503


Concepts in Practice Lesson

compute/RAM utilization, VxFlex OS is extremely lightweight. It provides massive


CAPEX savings at scale for core data center workloads.

VMware NSX

A network virtualization platform for SDDC architecture. It is a reproduction of the


network and its services, in a virtualized environment. NSX provides software that
represents logical network components such as switches, routers, distributed
services for firewalls, load balancers, and VPN. It reproduces Layer 2 to Layer 7
networking services that include switching, routing, firewalling, and load balancing
in software.

VMware NSX lets you create, delete, save, and restore networks without changing
the physical network. This process reduces the time to provision by simplifying
overall network operations. NSX Manager is integrated with vCenter for single pane
management and all these network resources can be deployed whether in a cloud
or a self-service portal environment.

Information Storage and Management (ISM) v4

Page 504 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Assessment

1. Which product creates IP-based SAN from direct attached server storage ?

A. Dell EMC VxFlex OS

B. VMware NSX

C. VMware vSphere

D. Dell EMC NetWorker

2. Which layer represents the ‘brain’ of SDN architecture?

A. Control

B. Infrastructure

C. Application

D. API

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 505


Summary

Summary

Information Storage and Management (ISM) v4

Page 506 © Copyright 2019 Dell Inc.


Introduction to Business Continuity

Introduction

Upon completing this module, you will be able to:


 Explain business continuity (BC) and information availability
 Describe the causes and impact of information unavailability
 List various BC technology solutions

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 507


Business Continuity Overview Lesson

Business Continuity Overview Lesson

Introduction

This lesson presents importance of business continuity, causes and impact of


information unavailability, and measurement of information availability. This lesson
also focuses on RPO and RTO, disaster recovery, and various BC technology
solutions.

This lesson covers the following topics:


 Importance of business continuity
 Causes and impact of information unavailability
 Measurement of information availability
 Recovery Point Objective (RPO) and Recovery Time Objective (RTO)
 Disaster recovery
 Business continuity technology solutions

Information Storage and Management (ISM) v4

Page 508 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

Business Continuity Overview

Video: Business Continuity Overview

The video is located at


https://edutube.emc.com/Player.aspx?vno=0bTlYemOy9CMIfw5AoCgQw

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 509


Business Continuity Overview Lesson

Business Continuity

Definition: Business Continuity (BC)


Process that prepares for, responds to, and recovers from a system
outage that can adversely affect business operations.

 BC process enables continuous availability of information and services in the


event of failure to meet the required SLA
 BC involves various proactive and reactive countermeasures
 It is important to automate BC process to reduce the manual intervention
 Goal of BC solution is to ensure information availability

Notes

Business continuity (BC) is a set of processes that includes all activities that a
business must perform to mitigate the impact of planned and unplanned downtime.
BC entails preparing for, responding to, and recovering from a system outage that
adversely affects business operations. It describes the processes and procedures
an organization establishes to ensure that essential functions can continue during
and after a disaster.

Business continuity prevents interruption of mission-critical services, and


reestablishes the impacted services as swiftly and smoothly as possible by using
an automated process. BC involves proactive measures such as business impact
analysis, risk assessment, building resilient IT infrastructure, deploying data
protection solutions (backup and replication). It also involves reactive
countermeasures such as disaster recovery.

In a modern data center, policy-based services can be created that include data
protection through the self-service portal. Consumers can select the class of
service that best meets their performance, cost, and protection requirements on
demand. Once the service is activated, the underlying data protection solutions that
are required to support the service is automatically invoked to meet the required
data protection.

Information Storage and Management (ISM) v4

Page 510 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

For example: If a service requires VM backup for every six hours and then backing
up VM is scheduled automatically every six hours.The goal of a BC solution is to
ensure “information availability” required to conduct vital business operations.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 511


Business Continuity Overview Lesson

Importance of Business Continuity

Today, businesses rely on information more than ever. Continuous access to


information is a must for the smooth functioning of business operations for any
organization.

Listed are some important factors:

Application Business applications rely on data protection techniques for


Dependency uninterrupted and reliable access to data

High-risk Data Organizations seek to protect their sensitive data to reduce the
risk of financial, legal, and business loss

Data Legal requirements mandate protection against unauthorized


Protection modification, loss, and unlawful processing of personal data
Laws

Notes

For business applications, it is essential to have uninterrupted, fast, reliable, and


secure access to data for enabling these applications to provide services. This
access, in turn, relies on how well the infrastructure and data is protected and
managed.

Data is the most valuable asset for an organization. An organization can use its
data to efficiently bill customers, advertise relevant products to the existing and
potential customers. It also enables organizations to launch new products and
services, and perform trend analysis to devise targeted marketing plans. These
sensitive data, if lost, may lead to significant financial, legal, and business loss
apart from serious damage to the reputation of an organization. An organization
seeks to reduce the risk of sensitive data loss to operate its business successfully.
It should focus its protection efforts where the need exists—its high-risk data.

Information Storage and Management (ISM) v4

Page 512 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

Many government laws mandate that an organization must be responsible for


protecting its employee’s and customer’s personal data. The data should be safe
from unauthorized modification, loss, and unlawful processing. Examples of such
laws are U.S. Health Insurance Portability and Accountability Act (HIPAA), U.S.
Gramm-Leach-Bliley Act (GLBA), and U.K. Data Protection Act. An organization
must be proficient at protecting and managing personal data in compliance with
legal requirements.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 513


Business Continuity Overview Lesson

Information Availability

Definition: Information Availability (IA)


The ability of an IT infrastructure to function according to business
requirements and customer expectations, during its specified time of
operation.

Information Availability can be defined in terms of:

Accessibility Information should be accessible to the right user when required.

Reliability Information should be reliable and correct in all aspects. It is “the


same” as what was stored and there is no alternation or corruption
to the information.

Timeliness Defines the time window (a particular time of the day, week, month,
and year as specified) during which information must be
accessible.
For example: if online access to an application is required between
8:00 am and 10:00 pm each day, any disruption to data availability
outside of this time slot is not considered to affect timeliness.

Information Storage and Management (ISM) v4

Page 514 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

Causes of Information Unavailability

 Application failure (for example: due to catastrophic exceptions caused by bad


logic)
 Data loss
 Infrastructure component failure (for example: due to power failure or disaster)
 Data center or site down
 For example: due to power failure or disaster
 Refreshing IT infrastructure

Notes

Data center failure due to disaster (natural or man-made disasters such as flood,
fire, earthquake, and so on) is not the only cause of information unavailability. Poor
application design or resource configuration errors can lead to information
unavailability. For example, if the database server is down for some reason, then
the data is inaccessible to the consumers, which leads to IT service outage.

Even the unavailability of data due to several factors (data corruption and human
error) leads to outage. The IT department is routinely required to take on activities
such as refreshing the data center infrastructure, migration, running routine
maintenance, or even relocating to a new data center. Any of these activities can
have its own significant and negative impact on information availability.

Note: In general, the outages can be broadly categorized into planned and
unplanned outages.

 Planned outages may include installation and maintenance of new hardware,


software upgrades or patches, performing application and data restores, facility
operations (renovation and construction), and migration.
 Unplanned outages include failure caused by human errors, database
corruption, failure of physical and virtual components, and natural or human-
made disasters.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 515


Business Continuity Overview Lesson

Impact of Information Unavailability

An IT service outage, due to information unavailability, results in loss of


productivity, loss of revenue, poor financial performance, and damages to
reputation. The loss of revenue includes direct loss, compensatory payments,
future revenue loss, billing loss, and investment loss. The damages to reputations
may result in a loss of confidence or credibility with customers, suppliers, financial
markets, banks, and business partners. The other possible consequences of
outage include the cost of extra rented equipment, overtime, and extra shipping.

Lost Productivity Lost Revenue


Know the downtime costs (per hour, day, two - Direct loss
- Number of employees impacted x
days, and so on.)
hours out x hourly rate
- Compensatory payments
- Lost future revenue
- Billing losses
- Investment losses

Financial Performance
Damaged Reputation
- Customers - Revenue recognition

- Suppliers - Cash flow

- Financial markets - Lost discounts (A/P)

- Banks - Payment guarantees

- Business partners - Credit rating


- Stock price
Other Expenses

- Temporary employees, equipment rentals, overtime costs, extra shipping


costs, travel expenses, and so on.

Information Storage and Management (ISM) v4

Page 516 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

Measurement of Information Availability

Information availability relies on the availability of both physical and virtual


components of a data center. The failure of these components might disrupt
information availability. A failure is the termination of a component’s ability to
perform a required function.

The component’s ability can be restored by performing various external corrective


actions, such as a manual reboot, a repair, or replacement of the failed
component(s). Proactive risk analysis, performed as part of the BC planning
process, considers the component failure rate and average repair time, which are
measured by MTBF and MTTR.

Time to repair or 'downtime'

Response Time Recovery Time

Detection Repair Restoration Incident

Incident Diagnosis Recovery Time

Detection Repair time Time between failures or 'uptime'


elapsed time

MTBF: Average time available for a system or component to perform its normal
operations between failures
 MTBF = Total uptime / Number of failures

MTTR: Average time required to repair a failed component


 MTTR = Total downtime / Number of failures

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 517


Business Continuity Overview Lesson

IA = MTBF / (MTBF + MTTR) or IA = Uptime / (Uptime + Downtime)

Information Storage and Management (ISM) v4

Page 518 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

Key BC Concepts: RPO and RTO

Recovery Point Objectives (RPO) Recovery Time Objectives (RTO)

Point-in-time to which data must be Time within which systems and


recovered. applications must be recovered.

Disaster

Time

RPO = Amount of data loss that a RTO = Amount of downtime that a


business can endure business can endure

Notes

When designing an information availability strategy for an application or a service,


organizations must consider two important parameters that are closely associated
with recovery.

 Recovery Point Objective: RPO is the point-in-time to which data must be


recovered after an outage. It defines the amount of data loss that a business
can endure. Based on the RPO, organizations plan for the frequency with which
a backup or replica must be made. For example, if the RPO of a particular
business application is 24 hours, then backups are created every midnight. The
corresponding recovery strategy is to restore data from the set of last backups.
An organization can plan for an appropriate BC solution on the basis of the
RPO it sets.
 Recovery Time Objective: RTO is the time within which systems and
applications must be recovered after an outage. It defines the amount of
downtime that a business can endure and survive. Based on the RTO, an
organization can decide which BC technology is best suited. The more critical
the application, the lower the RTO should be.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 519


Business Continuity Overview Lesson

Both RPO and RTO are counted in minutes, hours, or days and are directly related
to the criticality of the IT service and data. Usually, the lower the RTO and RPO,
the higher is the cost of a BC solution or technology.

Information Storage and Management (ISM) v4

Page 520 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

BC Planning Lifecycle

BC planning must follow a disciplined approach like any other planning process.
Organizations today dedicate specialized resources to develop and maintain BC
plans. From the conceptualization to the realization of the BC plan, a lifecycle of
activities can be defined for the BC process. The BC planning lifecycle includes five
stages:

Establish Objectives

 Determine BC requirements
 Estimate the scope and budget to achieve requirements
 Select a BC team that includes subject matter experts from all areas of
business, whether internal or external
 Create BC policies

Analyze

 Collect information on data profiles, business processes, infrastructure support,


dependencies, and frequency of using business infrastructure
 Conduct a business impact analysis
 Identify critical business processes and assign recovery priorities
 Perform risk analysis for critical functions and create mitigation strategies
 Perform cost benefit analysis for available solutions based on the mitigation
strategy
 Evaluate options

Design and Develop

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 521


Business Continuity Overview Lesson

 Define the team structure and assign individual roles and responsibilities; for
example, different teams are formed for activities such as emergency response
and infrastructure and application recovery
 Design data protection strategies and develop infrastructure
 Develop contingency solution and emergency response procedures
 Detail recovery and restart procedures

Implement

 Implement risk management and mitigation procedures that include backup,


replication, and management of resources
 Prepare the DR sites that can be utilized if a disaster affects the primary data
center. The DR site could be one of the organization’s own data center or could
be a cloud
 Implement redundancy for every resource in a data center to avoid single points
of failure

Train, Test, Assess, and Maintain

 Train the employees who are responsible for backup and replication of
business-critical data on a regular basis or whenever there is a modification in
the BC plan
 Train employees on emergency response procedures when disasters are
declared
 Train the recovery team on recovery procedures based on contingency
scenarios
 Perform damage-assessment processes and review recovery plans
 Test the BC plan regularly to evaluate its performance and identify its limitations
 Assess the performance reports and identify limitations
 Update the BC plans and recovery/restart procedures to reflect regular changes
within the data center

Information Storage and Management (ISM) v4

Page 522 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 523


Business Continuity Overview Lesson

Key BC Concepts: Disaster Recovery

Definition: Disaster Recovery (DR)


A part of BC process, which involves a set of policies and procedures
for restoring IT infrastructure, including data that is required to support
ongoing IT services, after a natural or human-induced disaster occurs.

A disaster may impact the ability of a data center to remain up and provide services
to users. This disaster may cause information unavailability. Disaster recovery (DR)
mitigates the risk of information unavailability due to a disaster. It involves a set of
policies and procedures for restoring IT infrastructure including data. This
infrastructure and data are required to support the ongoing IT services after a
disaster occurs.

Before Disaster After Disaster

Data Access

Data Access

Replication

Primary Site DR Site Primary Site DR Site

Notes

The fundamental principle of DR is to maintain a secondary data center or site,


called a DR site. The primary data center and the DR data center should be located
in different geographical regions to avoid the impact of a regional disaster. The DR
site must house a complete copy of the production data. Commonly, all production
data is replicated from the primary site to the DR site either continuously or
periodically. A backup copy can also be maintained at the DR site. Usually, the IT
infrastructure at the primary site is unlikely to be restored within a short time after a
catastrophic event.

Organizations often keep their DR site ready to restart business operations if there
is an outage at the primary data center. This may require the maintenance of a

Information Storage and Management (ISM) v4

Page 524 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

complete set of IT resources at the DR site that matches the IT resources at the
primary site. Organization can either build their own DR site, or they can use cloud
to build DR site.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 525


Business Continuity Overview Lesson

Business Continuity Technology Solutions

Implementing fault tolerance mechanism

Deploying data protection solutions

High Availability and Data


Protection Solutions

Automatic failover mechanisms

Architecting resilient modern applications

Notes

With the aim of meeting the required information and service availability, the
organizations should build a resilient IT infrastructure. Building a resilient IT
infrastructure requires the following high availability and data protection solutions:

 Deploying redundancy at both the IT infrastructure component level and the site
level to avoid single point of failure
 Deploying data protection solutions such as backup, replication, migration, and
archiving
 Automatic failover mechanism is one of the important methods as well. It is one
the efficient and cost effective way to ensure HA. For example, scripts can be
defined to bring up a new VM automatically when the current VM stops
responding or goes down.
 Architecting resilient modern applications

For example: when a disaster occurred at one of the data centers of an


organization, the BC triggers the DR process. This process typically involves both
manual and automated procedure to reactivate the service (application) at a
functioning data center. This reactivation of service requires the transfer of

Information Storage and Management (ISM) v4

Page 526 © Copyright 2019 Dell Inc.


Business Continuity Overview Lesson

application users, VMs, data, and services to the new data center. This process
involves the use of redundant infrastructure across different geographic locations,
live migration, backup, and replication solutions.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 527


Business Continuity Overview Lesson

Video: Business Continuity Solutions

The video is located at


https://edutube.emc.com/Player.aspx?vno=ENZH/0JvsHUSzSGDh4zpcg

Information Storage and Management (ISM) v4

Page 528 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Business Continuity Fault Tolerance Lesson

Introduction

This lesson presents key requirements for fault tolerance. This lesson also focuses
on component-level and site-level fault tolerance techniques.

This lesson covers the following topics:


 Key requirements for fault tolerance
 Component-level fault tolerance techniques
 Site-level fault tolerance techniques

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 529


Business Continuity Fault Tolerance Lesson

Fault Tolerance IT Infrastructure

Fault Tolerance IT Infrastructure Overview

Definition: Fault Tolerance


Ability of an IT system to continue functioning in the event of a failure.

Fault tolerance ensures that a single fault or failure does not make an entire system
or a service unavailable. It protects an IT system or a service against various types
of unavailability.

Transient Unavailability
Deals Due That
with to Hardware Outage Cause
Fault Fault
Software Issues User
Tolerance Failure Intermittent Unavailability
Errors

Permanent Unavailability

Notes

A fault may cause a complete outage of a component or cause a faulty component


to run but only to produce incorrect or degraded output. The common reasons for a
fault or a failure are: hardware failure, software issue, and administrator/user
errors. Fault tolerance ensures that a single fault or failure does not make an entire
system or a service unavailable.

Fault tolerance protects an IT system or a service against the following types of


unavailability:
 Transient unavailability: It occurs once for short time and then disappears. For
example, an online transaction times out but works fine when a user retries the
operation.

Information Storage and Management (ISM) v4

Page 530 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

 Intermittent unavailability: It is a recurring unavailability that is characterized by


an outage and then availability again and then another outage, and so on.
 Permanent unavailability: It exists until the faulty component is repaired or
replaced. Examples of permanent unavailability are network link outage,
application issues, and manufacturing defects.

Fault tolerance may be provided by software, hardware, or a combination of both.


The closer an organization reaches 100 percent fault tolerance, the more costly is
the infrastructure

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 531


Business Continuity Fault Tolerance Lesson

Key Requirements for Fault Tolerance

A fault tolerant IT infrastructure should meet two key requirements such as fault
isolation and eliminating single points of failure (SPOF).

Fault Isolation

Key Requirements for Fault


Tolerance

Eliminating Single Points of Failure

Information Storage and Management (ISM) v4

Page 532 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Fault Isolation

Fault isolation limits the scope of a fault into local area so that the other areas of a
system are not impacted by the fault. It does not prevent failure of a component but
ensures that the failure does not impact the overall system.

Fault isolation requires a fault detection mechanism that identifies the location of a
fault and a contained system design (like sandbox) that prevents a faulty system
component from impacting other components.

Isolated Dead Path


VM VM

HBA SAN Storage


= Points of Fault
Port

Hypervisor

Compute System HBA SAN Storage


Port
Pending I/Os are Live Path
Storage
Redirected to Live Path System

The example represents two I/O paths between a compute system and a storage
system. The compute system uses both the paths to send I/O requests to the
storage system. If an error or fault occurs on a path causing a path failure, the fault
isolation mechanism present in the environment automatically detects the failed
path. It isolates the failed path from the set of available paths and marks it as a
dead path to avoid sending the pending I/Os through it. All pending I/Os are
redirected to the live path. This helps avoiding the time-out and the retry delays.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 533


Business Continuity Fault Tolerance Lesson

Single Point of Failure

Definition: Single Point of Failure


Refers to any individual component or aspect of an infrastructure
whose failure can make the entire system or service unavailable.

Single point of failure may occur at infrastructure component-level and site-level


(data center).

SPOF at Storage-level

SPOF at Network-
level

VM VM

SPOF at Site-level

SPOF at Compute-
level Hypervisor

FC Switch
Storage System
Compute System Data Center

The illustration provides an example where various IT infrastructure components,


including the compute system, VM instance, network devices, storage, and site
itself, become a single point of failure. Assume that a web application runs on a VM
instance and it uses a database server which runs on another VM to store and
retrieve application data. If the database server is down and then the application
would not be able to access the data and in turn would impact the availability of the
service.

Consider another example where a group of compute systems is networked


through a single FC switch. The switch would present a single point of failure. If the
switch failed, all of the compute systems connected to that switch would become
inaccessible and result in service unavailability. It is important for organizations to
build a fault tolerance IT infrastructure that eliminates single points of failure in the
environment.

Information Storage and Management (ISM) v4

Page 534 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Eliminating Single Points of Failure

 Single points of failure can be avoided by implementing fault tolerance


mechanisms such as redundancy
 Implement redundancy at component level
 Compute
 Network
 Storage
 Implement multiple availability zones
 Avoid single points of failure at data center (site) level
 It is important to have high availability mechanisms that enable automated
application/service failover

It is important to have high availability mechanisms that enable automated


application/service failover

Notes

Highly available infrastructures are typically configured without single points of


failure to ensure that individual component failures do not result in service outages.
The general method to avoid single points of failure is to provide redundant
components for each necessary resource, so that a service can continue with the
available resource even if a component fails.

Organizations may also create multiple availability zones to avoid single points of
failure at data center level. Usually, each zone is isolated from others, so that the
failure of one zone would not impact the other zones. It is important to have high
availability mechanisms that enable automated application/service failover within
and across the zones if there is a component failure or disaster.

Note:

N+1 redundancy is a common form of fault tolerance mechanism that ensures


service availability if there is a component failure. A set of N components has at
least one standby component. This approach is typically implemented as an
active/passive arrangement, as the additional component does not actively

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 535


Business Continuity Fault Tolerance Lesson

participate in the service operations. The standby component is active only if any
one of the active components fails.

N+1 redundancy with active/active component configuration is also available. In


such cases all the component remains active. For example, if an active/active
configuration is implemented at the site level and then a service is fully deployed in
both the sites. The load for this service is balanced between the sites. If one of the
sites is down, the available site would manage the service operations and manage
the workload.

Information Storage and Management (ISM) v4

Page 536 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Implementing Redundancy at Component-Level

Organizations should follow stringent guidelines to implement fault tolerance in


their data centers for uninterrupted services. The underlying IT infrastructure
components (compute, storage, and network) should be highly available and the
single points of failure at the component level should be avoided.

Clustered Redundant Techniques to protect Compute/Network/Storage: Clustering, VM live


Compute System Links migration, Link and switch aggregation, NIC teaming, Multipathing,
VM VM Configuring redundant hot swappable components, RAID and Erasure
Coding, Dynamic disk sparing, Configuring redundant storage system
components

Hypervisor

NIC Teaming
Clients
Remote Site
Redundant
HBAs

Redundant FC
LAN/WAN
Switches

VM VM
APP APP
Redundant
OS OS
Network VMM VMM
Hypervisor Kernel
Redundant
Redundant
NIC Teaming Ports
Storage
System

Notes

The example represents an infrastructure that is designed to mitigate the single


points of failure at component level. The single points of failure at the compute level
can be avoided by implementing redundant compute systems in a clustered
configuration. Single points of failure at the network level can be avoided through
path and node redundancy and various fault tolerance protocols.

Multiple independent paths can be configured between nodes so that if a


component along the main path fails, traffic is rerouted along another path. The key
techniques for protecting storage from single points of failure are RAID, erasure
coding techniques, dynamic disk sparing, and configuring redundant storage
system components. Many storage systems also support redundant array
independent nodes (RAIN) architecture to improve the fault tolerance.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 537


Business Continuity Fault Tolerance Lesson

Compute Clustering

Service Failover

Compute Cluster

Heartbeat Signal

 Two or more compute systems/hypervisors are clustered to provide high


availability and load balancing
 Service running on a failed compute system moves to another compute system
 Two common clustering implementations are:

 Active/active
 Active/passive

Notes

Compute clustering is one of the key fault tolerance mechanisms. It provide


continuous availability of service even when a VM instance, physical compute
systems, operating system, or hypervisor fails.

Clustering is a technique where at least two compute systems (or nodes) work
together and are viewed as a single compute system to provide high availability
and load balancing. If one of the compute systems fails, the service running in the
compute system can failover to another compute system in the cluster. This
method minimizes or avoids any outage.

The two common cluster implementations are active/active and active/passive.

Information Storage and Management (ISM) v4

Page 538 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

 In active/active clustering, the nodes in a cluster are all active participants and
run the same service of their clients. The active/active cluster balances requests
for service among the nodes. If one of the nodes fails, the surviving nodes take
the load of the failed one. This method enhances both the performance and the
availability of a service. The nodes in the cluster have access to shared storage
volumes. In active/active clustering only one node can write or update the data
in a shared file system or database at a given time.
 In active/passive clustering, the service runs on one or more nodes and the
passive node waits for a failover. If the active node fails, the service that had
been running on the active node is failed over to the passive node.
Active/passive clustering does not provide performance improvement like
active/active clustering.

Clustering uses a heartbeat mechanism to determine the health of each node in the
cluster. The exchange of heartbeat signals, usually happens over a private network
enables participating cluster members to monitor one another’s status.

Clustering can be implemented between multiple physical compute systems, or


between multiple VMs, or between VM and physical compute system, or between
multiple hypervisors.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 539


Business Continuity Fault Tolerance Lesson

Compute Cluster Example

 Multiple hypervisors running on different systems are clustered


 Provides continuous availability of services running on VMs

Record
Logging Traffic Reply SecondaryV
VM
Events VM

Primary VM Events M

Hypervisor Hypervisor Clustering Hypervisor

Acknowledgement

Network

Storage System

Notes

The illustration shows an example of clustering where multiple hypervisors running


on different compute systems are clustered. They are accessing hypervisor’s native
file system which is a clustered file system that enables multiple hypervisors to
access the same shared storage resources concurrently. This method provides
high availability for services running on VMs by pooling the VMs and compute
systems that reside on into a cluster.

If a physical compute system running a VM fails, the VM is restarted on another


compute system in the cluster. This method provides rapid recovery of services
running on VMs if there is a compute system failure. In some hypervisor cluster
implementations, the hypervisor uses its native technique to provide continuous
availability of services running on VMs even if a physical compute system or a
hypervisor fails.

Information Storage and Management (ISM) v4

Page 540 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

In this implementation, a live instance (a secondary VM) of a primary VM is created


on another compute system. The primary and secondary VMs exchange
heartbeats. If the primary VM fails due to hardware failure, the clustering enables
failover to the secondary VM immediately. After a transparent failover occurs, a
new secondary VM is created and redundancy is reestablished.

The hypervisor running the primary VM as shown in the illustration captures the
sequence of events for the primary VM. This includes instructions from the virtual
I/O devices, virtual NICs, and so on. Then it transfers these sequences to the
hypervisor running on another compute system. The hypervisor running the
secondary VM receives these event sequences and sends them to the secondary
VM for execution.

The primary and the secondary VMs share the same storage, but all output
operations are performed only by the primary VM. A locking mechanism ensures
that the secondary VM does not perform write operations on the shared storage.
The hypervisor posts all events to the secondary VM at the same execution point
as they occurred on the primary VM. This way, these VMs “play” the same set of
events and their states are synchronized with each other.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 541


Business Continuity Fault Tolerance Lesson

Network Fault Tolerance Mechanisms

A short-time network interruption could impact plenty of services running in a data


center environment. So, the network infrastructure must be fully redundant and
highly available with no single points of failure. The following techniques provide
fault tolerance mechanism against link failure:

Link Aggregation

FC Switch Link Aggregation FC Switch

 Combines links between two switches and also between a switch and a node
 Enables network traffic failover in the event of a link failure in the aggregation

NIC Teaming

Physical Compute System

Load Distribution and


Failover

Teaming Software

Physical Switch
Logical NIC

Physical NIC

Information Storage and Management (ISM) v4

Page 542 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

 Groups NICs so that they appear as a single, logical NIC to the operation
system or hypervisor
 Provides network traffic failover in the event of a NIC/link failure
 Distributes network traffic across NICs

Multipathing

 Enables a compute system to


use multiple paths for
Physical Compute System
transferring data to a LUN
 Enables failover by redirecting
Hypervisor
I/O from a failed path to
another active path Multipathing Software

 Performs load balancing by


HBA1 HBA2
distributing I/O across active
paths Path1 Path3
Path2 Path4
Elastic Load Balancing

 Enables dynamic distribution of


application and client I/O traffic
 Dynamically scales resources FC Switch FC Switch
(VM instances) to meet traffic
demands SC1 SC2

 Provides fault tolerance


capability by detecting the
unhealthy VM instances and
Storage
automatically redirects the I/Os System
LUN
to other healthy VM instances

Notes

A short-time network interruption could impact plenty of services running in a data


center environment. So, the network infrastructure must be fully redundant and
highly available with no single points of failure. The techniques such as link

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 543


Business Continuity Fault Tolerance Lesson

aggregation, NIC teaming, multipathing, and load balancing provide fault tolerance
mechanism against link failure.

 Link aggregation combines two or more network links into a single logical link,
called port channel, yielding higher bandwidth than a single link could provide.
Link aggregation enables distribution of network traffic across the links and
traffic failover if there is a link failure. If a link in the aggregation is lost, all
network traffic on that link is redistributed across the remaining links.
 NIC teaming groups NICs so that they appear as a single, logical NIC to the OS
or hypervisor. NIC teaming provides network traffic failover to prevent
connectivity loss if there is a NIC failure or a network link outage. Sometimes,
NIC teaming enables aggregation of network bandwidth of individual NICs. The
bandwidth aggregation facilitates distribution of network traffic across NICs in
the team.

Multipathing enables organizations to meet aggressive availability and performance


service levels. It enables a compute system to use multiple paths for transferring
data to a LUN on a storage system. Multipathing enables automated path failover.
It eliminates the possibility of disrupting an application or service due to the failure
of an adapter, cable, port, and so on. When path failover happens all outstanding
and subsequent I/O requests are automatically directed to alternative paths.

To use multipathing, multiple paths must exist between the compute and the
storage systems. Each path can be configured as either active or standby. If one or
more active paths fail then standby paths become active. If an active path fails, the
multipathing process detects the failed path and then redirects I/Os of the failed
path to another active path.

Multipathing can be an integrated operating system and hypervisor function. It can


also be a third party software module that can be installed to the operating system
or hypervisor. The illustration shows a configuration where four paths between the
physical compute system (with dual-port HBAs) and the LUN enable multipathing.
Multipathing can perform load balancing by distributing I/O across all active paths.

Elastic load balancing enables dynamic distribution of application and client I/O
traffic among VM instances. It dynamically scales resources (VM instances) to
meet traffic demands. Load balancer provides fault tolerance capability by
detecting the unhealthy VM instances and automatically redirects the I/Os to other
healthy VM instances.

Information Storage and Management (ISM) v4

Page 544 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 545


Business Continuity Fault Tolerance Lesson

Storage Fault Tolerance Mechanisms

Data centers comprise storage systems with a large number of disk drives, and
solid state drives. This storage systems support various applications and services
running in the environment. The failure of these drives could result in data loss and
information unavailability. The greater the number of drives in use the greater is the
probability of a drive failure.

The following techniques provide data protection in the event of drive failure:

RAID

 Provides data protection against one or two drive failures

– Example: RAID 6 (dual distributed parity), where data is protected against


two disk failures

A1 A2 Ap Aq

B1 Bp Bq B2

Cp Cq C1 C2

RAID6 - Dual Distributed Parity

Erasure Coding

Erasure Coding: Provides space-optimal data redundancy to protect data loss


against multiple drive failure

Dynamic Disk Sparing

Dynamic Disk Sparing


 Automatically replaces a failed drive with a spare drive to protect against data
loss
 Multiple spare drives can be configured to improve availability

Information Storage and Management (ISM) v4

Page 546 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Storage Virtualization

Storage Resiliency using Virtualization

VM VM

Hypervisor

Virtual Volume I/Os

Virtualization
Appliance

Storage Pool

SAN

LUN LUN

Storage Mirror Legs Storage


System System

 Virtual volume is created using virtualization appliance


 Each I/O to the volume is mirrored to the LUNs on the storage systems

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 547


Business Continuity Fault Tolerance Lesson

 Virtual volume is continuously available to compute system


 Even if one of the storage systems is unavailable due to failure

The illustration provides an example of a virtual volume that is mirrored between


LUNs of two different storage systems. Each I/O to the virtual volume is mirrored to
the underlying LUNs on the storage systems. If one of the storage systems incurs
an outage due to failure or maintenance, the virtualization appliance will be able to
continue processing I/O on the surviving mirror leg. Upon restoration of the failed
storage system, the data from the surviving LUN is resynchronized to the
recovered leg. This method provides protection and high availability for critical
services if there is a storage system failure.

Notes

Dynamic disk sparing is a fault tolerance mechanism that refers to a spare drive
which automatically replaces a failed disk drive by taking the identity of it. A spare
drive should be large enough to accommodate data from a failed drive. Some
systems implement multiple spare drives to improve data availability.

In dynamic disk sparing, when the recoverable error rates for a disk exceed a
predetermined threshold, the disk subsystem tries to copy data from the failing disk
to the spare drive automatically. If this task is completed before the damaged disk
fails, the subsystem switches to the spare disk and marks the failing disk as
unusable. Otherwise, it uses parity or the mirrored disk to recover the data.

Storage resiliency can be achieved by using a storage virtualization appliance. A


virtualization layer that is created at SAN using virtualization appliance abstracts
the identity of physical storage devices and creates a storage pool from
heterogeneous storage systems. Virtual volume is created from the storage pool
and assigned to the compute system.

Instead of being directed to the LUNs on the individual storage systems, the
compute systems are directed to the virtual volume provided by the virtualization
layer

Information Storage and Management (ISM) v4

Page 548 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Fault Tolerance at Site-Level – Availability Zones

An availability zone is a location with its own set of resources and isolated from
other zones.A zone can be an entire data center or a part of the data center
 Enables running multiple service instances within and across zones to survive
data center or site failure
 If there is an outage, the service should seamlessly failover across the zones

Zones within a particular region are typically connected through low-latency


network for enabling faster service failover.

Notes

An important high availability design best practice is to create availability zones. An


availability zone is a location with its own set of resources and isolated from other
zones. Therefore, a failure in one zone will not impact other zones. A zone can be
a part of a data center or may even be an entire data center.

This method provides redundant computing facilities on which applications or


services can be deployed. Organizations can deploy multiple zones within a data
center (to run multiple instances of a service), so that if one of the zones incurs an
outage due to some reason, the service can be failed over to the other zone.

For example, if two compute systems are deployed, one in zone A and the other in
zone B, and then the probability that both go down simultaneously due to an
external event is low. This simple strategy enables the organization to construct
highly reliable web services by placing compute systems into multiple zones. So
the failure of one zone does not disrupt the service, or at the least, enable to rapidly
reconstruct the service in the second zone.

Organizations also deploy multiple zones across geographically dispersed data


centers (to run multiple instances of a service). This method helps the services to
survive even if the failure is at the data center level.

It is also important that there should be a mechanism that enables seamless


(automated) failover of services running in one zone to another. Automated failover
provides a reduced RTO when compared to the manual process. A failover process
also depends upon other capabilities, including replication and live migration
capabilities, and reliable network infrastructure between the zones.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 549


Business Continuity Fault Tolerance Lesson

Information Storage and Management (ISM) v4

Page 550 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Fault Tolerance at Site-Level – Example

High availability can be achieved by moving services across zones that are located
in different locations without user interruption. The services can be moved across
zones by implementing stretched cluster.

A stretched cluster is a cluster with compute systems in different remote locations


provide DR capability if there is a disaster in one of the data centers. Stretched
clusters are typically built as a way to create active/active zones to provide high
availability and enable dynamic workload balancing across zones.

Zone A Zone B
VM VM VM VM

Compute Stretched Cluster Hypervisor Compute


Hypervisor
System System

I/Os I/Os

FC SAN FC SAN

Virtual Volume
Virtualization Virtualization
Appliance Appliance

FC/IP

Storage Pool

Virtualization
Layer

Storage Storage
System LUN LUN System

Notes

The illustration also shows that a virtual volume is created from the federated
storage resources across zones. The virtualization appliance has the ability to

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 551


Business Continuity Fault Tolerance Lesson

mirror the data of a virtual volume between the LUNs located in two different
storage systems at different locations.

Each I/O from a host to the virtual volume is mirrored to the underlying LUNs on the
storage systems. If an outage occurs at one of the data centers, for example at
zone A, then the running VMs at zone A can be restarted at Zone B without
impacting the service availability.

This setup also enables accessing the storage even if one of the storage systems
is unavailable. If storage system at zone A is unavailable, then the hypervisor
running there still accesses the virtual volume. The hypervisor can access the data
from the available storage system at zone B.

Information Storage and Management (ISM) v4

Page 552 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

Resilient Application Overview

 Applications have to be designed to deal with IT resource’s failure to guarantee


the required availability
 Fault resilient applications have logic to detect and handle transient fault
conditions to avoid application downtime
 Examples of key application design strategies for improving availability:

 Graceful degradation of application functionality


 Retry logic in application code
 Persistent application state model

Notes

Today, organizations typically build their IT infrastructure using commodity systems


to achieve scalability and keep hardware costs down. In this environment, it is
assumed that some components will fail. Therefore, in the design of an application
the failure of individual resources often has to be anticipated to ensure an
acceptable availability of the application.

A reliable application properly manages the failure of one or more modules and
continues operating properly. If a failed operation is retried a few milliseconds later,
the operation may succeed. These types of error conditions are called as transient
faults. Fault resilient applications have logic to detect and handle transient fault
conditions in order to avoid application downtime.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 553


Business Continuity Fault Tolerance Lesson

Key Application Design Strategies for Improving Availability

Graceful Degradation

 Application maintains limited functionality even when some of the modules or


supporting services are not available
 Unavailability of certain application components or modules should not bring
down the entire application

Fault Detection and Retry Logic

 Refers to a mechanism that implements a logic in the code of an application to


improve the availability
 To detect and retry the service that is temporarily down; may result in
successful restore of service

Persistent Application State Model

 Application state information is stored out of the memory


 Stored in a data repository
 If an instance fails, the state information is still available in the repository

Notes

Graceful degradation refers to the ability of an application to maintain limited


functionality even when some of the components, modules, or supporting services
are not available. The purpose of graceful degradation of application functionality is
to prevent the complete failure of a business application.

For example, consider an eCommerce application that consists of modules such as


product catalog, shopping cart, order status, order submission, and order

Information Storage and Management (ISM) v4

Page 554 © Copyright 2019 Dell Inc.


Business Continuity Fault Tolerance Lesson

processing. Assume that due to some problem the payment gateway is


unavailable. It is impossible for the order processing module of the application to
continue. If the application is not designed to handle this scenario, the entire
application might go offline.

However, in this same scenario, it is still possible to make the product catalog
module available to consumers, to view the product catalog. The application could
also enable one to place the order and move it into the shopping cart. This method
provides the ability to process the orders when the payment gateway is available or
after failing over to a secondary payment gateway.

A key mechanism in an application design is to implement retry logic within a code


to handle a service that is temporarily down. When applications use other services,
errors can occur because of temporary conditions such as intermittent service,
infrastructure-level faults, or network issues. Often, this form of problem can be
solved by retrying the operation a few milliseconds later, and the operation may
succeed.

To implement the retry logic in an application, it is important to detect and identify


that particular exception which is likely to be caused by a transient fault condition. A
retry strategy must be defined to state how many retries can be attempted before
deciding that the fault is not transient.

In a stateful application model, the session state information of an application (for


example user ID, selected products in a shopping cart, and so on) is stored in
compute system memory. However, the information that is stored in the memory
can be lost if there is an outage with the compute system where the application
runs.

In a persistent application state model, the state information is stored out of the
memory and is stored in a repository (database). If a VM running the application
instance fails, the state information is still available in the repository. A new
application instance is created on another VM which can access the state
information from the database and resume the processing.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 555


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Dell EMC PowerPath
 VMware HA
 VMware FT

Information Storage and Management (ISM) v4

Page 556 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts In Practice

Concepts in Practice

Dell EMC PowerPath

 Host-based multipathing software


 Provides path failover and load-balancing functionality
 Automatic detection and recovery from host-to-array path failures
 PowerPath/VE software enables optimizing virtual environments with
PowerPath multipathing features

The video is located at


https://edutube.emc.com/Player.aspx?vno=YzeGufRuF1rmd9Sw11ku3A

A family of software products that ensures consistent application availability and


performance across I/O paths on physical and virtual platforms. It provides
automated path management and tools that enable to satisfy aggressive SLAs
without investing in more infrastructure.

Dell EMC PowerPath/VE is compatible with VMware vSphere and Microsoft Hyper-
V-based virtual environments. It can be used together with Dell EMC PowerPath to
perform the following functions in both physical and virtual environments:

 Standardize Path Management: Optimize I/O paths in physical and virtual


environments (PowerPath/VE) and cloud deployments
 Optimize Load Balancing: Adjust I/O paths to dynamically rebalance your
application environment for peak performance
 Automate Failover/Recovery: Define failover and recovery rules that route
application requests to alternative resources in the event of component failures
or user errors

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 557


Concepts in Practice Lesson

Concepts in Practice

VMware HA

 Provides high availability for applications running in virtual machines


 If there is a fault in a physical compute system, then the affected VMs are
automatically restarted on other compute systems

VMware FT

 Provides continuous availability for application in the event of server failure


 Creates a live shadow instance of a VM that is in virtual lockstep with the
primary instance
 FT eliminates even the smallest chance of data loss or disruption

VMware HA

VMware HA provides high availability for applications running in VMs. If there is a


fault in a physical compute system, then the affected VMs are automatically
restarted on other compute systems.

VMware HA minimizes unplanned downtime and IT service disruption while


eliminating the need for dedicated standby hardware and installation of additional
software.

VMware FT

VMware FT provides continuous availability for applications in the event of server


failures. It creates a live shadow instance of a VM that is in virtual lockstep with the
primary VM instance.

VMware FT is used to prevent application disruption due to hardware failures. The


downtime that is associated with mission-critical applications can be expensive and
disruptive to businesses. By enabling instantaneous failover between the two

Information Storage and Management (ISM) v4

Page 558 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

instances in the event of hardware failure, FT eliminates even the smallest chance
of data loss or disruption.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 559


Concepts in Practice Lesson

Assessment

1. Which defines the amount of data loss that a business can endure?

A. RTO

B. RPO

C. Persistent state model

D. Availability zone

2. Which refers to the ability of an application to maintain limited functionality even


when some of the components, modules, or supporting services are not
available?

A. Graceful degradation

B. Retry logic

C. Partial mesh topology

D. Core-edge topology

Information Storage and Management (ISM) v4

Page 560 © Copyright 2019 Dell Inc.


Summary

Summary

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 561


Data Protection Solutions

Introduction

This module presents need for backup, various backup methods and deduplication
implementation. This module also focuses on different replication types, data
archiving solution, and data migration solution.

Upon completing this module, you will be able to:


 Explain various backup methods
 Describe deduplication
 Explain different replication methodsü
 Describe data archiving
 Explain data migration

Information Storage and Management (ISM) v4

Page 562 © Copyright 2019 Dell Inc.


Replication Lesson

Replication Lesson

Introduction

This lesson presents the primary uses of replica, and characteristics of replica. This
lesson also focuses on replication types.

This lesson covers the following topics:


 Primary uses of replica
 Characteristics of replica
 Types of replication

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 563


Replication Lesson

Replication

Video: Replication Overview

The video is located at


https://edutube.emc.com/Player.aspx?vno=IJ7983uhVoaiWLfvjhttlg

Information Storage and Management (ISM) v4

Page 564 © Copyright 2019 Dell Inc.


Replication Lesson

Introduction to Data Replication

Definition: Data Replication


A process of creating an exact copy (replica) of the data to ensure
business continuity in the event of a local outage or disaster.

 Replicas are used to restore and restart operations if data loss occurs
 Data can be replicated to one or more locations based on the business
requirements

Data Center A
Data
Replication

Replica

Data Center B

Servers

Connectivity

Data
Replication

Storage

Primary Storage Replica

Cloud
Data Replication
to Cloud

Data Replication

Notes

Data is one of the most valuable assets of any organization. It is being stored,
mined, transformed, and used continuously. It is a critical component in the
operation and function of organizations. Outages, whatever may be the cause, are
costly, and customers are always concerned about data availability. Safeguarding
and keeping the data highly available are some of the top priorities of any
organization.

To avoid disruptions in business operations, it is necessary to implement data


protection technologies in a data center. A data replication solution is one of the

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 565


Replication Lesson

key data protection solutions that enables organizations to achieve business


continuity, high availability, and data protection.

Data replication is the process of creating an exact copy (replica) of data. If a data
loss occurs, then the replicas are used to restore and restart operations. For
example, if a production VM goes down and then the replica VM can be used to
restart the production operations with minimum disruption. Based on business
requirements, data can be replicated to one or more locations.

For example, data can be replicated within a data center, between data centers,
from a data center to a cloud, or between clouds.In a replication environment, a
compute system accessing the production data from one or more LUNs on storage
system is called a production compute system. These LUNs are known as source
LUNs, production LUNs, or the source. A LUN on which the production data is
replicated to is called the target LUN or the target or replica.

Information Storage and Management (ISM) v4

Page 566 © Copyright 2019 Dell Inc.


Replication Lesson

Primary Uses of Replicas

Replicas are created for various purposes which include the following:

Replication Can act as a source for backup

Data
Replication Can be used to restart business
operations or to recover the data

Replication Used for running decision support


activities

Source
Replication
Used for testing applications

Replication
Data migration

Replica

Notes

Alternative Source for Backup

Under normal backup operations, data is read from the production LUNs and
written to the backup device. This places an extra burden on the production
infrastructure because production LUNs are simultaneously involved in production
operations and servicing data for backup operations.

To avoid this situation, a replica can be created from production LUN and it can be
used as a source to perform backup operations. This method alleviates the backup
I/O workload on the production LUNs.

Fast Recovery and Restart

For critical applications, replicas can be taken at short, regular intervals. This
enables fast recovery from data loss. If a complete failure of the source LUN
occurs, the replication solution enables to restart the production operation on the
replica. This approach reduces the RTO.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 567


Replication Lesson

Decision-Support Activities

Running reports using the data on the replicas greatly reduces the I/O burden on
the production device.

Testing Platform

Replicas are also used for testing new applications or upgrades.

For example, an organization may use the replica to test the production application
upgrade. If the test is successful, the upgrade may be implemented on the
production environment.

Data Migration

Another use for a replica is data migration. Data migrations are performed for
various reasons such as migrating from a smaller capacity LUN to one of a larger
capacity.

Information Storage and Management (ISM) v4

Page 568 © Copyright 2019 Dell Inc.


Replication Lesson

Replica Characteristics and Types

Replica Characteristics Replica Types

Recoverability/Restartability Point-in-Time (PIT)


 Replica could restore data to the source device  Nonzero RPO
 Restart business operation from replica

Consistency Continuous
 Ensures the usability of a replica  Near-zero RPO
 Replica must be consistent with the source

Notes

A replica should have the following characteristics:

Recoverability Enables restoration of data from the replicas to the source if data
loss occurs.

Restartability Enables restarting business operations using the replicas.

Consistency Replica must be consistent with the source so that it is usable for
both recovery and restart operations.
For example, if a service running on a primary data center is to fail
over to remote site due to disaster. There must be a consistent
replica available at that site. So, ensuring consistency is the
primary requirement for all the replication technologies.Replicas
can either be point-in-time (PIT) or continuous and the choice of
replica ties back into RPO.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 569


Replication Lesson

PIT replica The data on the replica is an identical image of the production at
some specific timestamp.
For example, a replica of a file system is created at 4:00 PM on
Monday. This replica would then be referred to as the Monday
4:00 PM PIT copy. The RPO maps to the time when the PIT was
created to the time when any kind of failure on the production
occurred. If there is a failure on the production at 8:00 PM and
there is a 4:00 PM PIT available, the RPO would be 4 hours (8-
4=4). To minimize RPO, take periodic PITs.

Continuous The data on the replica is in-sync with the production data always.
replica The objective with any continuous replication is to reduce the
RPO to zero or near-zero.

Information Storage and Management (ISM) v4

Page 570 © Copyright 2019 Dell Inc.


Replication Lesson

Types of Replication

Replication can be classified into two major categories:

Local Replication

 Refers to
Data is replicated within a storage
replicating data Storage system in a storage-based
System replication
within the same
location
 Within a data Storage
Data is replicated within a data
center from one system to another in
System
center in a compute-based replication

compute-based
replication Data Center

 Within a
storage system in storage system-based replication
 Typically used for operational restore of data if there is a data loss

Remote Replication

 Refers to
replicating data Data is replicated to
remote data center
to remote
locations
Storage System Storage System
(locations can be
geographically Data Center A Data Center B
dispersed)
 Data can be
synchronously or asynchronously replicated
 Helps to mitigate the risks associated with regional outages
 Enables organizations to replicate the data to cloud for DR purpose

Notes

Local replication is the process of replicating data within the same storage system
or the same data center.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 571


Replication Lesson

Local replicas help to restore the data if there is a data loss or enable restarting the
application immediately to ensure business continuity.

Remote replication is the process of replicating data to remote locations (locations


can be geographically dispersed).

Remote replication helps organizations to mitigate the risks that are associated with
regional outages resulting from natural or human-made disasters. During disasters,
the services can be moved to a remote location to ensure continuous business
operation.

Remote replication also enables organizations to replicate their data to the cloud
for DR purpose. In a remote replication, data can be synchronously or
asynchronously replicated.

Information Storage and Management (ISM) v4

Page 572 © Copyright 2019 Dell Inc.


Replication Lesson

Video: Storage-Based Replication

The video is located at


https://edutube.emc.com/Player.aspx?vno=DuaE7ghKQBfkz+5UGLQiDg

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 573


Replication Lesson

Local Replication: VM Snapshot

 A VM snapshot preserves the state and data of a VM at a specific PIT


– The state includes the VM's power state, for example: powered-on,
powered-off, or suspended
 The data includes all the files that make up the VM

– This includes disks, memory, and other devices, such as virtual network
interface cards
– This VM snapshot is useful for quick restore of a VM
For example:
 An administrator can create a snapshot of a VM, make changes such as
applying patches and software upgrades to the VM
 If anything goes wrong, the administrator can restore the VM to its previous
state using the VM snapshot
 The hypervisor provides an option to create and manage multiple snapshots
 Taking multiple snapshots provide several restore points for a VM
 While more snapshots improve the resiliency of the infrastructure, it is important
to consider the storage space they consume

Notes

When a snapshot is created for a VM, a child virtual disk (delta disk file) is created
from the base image or parent virtual disk. The snapshot mechanism prevents the
guest operating system from writing to the base image or parent virtual disk.
Instead it directs all writes to the delta disk file. Successive snapshots generate a
new child virtual disk from the last child virtual disk in the chain. Snapshots hold
only changed blocks.

Sometimes it may be required to retain a snapshot for longer period. It must be


noted that larger snapshots take longer time to commit and may impact the
performance. Source (parent VM) must be healthy in order to use snapshot for roll
back.

Information Storage and Management (ISM) v4

Page 574 © Copyright 2019 Dell Inc.


Replication Lesson

Local Replication: VM Snapshot Example

 Child virtual disks store all the changes that are made to the parent VM after
snapshots are created
 When committing snapshot 3, the data on child virtual disk file 1 and 2 are
committed prior to committing data on child virtual disk 3 to the parent virtual
disk file

 After committing the data, the child virtual disk 1, 2, and 3 are deleted
 However, while rolling back to the snapshot 1, child disk file 1 is retained
and the snapshots 2 and 3 are discarded
VM

VM writes here

Changed blocks
Snapshot 3 (Child of snapshot 2
Virtual Disk 3)
and base image

Snapshot 2 (Child Changed blocks


Virtual Disk 2) of snapshot 1

Snapshot 1 (Child Changed blocks


Virtual Disk 1) of base image

Base Image (Parent


Virtual Disk)

VM virtual disk Storage

Notes

Consider an example in which three snapshots of a VM are created as shown on


the slide. In this example, child virtual disk 1 stores all the changes that are made
to the parent VM after snapshot 1 is created. Similarly, child virtual disk 2 and child
virtual disk 3 store all the changes after snapshot 2 and snapshot 3 are created
respectively. When committing snapshot 3 for the VM, the data on child virtual disk
file 1 and 2 are committed prior to committing data on child virtual disk 3 to the

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 575


Replication Lesson

parent virtual disk file. After committing the data, the child virtual disk 1, child virtual
disk 2, and child virtual disk 3 are deleted. However, while rolling back to the
snapshot 1(PIT), child disk file 1 is retained and the snapshots 2 and 3 are
discarded.Sometimes it may be required to retain a snapshot for longer period. It
must be noted that larger snapshots take longer time to commit and may impact
the performance. Source (parent VM) must be healthy in order to use snapshot for
roll back.

Information Storage and Management (ISM) v4

Page 576 © Copyright 2019 Dell Inc.


Replication Lesson

Local Replication: Storage System-Based Snapshot - RoW

 Redirects new writes that are VM VM

destined for the source LUN to Target


a reserved LUN in the storage Compute
System
pool Hypervisor

VM VM

 Replica (snapshot) still points


Source
to the source LUN Compute
System
Hypervisor

 All reads from replica are All reads from replica


Writes are served from
served from the source LUN source

Reads

Notes Source

Storage system-based snapshot is Snapshot

a space optimal pointer-based


virtual replication. At the time of New data written
to new location
replication session activation, the
target (snapshot) contains pointers Reserved LUN

to the location of the data on the


Storage Pool
source. The snapshot does not
contain data at any time. The
snapshot is known as a virtual
replica.

The snapshot is immediately accessible after the replication session activation.


Snapshot is typically recommended when the changes to the source are less than
30 percent. Multiple snapshots can be created from the same source LUN for
various business requirements.

Some snapshot software provides the capability of automatic termination of a


snapshot upon reaching the expiration date. This approach is useful where a rolling
snapshot might be taken and then automatically removed after its time of
usefulness has passed. The unavailability of the source device invalidates the data
on the target. The storage system-based snapshot uses a Redirect on Write (RoW)
mechanism.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 577


Replication Lesson

Some pointer-based virtual replication implementations use redirect on write


technology (RoW). RoW redirects new writes that are destined for the source LUN
to a reserved LUN in the storage pool. In RoW, a new write from source compute
system is written to a new location (redirected) inside the pool. The original data
remains where it is, and is untouched by the RoW process. In a RoW snapshot, the
original data remains where it is, and is therefore read from the original location on
the source LUN.

Information Storage and Management (ISM) v4

Page 578 © Copyright 2019 Dell Inc.


Replication Lesson

Local Replication: Clone

 Cloning provides the ability to create fully populated point-in-time copies of


LUNs within a storage system or create a copy of an existing VM
 Clone of a storage volume
 Initial synchronization is performed between the source LUN and the replica
(clone)
 Changes made to both the source and the replica can be tracked at some
predefined granularity
 VM clone

 Clone is a copy of an existing virtual machine (parent VM)


o The clone VM’s MAC address is different from the parent VM
 Typically clones are deployed when many identical VMs are required
o Reduces the time that is required to deploy a new VM

Notes

Cloning provides the ability to create fully populated point-in-time copies of LUNs
within a storage system or create a copy of an existing VM.

Clone of a storage volume:


 When the replication session is started, an initial synchronization is performed
between the source LUN and the replica (clone). Synchronization is the process
of copying data from the source LUN to the clone. During synchronization
process, the replica is not available for any compute system access. Once the
synchronization is completed, the replica is exactly same as source LUN. The
replica can be detached from the source LUN. It can be made available to
another compute system for business operations. Subsequent synchronizations
involve only a copy of any data that has changed on the source LUN since the
previous synchronization.
 Typically after detachment, changes made to both the source and replica can
be tracked at some predefined granularity. This approach enables incremental
resynchronization (source to target) or incremental restore (target to source).
The clone must be the same size as the source LUN.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 579


Replication Lesson

VM Clone:
 A VM clone is a copy of an existing VM. The existing VM is called the parent of
the clone. When the cloning operation completes, the clone becomes a
separate VM. The changes made to a clone do not affect the parent VM.
Changes made to the parent VM do not appear in a clone. A clone's MAC
address is different from that of the parent VM.
 In general, installing a guest operating system and applications on a VM is a
time consuming task. With clones, administrators can make many copies of a
virtual machine from a single installation and configuration process. For
example, in an organization, the administrator can clone a VM for each new
employee, with a suite of preconfigured software applications.

Information Storage and Management (ISM) v4

Page 580 © Copyright 2019 Dell Inc.


Replication Lesson

Remote Replication: Synchronous

 Write is committed to both the source and the remote replica before it is
acknowledged to the compute system
 Enables to restart business operations at a remote site with zero data loss;
Provides near zero RPO

VM VM

Production
1.The write I/O is received from production compute
Compute Hypervisor
System system into cache of source and placed in queue

2.The write I/O is transmitted to the cache of the target


1 4 storage

3.Receipt acknowledgment is provided by target


2
Storage storage back to cache of the source
Storage (Target)
(Source)
3 4.Source storage system sends an acknowledgment back
to the production compute system

Source Site Remote Site

Notes

Storage-based remote replication solution can avoid downtime by enabling


business operations at remote sites. Storage-based synchronous remote
replication provides near zero RPO where the target is identical to the source
always.

In synchronous replication, writes must be committed to the source and the remote
target prior to acknowledging “write complete” to the production compute system.
Additional writes on the source cannot occur until each preceding write has been
completed and acknowledged.

This approach ensures that data is identical on the source and the target at all
times. Further, writes are transmitted to the remote site exactly in the order in which
they are received at the source. Write ordering is maintained and it ensures
transactional consistency when the applications are restarted at the remote
location. As a result, the remote images are always restartable copies.

 Note: Application response time is increased with synchronous remote


replication. Since, writes must be committed on both the source and the target
before sending the “write complete” acknowledgment to the compute system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 581


Replication Lesson

The degree of impact on response time depends primarily on the distance and
the network bandwidth between sites. If the bandwidth provided for
synchronous remote replication is less than the maximum write workload, there
will be times during the day when the response time might be excessively
elongated, causing applications to time out. The distances over which
synchronous replication can be deployed depend on the application’s capability
to tolerate the extensions in response time. Typically synchronous remote
replication is deployed for distances less than 200 kilometers (125 miles)
between the two sites.

Information Storage and Management (ISM) v4

Page 582 © Copyright 2019 Dell Inc.


Replication Lesson

Remote Replication: Asynchronous

A write is committed to the source and immediately acknowledged to the compute


system:
 Data is buffered at the source and sent to the remote site periodically
 Applications write response time is not dependent on the latency of the link
 Replica is behind the source by a finite amount (finite RPO)

VM VM

Production
Compute 1.The write I/O is received from production compute system
System Hypervisor into cache of source and placed in queue

2. Receipt acknowledgment is provided by source storage


1 2 back to production compute system

3. The write I/O is transmitted to the cache of the target


3
storage
Storage
Storage (Target)
(Source) 4. Target acknowledges back to source
4

Source Site Remote Site

Notes

It is important for an organization to replicate data across geographical locations to


mitigate the risk involved during disaster. If the data is replicated (synchronously)
between sites and the disaster strikes, then there would be a chance that both the
sites may be impacted. This leads to data loss and service outage.

Replicating data across sites which are 1000s of kilometers apart would help
organization to face any disaster. If a disaster strikes at one of the regions then the
data would still be available in another region. The service could move to the
location. Asynchronous replication enables to replicate data across sites which are
1000s of kilometers apart.

In asynchronous remote replication, a write from a production compute system is


committed to the source and immediately acknowledged to the compute system.
Asynchronous replication also mitigates the impact to the application’s response
time because the writes are acknowledged immediately to the compute system.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 583


Replication Lesson

This method enables replicating data over distances of up to several thousand


kilometers between the source site and the secondary site (remote locations). In
this replication, the required bandwidth can be provisioned equal to or greater than
the average write workload.

In asynchronous replication, compute system writes are collected into buffer (delta
set) at the source. This delta set is transferred to the remote site in regular
intervals. Adequate buffer capacity should be provisioned to perform asynchronous
replication. Some storage vendors offer a feature called delta set extension, which
enables to offload delta set from buffer (cache) to specially configured drives. This
feature makes asynchronous replication resilient to the temporary increase in write
workload or loss of network link.

In asynchronous replication, RPO depends on the size of the buffer, the available
network bandwidth, and the write workload to the source. This replication can take
advantage of locality of reference (repeated writes to the same location). If the
same location is written multiple times in the buffer prior to transmission to the
remote site, only the final version of the data is transmitted. This feature conserves
link bandwidth.

Information Storage and Management (ISM) v4

Page 584 © Copyright 2019 Dell Inc.


Replication Lesson

Remote Replication: Multisite

 Data from source site is replicated to multiple remote sites for DR purpose
 Disaster recovery protection is always available if any one-site failure occurs
 Mitigates the risk in two-site replication

 No DR protection after source or remote site failure

Remote Site

VM VM

Production
Compute System
Hypervisor
Storage
(Target 2)

Asynchronous

Asynchronous with Differential


Resynchronization
Bunker Site

Storage (Source)

Synchronous

Storage
(Target 1)

Source Site

Notes

In a two-site synchronous replication, the source and target sites are usually within
a short distance. If a regional disaster occurs, both the source and the target sites
might become unavailable. This can lead to extended RPO and RTO. Since the
last known good copy of data would need to come from another source, such as an
offsite tape. A regional disaster will not affect the target site in a two-site
asynchronous replication since the sites are typically several hundred or several
thousand kilometers apart. If the source site fails, production can be shifted to the

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 585


Replication Lesson

target site. However, there is no further remote protection of data until the failure is
resolved.

Multisite replication mitigates the risks that are identified in two-site replication. In a
multisite replication, data from the source site is replicated to two or more remote
sites. The illustration provides an example of a three-site remote replication
solution. In this approach, data at the source is replicated to two different storage
systems at two different sites. The source-to-bunker site (target 1) replication is
synchronous with a near-zero RPO. The source-to-remote site (target 2) replication
is asynchronous with an RPO in the order of minutes. The key benefit of this
replication is the ability to fail over to either of the two remote sites in the case of
source-site failure.

Disaster recovery protection is always available if any one-site failure occurs.


During normal operations, all three sites are available and the production workload
is at the source site. At any given instance, the data at the bunker and the source is
identical. The data at the remote site is behind the data at the source and the
bunker. The replication network links between the bunker and the remote sites are
in place but will not be in use. The difference in the data between the bunker and
the remote sites is tracked. If a source site disaster occurs, operations can be
resumed at the bunker or the remote sites with incremental resynchronization
between these two sites.

Information Storage and Management (ISM) v4

Page 586 © Copyright 2019 Dell Inc.


Replication Lesson

Video: Network-Based Replication

The video is located at


https://edutube.emc.com/Player.aspx?vno=tdtXEqmgsWqD0pvctet7Rg

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 587


Replication Lesson

Continuous Data Protection (CDP)

 Network-based replication solution


 CDP provides the ability to restore data and VMs to any previous PIT
 Supports heterogeneous compute and storage platforms
 Supports both local and remote replication
 Data can also be replicated to more than two sites (multisite)
 Supports WAN optimization techniques to reduce bandwidth requirements

Notes

Continuous data protection (CDP) is a network-based replication solution that


provides the capability to restore data and VMs to any previous PIT.

Traditional data protection technologies offer a limited number of recovery points. If


a data loss occurs, the system can be rolled back only to the last available recovery
point. CDP tracks all the changes to the production volumes and maintains
consistent point-in-time images. This makes the CDP to restore data to any
previous PIT.

CDP supports both local and remote replication of data and VMs to meet
operational and disaster recovery respectively. In a CDP implementation, data can
be replicated to more than two sites using synchronous and asynchronous
replication. CDP supports various WAN optimization techniques (deduplication,
compression). These techniques reduce bandwidth requirements, and also
optimally use the available bandwidth.

Information Storage and Management (ISM) v4

Page 588 © Copyright 2019 Dell Inc.


Replication Lesson

Key CDP Components

The following are key CDP components:

Journal Volume

 Contains all the data that has changed from the time the replication session
started to the production volume

CDP Appliance

 Intelligent hardware platform that runs the CDP software


 Manages both the local and the remote replications
 Appliance could also be virtual, where CDP software is running inside VMs

Write Splitter

 Intercept writes to the production volume from the compute system and splits
each write into two copies
 Can be implemented at the compute, fabric, or storage system

Notes

CDP uses a journal volume to store all the data that has changed on the production
volume from the time the replication session started. The journal contains the
metadata and data that enable roll back to any recovery points. The amount of
space that is configured for the journal determines how far back the recovery points
can go.

CDP also uses an appliance and a write splitter. A CDP appliance is an intelligent
hardware platform that runs the CDP software and manages local and remote data

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 589


Replication Lesson

replications. Some vendors offer virtual appliance where the CDP software is
running inside VMs.

Write splitters intercept writes to the production volume from the compute system
and split each write into two copies. Write splitting can be performed at the
compute, fabric, or storage system.

Information Storage and Management (ISM) v4

Page 590 © Copyright 2019 Dell Inc.


Replication Lesson

CDP Operations: Local and Remote Replication

The illustration provides an example of a CDP local and remote replication


operations where the write splitter is deployed at the compute system.

2a. Writes are acknowledged back from the CDP


appliance and data is sent to journal, in turn copied to
Compute System replica

VM VM 2b. Data is sequenced, compressed, and replicated to


remote appliance
1. Data is “split” and sent to the local CDP
appliance and production volume

3. Data is received, uncompressed, and


Hypervisor sequenced

Write Splitter

Local CDP Remote CDP


Appliance Appliance

SAN WAN/SAN SAN

5.Data is copied to the remote


replica

4.Data is written to the journal

Local Replica Journal Journal Remote Replica


Production Volume

Source Site Remote Site

Notes

Typically the replica is synchronized with the source, and then the replication
process starts. After the replication starts, all the writes from the compute system to
the source (production volume) are split into two copies. One copy is sent to the
local CDP appliance at the source site, and the other copy is sent to the production
volume. Then the local appliance writes the data to the journal at the source site
and the data in turn is written to the local replica. If a file is accidentally deleted, or
the file is corrupted, the local journal enables organizations to recover the
application data to any PIT.

In remote replication, the local appliance at the source site sends the received write
I/O to the appliance at the remote (DR) site. Then, the write is applied to the journal
volume at the remote site. As a next step, data from the journal volume is sent to
the remote replica at predefined intervals. CDP operates in either synchronous or
asynchronous mode.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 591


Replication Lesson

In the synchronous replication mode, the application waits for an acknowledgment


from the CDP appliance at the remote site before initiating the next write. In the
case of asynchronous mode, the local CDP appliance acknowledges a write when
it is received. If there is a disaster at the source site, data can be recovered to the
required PIT. The service can be restarted at the DR site.

Information Storage and Management (ISM) v4

Page 592 © Copyright 2019 Dell Inc.


Replication Lesson

Hypervisor-based CDP

The illustration shows a CDP local replication implementation.

 Protects a single or
VM VM
multiple VMs locally or remotely
Virtual
Appliance  Enables to restore VM
to any PIT
Hypervisor
 Virtual appliance is
Write Splitter running on a hypervisor
 Write splitter is
embedded in the hypervisor
SAN

Notes

VM Disk VM Disk
Some vendors offer continuous data
Files Files
protection for VMs through hypervisor-
Source Local Journal based CDP implementation. In this
Volume Replica
deployment, the specialized hardware-
CDP - Local Replication based appliance is replaced with virtual
appliance which is running on a
hypervisor. The write splitter is
embedded in the hypervisor. This
option protects single or multiple VMs locally or remotely and enables to restore
VMs to any PIT. The local and remote replication operations are as similar as
network-based CDP replication.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 593


Backup and Recovery Lesson

Backup and Recovery Lesson

Introduction

This lesson presents the need for backup, backup architecture, backup target, and
backup operation. This lesson also focuses on backup granularity and various
backup methods.

This lesson covers the following topics:


 Need for backup
 Backup architecture
 Backup target
 Backup granularity
 Backup methods

Information Storage and Management (ISM) v4

Page 594 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Backup and Recovery Overview

Backup and Recovery Overview

Definition: Backup
An additional copy of production data, which is created and retained
for the sole purpose of recovering lost or corrupted data.

 Typically both application data and server configurations are backed up to


restore data and servers if there is an outage.
 Businesses also implement backup solutions to comply with regulatory
requirements.
 To implement a successful backup and recovery solution

 IT needs to evaluate the backup methods along with their recovery


considerations and retention requirements

Notes

Like protecting the IT infrastructure components (compute, storage, and network), it


is also critical for organizations to protect the data. Typically organizations
implement data protection solution to protect the data from accidentally deleting
files, application crashes, data corruption, and disaster. Data should be protected
at local and remote locations to ensure the availability of service.

For example: when a service is failed over to other zone (data center), the data
should be available at the destination. This approach helps to successfully failover
the service to minimize the outage. One of the key data protection solutions that
are widely implemented is backup.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 595


Backup and Recovery Lesson

A backup is an additional copy of production data, which is created and retained for
the sole purpose of recovering the lost or corrupted data. With the growing
business and the regulatory demands for data storage, retention, and availability,
organizations face the task of backing up an ever-increasing amount of data. This
task becomes more challenging with the growth of data, reduced IT budgets, and
less time available for taking backups.

Moreover, organizations need fast backup and recovery of data to meet their
service level agreements. Most organizations spend a considerable amount of time
and money protecting their application data but give less attention to protecting
their server configurations. During disaster recovery, server configurations must be
re-created before the application and data are accessible to the user.

The process of system recovery involves reinstalling the operating system,


applications, and server settings and then recovering the data. So it is important to
backup both application data and server configurations.Evaluating backup
technologies, recovery, and retention requirements for data and applications is an
essential step to ensure successful implementation of a backup and recovery
solution.

Information Storage and Management (ISM) v4

Page 596 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Video: Backup and Recovery Overview

The video is located at


http://edutube.emc.com/Player.aspx?vno=mUT1iNrePDRZHxEclMCoCA

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 597


Backup and Recovery Lesson

Backup Architecture

The role of a backup client is to gather the data that needs to backup and send it to
the storage node. The backup client can be installed on application servers, mobile
clients, and desktops. It also sends the tracking information to the backup server.

Key backup components are:

 Backup client
 Backup server
 Storage node
 Backup device (backup target)

Cloud
Backup Clients Backup Server
VM VM

Tracking Information

Backup Data
Tracking Information
Hypervisor
Backup Data

Backup Data
Storage Node

Backup Device

Notes

The backup server manages the backup operations and maintains the backup
catalog, which contains information about the backup configuration and backup
metadata. The backup configuration contains information about when to run
backups, which client data to be backed up, and so on. The backup metadata
contains information about the backed up data. The storage node is responsible for
organizing the client’s data and writing the data to a backup device. A storage node
controls one or more backup devices.

Information Storage and Management (ISM) v4

Page 598 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

In most implementations, the storage node and the backup server run on the same
system. Backup devices may be attached directly or through a network to the
storage node. The storage node sends the tracking information about the data that
is written to the backup device to the backup server. Typically this information is
used for recoveries. A wide range of backup targets are available such as tape,
disk, and virtual tape library. Now, organization can also back up their data to the
cloud storage. Many service providers offer backup as a service that enables an
organization to reduce its backup management overhead.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 599


Backup and Recovery Lesson

Backup Targets

Backup Target Description

Tape Library  Tapes are portable and can be used for long term offsite
storage.
 Must be stored in locations with a controlled environment
 Not optimized to recognize duplicate content
 Data integrity and recoverability are major issues with
tape-based backup media.

Disk Library  Enhanced backup and recovery performance


 No inherent offsite capability
 Disk-based backup appliance includes features such as
deduplication, compression, encryption, and replication to
support business objectives

Virtual Tape  Disks are emulated and presented as tapes to backup


Library software.
 Does not require any additional modules or changes in the
legacy backup software
 Provides better performance and reliability over physical
tape
 Does not require the usual maintenance tasks that are
associated with a physical tape drive, such as periodic
cleaning and drive calibration

Information Storage and Management (ISM) v4

Page 600 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Notes

A tape library contains one or more tape drives that records and retrieves data on a
magnetic tape. Tape is portable, and one of the primary reasons for the use of tape
is long-term, offsite storage. Backups that are implemented using tape devices
involve several hidden costs. Tapes must be stored in locations with a controlled
environment to ensure preservation of the media and to prevent data corruption.
Physical transportation of the tapes to offsite locations also adds management
overhead and increases the possibility of loss of tapes during offsite shipment.

The traditional backup process, using tapes, is not optimized to recognize duplicate
content. Due to its sequential data access, both backing up of data and restoring it
take more time with tape. This data access may impact the backup window and
RTO. A backup window is a period during which a production volume is available to
perform backup. Data integrity and recoverability are also major issues with tape-
based backup media.

Disk density has increased dramatically over the past few years, lowering the cost
per GB. So, it became a viable backup target for organizations. When used in a
highly available configuration in a storage array, disks offer a reliable and fast
backup target medium. One way to implement a backup to disk system is by using
it as a staging area. This approach offloads backup data to a secondary backup
target such as tape after a period of time.

Some vendors offer a purpose-built, disk-based backup appliances that are


emerged as the optimal backup target solution. These systems are optimized for
backup and recovery operations, offering extensive integration with popular backup
management applications. The integrated features such as replication,
compression, encryption, and data deduplication increase the value of purpose-
built backup appliances.

Virtual tape libraries use disks as backup media. Virtual tapes are disk drives that
are emulated and presented as tapes to the backup software. Compared to
physical tapes, virtual tapes offer better performance, better reliability, and random
disk access. A virtual tape drive does not require the usual maintenance tasks that
are associated with a physical tape drive, such as periodic cleaning and drive
calibration. Compared to the disk library, a virtual tape library offers easy
installation and administration because it is preconfigured by the manufacturer. A
key feature that is available on virtual tape library appliances is replication.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 601


Backup and Recovery Lesson

Information Storage and Management (ISM) v4

Page 602 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Backup Operation

VM VM

Backup Clients

Hypervisor

3b
4 Backup
Device

1 3a
5
2
7
Backup Server Storage Node
6

(1) Backup server initiates scheduled backup process.

(2) Backup server retrieves backup-related information from the backup catalog.

(3a) Backup server instructs storage node to load backup media in the backup
device.

(3b) Backup server instructs backup clients to send data to be backed up to the
storage node.

(4) Backup clients send data to storage node and update the backup catalog on the
backup server.

(5) Storage node sends data to the backup device

(6) Storage node sends metadata and media information to the backup server

(7) Backup server updates the backup catalog

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 603


Backup and Recovery Lesson

Notes

The backup operation is typically initiated by a server, but it can also be initiated by
a client. The backup server initiates the backup process for different clients that is
based on the backup schedule configured for them.

For example: the backup for a group of clients may be scheduled to start at 3:00
a.m. every day. The backup server coordinates the backup process with all the
components in a backup environment. The backup server maintains the information
about backup clients to be backed up and storage nodes to be used in a backup
operation. The backup server retrieves the backup related information from the
backup catalog. Based on this information, the backup server instructs the storage
node to load the appropriate backup media into the backup devices.

Simultaneously, it instructs the backup clients to gather the data to be backed up


and sends it over the network to the assigned storage node. After the backup data
is sent to the storage node, the client sends some backup metadata (the number of
files, name of the files, storage node details, and so on) to the backup server. The
storage node receives the client data, organizes it, and sends it to the backup
device. The storage node sends extra backup metadata (location of the data on the
backup device, time of backup, and so on) to the backup server. The backup server
updates the backup catalog with this information. The backup data from the client
can be sent to the backup device over a LAN or SAN network.

Hot backup and cold backup are the two methods that are deployed for backup.
They are based on the state of the application when the backup is performed. In a
hot backup, the application is up-and-running, with users accessing their data
during the backup process. This method of backup is also referred to as online
backup. The hot backup of online production data is challenging because data is
actively being used and changed. If a file is open, it is normally not backed up
during the backup process.

In such situations, an open file agent is required to back up the open file. These
agents interact directly with the operating system or application and enable the
creation of consistent copies of open files. The disadvantage that is associated with
a hot backup is that the agents usually affect the overall application performance. A
cold backup requires the application to be shut down during the backup process.
Hence, this method is also referred to as offline backup. Consistent backups of

Information Storage and Management (ISM) v4

Page 604 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

databases can also be done by using a cold backup. The disadvantage of a cold
backup is that the database is inaccessible to users during the backup process.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 605


Backup and Recovery Lesson

Recovery Operation

VM VM

Backup
Hypervisor Clients

4
1
Backup
Device

3 4
2

6
Backup Server Storage Node
5

(1) Backup client requests backup server for data restore

(2) Backup server scans backup catalog to identify data to be restored and the
client that will receive data

(3) Backup server instructs storage node to load backup media in the backup
device

(4) Data is then read and sent to the backup client

(5) Storage node sends restore metadata to the backup server

(6) Backup server updates the backup catalog

Information Storage and Management (ISM) v4

Page 606 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Notes

After the data is backed up, it can be restored when required. A restore process
can be manually initiated from the client. A recovery operation restores data to its
original state at a specific PIT. Typically backup applications support restoring one
or more individual files, directories, or VMs. The illustration depicts a restore
operation.

Upon receiving a restore request, an administrator opens the restore application to


view the list of clients that have been backed up. While selecting the client for
which a restore request has been made, the administrator also needs to identify the
client that receives the restored data. Data can be restored on the same client for
whom the restore request has been made or on any other client.

The administrator then selects the data to be restored and the specified point in
time to which the data has to be restored based on the RPO. Because all this
information comes from the backup catalog, the restore application needs to
communicate with the backup server. The backup server instructs the appropriate
storage node to mount the specific backup media onto the backup device. Data is
then read and sent to the client that has been identified to receive the restored
data.Some restorations are successfully accomplished by recovering only the
requested production data. For example, the recovery process of a spreadsheet is
completed when the specific file is restored. In database restorations, additional
data, such as log files, must be restored along with the production data. This
approach ensures consistency of the restored data. In these cases, the RTO is
extended due to the additional steps in the restore operation. It is also important for
the backup and recovery applications to have security mechanisms to avoid
recovery of data by nonauthorized users.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 607


Backup and Recovery Lesson

Backup Granularity

Full backup

Incremental Backup

Cumulative (Differential) Backup

Amount of Data Backup

Different granularity levels are:


 Full backup
 Incremental backup
 Cumulative backup

Notes

Backup granularity depends on business needs and the required RTO/RPO. Based
on the granularity, backups can be categorized as full, incremental, and cumulative
(or differential). Most organizations use a combination of these backup types to
meet their backup and recovery requirements.

Full Backup: It is a full copy of the entire data set. Organizations typically use full
backup on a periodic basis because it requires more storage space and also takes
more time to back up. The full backup provides a faster data recovery.

Information Storage and Management (ISM) v4

Page 608 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Incremental Backup: It copies the data that has changed since the last backup.
For example, a full backup is created on Monday, and incremental backups are
created for the rest of the week. Tuesday's backup would only contain the data that
has changed since Monday. Wednesday's backup would only contain the data that
has changed since Tuesday.The primary disadvantage to incremental backups is
that they can be time-consuming to restore. Suppose an administrator wants to
restore the backup from Wednesday. To do so, the administrator has to first restore
Monday's full backup. After that, the administrator has to restore Tuesday's copy,
followed by Wednesday's.

Cumulative Backup: It copies the data that has changed since the last full backup.
Suppose, for example, the administrator wants to create a full backup on Monday
and differential backups for the rest of the week. Tuesday's backup would contain
all of the data that has changed since Monday. It would therefore be identical to an
incremental backup at this point.On Wednesday, however, the differential backup
would backup any data that had changed since Monday (full backup). The
advantage that differential backups have over incremental is shorter restore times.
Restoring a differential backup never requires more than two copies.The tradeoff is
that as time progresses, a differential backup can grow to contain more data than
an incremental backup.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 609


Backup and Recovery Lesson

Agent-Based Backup

In this approach, an agent or client is installed on a virtual machine or a physical


compute system. The agent streams the backup data to the backup device as
shown in the illustration.

 Agent is running inside the application servers (physical/virtual)


 Performs file-level backup
 Impacts performance of applications running on compute systems

 Performing backup on multiple VMs on a compute system may consume


more resources and lead to resource contention

A
VM VM

Hypervisor

Application Servers

A
Backup Device
Backup Server/ Storage Node
Agent A

Notes

This backup does not capture virtual machine configuration files. The agent running
on the compute system consumes CPU cycles and memory resources. If multiple
VMs on a compute system are backed up simultaneously, then the combined I/O
and bandwidth demands that are placed on the compute system by the various
backup operations can deplete the compute system resources.

This approach may impact the performance of the services or applications running
on the VMs. To overcome these challenges, the backup process can be offloaded
from the VMs to a proxy server. This can be achieved by using the image-based
backup approach.

Information Storage and Management (ISM) v4

Page 610 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 611


Backup and Recovery Lesson

Image-Based Backup

Image-based backup makes a copy of the virtual drive and configuration that are
associated with a particular VM.
 Backup is saved as a single entity called a VM image
 Enables quick restoration of a VM
 Supports recovery at VM-level and file-level
 No agent is required inside the VM to perform backup
 Backup processing is offloaded from VMs to a proxy server

VM Snapshot Proxy Server


FS Volume VM VM

VM VM

Create Snapshot Mount the


Snapshot
Hypervisor
Hypervisor
Backup

Notes

Image-based backup makes a copy of the virtual drive and configuration that are
associated with a particular VM. The backup is saved as a single entity called as
VM image. This type of backup is suitable for restoring an entire VM if there is a
hardware failure or human error such as the accidental deletion of the VM. The
image - based backup also supports file-level recovery.

In an image-level backup, the backup software can backup VMs without installing
backup agents inside the VMs or at the hypervisor-level. The backup processing is
performed by a proxy server that acts as the backup client, thereby offloading the
backup processing from the VMs. The proxy server communicates to the
management server responsible for managing the virtualized compute
environment. It sends commands to create a snapshot of the VM to be backed up
and to mount the snapshot to the proxy server. A snapshot captures the
configuration and virtual drive data of the target VM and provides a point-in-time
view of the VM. The proxy server then performs backup by using the

Information Storage and Management (ISM) v4

Page 612 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

snapshot.Some vendors support incremental backup through tracking changed


blocks. This feature identifies and tags any blocks that have changed since the last
VM snapshot. This approach enables the backup application to backup only the
blocks that have changed, rather than backing up every block. This considerably
reduces the amount of data to be backed up and the number of VM that needs to
be backed up within a backup window.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 613


Backup and Recovery Lesson

Image-Based Backup: Recovery-In-Place

Definition: Recovery-in-place
A term that refers to running a VM directly from the backup device,
using a backed up copy of the VM image instead of restoring that
image file.

 Eliminates the need to transfer the image from the backup device to the primary
storage before it is restarted
 Provides an almost instant recovery of a failed VM
 Requires a random access device to work efficiently
 Disk-based backup target
 Reduces the RTO and network bandwidth to restore VM files

Notes

One of the primary benefits of recovery in place is that it eliminates the need to
transfer the image from the backup area to the primary storage area before it is
restarted. So, the application that is running on those VMs can be accessed more
quickly. This method not only saves time for recovery, but also reduces network
bandwidth to restore files.

Information Storage and Management (ISM) v4

Page 614 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

NDMP-Based Backup

Definition: NDMP
An open standard TCP/IP-based protocol that is designed for a
backup in a NAS environment.

 Data can be backed up using NDMP regardless of the operating system or


platform
 Backup data is sent directly from NAS to the backup device
 No longer necessary to transport data through application servers
 Backs up and restores data while preserving security attributes of file system
(NFS and CIFS) and maintains data integrity

Notes

As the amount of unstructured data continues to grow exponentially, organizations


face the daunting task of ensuring that critical data on NAS systems are protected.
Most NAS heads run on proprietary operating systems that are designed for
serving files.

To maintain its operational efficiency generally, it does not support the hosting of
third-party applications such as backup clients. This forced backup administrators
to backup data from application server or mount each NAS volume through CIFS or
NFS from another server across the network, which hosted a backup agent. These
approaches may lead to performance degradation of application server and
production network during backup operations, due to overhead.

Further, security structures differ on the two network file systems, NFS and CIFS.
Backups that are implemented through one of the file systems would not effectively
backup any data security attributes on the NAS head that was accessed through a
different file system. For example, CIFS backup, when restored, would not be able

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 615


Backup and Recovery Lesson

to restore NFS file attributes and vice versa. These backup challenges of the NAS
environment can be addressed with the use of Network Data Management Protocol
(NDMP).

NDMP is an industry-standard TCP/IP-based protocol that is designed for a backup


in a NAS environment. It communicates with several elements in the backup
environment (NAS head, backup devices, backup server, and so on) for data
transfer and enables vendors to use a common protocol for the backup
architecture. Data can be backed up using NDMP regardless of the operating
system or platform. NDMP backs up and restores data without losing the data
integrity and file system structure (regarding different rights and permission in
different file systems).

Due to its flexibility, it is no longer necessary to transport data through the


application server, which reduces the load on the application server and improves
the backup speed. NDMP optimizes backup and restore by using the high-speed
connection between the backup devices and the NAS head. In NDMP, backup data
is sent directly from the NAS head to the backup device, whereas metadata is sent
to the backup server.

Information Storage and Management (ISM) v4

Page 616 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Primary Storage-Based Backup

VM VM
A

Hypervisor Backup Data

Application Storage Storage


Network Network
Servers

Primary Storage System Backup


A Agent
Device

This backup approach backs up data directly from primary storage system to
backup target without requiring additional backup software.

This backup approach backs up data directly from primary storage system to
backup target without requiring additional backup software.
 Eliminates the backup impact on application servers
 Improves the backup and recovery performance to meet SLAs

Notes

Typically, an agent runs on the application servers that control the backup process.
This agent stores configuration data for mapping the LUNs on the primary storage
system to the backup device to orchestrate backup (the transfer of changed blocks
and creation of backup images) and recovery operations. This backup information
(metadata) is stored in a catalog which is local to the application server.

When a backup is triggered through the agent running on application server, the
application momentarily pauses simply to mark the point in time for that backup.
The data blocks that have changed since the last backup is sent across the
network to the backup device. The direct movement from primary storage to
backup device eliminates the LAN impact by isolating all backup traffic to the SAN.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 617


Backup and Recovery Lesson

This approach eliminates backup impact on application servers and provides faster
backup and recovery to meet the application protection SLAs.

For data recovery, the backup administrator triggers recovery operation and then
the primary storage reads the backup image from the backup device. The primary
storage replaces production LUN with the recovered copy.

Information Storage and Management (ISM) v4

Page 618 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

Cloud-Based Backup: Backup as a Service

Cloud

Backup Data to Cloud


Restore Data from Cloud

VM VM

Backup
Clients

 Enables consumers to procure backup services on demand through a self-


service portal

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 619


Backup and Recovery Lesson

 Provides the capability to perform backup and recovery at any time, from
anywhere
 Reduces the backup management overhead
 Transforms from CAPEX to OPEX
 Pay-per-use/subscription-based pricing
 Enables organizations to meet long-term retention requirements
 Backing up to cloud ensures regular and automated backup of data
 Gives consumers the flexibility to select a backup technology based on their
current requirements

Notes

Data is important for businesses of all sizes. Organizations need to regularly back
up data to avoid losses, stay compliant, and preserve data integrity. IT
organizations today are dealing with the explosion of data, particularly with the
development of third platform technologies. Data explosion poses the challenge of
data backup and quick data restore. It strains the backup windows, IT budget, and
IT management. The growth and complexity of the data environment, added with
proliferation of virtual machines and mobile devices constantly outpaces the
existing data backup plans.

Deployment of a new backup solution takes weeks of planning, justification,


procurement, and setup. However, technology and data protection requirements
change quickly. Enterprises must also comply with regulatory and litigation
requirements. These challenges can be addressed with the emergence of cloud-
based backup (backup as a service).

Backup as a service enables organizations to procure backup services on-demand


in the cloud. The backup service is offered by a service provider to consumers.
Organizations can build their own cloud infrastructure and provide backup services
on demand to their employees/users. Some organizations prefer a hybrid cloud
option for their backup strategy. They keep a local backup copy in their private
cloud and use a public cloud for keeping their remote copy for DR purpose. For

Information Storage and Management (ISM) v4

Page 620 © Copyright 2019 Dell Inc.


Backup and Recovery Lesson

providing backup as a service, organizations and service providers should have


necessary backup technologies in place to meet the required service levels.

Backup as a service enables individual consumers or organizations to reduce their


backup management overhead. It also enables the individual consumer/user to
perform backup and recovery anytime, from anywhere, using a network connection.
Consumers do not need to invest in capital equipment to implement and manage
their backup infrastructure. These infrastructure resources are rented without
obtaining ownership of the resources. Based on the consumer demand, backups
can be scheduled and infrastructure resources can be allocated with a metering
service. This will help to monitor and report resource consumption.

Many organizations’ remote and branch offices have limited or no backup in place.
Mobile workers represent a particular risk because of the increased possibility of
lost or stolen devices. Backing up to cloud ensures regular and automated backup
of data. Cloud computing gives consumers the flexibility to select a backup
technology, based on their requirement. It also enables to quickly move to a
different technology when their backup requirement changes.

Data can be restored from the cloud using two methods, namely web-based restore
and media-based restore. In web-based restore, the requested data is gathered
and sent to the server, running cloud backup agent. The agent software restores
data on the server. This method is considered if sufficient bandwidth is available. If
a large amount of data needs to be restored and sufficient bandwidth is not
available, then the consumer may request data restoration using backup media
such as DVD or disk drives. In this option, the service provider gathers the data to
restore, stores data to a set of backup media, and ships it to the consumer.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 621


Data Deduplication Lesson

Data Deduplication Lesson

Introduction

This lesson presents the need for data deduplication, and the factors affecting
deduplication ratio. This lesson also focuses on source-based and target-based
deduplication.

This lesson covers the following topics:


 Drivers for data deduplicationü
 Factors affecting deduplication ratio
 Source-based and target-based deduplication

Information Storage and Management (ISM) v4

Page 622 © Copyright 2019 Dell Inc.


Data Deduplication Lesson

Data Deduplication

Video: Data Deduplication

The video is located at


https://edutube.emc.com/Player.aspx?vno=gHpJBZz2XqTo2FGvK/UM7w

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 623


Data Deduplication Lesson

What is Data Deduplication?

Definition: Data Deduplication


The process of detecting and identifying the unique data segments
within a given set of data to eliminate redundancy.

 Duplication process:
 Chunk the dataset
 Identify duplicate chunk Deduplication

 Eliminate the redundant


chunk
After deduplication unique
 Deduplication could be segments = 3

performed in backup and


production environment
Before deduplication total
 Effectiveness of segments = 39

deduplication is expressed
as a deduplication ratio

Notes

The use of deduplication techniques reduces the amount of data to be backed-up.


Data deduplication operates by segmenting a dataset into blocks and identifying
redundant data and writing the unique blocks to a backup target.

To identify redundant blocks, the data deduplication system creates a hash value
or digital signature, like a fingerprint, for each data block. It also creates an index of
the signatures for a given repository. The index provides the reference list to
determine whether blocks exist in a repository.

When the data deduplication system sees a block it has processed before, instead
of storing the block again, it inserts a pointer to the original block in the repository.
It is important to note that the data deduplication can be performed in backup as
well as in production environment. In production environment, the deduplication is

Information Storage and Management (ISM) v4

Page 624 © Copyright 2019 Dell Inc.


Data Deduplication Lesson

implemented at primary storage systems to eliminate redundant data in the


production volume.The effectiveness of data deduplication is expressed as a
deduplication ratio. It is the ratio of data before deduplication to the amount of data
after deduplication. This ratio is typically depicted as “ratio:1” or “ratio X” (10:1 or 10
X). For example, if 200 GB of data consumes 20 GB of storage capacity after data
deduplication, the space reduction ratio is 10:1.

Every data deduplication vendor claims that their product offers a certain ratio of
data reduction. However, the actual data deduplication ratio varies, based on many
factors.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 625


Data Deduplication Lesson

Drivers for Data Deduplication

Capacity requirements are growing


Shorter backup windows due to the
year over year – Increases storage
need for 24x7 service availability
cost

Limited Limited
Budget Backup
Window

Network Longer
Bandwidth Retention
Constrain Period

Data is distributed across remote


locations (cloud) for DR purpose – Regulatory requirement demand to
Requires huge network bandwidth keep data for longer periods

Notes

With the growth of data and 24x7 service availability requirements, organizations
are facing challenges in protecting their data. Typically, many redundant data is
backed-up. It increases the backup window size and also results in unnecessary
consumption of resources, such as backup storage space and network bandwidth.

There are also requirements to preserve data for longer periods – whether driven
by the need of consumers or legal and regulatory concerns. Backing up large
amount of duplicate data at the remote site or cloud for DR purpose is also
cumbersome and requires lots of bandwidth.

Data deduplication provides the solution for organizations to overcome these


challenges in a backup environment.

Information Storage and Management (ISM) v4

Page 626 © Copyright 2019 Dell Inc.


Data Deduplication Lesson

Factors Affecting Deduplication Ratio

Factor Description

Retention period Longer the data retention period, the greater is the chance of
identical data existence in the backup

Frequency of full More frequently the full backups are conducted, the greater is
backup the advantage of deduplication

Change rate Fewer the changes to the content between backups, the
greater is the efficiency of deduplication

Data type The more unique the data, the less intrinsic duplication exists

Deduplication The highest amount of deduplication across an organization is


method discovered using variable-length, sub-file deduplication

Notes

Data deduplication performance (or ratio) is tied to the following factors:

Retention period: This is the period of time that defines how long the backup
copies are retained. The longer the retention, the greater is the chance of identical
data existence in the backup set which would increase the deduplication ratio and
storage space savings.

Frequency of full backup: As more full backups are performed, it increases the
amount of same data being repeatedly backed-up. So, it results in high
deduplication ratio.

Change rate: This is the rate at which the data received from the backup
application changes from backup to backup. Client data with a few changes
between backups produces higher deduplication ratios.

Data type: Backups of user data such as text documents, PowerPoint


presentations, spreadsheets, and emails are known to contain redundant data and

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 627


Data Deduplication Lesson

are good deduplication candidates. Other data such as audio, video, and scanned
images are highly unique and typically do not yield good deduplication ratio.

Deduplication method: Deduplication method also determines the effective


deduplication ratio. Variable-length, subfile deduplication discovers the highest
amount of deduplication of data.

Information Storage and Management (ISM) v4

Page 628 © Copyright 2019 Dell Inc.


Data Deduplication Lesson

Deduplication Granularity Level

The level at which data is identified as duplicate affects the amount of redundancy
or commonality. The operational levels of deduplication include file-level
deduplication and sub-file deduplication.

File-level Deduplication

 Detects and removes redundant copies of identical files


 Only one copy of the file is stored; the subsequent copies are replaced with a
pointer to the original file
 Does not address the problem of duplicate content inside the files

Sub-file Level Deduplication

Breaks down files to smaller segments

 Detects redundant data within and across files

Two methods:

 Fixed-length block
 Variable-length block

Notes

File-level deduplication (also called single instance storage) detects and removes
redundant copies of identical files in a backup environment. Only one copy of the
file is stored; the subsequent copies are replaced with a pointer to the original file.
By removing all of the subsequent copies of a file, a significant amount of space
savings can be achieved. File-level deduplication is simple but does not address
the problem of duplicate content inside the files. A change in any part of a file also
results in classifying that as a new file and saving it as a separate copy. For
example, two 10-MB presentations with a difference in just the title page are not
considered as duplicate files, and each file is stored separately.

Sub-file deduplication breaks the file into smaller blocks and then uses a standard
hash algorithm to detect redundant data within and across the file. As a result, sub-
file deduplication eliminates duplicate data across files. There are two forms of sub-

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 629


Data Deduplication Lesson

file deduplication, fixed-length and variable-length. The fixed-length block


deduplication divides the files into fixed-length blocks and uses a hash algorithm to
find duplicate data. Although simple in design, the fixed-length block may miss
opportunities to discover redundant data because the block boundaries of similar
data may be different. For example: the addition of a person’s name to a
document’s title page may shift the whole document, and make all blocks appear to
have changed, causing the failure of the deduplication method to detect
equivalencies. In variable-length block deduplication, if there is a change in the
block and then the boundary for that block only is adjusted, leaving the remaining
blocks unchanged. More data is identified as common data, and there is less
backup data to store as only the unique data is backed-up. Variable-length block
deduplication yields a greater granularity in identifying duplicate data, improving
upon the limitations of file-level, and fixed-length block level deduplication.

Information Storage and Management (ISM) v4

Page 630 © Copyright 2019 Dell Inc.


Data Deduplication Lesson

Source-Based Deduplication Method

 Data is deduplicated at the


Deduplication at Source
source (backup client)
 Backup client sends only VM VM

new, unique segments


across the network
Hypervisor
 Reduced storage capacity
and network bandwidth
Application Server (Backup Deduplication Server
requirements Client) Backup
Device

 Recommended for ROBO


environment for taking
Deduplication Agent
centralized backup
 Cloud service providers use
this method when performing
backup from consumer’s location to their location

Notes

Source-based data deduplication eliminates redundant data at the source (backup


client) before transmission to the backup device. The deduplication software or
agent on the clients checks each file or block for duplicate content. Source-based
deduplication reduces the amount of data that is transmitted over a network from
the source to the backup device, thus requiring less network bandwidth. There is
also a substantial reduction in the capacity that is required to store the backup
data.

However, a deduplication agent running on the client may impact the backup
performance, especially when a large amount of data needs to be backed-up.
When image-level backup is implemented, the backup workload is moved to a
proxy server. The deduplication agent is installed on the proxy server to perform
deduplication without impacting the VMs running applications. Organizations can
implement source-based deduplication when performing backup (backup as a
service) from their location to provider’s location.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 631


Data Deduplication Lesson

Target-Based Deduplication Method

 Data is deduplicated at the Deduplication at


Target
target
 Inline
 Post-process VM VM

 Offloads the backup client


from deduplication process Hypervisor

 Requires sufficient network Deduplication Backup


Application Server (Backup
Device
bandwidth Client)
Server

 In some implementations, Deduplication Appliance

part of the deduplication load


is moved to the backup
server

 Reduces the burden on the target


 Improves the overall backup performance

Notes

Target-based data deduplication occurs at the backup device, which offloads the
deduplication process and its performance impact from the backup client. In target-
based deduplication, the backup application sends data to the target backup device
where the data is deduplicated, either immediately (inline) or at a scheduled time
(post-process).

With inline data deduplication, the incoming backup stream is divided into small
chunks, and then compared to data that has already been deduplicated. The inline
deduplication method requires less storage space than the post process approach.
However, inline deduplication may slow down the overall data backup process.

Inline deduplication systems of some vendors use the continued advancement of


CPU technology. This increases the performance of the inline deduplication by
minimizing disk accesses required to deduplicate data. Such inline deduplication
systems identify duplicate data segments in memory, which minimizes the disk
usage.

Information Storage and Management (ISM) v4

Page 632 © Copyright 2019 Dell Inc.


Data Deduplication Lesson

In post-process deduplication, the backup data is first stored to the disk in its native
backup format and deduplicated after the backup is complete. In this approach, the
deduplication process is separated from the backup process and the deduplication
happens outside the backup window. However, the full backup dataset is
transmitted across the network to the storage target before the redundancies are
eliminated. So, this approach requires adequate storage capacity to accommodate
the full backup dataset.

Organizations can consider implementing target-based deduplication when their


backup application does not have built in deduplication capabilities. It supports the
current backup environment without any operational changes. Target-based
deduplication reduces the amount of storage that is are required, but unlike source-
based deduplication, it does not reduce the amount of data that is are sent across a
network during the backup. In some implementations, part of the deduplication
functionality is moved to the backup client or backup server. This reduces the
burden on the target backup device for performing deduplication and improves the
overall backup performance.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 633


Data Archiving Lesson

Data Archiving Lesson

Introduction

This lesson presents data archiving operations and difference between backup and
archiving. This lesson also focuses on purpose-built archive storage and cloud-
based archiving.

This lesson covers the following topics:


 Data archiving operations
 Backup vs. Archiving
 Purpose-built archive storage
 üCloud-based archiving

Information Storage and Management (ISM) v4

Page 634 © Copyright 2019 Dell Inc.


Data Archiving Lesson

ISMv4 Source - Data Protection Solutions - Data Archiving

Video: Data Archiving

The video is located at


https://edutube.emc.com/Player.aspx?vno=3vxUpJGQRpkqYPvf4+D5ew

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 635


Data Archiving Lesson

Data Archiving Overview

Definition: Data Archiving


The process of identifying and moving inactive data out of current
production systems into low-cost storage tier for long-term retention
and future reference.

 Data archive is a repository where fixed content is stored


 Organizations set their own policies for qualifying data to archive.
 Archiving enables organizations to:

 Reduce on-going primary storage acquisition costs


 Meet regulatory compliance
 Reduce backup challenges including the backup window by moving static
data out of the recurring backup stream process
 Use this data for generating new revenue strategies

Information Storage and Management (ISM) v4

Page 636 © Copyright 2019 Dell Inc.


Data Archiving Lesson

Notes

In the information life cycle, data , accessed, and changed. As data ages, it is less
likely to be changed and eventually becomes “fixed” but remains accessed by
applications and users. This data is called fixed content. Assets such as X-rays,
MRIs, CAD/CAM designs, surveillance video, MP3s, and financial documents are
examples of fixed data. These data are growing at over 90% annually.

Data archiving is the process of moving data (fixed content) that is no longer
actively accessed to a separate low-cost archival storage tier for long-term
retention and future reference. Data archive is a storage repository that is used to
store these data. Organizations set their own policies for qualifying data to move
into archives. These policy settings are used to automate the process of identifying
and moving the appropriate data into the archive system. Organizations implement
archiving processes and technologies to reduce primary storage cost. With
archiving, the capacity on expensive primary storage can be reclaimed by moving
infrequently accessed data to lower-cost archive tier. Archiving fixed content before
taking backup helps to reduce the backup window and backup storage acquisition
costs.

Government regulations and legal/contractual obligations mandate organizations to


retain their data for an extended period. The key to determine how long to retain
archives of an organization is to understand which regulations apply to the
particular industry and which retention rules apply to that regulation.

For instance, all publicly traded companies are subject to the Sarbanes-Oxley
(SOX) Act. This act defines email retention requirements, among other things
related to data storage and security.

Archiving helps organizations to adhere to compliances. Archiving can help


organizations use growing volumes of information in potentially new and
unanticipated ways.

For example, new product innovation can be fostered if engineers can access
archived project materials such as designs, test results, and requirement
documents. Besides to meeting governance and compliance requirements,
organizations retain data for business intelligence and competitive advantage. Both
active and archived information can help data scientists drive innovations or help to
improve current business processes.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 637


Data Archiving Lesson

Backup Vs. Archiving

Data archiving is often confused with data backup. Backups are used to restore
data in case it is lost, corrupted, or destroyed. In contrast, data archives protect
older data that is not required for everyday business operations but may
occasionally need to be accessed.

The table compares some of the significant differences between backup and
archiving.

Data Backup Data Archiving

Secondary copy of data Primary copy of data

Used for data recovery operations Available for data retrieval

Primary objective – operational recovery Primary objective – compliance


and disaster recovery adherence and lower cost

Typically short-term (weeks or months) Long-term (months, years, or decades)


retention retention

Information Storage and Management (ISM) v4

Page 638 © Copyright 2019 Dell Inc.


Data Archiving Lesson

Data Archiving Operations

 Archiving agent scans primary storage to find files that meet the archiving
policy. The archive server indexes the files.
 Once the files have been indexed, they are moved to archive storage and small
stub files are left on the primary storage.

Primary Storage

Communication

Index

Network
Archive Server
Clients

Archive Storage

Notes

The data archiving operation has an archiving agent, archive server/policy engine,
and archive storage. The archiving agent scans the primary storage to find files that
meet the archiving policy. This policy is defined on the archive server (policy
engine).

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 639


Data Archiving Lesson

After the files are identified for archiving, the archive server creates the index for
the files. Once the files have been indexed, they are moved to the archive storage
and small stub files are left on the primary storage. In other words, each archived
file on primary storage is replaced with a stub file. The stub file contains the
address of the archived file. As the size of the stub file is small, it saves space on
primary storage.

From the perspective of a client, the data movement from primary storage to
secondary storage is transparent.

Information Storage and Management (ISM) v4

Page 640 © Copyright 2019 Dell Inc.


Data Archiving Lesson

Use Case: Email Archiving

 Emails are a part of business processes.


 They represent a correspondence between
two or more parties and are immutable
after generation.
 Email archiving is the process of archiving
emails from the mail server to an archive
storage.

 After the email is archived, it is retained for


years, based on the retention policy.

Legal Dispute

Email archiving helps an organization to address legal disputes.

For example, an organization may be involved in a legal dispute. They need to


produce all emails within a specified time period containing specific keywords that
were sent to or from certain people.

Government Compliance

Email archiving helps to meet government compliance requirements such as


Sarbanes-Oxley and SEC regulations.

For example, an organization may need to produce all emails from all individuals
that are involved in stock sales or transfers. Failure to comply with these
requirements could cause an organization to incur penalties.

Mailbox Space Savings

Email archiving provides more mailbox space by moving old emails to archive
storage.

For example, an organization may configure a quota on each mailbox to limit its
size. A fixed quota for a mailbox forces users to delete emails as they approach the
quota size. However, users often need to access emails that are weeks, months, or

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 641


Data Archiving Lesson

even years old. With email archiving, organizations can free up space in user
mailboxes and still provide user access to older emails.

Information Storage and Management (ISM) v4

Page 642 © Copyright 2019 Dell Inc.


Data Archiving Lesson

Purpose-Built Archive Storage – CAS

Content addressed storage (CAS) is an object-based storage device that is


purposely built for storing and managing fixed data.

 Each object that is stored in CAS is assigned a globally unique content address
(digital fingerprint of the content).
 Application server accesses the CAS device through the CAS API.

Network

Client

Application CAS
Server

CAS API

Notes

CAS stores user data and its attributes as an object. The stored object is assigned
a globally unique address, which is known as a content address (CA). This address
is derived from the binary representation of an object. Content addressing
eliminates the need for application servers to understand and manage the physical
location of objects on a storage system.

Content address (digital fingerprint of the content) not only simplifies the task of
managing huge number of objects, but also ensures content authenticity. The
application server can access the CAS device only through the CAS API.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 643


Data Archiving Lesson

Information Storage and Management (ISM) v4

Page 644 © Copyright 2019 Dell Inc.


Data Archiving Lesson

Cloud-Based Archiving

Organizations prefer hybrid cloud options. Archived data that may require high-
speed access is retained internally (private cloud) while lower-priority archive data
is moved to low-cost, public cloud-based archive storage.

 No CAPEX, pay-as-you-go, faster deployment


 Reduced management overhead of IT
 Supports massive data growth and retention requirements

Primary Storage
VM VM

Archive
Data
Hypervisor
Network WAN Cloud

Email/File Server

Archiving Server
(Policy Engine)
Data Center

Notes

In a traditional in-house data archiving model, archiving systems and underlying


infrastructure are deployed and managed within an organization’s data center. Due
to exponential data growth, organizations are facing challenges with increased cost
and complexity in their archiving environment. Often an existing infrastructure is
siloed by architecture and policy. Organizations are looking for new ways to
improve the agility and the scalability of their archiving environments.

Cloud computing provides highly scalable and flexible computing that is available
on demand. It empowers self-service requesting through a fully automated request-
fulfillment process in the background. It provides capital cost savings and agility to

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 645


Data Archiving Lesson

organizations. With cloud-based archiving, organizations are required to pay as


they use and can scale the usage as needed. It also enables the organization to
access their data from any device and any location.

Typically a cloud-based archiving service is designed to classify, index, search, and


retrieve data in a security-rich manner. It automates regulatory monitoring and
reporting. It also enables organizations to consistently enforce the policies for the
centralized cloud archive repository. Hybrid cloud archiving is one step toward the
cloud from the traditional in-house approach. Archived data that may require high-
speed access is retained internally. while lower-priority archive data is moved to
low-cost, public cloud-based archive storage.

Information Storage and Management (ISM) v4

Page 646 © Copyright 2019 Dell Inc.


Migration Lesson

Migration Lesson

Introduction

This lesson presents importance of data migrations and various types of data
migration. This lesson also focuses on Disaster Recovery as a Service (DRaaS).

This lesson covers the following topics:


 Types of data migrations
 Disaster Recovery as a Service (DRaaS)

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 647


Migration Lesson

Migration

Video: Data Migration

The video is located at


https://edutube.emc.com/Player.aspx?vno=nb2RfSXSY/hFxiyjyc7vUg

Information Storage and Management (ISM) v4

Page 648 © Copyright 2019 Dell Inc.


Migration Lesson

Data Migration

Definition: Data Migration


Involves the transfer of data between hosts (physical or virtual),
storage devices, or formats.

 In today’s competitive business environment, IT organizations should require


non-disruptive live migration solutions in place to meet the required SLAs
 Organization deploys data migration solutions for the following reasons:

 Data center maintenance without downtime


 Disaster avoidance
 Technology refresh
 Data center migration or consolidation
 Workload balancing across data centers (multiple sites)

Notes

Traditionally, migrating data and applications within or between data centers


involved a series of manual tasks and activities. IT would either make physical
backups or use data replication services to transfer applications and data to an
alternate location. Applications had to be stopped and could not be restarted until
testing and verification were complete. In today’s competitive business
environment, IT organizations should require non-disruptive live migration solutions
in place to meet the required SLAs.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 649


Migration Lesson

Storage System-Based Migration

 Moves data between heterogeneous storage systems


 Storage system that performs migration is called as control storage system
 Push: Data is pushed from control system to remote system
 Pull: Data is pulled to the control system from remote system

Push

Control
Remote
Device
Device
SAN

Pull

Control Storage Remote Storage


System System

Notes

Storage system-based migration moves data between heterogeneous storage


systems. This technology is application and server-operating-system independent
because the migration operations are performed by one of the storage systems.
The storage system that performs migration operations is called as control storage
system. Data can be moved from/to the devices in the control storage system
to/from a remote storage system.

Data migration solutions perform push and pull operations for data movement.
These terms are defined from the perspective of the control storage system. In the
push operation, data is moved from the control storage system to the remote
storage system. In the pull operation, data is moved from the remote storage
system to the control storage system.

Information Storage and Management (ISM) v4

Page 650 © Copyright 2019 Dell Inc.


Migration Lesson

During the push and pull operations, compute system’s access to the remote
device is not enabled. Since, the control storage system has no control over the
remote storage and cannot track any change on the remote device. Data integrity
cannot be guaranteed if changes are made to the remote device during the push
and pull operations. The push/pull operations can be either hot or cold. These
terms apply to the control devices only.

In a cold operation, the control device is inaccessible to the compute system during
migration. Cold operations guarantee data consistency because both the control
and the remote devices are offline. In a hot operation, the control device is online
for compute system operations. During hot push/pull operations, changes can be
made to the control device. Since, the control storage system can keep track of all
changes, and thus ensure data integrity.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 651


Migration Lesson

Virtualization Appliance-Based Migration

 Virtualization layer handles the migration of data


 Enables LUNs to remain online and accessible by compute system while
data is migrating
 Support data migration between multivendor heterogeneous storage systems
 Service provider could implement to migrate the customer data from their
storage system to a cloud-based storage
 Example:

 An administrator wants to perform a data migration from storage system A to


system B as shown in the illustration
 The virtualization layer handles the migration of data, which enables LUNs
to remain online and accessible while data is migrating
 In this case, physical changes are not required because the compute system
still points to the same virtual volume on the virtualization layer
 However, the mapping information resides on the appliance should be
changed. These changes can be run dynamically and made transparent to
the user

Information Storage and Management (ISM) v4

Page 652 © Copyright 2019 Dell Inc.


Migration Lesson

VM VM

Notes
Hypervisor
Compute
Data migration can also System

be implemented using a
virtualization appliance
at the SAN. Virtualization Virtual Volume Virtualization
Appliance
appliance provides a
translation layer in the
SAN, between the
FC SAN
compute systems and Storage Storage
System A System B
the storage systems. Data Migration

The LUNs created at the


storage systems are
assigned to the LUN LUN

appliance. The appliance


abstracts the identity of
these LUNs and creates
a storage pool by aggregating LUNs from the storage systems.

A virtual volume is created from the storage pool and assigned to the compute
system. When an I/O is sent to a virtual volume, it is redirected through the
virtualization layer to the mapped LUNs. The key advantage of using virtualization
appliance is to support data migration between multivendor heterogeneous storage
systems.

In a cloud environment, the service provider could also implement virtualization-


based data migration. They migrate the customer data from their storage system to
a shared storage used by the service provider. This approach enables the
customer to migrate without causing downtime to their applications and users
during the migration process. The providers themselves perform this data migration
without the need to go for a third-party data migration specialist.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 653


Migration Lesson

Hypervisor-Based Migration: VM Migration

Running services on VMs are moved from one physical compute system to another
without any downtime:
 Enables scheduled maintenance without any downtime
 Facilitates VM load balancing

Services
Migrated VMs

VM VM VM VM
VM Migration

Services

Compute System 1 Compute System 2

Hypervisor Hypervisor

Network VM Live Migration with Hypervisor Cluster

Storage
System

Notes

Organizations using a virtualized infrastructure have many reasons to move


running VMs from one physical compute system to another. The compute systems
can be located within a data center or across data centers. The migration can be
used for routine maintenance, and VM distribution across sites to balance system
load.

The migration can also be used for disaster recovery, or consolidating VMs onto
fewer physical compute systems. The ideal virtual infrastructure platform should
enable organizations to move the running VMs as quickly as possible and with
minimal impact on the users. This can be achieved with the help of implementing
VM live migrations.

Information Storage and Management (ISM) v4

Page 654 © Copyright 2019 Dell Inc.


Migration Lesson

In a VM live migration the entire active state of a VM is moved from one hypervisor
to another. The state information includes memory contents and all other
information that identifies the VM. This method involves copying the contents of VM
memory from the source hypervisor to the target. Then transferring the control of
the VM’s disk files to the target hypervisor. Next, the VM is suspended on the
source hypervisor, and the VM is resumed on the target hypervisor.

Performing VM live migration requires a high-speed network connection. It is


important to ensure that even after the migration, the VM network identity and
network connections are preserved. VM live Migration with stretched cluster
provides the ability to move VMs across data centers. This solution is suitable for
cloud environment, where consumers of a given application are spread across the
globe and working in different time zones. If an application is closer to the
consumers, then the productivity is enhanced to a great extent.

Live migration with stretched cluster provides the ability to move VMs and
applications to a location that is closest to the consumer for faster/reliable access.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 655


Migration Lesson

Hypervisor-Based Migration: VM Storage Migration

 Migrates VM files from one storage system to another without any service
disruption

– Simplify array migration and storage upgrades


– Dynamically optimize storage I/O performance
– Efficiently manage storage capacity

VM VM

Notes
Hypervisor
Compute
System
In a VM storage migration, VM
files are moved from one storage
Network system to another system without
any downtime. This approach
VM
enables the administrator to
move VM files across dissimilar
VM storage systems. VM storage
VM
migration starts by copying the
VM metadata about the VM from the
Storage Systems
source system to the target
storage system. The metadata
essentially consists of
configuration, swap, and log files.
After the metadata is copied, the VM disk file is moved to the new location. During
migration, there might be a chance that the source is updated. It is necessary to
track the changes on the source to maintain data integrity. After the migration is
completed, the blocks that have changed since the migration has started are
transferred to the new location.

Information Storage and Management (ISM) v4

Page 656 © Copyright 2019 Dell Inc.


Migration Lesson

The key benefits of VM storage migration are:

 Simplify array migration and storage upgrades: The traditional process of


moving data to new storage is cumbersome, time-consuming, and disruptive.
With VM storage migration, organization can make it easier and faster to
embrace new storage platforms. This is to adopt flexible leasing models, retire
older systems, and conduct storage upgrades.

 Dynamically optimize storage I/O performance: With storage migration, IT


administrators can move VM disk files to alternative LUNs that are properly
configured to deliver optimal performance. This migration avoids scheduled
downtime, eliminating the time and cost associated with traditional methods.

 Efficiently manage storage capacity: Nondisruptive VM disk file migration to


different classes of storage enables cost-effective management of VM disks as
part of a tiered storage strategy.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 657


Migration Lesson

Disaster Recovery as a Service (DRaaS)

 Enables organizations to have a DR site in the cloud


 Service provider offers resources to run consumer’s IT services in the cloud
during disaster
 Pay-as-you-go pricing model
 Resources at the service provider location may be dedicated to the consumer,
or they can be shared
 During normal production operations, IT services run at the consumer’s
production data center
 If there is a disaster, the business operations failover to the provider’s
infrastructure

Notes

Organizations need to rely on business continuity processes to mitigate the impact


of service disruptions due to disaster. Traditional disaster recovery methods often
require buying and maintaining a complete set of IT resources at secondary data
centers. This IT resources should match the business-critical systems at the
primary data center. This includes sufficient storage to house a complete copy of all
business data at the secondary site. This may be a complex process and
expensive solution for organizations.

Disaster Recovery-as-a-Service (DRaaS) has emerged as a solution that offers a


viable DR solution to organizations. DRaaS enables organizations to have a DR
site in the cloud. The cloud service provider assumes the responsibility for
providing IT resources to enable organizations to continue running their IT services
if there is a disaster. Resources at the service provider’s location may either be
dedicated to the consumer or they can be shared.

From organizations (consumers) perspective, having a DR site in the cloud reduces


the need for data center space and IT infrastructure. This approach leads to
significant cost reductions, and eliminates the need for upfront capital expenditure.
DRaaS is gaining popularity among organizations. This is due to its pay-as-you-go

Information Storage and Management (ISM) v4

Page 658 © Copyright 2019 Dell Inc.


Migration Lesson

pricing model and the use of automated virtual platforms. This can lower costs and
minimize the recovery time after a failure. During normal production operations, IT
services run at the organization’s production data center. Replication of data occurs
from the organization’s production environment to the cloud over the network.

Typically during normal operating conditions, a DRaaS implementation may only


need a small share of resources. This helps to synchronize the application data
and VM configurations from the consumer’s site to the cloud. The full set of
resources required to run the application in the cloud is consumed only if a disaster
occurs. If there is a business disruption or disaster, the business operations failover
to the provider’s infrastructure.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 659


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 Dell EMC NetWorker
 Dell EMC Avamar
 Dell EMC Data Domain
 Dell EMC Integrated Data Protection Appliance
 Dell EMC SRDF
 Dell EMC TimeFinder SnapVX

Information Storage and Management (ISM) v4

Page 660 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts In Practice

Concepts in Practice

Dell EMC NetWorker

 Software that centralizes, automates, and accelerates data backup and


recovery
 Delivers enterprise-class performance and security to meet even the most
demanding service level requirements
 Supports source-based and target-based deduplication capabilities by
integrating with Dell EMC Avamar and Dell EMC Data Domain respectively

Dell EMC Avamar

 Disk-based backup and recovery solution that provides inherent source-based


deduplication
 Uses variable-length deduplication, which significantly reduces backup time by
only storing unique daily changes
 Provides various options for backup, including guest OS-level backup and
image-level backup
 Data is encrypted and deduplicated to secure and minimize the network
bandwidth consumption

Dell EMC Data Domain

 A target-based data deduplication solution


 Data Domain Boost software increases the backup performance by distributing
parts of deduplication process to the backup server
 Provides secure multitenancy

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 661


Concepts in Practice Lesson

 Supports backup and archive in a single system


 Supports low-cost disaster recovery to the cloud

Dell EMC Integrated Data Protection Appliance

 Pre-integrated protection storage and software for comprehensive, modern


protection and faster time to value
 Extends data protection seamlessly to private and public clouds
 Flash-enabled for faster performance and instant recoverability
 Protection for modern applications and optimized for VMware virtual
Environments

Dell EMC SRDF

 Remote replication solution that provides DR and data mobility solutions for
PowerMax (VMAX) storage system
 Provides the ability to maintain multiple, host-independent, remotely mirrored
copies of data

SRDF family includes:

 SRDF/S and SRDF/A


 SRDF/DM
 SRDF/AR
 Concurrent and Cascaded SRDF

Dell EMC TimeFinder SnapVX

 Creates a PIT copy of a source LUN


 Uses redirect on first write technology

Information Storage and Management (ISM) v4

Page 662 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

 Provides a new option to secure snaps against accidental or internal deletion


 Provides instant restore which means when a LUN level restore is initiated, the
restored view is available immediately

Dell EMC NetWorker

Backup and recovery software which centralizes, automates, and accelerates data
backup and recovery operations. The following are key features of NetWorker:

 Supports heterogeneous platforms such as Windows, UNIX, Linux, and also


virtual environments
 Supports different backup targets – tapes, disks, Data Domain purpose-built
backup appliance, and virtual tapes
 Supports multiplexing (or multi-streaming) of data
 Delivers enterprise-class performance and security to meet even the most
demanding service level requirements
 Provides both source-based and target-based deduplication capabilities by
integrating with DELL EMC Avamar and DELL EMC Data Domain respectively
 The cloud-backup option in NetWorker enables backing up data to public cloud
configurations

Dell EMC Avamar

A disk-based backup and recovery solution that provides inherent source-based


data deduplication. With its unique global data deduplication feature, Avamar
differs from traditional backup and recovery solutions by identifying and storing only
unique sub-file data. Avamar employs variable-length deduplication, which
significantly reduces backup time by only storing unique daily changes while
maintaining daily full backups for immediate, single-step restore.

DELL EMC Avamar provides a variety of options for backup, including guest OS-
level backup and image-level backup. The three major components of an Avamar
system include Avamar server, Avamar backup clients, and Avamar administrator.
Avamar server provides the essential processes and services required for client
access and remote system administration. The Avamar client software runs on
each compute system that is being backed up. Avamar administrator is a user

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 663


Concepts in Practice Lesson

management console application that is used to remotely administer an Avamar


system.

DELL EMC Data Domain

DELL EMC Data Domain deduplication storage systems continue to revolutionize


disk backup, archiving, and disaster recovery with high-speed, inline deduplication.
DELL EMC Data Domain deduplication storage system is a target-based data
deduplication solution. Using high-speed, inline deduplication technology, the Data
Domain system provides a storage footprint that is significantly smaller on average
than that of the original data set.

DELL EMC Data Domain Boost software significantly increases backup


performance by distributing the parts of the deduplication process to the backup
server. With Data Domain Boost, only unique, compressed data segments are sent
to a Data Domain system. For archiving and compliance solutions, Data Domain
systems allow customers to cost-effectively archive non-changing data while
keeping it online for fast, reliable access and recovery.

DELL EMC Data Domain Extended Retention is a solution for long-term retention
of backup data. It is designed with an internal tiering approach to enable cost-
effective, long-term retention of data on disk by implementing deduplication
technology. Data Domain provides secure multi-tenancy that enables data
protection-as-a-service for large enterprises and service providers who are looking
to offer services based on Data Domain in a private or public cloud. With secure
multi-tenancy, a Data Domain system will logically isolate tenant data, ensuring that
each tenant’s data is only visible and accessible to them.

DELL EMC Data Domain Replicator software transfers only the deduplicated and
compressed unique changes across any IP network, requiring a fraction of the
bandwidth, time, and cost, compared to traditional replication methods. Data
Domain Cloud DR (DD CDR) allows enterprises to copy backed-up VMs from their
on-premise Data Domain and Avamar environments to the public cloud.

Dell EMC Integrated Data Protection Appliance

A pre-integrated, turnkey solution that is simple to deploy and scale, provides


comprehensive protection for a diverse application ecosystem, and comes with
native cloud tiering for long-term retention. IDPA combines protection storage,

Information Storage and Management (ISM) v4

Page 664 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

protection software, search, and analytics to reduce the complexity of managing


multiple data silos, point solutions, and vendor relationships.

IDPA is an innovative solution that provides support for modern applications like
MongoDB and MySQL, and is optimized for VMware. It is also built on industry
proven data invulnerability architecture, delivering encryption, fault detection, and
healing.

Dell EMC SRDF

SRDF, which stands for Symmetrix Remote Data Facility), is a family of software
that is the industry standard for remote replication in mission-critical environments.
Built for the industry-leading highend PowerMax (VMAX) hardware architecture, the
SRDF family of solutions is trusted globally for disaster recovery and business
continuity.

The SRDF family offers unmatched deployment flexibility and massive scalability to
deliver a wide range of distance replication capabilities.

SRDF consists of the following options:


 SRDF/S (synchronous option for zero data exposure loss)
 SRDF/A (asynchronous option for extended distances)
 SRDF/Star (multi-site replication option)
 SRDF/CG (consistency groups for federated data sets across arrays)
 SRDF/Metro (for active/active data center protection)

Dell EMC TimeFinder SnapVX

Enables zero-impact snapshots, simple user-defined names, faster and secure


snapshot creation/expiration, cascading, compatibility with SRDF, and support for
legacy VMAX replication modes. SnapVX reduces replication storage costs by up
to 10x and is optimized for cloud scale with its highly efficient snaps. Customers
can take up to 256 snapshots and establish up to 1024 target volumes per source
device, providing read/write access as pointer (snap) or full (clone) copies.

SnapVX also provides a new option to secure snaps against accidental or internal
deletion. It provides instant restore which means when a LUN level restore is

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 665


Concepts in Practice Lesson

initiated, the restored view is available immediately. Snapshot provides point-in-


time data copies for backups, testing, decision support, and data recovery.

Information Storage and Management (ISM) v4

Page 666 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts in Practice

Dell EMC RecoverPoint

 Enable continuous data protection for any PIT recovery to optimize RPO and
RTO
 Ensure recovery consistency for interdependent applications
 Provide synchronous or asynchronous replication policies
 Reduce WAN bandwidth consumption and utilize available bandwidth optimally
 Offer multisite support

Dell EMC Power Vault

 Simplifies data backup and archive by easily integrating the LTO family of tape
drives into your data center
 It’s lower power consumption makes it an ideal part of a cloud physical
infrastructure build-out
 Linear Tape File System (LTFS) support removes software incompatibilities,
creating portability between different vendors and operating systems

Dell EMC SourceOne

 Archiving software that helps organizations to archive aging emails, files, and
the Microsoft SharePoint content to the appropriate storage tiers

SourceOne family of products includes:

 DELL EMC SourceOne Email Management


 DELL EMC SourceOne for Microsoft SharePoint
 DELL EMC SourceOne for File Systems
 DELL EMC SourceOne Email Supervisor

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 667


Concepts in Practice Lesson

VMware vCloud Air Disaster Recovery

Recovery-as-a-service offering which:

 Provides simple, affordable protection in the cloud for your vSphere


environment
 Offers enhanced recovery times for business and mission-critical applications
running on vSphere
 Offers scalable disaster recovery protection capacity in the cloud to address the
changing business requirements

VMware vMotion

 Performs live migration of a running VM from one physical server to another,


without any downtime
 VM retains its network identity and connections, ensuring a seamless migration
process
 Enables to perform maintenance without disrupting business operations

VMware Storage vMotion

 Enables live migration of VM disk files within and across storage systems
without any downtime
 Performs zero-downtime storage migrations with complete transaction integrity
 Migrates the disk files of VMs running any supported operating system on any
supported server hardware

Dell EMC RecoverPoint

Provides continuous data protection for comprehensive operational and disaster


recovery. It supports major 3rd party arrays via VPLEX.

Information Storage and Management (ISM) v4

Page 668 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

RecoverPoint delivers benefits including the ability to:


 Enable continuous data protection for any PIT recovery to optimize RPO and
RTO
 Ensure recovery consistency for interdependent applications
 Provide synchronous or asynchronous replication policies
 Reduce WAN bandwidth consumption and utilize available bandwidth optimally
 Offer multisite support

Dell EMC PowerVault

Simplifies data backup and archive by easily integrating the LTO family of tape
drives into your data center. Supporting TBs of native capacity on a single
cartridge, LTO drives provide decades of shelf life for industries and tasks that
need reliable, long-term, large-capacity data retention, such as:

 Healthcare imaging
 Media and entertainment
 Video surveillance
 Geophysical (oil and gas) data
 Computational analysis, such as genome mapping and event simulations

Its lower power consumption makes it an ideal part of a cloud physical


infrastructure build-out. Linear Tape File System (LTFS) support removes software
incompatibilities, creating portability between different vendors and operating
systems to extend the life of your infrastructure investments.

Dell EMC SourceOne

A family of archiving software. It helps organizations to reduce the burden of aging


emails, files, and Microsoft SharePoint content by archiving them to the appropriate
storage tier. SourceOne helps in meeting the compliance requirements by
managing emails, files, and SharePoint content as business records and enforcing
retention/disposition policies.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 669


Concepts in Practice Lesson

The SourceOne family of products includes:


 DELL EMC SourceOne Email Management for archiving email messages and
other items
 DELL EMC SourceOne for Microsoft SharePoint for archiving SharePoint
content
 DELL EMC SourceOne for File Systems for archiving files from file servers
 DELL EMC SourceOne Email Supervisor for monitoring corporate email policy
compliance

VMware vCloud Air Disaster Recovery

A DRaaS offering owned and operated by VMware, built on vSphere Replication


and vCloud Air – a hybrid cloud platform for infrastructure-as-a-service (IaaS).
Disaster Recovery leverages vSphere Replication to provide robust, asynchronous
replication capabilities at the hypervisor layer. This approach towards replication
helps in easy configuration of virtual machines in vSphere for disaster recovery,
without depending on underlying infrastructure hardware or data center mirroring.
Per-virtual-machine replication and restore granularity further provide the ability to
meet dynamic recovery objectives without overshooting the actual business
requirements for disaster recovery as they change.

VMware vMotion

Performs live migration of a running virtual machine from one physical server to
another, without downtime. The virtual machine retains its network identity and
connections, ensuring a seamless migration process. Transferring the virtual
machine's active memory and the precise execution state over a high-speed
network, allows the virtual machine to move from one host to another. This entire
process takes less time on a gigabit Ethernet network.

vMotion provides the following benefits:


 Perform hardware maintenance without scheduling downtime or disrupting
business operations
 Move virtual machines away from failing or underperforming servers
 Allows vSphere DRS to balance VMs across hosts

Information Storage and Management (ISM) v4

Page 670 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

VMware Storage vMotion

Enables live migration of virtual machine disk files within and across storage
systems without service disruptions. Storage vMotion performs zero-downtime
storage migrations with complete transaction integrity. It migrates the disk files of
virtual machines running any supported operating system on any supported server
hardware. It performs live migration of virtual machine disk files across any Fibre
Channel, iSCSI, FCoE, and NFS storage system supported by VMware vSphere. It
allows to redistribute VMs or virtual disks to different storage systems or volumes to
balance capacity or improve performance.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 671


Concepts in Practice Lesson

Assessment

1. Which is a period during which a production volume is available to perform


backup?

A. RTO

B. RPO

C. Backup Window

D. Backup Media

2. Which provides the ability to create fully populated point-in-time copies of LUNs
within a storage system or create a copy of an existing VM?

A. Clone

B. Snapshot

C. Pointer-based virtual replica

D. LUN masking

3. Which factor impacts the deduplication ratio in a backup environment?

A. Retention period

Information Storage and Management (ISM) v4

Page 672 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

B. Type of backup media

C. Type of backup server

D. Value of data

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 673


Summary

Summary

Information Storage and Management (ISM) v4

Page 674 © Copyright 2019 Dell Inc.


Storage Infrastructure Security

Introduction

This module focuses on information security goals and key terminologies. This
module also focuses on the three storage security domains and key threats across
the domains. Further, this module focuses on the various security controls that
enable an organization to mitigate these threats. Finally, this module focuses on
the governance, risk, and compliance (GRC) aspect in a data center environment.

Upon completing this module, you will be able to:


 Explain information security goals and terminologies
 List storage security domains and threats in storage infrastructure
 Describe governance, risk, and compliance

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 675


Introduction to Information Security Lesson

Introduction to Information Security Lesson

Introduction

This lesson covers goals of information security, security concepts and their
relations, and defense-in-depth strategy. The lesson also focuses on the
governance, risk, and compliance (GRC) aspect in a data center environment.

This lesson covers the following topics:


 Goals of information security
 Security concepts
 Defense-in-depth strategy
 Governance, Risk and Compliance

Information Storage and Management (ISM) v4

Page 676 © Copyright 2019 Dell Inc.


Introduction to Information Security Lesson

Introduction to Information Security

Introduction to Information Security

Definition: Information Security


It includes a set of practices that protect information and information
systems from unauthorized access, use, destruction, deletion,
modification, and disruption.
Source: US Federal law (Title 38 Part IV, Chapter 57, Subchapter III
USC 5727)

 Information is an organization’s most valuable asset


 Organizations are transforming to modern technologies infrastructure
 Cloud is one of the core elements of the modern technologies
 Trust is one of the key concerns for consumers using modern technologies
o Trust = Visibility + Control
 Securing infrastructure is important for the platform of most of the technological
environment

Notes

Information is an organization’s most valuable asset. This information, including


intellectual property, personal identities, and financial transactions, is routinely
processed and stored in storage systems, which are accessed through the
network. As a result, storage is now more exposed to various security threats that
can potentially damage business-critical data and disrupt critical services.
Organizations deploy various tools within their infrastructure to protect the asset.
These tools must be deployed on various infrastructure assets, such as compute
(processes information), storage (stores information), and network (carries
information) to protect the information.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 677


Introduction to Information Security Lesson

As organizations are adopting modern technologies, in which cloud is a core


element, one of the key concerns they have is ‘trust’. Trust depends on the degree
of control and visibility available to the information’s owner. Therefore, securing
storage infrastructure has become an integral component of the storage
management process in modern technological environment. It is an intensive and
necessary task, essential to manage, and protect vital information.

Information security includes a set of practices that protect information and


information systems from unauthorized disclosure, access, use, destruction,
deletion, modification, and disruption.

Information security involves implementing various kinds of safeguards or controls


to lessen the risk of an exploitation or a vulnerability in the information system. The
risk and the vulnerabilities could otherwise cause a significant impact to
organization’s business. From this perspective, security is an ongoing process, not
static, and requires continuous re-validation and modification. Securing the storage
infrastructure begins with understanding the goals of information security.
Information security is vital for every business organization.

Information Storage and Management (ISM) v4

Page 678 © Copyright 2019 Dell Inc.


Introduction to Information Security Lesson

Goals of Information Security

Confidentiality

C Ensures the secrecy of information

Integrity

I Ensures no unauthorized changes to the information

Availability
A Ensures that the resources are always available to authorized
users

Accountability
A Users or the applications are responsible for the
actions

The goals of information security are:


 CIA
 Confidentiality
 Integrity
 Availability
 Accountability

Notes

The goal of information security is to provide Confidentiality, Integrity, and


Availability, commonly referred to as the security triad, or CIA:

 Confidentiality provides the required secrecy of information to ensure that only


authorized users have access to data.
 Integrity ensures that unauthorized changes to information are not allowed. The
objective of ensuring integrity is to detect and protect against unauthorized
alteration or deletion of information.
 Availability ensures that authorized users have reliable and timely access to
compute, storage, network, application, and data resources.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 679


Introduction to Information Security Lesson

Ensuring confidentiality, integrity, and availability are the primary objective of any IT
security implementation. These goals are supported by using authentication,
authorization, and auditing processes.

Accountability is another important principle of information security. It refers to the


process where the users or applications are responsible for the actions or events
that are executed on the systems. Accountability can be achieved by auditing logs.

Information Storage and Management (ISM) v4

Page 680 © Copyright 2019 Dell Inc.


Introduction to Information Security Lesson

Authentication, Authorization, and Auditing

Process to ensure ‘users’ or ‘assets’ are who they claim to


Authentication be

Process to determine the privileges that a user has, example:


Authorization
read/write or read only

Logging of all transactions to assess the effectiveness of the


Auditing security controls

Notes

Authentication, authorization, and auditing also referred as AAA plays an important


role in protecting the customers data in a multitenant cloud environment:
 Authentication is a process to ensure that ‘users’ or ‘assets’ are who they claim
to be by verifying their identity credentials. The user has to prove identity to the
provider to access the data stored. A user may be authenticated using a single-
factor or multifactor method. Single-factor authentication involves the use of
only one factor, such as a password. Multifactor authentication uses more than
one factor to authenticate a user.
 Authorization is a process of determining the privileges that a
user/device/application has, to access a particular service or a resource. For
example, a user with administrator’s privileges is authorized to access more
services or resources compared to a user with non-administrator privileges. For
example, the administrator can have ‘read/write’ access and a normal user can
have ‘read-only’ access. Authorization should be performed only if the
authentication is successful. The most common authentication and
authorization controls, used in a data center environment are Windows Access
Control List (ACL), UNIX permissions, Kerberos, and Challenge-Handshake

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 681


Introduction to Information Security Lesson

Authentication Protocol (CHAP). It is essential to verify the effectiveness of


security controls that are deployed with the help of auditing.
 Auditing refers to the logging of all transactions for the purpose of assessing the
effectiveness of security controls. It helps to validate the behavior of the
infrastructure components, and to perform forensics, debugging, and monitoring
activities.

For example: In cloud computing, a customer can access the cloud service catalog
using the credentials. Once the customer is authenticated, a different view of the
catalog is provided along with different options, based on the privileges assigned.
Administrator can have a different view of the catalog compared to a normal user.
The number of times a customer has logged in to the catalog is audited for
monitoring purposes.

Information Storage and Management (ISM) v4

Page 682 © Copyright 2019 Dell Inc.


Introduction to Information Security Lesson

Security Concepts and Relationships

Threat Agent

Gives rise to

Threat

Wish to
That exploits
abuse/or may
damage Owner
Vulnerabilities

Leading to

Risk Countermeasures
To Imposes
To reduce

Asset
Values

Notes

The figure shows relationship among various security concepts in a data center
environment. An organization (owner of the asset) wants to safeguard the asset
from threat agents (attackers) who seek to abuse the assets. Risk arises when the
likelihood of a threat agent (an attacker) to exploit the vulnerability arises.
Therefore, the organizations deploy various countermeasures to minimize risk by
reducing the vulnerabilities.

Risk assessment is the first step to determine the extent of potential threats and
risks in an infrastructure. The process assesses risk and helps to identify
appropriate controls to mitigate or eliminate risks. Organizations must apply their
basic information security and risk-management policies and standards to their
infrastructure.

Some of the key security areas that an organization must focus on while building
the infrastructure are: authentication, identity and access management, data loss
prevention and data breach notification, governance, risk, and compliance (GRC),
privacy, network monitoring and analysis, security information and event logging,
incident management, and security management.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 683


Introduction to Information Security Lesson

Security Concepts

The following are important security concepts:

Security Assets

 Information, hardware, and software


 Security considerations:

 Must provide easy access to authorized users


 Must be difficult for potential attackers to compromise
 Cost of securing the assets should be a fraction of the value of the assets

Security Threats

 Potential attacks that can be carried out


 Attacks can be classified as:

 Passive attacks attempt to gain unauthorized access into the system


 Active attacks attempt data modification, Denial of Service (DoS), and
repudiation attacks

Security Vulnerabilities

 A weaknesses that an attacker exploits to carry out attacks


 Security considerations:
 Attack surface
 Attack vectors
 Work factor
 Managing vulnerabilities:

 Minimize the attack surface


 Maximize the work factor
 Install security controls

Information Storage and Management (ISM) v4

Page 684 © Copyright 2019 Dell Inc.


Introduction to Information Security Lesson

Security Controls

 Reduce the impact of vulnerabilities


 Controls can be:
 Technical: antivirus, firewalls, and IDPS
 Non-technical: administrative policies and physical controls
 Controls are categorized as:

 Preventive
 Detective
 Corrective

Security Assets Notes

Information is one of the most important assets for any organization. Other assets
include hardware, software, and other infrastructure components required to
access the information. To protect these assets, organizations deploy security
controls. These security controls have two objectives.

 The first objective is to ensure that the resources are easily accessible to
authorized users.
 The second objective is to make it difficult for potential attackers to access and
compromise the system.

The effectiveness of a security control can be measured by two key criteria. One,
the cost of implementing the system should be a fraction of the value of the
protected data. Two, it should cost heavily to a potential attacker, in terms of
money, effort, and time, to compromise and access the assets.

Security Threats Notes

Threats are the potential attacks that can be carried out on an IT infrastructure.
These attacks can be classified as active or passive. Passive attacks are attempts
to gain unauthorized access into the system. Passive attacks pose threats to
confidentiality of information. Active attacks include data modification, denial of
service (DoS), and repudiation attacks. Active attacks pose threats to data integrity,
availability, and accountability.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 685


Introduction to Information Security Lesson

Security Vulnerabilities Notes

Vulnerability is a weakness of any information system that an attacker exploits to


carry out an attack. The components that provide a path enabling access to
information are vulnerable to potential attacks. It is important to implement
adequate security controls at all the access points on these components.

Attack surface, attack vector, and work factor are the three factors to consider
when assessing the extent to which an environment is vulnerable to security
threats. Attack surface refers to the various entry points that an attacker can use to
launch an attack, which includes people, process, and technology. For example,
each component of a storage infrastructure is a source of potential vulnerability. An
attack vector is a step or a series of steps necessary to complete an attack. For
example, an attacker might exploit a bug in the management interface to execute a
snoop attack. Work factor refers to the amount of time and effort required to exploit
an attack vector.

Having assessed the vulnerability of the environment, organizations can deploy


specific control measures. Any control measure should account for three aspects:
people, process, technology, and the relationships among them.

Security Controls Notes

The security controls are directed at reducing vulnerability by minimizing the attack
surfaces and maximizing the work factors. These controls can be technical or non-
technical. Controls are categorized as preventive, detective, and corrective.
 Preventive: Avoid problems before they occur
 Detective: Detect a problem that has occurred
 Corrective: Correct the problem that has occurred

Organizations should deploy defense-in-depth strategy when implementing the


controls.

Information Storage and Management (ISM) v4

Page 686 © Copyright 2019 Dell Inc.


Introduction to Information Security Lesson

Defense-in-Depth

Definition: Defense-in-Depth
A strategy in which multiple layers of defense are deployed
throughout the infrastructure to help mitigate the risk of security
threats in case one layer of the defense is compromised.

Storage Security
(Encryption, Zoning, etc.)

Compute Security
(Hardening, Malware Protection Software, etc.)

Network Security
(Firewall, DMZ, etc.)

Remote Access Control


(VPN, Authentication, etc.)

Perimeter Security
(Physical Security)

 Also known as a “layered approach” to security


 Provides organizations additional time to detect and respond to an attack

 Reduces the scope of a security breach

Notes

An organization should deploy multiple layers of defense throughout the


infrastructure to mitigate the risk of security threats, in case one layer of the
defense is compromised. This strategy is referred to as defense-in-depth. This
strategy may also be thought of as a “layered approach to security” because there
are multiple measures for security at different levels. Defense-in-depth increases
the barrier to exploitation—an attacker must breach each layer of defenses to be

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 687


Introduction to Information Security Lesson

successful—and thereby provides additional time to detect and respond to an


attack.

This potentially reduces the scope of a security breach. However, the overall cost
of deploying defense-in-depth is often higher compared to single-layered security
controls. An example of defense-in-depth could be a virtual firewall installed on a
hypervisor when there is already a network-based firewall deployed within the
same environment. This provides additional layer of security reducing the chance
of compromising hypervisor’s security if network-level firewall is compromised.

Information Storage and Management (ISM) v4

Page 688 © Copyright 2019 Dell Inc.


Introduction to Information Security Lesson

Governance, Risk, and Compliance

Definition: GRC
A term encompassing processes that help an organization to ensure
that their acts are ethically correct and in accordance with their risk
appetite (the risk level an organization chooses to accept), internal
policies, and external regulations.

 GRC work together to enforce policies and minimize risks

Governance Risk Management Compliance

Authority for making Restricting access to Assures policies are


policies certain users being enforced

Notes

GRC should be integrated, holistic, and organization-wide. All operations of an


organization should be managed and supported through GRC. Governance, risk
management, and compliance management work together to enforce policies and
minimize potential risks. To better understand how these three components work
together, consider an example of how GRC is implemented in an IT organization.
Governance is the authority for making policies such as defining access rights to
users based on their roles and privileges. Risk management involves identifying
resources that should not be accessed by certain users in order to preserve
confidentiality, integrity, and availability. In this example, compliance management
assures that the policies are being enforced by implementing controls such as
firewalls and identify management systems.

GRC is an important component of data center infrastructure. Therefore, while


using modern technologies infrastructure organizations must ensure that all
aspects of GRC are deployed that include cloud-related aspects such as ensuring

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 689


Introduction to Information Security Lesson

secured multi-tenancy, the jurisdictions where data should be stored, data privacy,
and ownership.

Information Storage and Management (ISM) v4

Page 690 © Copyright 2019 Dell Inc.


Storage Security Domains and Threats Lesson

Storage Security Domains and Threats Lesson

Introduction

This lesson covers the storage security domains and the key security threats
across domains.

This lesson covers the following topics:


 Storage security domains
 Key security threats across domains

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 691


Storage Security Domains and Threats Lesson

Storage Security Domains and Threats

Storage Security Domains

The information made available on a network is exposed to security threats from


various of sources. Therefore, specific controls must be implemented to secure this
information that is stored on an organization’s storage infrastructure.

The illustration depicts the three security domains of a storage environment.

Management
Access

Backup, Replication, and Archive

Storage Network
Application Secondary Storage
Access

Data
Storage

Storage Security Domains Notes

 To deploy controls, it is important to have a clear understanding of the access


paths leading to storage resources. If each component within the infrastructure
is considered a potential access point, the attack surface of all these access
points must be analyzed to identify the associated vulnerabilities.

Information Storage and Management (ISM) v4

Page 692 © Copyright 2019 Dell Inc.


Storage Security Domains and Threats Lesson

 To identify the threats that apply to a storage infrastructure, access paths to


data storage can be categorized into three security domains: application
access, management access, and backup, replication, and archive.
 To secure the storage environment, identify the attack surface and existing
threats within each of the security domains and classify the threats based on
the security goals—availability, confidentiality, and integrity.

Storage Security Domains Illustration Notes

In the illustration:
 The first security domain involves application access to the stored data through
the storage network. Application access domain may include only those
applications that access the data through the file system or a database
interface.
 The second security domain includes management access to storage and
interconnecting devices and to the data residing on those devices. Management
access, whether monitoring, provisioning, or managing storage resources, is
associated with every device within the storage environment. Most
management software supports some form of CLI, system management
console, or a web-based interface. Implementing appropriate controls for
securing management applications is important because the damage that can
be caused by using these applications can be far more extensive.
 The third domain consists of backup, replication, and archive access. This
domain is primarily accessed by storage administrators who configure and
manage the environment. Along with the access points in this domain, the
backup and replication media also needs to be secured.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 693


Storage Security Domains and Threats Lesson

Key Security Threats Across Domains

 Some of the key security threats across domains are

– Denial of services (DoS)


– Distributed denial of service attack (DDoS)
– Loss of data
– Malicious insiders
– Account hacking
– Shared technology vulnerabilities

Information Storage and Management (ISM) v4

Page 694 © Copyright 2019 Dell Inc.


Storage Security Domains and Threats Lesson

Denial of Services (DoS)

 Prevents legitimate users from accessing resources or services


 Example: Exhausting network bandwidth or CPU cycles
 Could be targeted against compute systems, networks, and storage
resources
 DDoS is a variant of DoS attack
 Several systems launch a coordinated DoS attack on target(s)
 DDoS master program is installed on a compute system
 Master program communicates to agents at designated time
 Agents initiate the attack on receiving the command
 Control measure

 Impose restrictions and limits on resource consumption

Notes

Prevents legitimate users from accessing resources or services. DoS attacks can
be targeted against compute systems, networks, or storage resources in a storage
environment. Always, the intent of DoS is to exhaust key resources, such as
network bandwidth or CPU cycles, thus impacting production use. For example, an
attacker may send massive quantities of data over the network to the storage
system with the intention of consuming bandwidth. This prevents legitimate users
from using the bandwidth and the user may not be able to access the storage
system over the network. Such an attack may be carried out by exploiting
weaknesses of a communication protocol. For example, an attacker may cause
DoS to a legitimate user by resetting TCP sessions. Apart from DoS attack, an
attacker may also carry out Distributed DoS attack.

A Distributed DoS (DDoS) attack is a variant of DoS attack in which several


systems launch a coordinated, simultaneous DoS attack on their target(s). It results
into denial of service to the users of the targeted system(s). In a DDoS attack, the
attacker can multiply the effectiveness of the DoS attack by harnessing the
resources of multiple collaborating systems which serve as attack platforms.
Typically, a DDoS master program is installed on one compute system. Then, at a

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 695


Storage Security Domains and Threats Lesson

designated time, the master program communicates to a number of "agent"


programs installed on compute systems. When the agents receive the command,
they initiate the attack.

The principal control that can minimize the impact of DoS and DDoS attack is to
impose restrictions and limits on the network resource consumption. For example,
when it is identified that the amount of data being sent from a given IP address
exceeds the configured limits, the traffic from that IP address may be blocked. This
provides a first line of defense. Further, restrictions and limits may be imposed on
resources consumed by each compute system, providing an additional line of
defense.

Information Storage and Management (ISM) v4

Page 696 © Copyright 2019 Dell Inc.


Storage Security Domains and Threats Lesson

Loss of Data

 Occurs due to various reasons other than malicious attacks


 Causes of data loss include:
 Accidental deletion by an administrator
 Destruction resulting from natural disasters
 If organization is a service provider then they should publish
 Protection controls deployed for data protection
 Appropriate terms/conditions and penalties related to data loss
 Control measure

 Data backup and replication

Notes

Data loss can occur in a storage environment due to various reasons other than
malicious attacks. Some of the causes of data loss may include accidental deletion
by an administrator or destruction resulting from natural disasters. In order to
prevent data loss, deploying appropriate measures such as data backup or
replication can reduce the impact of such events. Organizations need to develop
strategies that can avoid or at least minimize the data loss due to such events.
Examples of such strategies include choice of backup media, frequency of backup,
synchronous/asynchronous replication, and number of copies.

Further, if the organization is a cloud service provider then they must publish the
protection controls deployed to protect the data stored in cloud. The providers must
also ensure appropriate terms and conditions related to data loss and the
associated penalties as part of the service contract. The service contract should
also include various BC/DR options, such as backup and replication, offered to the
consumers.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 697


Storage Security Domains and Threats Lesson

Malicious Insiders

Definition: Malicious Insiders


An organization’s current or former employee, contractor, or other
business partner who has or had authorized access to an
organization's compute systems, network, or storage.
Source: Computer Emergency Response Team (CERT)

 Intentional misuse of access to negatively impact CIA


 Control measures:

 Strict access control policies


 Security audit and data encryption
 Disable employee accounts immediately after separation
 Segregation of duties (role-based access control)
 Background investigation of candidates before hiring

Notes

Today, most organizations are aware of the security threats posed by outsiders.
Countermeasures such as firewalls, malware protection software, and intrusion
detection systems can minimize the risk of attacks from outsiders. However, these
measures do not reduce the risk of attacks from malicious insiders.

According to Computer Emergency Response Team (CERT), a malicious insider


could be an organization’s current or former employee, contractor, or other
business partner who has or had authorized access to an organization’s compute
systems, network, or storage. These malicious insiders may intentionally misuse
that access in ways that negatively impact the confidentiality, integrity, or
availability of the organization’s information or resources.

For example, consider a former employee of an organization who had access to


the organization’s storage resources. This malicious insider may be aware of
security weaknesses in that storage environment. This is a serious threat because

Information Storage and Management (ISM) v4

Page 698 © Copyright 2019 Dell Inc.


Storage Security Domains and Threats Lesson

the malicious insider may exploit the security weakness. Control measures that can
minimize the risk due to malicious insiders include strict access control policies,
disabling employee accounts immediately after separation from the company,
security audit, encryption, and segregation of duties (role-based access control,
which is discussed later in this module). A background investigation of a candidate
before hiring is another key measure that can reduce the risk due to malicious
insiders.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 699


Storage Security Domains and Threats Lesson

Account Hacking

 Occurs when an attacker gains access to administrator’s/user’s accounts


 Controls measures: multi-factor authentication, IPSec, IDPS, and firewall

Type of attack Description

Phishing  Social engineering attack used to deceive


users
 Carried out by spoofing email containing
link to a fake website
 Users credentials entered on the fake site
are captured

Installing keystroke-logging  Attacker installs malware in administrator’s


malware or user’s compute system
 Malware captures users credentials and
sends to the attacker

Man-in-the-middle  Attacker eavesdrops on the network to


capture credential

Notes

Account hijacking refers to a scenario in which an attacker gains access to an


administrator’s or user’s account(s) using methods such as phishing or installing
keystroke-logging malware on administrator’s or user’s compute systems.

Phishing is an example of a social engineering attack that is used to deceive users.


Phishing attacks are typically carried out by spoofing email – an email with a fake
but genuine-appearing address, which provides a link to a website that
masquerades as a legitimate website. After opening the website, users are asked
to enter details such as their login credentials. These details are then captured by

Information Storage and Management (ISM) v4

Page 700 © Copyright 2019 Dell Inc.


Storage Security Domains and Threats Lesson

the attacker to take over the user’s account. For example, an employee of an
organization may receive an email that is designed to appear as if the IT
department of that organization has sent it. This email may ask the users to click
the link provided in the email and update their details. After clicking the email, the
user is directed to a malicious website where their details are captured.

Another way to gain access to a user’s credentials is by installing keystroke-logging


malware. In this attack, the attacker installs malware in the storage administrator’s
compute system which captures user credentials and sends them to the attacker.
After capturing the credentials, an attacker can use them to gain access to the
storage environment. The attacker may then eavesdrop on the administrator’s
activities and may also change the configuration of the storage environment to
negatively impact the environment.

A “man-in-the-middle” attack is another way to hack user’s credentials. In this


attack, the attacker eavesdrops—overhears the conversation—on the network
channel between two sites when replication is occurring over the network. Use of
multi-factor authentication and IPSec (a suite of algorithms, protocols, and
procedures used for securing IP communications by authenticating and/or
encrypting each packet in a data stream) can prevent this type of attack.

Intrusion detection and prevention systems and firewalls are additional controls that
may reduce the risk of such attacks.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 701


Storage Security Domains and Threats Lesson

Shared Technologies Vulnerabilities

 An attacker may exploit the vulnerabilities of tools used to enable multi-tenant


environments
 Examples of threats:
 Failure of controls that provide separation of memory and storage
 Hyperjacking attack involves installing a rogue hypervisor that takes control
of compute system
 Control measure:

 Examining program memory and processor registers for anomalies

Notes

Technologies that are used to build today’s storage infrastructure provide a multi-
tenant environment enabling the sharing of resources. Multi-tenancy is achieved by
using controls that provide separation of resources such as memory and storage
for each application. Failure of these controls may expose the confidential data of
one business unit to users of other business units, raising security risks.

Compromising a hypervisor is a serious event because it exposes the entire


environment to potential attacks. Hyperjacking is an example of this type of attack
in which the attacker installs a rogue hypervisor that takes control of the compute
system. The attacker now can use this hypervisor to run unauthorized virtual
machines in the environment and carry out further attacks. Detecting this attack is
difficult and involves examining components such as program memory and the
processor core registers for anomalies.

Information Storage and Management (ISM) v4

Page 702 © Copyright 2019 Dell Inc.


Security Controls Lesson

Security Controls Lesson

Introduction

This lesson covers physical security and focuses on key security controls.

This lesson covers the following topics:


 Physical security
 Key security controls

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 703


Security Controls Lesson

Security Controls

Introduction to Security Controls

Any security control should account for three aspects: people, process, and
technology, and the relationships among them.

Security controls can be classified as


 Administrative
 Include security and personnel policies or standard procedures to direct the
safe execution of various operations
 Technical

 Usually implemented through tools or devices deployed on the IT


infrastructure
Technical security controls must be deployed at
 Compute level
 Network level
 Storage level

Information Storage and Management (ISM) v4

Page 704 © Copyright 2019 Dell Inc.


Security Controls Lesson

Key Security Controls

Important security controls include:


 Physical security
 Identity and access management
 Role-based access control
 Firewall
 Intrusion detection and prevention system
 Virtual private network
 Malware protection software
 Data encryption
 Data shredding

Notes

At the compute system level, security controls are deployed to secure hypervisors
and hypervisor management systems, virtual machines, guest operating systems,
and applications. Security at the network level commonly includes firewalls,
demilitarized zones, intrusion detection and prevention systems, virtual private
networks, and VLAN. At the storage level, security controls include data shredding,
and data encryption. Apart from these security controls, the storage infrastructure
also requires identity and access management, role-based access control, and
physical security arrangements.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 705


Security Controls Lesson

Physical Security

Physical security is the foundation of any overall IT security strategy. Strict


enforcement of policies, processes, and procedures by an organization is critical
element of successful physical security.

The physical security measures that are deployed to secure the organization’s
storage infrastructure are:
 Disabling all unused devices and ports
 24/7/365 onsite security
 Biometric or security badge-based authentication to grant access to the facilities
 Surveillance cameras to monitor activity throughout the facility
 Sensors and alarms to detect motion and fire

Information Storage and Management (ISM) v4

Page 706 © Copyright 2019 Dell Inc.


Security Controls Lesson

Identity and Access Management

Definition: Identity and Access Management (IAM)


A process of managing users identifiers, and their authentication and
authorization to access storage infrastructure resources.

 IAM controls access to resources by placing restrictions based on user


identities
 An organization may collaborate with one or more cloud service providers to
access various cloud-based storage services

 Requires deploying multiple authentication systems to enable the


organization to authenticate employees and provide access to cloud-based
storage services.
Organizations may deploy the following authorization and authentication controls:

Control Description Examples

Authorization Restricts accessibility and Windows ACLs, UNIX


sharing of files and permission, and OAuth
folders

Authentication Enables authentication Multi-factor authentication,


amount client and server Kerberos, CHAP, and
OpenID

Notes

The key traditional authentication and authorization controls that are deployed in a
storage environment are Windows ACLs, UNIX permissions, Kerberos, and
Challenge-Handshake Authentication Protocol (CHAP). Alternatively, the
organization can use Federated Identity Management (FIM) for authentication. A
federation is an association of organizations (referred to as trusted parties) that

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 707


Security Controls Lesson

come together to exchange information about their users and resources to enable
collaboration.

Federation includes the process of managing the trust relationships among the
trusted parties beyond internal networks or administrative boundaries. FIM enables
the organizations (especially cloud service providers) to offer services without
implementing their own authentication system. The organization can choose an
identity provider to authenticate their users. This involves exchanging identity
attributes between the organizations and the identity provider in a secure way. The
identity and access management controls used by organizations include OpenID
and OAuth.

Information Storage and Management (ISM) v4

Page 708 © Copyright 2019 Dell Inc.


Security Controls Lesson

OAuth

Definition: OAuth
An open authorization control enables a client to access protected
resources from a resource server on behalf of a resource owner.

Client
1. Authorization Request

Resource Owner

2. Authorization Grant

3. Authorization Grant
Authorization
Server

4. Access Token

5. Access Token

Resource Server

6. Service Request

 Can be used to secure application access domain


 There are four entities that are involved in the authorization control:
 Resource owner
 Resource server
 Client
 Authorization Server
 Example: Giving LinkedIn permission to access your Facebook contacts

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 709


Security Controls Lesson

Notes

The illustration shows the steps involved in OAuth process as described in Request
for Comments (RFC) 6749 published by Internet Engineering Task Force (IETF):
1. The client requests authorization from the resource owner. The authorization
request can be made directly to the resource owner, or indirectly through the
authorization server.
2. The client receives an authorization grant, which is a credential representing the
resource owner's authorization to access its protected resources. It is used by
the client to obtain an access token. Access tokens are credentials that are
used to access protected resources. An access token is a string representing
an authorization issued to the client. The string is usually opaque to the client.
Tokens represent specific scopes and durations of access, granted by the
resource owner, and enforced by the resource server and authorization server.
3. The client requests an access token by authenticating with the authorization
server and presenting the authorization grant.
4. The authorization server authenticates the client and validates the authorization
grant, and if valid, issues an access token.
5. The client requests the protected resource from the resource server and
authenticates by presenting the access token.
6. The resource server validates the access token, and if valid, serves the request.

Information Storage and Management (ISM) v4

Page 710 © Copyright 2019 Dell Inc.


Security Controls Lesson

OpenID

Definition: OpenID
An open standard for authentication in which an organization uses
authentication services from an OpenID provider.

Service Provider
Do not require their own
(Relying Party) authentication control
Step 1

Step 4

Maintains users’ credentials


Step 2
Browser Enables relying parties to authenticate users

User creates an ID with one of the


OpenID providers

Step 3
OpenID Provider

(Identity Provider)

User
Step 1: Login request using OpenID

Step 2: Authentication request is redirected to


OpenID provider

Step 3: Consent to profile sharing

Step 4: Authentication response is redirected to


organization providing services

Notes

The organization is known as the relying party and the OpenID provider is known
as the identity provider. An OpenID provider maintains users credentials on their
authentication system and enables relying parties to authenticate users requesting
the use of the relying party’s services. This eliminates the need for the relying party
to deploy their own authentication systems.

In the OpenID control, a user creates an ID with one of the OpenID providers. This
OpenID then can be used to sign on to any organization (relying party) that accepts
Open ID authentication. This control can be used in the modern environment to
secure application access domain.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 711


Security Controls Lesson

The illustration shows the OpenID concept by considering a user who requires
services from the relying party. For the user to use the services provided by the
relying party, an identity (user ID and password) is required. The relying party does
not provide their own authentication control, however they support OpenID from
one or more OpenID providers. The user can create an ID with the identity provider
and then use this ID with the relying party. The relying party, after receiving the
login request, authenticates it with the help of identity provider and then grants
access to the services.

Information Storage and Management (ISM) v4

Page 712 © Copyright 2019 Dell Inc.


Security Controls Lesson

Multifactor Authentication

User Login

Something you know

First Factor =BH347N12 Username PATSGR

BH347N12459820
Password

Something you have


459820

Token
459820

 Multiple factors for authentication:


 First factor: What a user knows?
o For example, a password
 Second factor: What the user has?
o For example, a token
 Third factor: Who is the user?
o For example, biometric identity
 Access is granted only when all the factors are validated

Notes

Multifactor authentication uses more than one factor to authenticate a user. A


commonly implemented two-factor authentication process requires the user to
supply both something he, or she knows (such as a password) and also something
he or she has (such as a device). The second factor can be a password that is
generated by a physical device (known as token), which is in the user’s
possession. The password that is generated by the token is valid for a predefined
time. The token generates another password after the predefined time is over. To

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 713


Security Controls Lesson

further enhance the authentication process, more factors may also be considered.
Examples of more factors that may be used include biometric identity. A multifactor
authentication technique may be deployed using any combination of these factors.
A user’s access to the environment is granted only when all the required factors are
validated.

Information Storage and Management (ISM) v4

Page 714 © Copyright 2019 Dell Inc.


Security Controls Lesson

Challenge Handshake Authentication Protocol

CHAP is basic authentication control that has been widely adopted by network
devices and compute systems. It provides a method for initiators and targets to
authenticate each other by using a secret code or password.

The figure illustrates the handshake steps that occur between an initiator and a
target:

1. Initiates a login to the target

2. CHAP challenge sent to initiator

VM VM
3. Takes shared secret and calculates value using a one-way hash function

Hypervisor
4. Returns hash value to the target

Compute System
5. Computes the expected hash value from the shared secret and compares the value
iSCSI Storage System
received from initiator

6. If value matches, authentication is acknowledged

Initiator Target

Notes

CHAP secrets are random secrets of 12 to 128 characters. The secret is never
exchanged directly over the communication channel. It is rather, a one-way hash
function that converts it into a hash value, which is then exchanged.

A hash function, using the MD5 algorithm, transforms data in such a way that the
result is unique and cannot be changed back to its original form. If the initiator
requires reverse CHAP authentication, the initiator authenticates the target by
using the same procedure. The CHAP secret must be configured on the initiator
and the target. A CHAP entry, which is composed of the name of a node and the
secret associated with the node, is maintained by the target and the initiator.

The same steps are execute run in a two-way CHAP authentication scenario.After
these steps are completed, the initiator authenticates the target. If both the
authentication steps succeed, then data access is enabled. CHAP is often used

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 715


Security Controls Lesson

because it is a simple protocol to implement and can be implemented across


various disparate systems.

Information Storage and Management (ISM) v4

Page 716 © Copyright 2019 Dell Inc.


Security Controls Lesson

Role-based Access Control

 An approach to restrict access to authorized users based on their respective


roles
 Only those privileges are assigned to a role that are required to perform
tasks associated with that role
 Separation of duties ensures that no single individual can both specify an action
and carry it out

Notes

Role-based access control (RBAC) is an approach to restricting access to


authorized users based on their respective roles. A role may represent a job
function, for example, a storage administrator. Minimum privileges are assigned to
a role that is required to perform the tasks associated with that role.

It is advisable to consider administrative controls, such as separation of duties,


when defining data center security procedures. Clear separation of duties ensures
that no single individual can both specify an action and carry it out. For example,
the person who authorizes the creation of administrative accounts should not be
the person who uses those accounts.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 717


Security Controls Lesson

Firewall and Demilitarized Zone

Definition: Firewall
A security control designed to monitor the incoming and the outgoing
network traffic and compare them to a set of filtering rules.

 Firewall security rules may use various filtering parameters such as source
address, destination address, port numbers, and protocols. The effectiveness of
a firewall depends on how robustly and extensively the security rules are
defined.
 Firewalls can be deployed at:
 Network level
 Compute level
 Hypervisor level
 Uses various parameters for traffic filtering

Definition: Demilitarized Zone


A control to secure internal assets while enabling Internet-based
access to selected resources.

Notes

A network-level firewall is typically used as first line of defense for restricting certain
type of traffic from coming in and going out from a network. This type of firewall is
typically deployed at the entry point of an organization’s network.

At the compute system-level, a firewall application is installed as second line of


defense in a defense-in-depth strategy. This type of firewall provides protection
only to the compute system on which it is installed.

Information Storage and Management (ISM) v4

Page 718 © Copyright 2019 Dell Inc.


Security Controls Lesson

In a virtualized environment, there is an added complexity of virtual machines


running on a smaller number of compute systems. When virtual machines on the
same hypervisor communicate with each other over a virtual switch, a network-
level firewall cannot filter this traffic. In such situations, a virtual firewall can be used
to filter virtual machine traffic.

To reduce the vulnerability and protect the internal resources and applications, the
compute systems or virtual machines that require the Internet access are placed in
a demilitarized zone.

In a demilitarized zone environment, servers that need Internet access are placed
between two sets of firewalls. The servers in the demilitarized zone may or may not
be allowed to communicate with internal resources. Application-specific ports such
as those designated for HTTP or FTP traffic are allowed through the firewall to the
demilitarized zone servers. However, no Internet-based traffic is allowed to go
through the second set of firewalls and gain access to the internal network.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 719


Security Controls Lesson

Intrusion Detection and Prevention System

Definition: Intrusion Detection and Prevention System (IDPS)


A security tool that automates the process of detecting and preventing
events that can compromise the confidentiality, integrity, or availability
of IT resources.

 Signature-based detection technique:


 Scans for signatures to detect an intrusion
 Effective only for known threats
 Anomaly-based detection technique:

 Scans and analyzes events to detect if they are statistically different from
normal events
 Has the ability to detect various events

Notes

Intrusion detection is the process of detecting events that can compromise the
confidentiality, integrity, or availability of IT resources.

An intrusion detection system (IDS) is a security tool that automates the detection
process. An IDS generates alerts, in case anomalous activity is detected. An
intrusion prevention system (IPS) is a tool that has the capability to stop the events
after they have been detected by the IDS. These two controls usually work together
and are generally referred to as intrusion detection and prevention system (IDPS).
The key techniques used by an IDPS to identify intrusion in the environment are
signature-based and anomaly-based detection.

In the anomaly-based detection technique, the IDPS scans and analyzes events to
determine whether they are statistically different from events normally occurring in
the system. This technique can detect various events such as multiple login
failures, excessive process failure, excessive network bandwidth consumed by an

Information Storage and Management (ISM) v4

Page 720 © Copyright 2019 Dell Inc.


Security Controls Lesson

activity, or an unusual number of emails sent by a user, which could signify an


attack is taking place.

The IDPS can be deployed at the compute system, network, or hypervisor levels.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 721


Security Controls Lesson

Virtual Private Network

 Extends a user’s private network across a public network


 Enables to apply internal network’s security and management policies over
the VPN connection
 Two methods to establish a VPN connection:

 Remote access VPN connection


o Remote client initiates a remote VPN connection request
o VPN server authenticates and grants access to organization’s network
 Site-to-site VPN connection
o Remote site initiates a site-to-site VPN connection
o VPN server authenticates and grants access to organization’s network

Notes

In the storage environment, a virtual private network (VPN) can be used to provide
a user, a secure connection to the storage resources. VPN is also used to provide
secure site-to-site connection between a primary site and a DR site when
performing remote replication. VPN can also be used to provide secure site-to-site
connection between an organization’s data center and cloud.

A virtual private network extends an organization’s private network across a public


network such as Internet. VPN establishes a point-to-point connection between two
networks over which encrypted data is transferred. VPN enables organizations to
apply the same security and management policies to the data transferred over the
VPN connection as applied to the data transferred over the organization’s internal
network. When establishing a VPN connection, a user is authenticated before the
security and management policies are applied.

There are two methods in which a VPN connection can be established:


 Remote access VPN connection
 Site-to-site VPN connection

Information Storage and Management (ISM) v4

Page 722 © Copyright 2019 Dell Inc.


Security Controls Lesson

In a remote access VPN connection, a remote client (typically client software


installed on the user’s compute system) initiates a remote VPN connection request.
A VPN server authenticates and provides the user access to the network. This
method can be used by administrators to establish a secure connection to data
center and carry out management operations.

In a site-to-site VPN connection, the remote site initiates a site-to-site VPN


connection. The VPN server authenticates and provides access to internal network.
One typical usage scenario for this method is when deploying a remote replication
or connecting the cloud.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 723


Security Controls Lesson

Malware Protection Software

 Detects, prevents, and removes malware programs


 Common malware detection techniques:
 Signature-based detection
 Heuristics detection
 Protects OS against attacks that modify sensitive areas

 Disallows unauthorized modification of sensitive areas

Notes

Malware protection software is typically installed on a compute system or on a


mobile device to provide protection for the operating system and applications. The
malware protection software detects, prevents, and removes malware and
malicious programs such as viruses, worms, Trojan horses, key loggers, and
spyware. Malware protection software uses various techniques to detect malware.

One of the most common techniques that is used is signature-based detection. In


this technique, the malware protection software scans the files to identify a
malware signature. A signature is a specific bit pattern in a file. These signatures
are cataloged by malware protection software vendors and are made available to
users as updates. The malware protection software must be configured to regularly
update these signatures to provide protection against new malware programs.

Another technique, called heuristics, can be used to detect malware by examining


suspicious characteristics of files. For example, malware protection software may
scan a file to determine the presence of rare instructions or code. Malware
protection software may also identify malware by examining the behavior of
programs. For example, malware protection software may observe program
execution to identify inappropriate behavior such as keystroke capture.

Malware protection software can also be used to protect operating system against
attacks. A common type of attack that is carried out on operating systems is by
modifying its sensitive areas, such as registry keys or configuration files, with the
intention of causing the application to function incorrectly or to fail. This can be
prevented by disallowing the unauthorized modification of sensitive areas by

Information Storage and Management (ISM) v4

Page 724 © Copyright 2019 Dell Inc.


Security Controls Lesson

adjusting operating system configuration settings or through a malware protection


software. In this case, when a modification is attempted, the operating system or
the malware protection software challenges the administrator for authorization.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 725


Security Controls Lesson

Data Encryption

Definition: Data Encryption


A cryptographic technique in which data is encoded and made
indecipherable to eavesdroppers or hackers.

 Enables securing data in-flight and at-rest


 Provides protection from threats, such as data tampering, media theft, and
sniffing attacks
 Data encryption control can be deployed at compute, network, and storage
 Data should be encrypted as close to its origin as possible

Notes

Data encryption is one of the most important controls for securing data in-flight and
at-rest. Data in-flight refers to data that is being transferred over a network and
data at-rest refers to data that is stored on a storage medium. Data encryption
provides protection from threats such as tampering with data which violates data
integrity, media theft which compromises data availability, and confidentiality and
sniffing attacks which compromise confidentiality.

Data should be encrypted as close to its origin as possible. If it is not possible to


perform encryption on the compute system, an encryption appliance can be used
for encrypting data at the point of entry into the storage network. Encryption
devices can be implemented on the fabric to encrypt data between the compute
system and the storage media. These controls can protect both the data at-rest on
the destination device and data in-transit. Encryption can also be deployed at the
storage-level, which can encrypt data-at-rest.

Another way to encrypt network traffic is to use cryptographic protocols such as


Transport Layer Security (TLS) which is a successor to Secure Socket Layer
(SSL). These are application layer protocols and provide an encrypted connection
for client-server communication. These protocols are designed to prevent

Information Storage and Management (ISM) v4

Page 726 © Copyright 2019 Dell Inc.


Security Controls Lesson

eavesdropping and tampering of data on the connection over which it is being


transmitted.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 727


Security Controls Lesson

Data Shredding

Definition: Data Shredding


A process of deleting data or residual representation (sometimes
called remanence) of data and making it unrecoverable.

 Techniques for shredding data stored on tapes:


 Overwriting tapes with invalid data
 Degaussing media
 Destroying media
 Techniques for shredding data stored on disks and flash drives:
 Shredding algorithms
 Shred all copies of data including backup and replicas

Notes

Typically, when data is deleted, it is not made unrecoverable from the storage and
an attacker may use specialized tools to recover it. The threat of unauthorized data
recovery is greater when an organization discards the failed storage media such as
disk drive, solid state drive, or tape. After the organization discards the media, an
attacker may gain access to these media and may recover the data by using
specialized tools.

Organizations can deploy data shredding controls in their storage infrastructure to


protect from loss of confidentiality of their data. Data may be stored on disks or on
tapes. Techniques to shred data stored on tape include overwriting it with invalid
data, degaussing the media (a process of decreasing or eliminating the magnetic
field), and physically destroying the media. Data stored on disk or flash drives can
be shredded by using algorithms that overwrite the disks several times with invalid
data.

Organizations may create multiple copies (backups and replicas) of their data and
store at multiple locations as part of business continuity and disaster recovery

Information Storage and Management (ISM) v4

Page 728 © Copyright 2019 Dell Inc.


Security Controls Lesson

strategy. Therefore, organizations must deploy data shredding controls at all


location to ensure that all the copies are shred.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 729


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This section highlights technologies that are relevant to the topics covered in this
module.

This lesson covers the following topics:


 RSA SecurID
 RSA Security Analytics
 RSA Adaptive Authentication
 RSA Archer Suite
 Dell Change Auditor
 Dell InTrust
 VMware Airwatch
 VMware AppDefense

Information Storage and Management (ISM) v4

Page 730 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts in Practice

Concepts in Practice

RSA SecurID

A two-factor authentication provides an added layer of security to ensure that only


valid users have access to systems and data. RSA SecurID is based on something
a user knows (a password or PIN) and something a user has (an authenticator
device). It provides a much more reliable level of user authentication than reusable
passwords. It generates a new, one-time token code at pre-defined intervals,
making it difficult for anyone other than the genuine user to input the correct token
code at any given time. To access their resources, users combine their secret
Personal Identification Number (PIN) with the token code that is displayed on their
SecurID authenticator device display at that given time. The result is a unique, one-
time password used to assure a user’s identity.

RSA Security Analytics

Helps security analysts detect and investigate threats often missed by other
security tools. Security Analytics provides converged network security monitoring
and centralized security information and event management (SIEM). Security
Analytics combines big data security collection, management, and analytics; full
network and log-based visibility; and automated threat intelligence – enabling
security analysts to better detect, investigate, and understand threats they often
could not easily see or understand before. It provides a single platform for
capturing and analyzing large amounts of network, log, and other data. It also
accelerates security investigations by enabling analysts to pivot through terabytes
of metadata, log data, and recreated network sessions. It archives and analyzes
long-term security data through a distributed computing architecture and provides
built-in compliance reports covering a multitude of regulatory regimes.

RSA Adaptive Authentication

A comprehensive authentication and fraud detection platform. Adaptive


Authentication is designed to measure the risk associated with a user’s login and

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 731


Concepts in Practice Lesson

post-login activities by evaluating a variety of risk indicators. Using a risk and rules-
based approach, the system then requires additional identity assurance, such as
out-of-band authentication, for scenarios that are at high risk and violate a policy.
This methodology provides transparent authentication for organizations that want to
protect users accessing websites and online portals, mobile applications and
browsers, Automated Teller Machines (ATMs), Secure Sockets Layer (SSL), virtual
private network (VPN) applications, web access management (WAM) applications,
and application delivery solutions.

RSA Archer Suite

Allows an organization to build an efficient, collaborative enterprise governance,


risk and compliance program across IT, finance, operations and legal domains.
With RSA Archer Suite, an organization can manage risks, demonstrate
compliance, automate business processes, and gain visibility into corporate risk
and security controls. RSA delivers several core enterprise governance, risk, and
compliance solutions, with the integrated risk management feature of RSA Archer
Platform. Business users can quickly implement risk management processes
leading to improved risk management maturity, more informed decision-making,
and enhanced business performance. It also supports users with the freedom to
tailor the solutions and integrate with multiple data sources through code-free
configuration.

RSA Archer platform is an advanced security management system that provides a


single point of visibility and coordination for physical, virtual, and cloud assets. Its
three layers—controls enforcement, controls management, and security
management—work together to provide a single view of information, infrastructure,
and identities across physical and virtual environments.

Dell Change Auditor

Helps customers to audit, alert, protect and reports user activity and configuration
and application changes against Active Directory and Windows applications. The
software has role-based access, enabling auditors to have access to only the
information they need to quickly perform their job. Change Auditor provides visibility
into enterprise-wide activities from one central console, enabling customers to see
how data is being handled.

Information Storage and Management (ISM) v4

Page 732 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Dell InTrust

An IT data analytics solution that provides the organizations the power to search
and analyze vast amounts of data in one place. It provides real-time insights into
user activity across security, compliance, and operational teams. It helps the
administrators to troubleshoot the issues by conducting security investigations
regardless of how and where the data is stored. It helps the compliance officers to
produce reports validating the compliance across multiple systems. This web
interface quickly provides information on who accessed the data, how was it
obtained and how the data was used. This helps the administrators and security
teams to discover the suspicious event trends.

VMware Airwatch

Enables organizations to address the challenges associated with mobility by


providing a simplified, efficient way to view and manage all devices from the central
administration console. This solution enables to enroll devices in an enterprise
environment, configure and update device settings over-the-air, and secure mobile
devices. AirWatch enables to manage devices including Android™, Apple® iOS,
BlackBerry®, Mac® OS, Symbian® and Windows® devices from a single
administration console. AirWatch enables to gain visibility into the devices
connecting to your enterprise network, content and resources.

Benefits offered by the VMware AirWatch are:


 Manage different types of devices from a single console
 Allow employees to easily enroll their devices
 Enable secure access to corporate resources
 Integrate with existing enterprise infrastructure
 Support employee, corporate-owned and shared devices
 Gain visibility across mobile device deployment

VMware AppDefense

It has an authoritative understanding of how data center endpoints are meant to


behave and provides endpoint security to protect applications running in virtualized
environments. AppDefense understands application's intended state and behavior.
It monitors the changes of intended state that indicate a probable threat.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 733


Concepts in Practice Lesson

App defense ensures security in a data center environment by:


 Supports integration with third parties: The platform such as RSA NetWitness
Suite leverages it for deeper application context within an enterprise’s virtual
data center, response automation/orchestration, and visibility into application
attacks.
 Secures modern application: Security of modern application is guaranteed
through AppDefense by protecting the network and data center endpoints and
also by encrypting the enterprise data at rest.
 Provide automatic response: Uses vSphere and VMware NSX Data Center to
automate the correct response. It automatically blocks process communication,
snapshot an endpoint for forensic analysis, and suspend or shut down the
endpoint.

Information Storage and Management (ISM) v4

Page 734 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Assessment

1. How can you manage vulnerabilities in a modern data center?

A. Installing security controls

B. Maximizing the attack surface

C. Minimizing work factor

D. Avoid patch updates regularly

2. What is the technique used for data shredding?

A. Degaussing media

B. Masking

C. Backup

D. Hardening

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 735


Summary

Summary

Information Storage and Management (ISM) v4

Page 736 © Copyright 2019 Dell Inc.


Storage Infrastructure Management

Introduction

This module focuses on the key functions and processes of the storage
infrastructure management.

Upon completing this module, you will be able to:


 Describe storage infrastructure management and its functions
 Describe key storage infrastructure management processes

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 737


Introduction to Storage Infrastructure Management Lesson

Introduction to Storage Infrastructure Management


Lesson

Introduction

This lesson covers the key characteristics of platform-centric storage infrastructure


management and the key functions of storage infrastructure management.

This lesson covers the following topics:


 List key characteristics of platform-centric storage infrastructure management
 Identify key functions of storage infrastructure management

Information Storage and Management (ISM) v4

Page 738 © Copyright 2019 Dell Inc.


Introduction to Storage Infrastructure Management Lesson

Introduction to Storage Infrastructure Management

What is Storage Infrastructure Management?

Definition: Storage Infrastructure Management


All the storage infrastructure-related functions that are necessary for
the management of the infrastructure components and services, and
for the maintenance of data throughout its lifecycle.

 Aligns storage operations and services to an organization’s strategic business


goal and service level requirements
 Ensures that the storage infrastructure is operated optimally by using as few
resources as needed
 Ensures better utilization of existing infrastructure components

Notes

The key storage infrastructure components are compute systems, storage systems,
and storage area networks (SANs). These components could be physical or virtual
and are used to provide services to the users. The storage infrastructure
management includes all the storage infrastructure-related functions that are
necessary for the management of the infrastructure components and services, and
for the maintenance of data throughout its lifecycle. These functions help IT
organizations to align their storage operations and services to their strategic
business goal and service level requirements. They ensure that the storage
infrastructure is operated optimally by using as few resources as needed. They
also ensure better utilization of existing components, thereby limiting the need for
excessive ongoing investment on infrastructure.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 739


Introduction to Storage Infrastructure Management Lesson

As organizations are driving their IT infrastructure to support modern data center


applications, the storage infrastructure management is also transformed to meet
the application requirements. Management functions are optimized to help an
organization to become a social networking, mobility, big data, or cloud service
provider. This module describes the storage infrastructure management from a
service provider’s perspective.

Information Storage and Management (ISM) v4

Page 740 © Copyright 2019 Dell Inc.


Introduction to Storage Infrastructure Management Lesson

Key Characteristics of Platform-centric Management

Modern data center management functions are different in many ways from the
traditional management and have the following set of distinctive characteristics:
 Service-focused approach
 Software-defined infrastructure-aware
 End-to-end visibility
 Orchestrated operations

Notes

Traditionally, storage infrastructure management is component specific. The


management tools only enable monitoring and management of specific
components(s). This may cause management complexity and system
interoperability issues in a large environment that includes many multi-vendor
components residing in world-wide locations. In addition, traditional management
operations such as provisioning LUNs and zoning are mostly manual. The
provisioning tasks often take days to weeks to complete, due to rigid resource
acquisition process and long approval cycle.

Further, the traditional management processes and tools may not support a service
oriented infrastructure, especially if the requirement is to provide cloud services.
They usually lack the ability to execute management operations in agile manner,
respond to adverse events quickly, coordinate the functions of distributed
infrastructure components, and meet sustained service levels. This component
specific, extremely manual, time consuming, and overly complex management is
simply not appropriate for modern data center infrastructure.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 741


Introduction to Storage Infrastructure Management Lesson

Service-focused Approach

Storage infrastructure management is linked to service requirements and service


level agreement (SLA)

Management functions linked to service requirements and the SLA:


 Determine optimal amount of storage space needed in a storage pool to meet
the capacity requirements of services
 Create a disaster recovery plan to meet the recovery time objective (RTO) of
services
 Ensure that the management processes, management software, and staffing
are appropriate to provide services
 Return services to the users within agreed time period in the event of a service
failure
 Validate changes to the storage infrastructure for creating or modifying a
service

Notes

The storage infrastructure management in a modern data center has a service-


based focus. It is linked to the service requirements and service level agreement
(SLA).

Service requirements cover the services to be created/upgraded, service features,


service levels, and infrastructure components that constitute a service. An SLA is a
formalized contract document that describes service level targets, service support
guarantee, service location, and the responsibilities of the service provider and the
user. These parameters of a service determine how the storage infrastructure will
be managed.

Information Storage and Management (ISM) v4

Page 742 © Copyright 2019 Dell Inc.


Introduction to Storage Infrastructure Management Lesson

Software-Defined Infrastructure-aware

In a platform-centric environment, more value is given to the software-defined


infrastructure management over the traditional physical component-specific
management, including:
 Software-defined infrastructure management is more valued over hardware-
specific management
 Management functions move to external software controller
 Many common, repeatable, hardware-specific management tasks are
automated
 Management is focused on strategic, value-driven activities
 Management operations become independent of underlying hardware

Notes

Management functions are increasingly becoming decoupled from the physical


infrastructure and moving to external software controller. As a result of this shift,
the infrastructure components are managed through the software controller. The
controller usually has a native management tool for configuring components and
creating services. Administrators may also use independent management tools for
managing the storage infrastructure. Management tools interact with the controller
commonly through the application programming interfaces (APIs).

Management through a software controller has changed the way a traditional


storage infrastructure is operated. The software controller automates and abstracts
many common, repeatable, and physical component-specific tasks, thereby
reducing the operational complexity. This allows the administrators to focus on
strategic, value-driven activities such as aligning services with the business goal,
improving resource utilization, and ensuring SLA compliance.

Further, the software controller helps in centralizing the management operations.


For example, an administrator may set configuration settings related to automated
storage tiering, thin provisioning, backup, or replication from the management
console. Thereafter, these settings are automatically and uniformly applied across

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 743


Introduction to Storage Infrastructure Management Lesson

all the managed components that may be distributed across wide locations. These
components may also be proprietary or commodity hardware manufactured by
different vendors. But, the software controller ensures that the management
operations are independent of the underlying hardware.

Information Storage and Management (ISM) v4

Page 744 © Copyright 2019 Dell Inc.


Introduction to Storage Infrastructure Management Lesson

End-to-end Visibility

 Management in modern data center environments provides end-to-end visibility


into the storage infrastructure components and deployed services.
 Provides information on the configuration, connectivity, capacity,
performance, and interrelationships of all components centrally
 Helps in consolidating reports, correlating issues, and tracking movement of
data and services across infrastructure
 End-to-end visibility of a storage infrastructure is provided by specialized
monitoring tools

Notes

The end-to-end visibility of the storage infrastructure enables comprehensive and


centralized management. The administrators can view the configuration,
connectivity, capacity, performance, and interrelationships of all infrastructure
components centrally. Further, it helps in consolidating reports of capacity
utilization, correlating issues in multiple components, and tracking the movement of
data and services across the infrastructure.

Depending on the size of the storage infrastructure and the number of services
involved, the administrators may have to monitor information about hundreds or
thousands of components located in multiple data centers. In addition, the
configuration, connectivity, and interrelationships of components change as the
storage infrastructure grows, applications scale, and services are updated.
Organizations typically deploy specialized monitoring tools that provide end-to-end
visibility of a storage infrastructure on a digital dashboard. In addition, they are
capable of reporting relevant information in a rapidly changing and varying
workload environment.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 745


Introduction to Storage Infrastructure Management Lesson

Orchestrated Operations

Definition: Orchestration
Automated arrangement, coordination, and management of various
system or component functions in a storage infrastructure.

 Management operations are orchestrated as much as possible to provide


business agility
 Reduces time to provide and manage a service
 Reduces risk of manual errors and administration cost
 An orchestrator programmatically integrates and sequences inter-related
component functions into workflows

 Triggers an appropriate workflow upon receiving a request

Notes

Orchestration refers to the automated arrangement, coordination, and management


of various system or component functions in a storage infrastructure. Orchestration,
unlike an automated activity, is not associated with a specific infrastructure
component. Instead, it may span multiple components, located in different locations
depending on the size of a storage infrastructure. In order to sustain in a modern
data center environment, the storage infrastructure management must rely on
orchestration.

Management operations should be orchestrated as much as possible to provide


business agility. Orchestration reduces the time to configure, update, and integrate
a group of infrastructure components that are required to provide and manage a
service. By automating the coordination of component functions, it also reduces the
risk of manual errors and the administration cost.

A purpose-built software, called orchestrator, is commonly used for orchestrating


component functions in a storage infrastructure. The orchestrator provides a library

Information Storage and Management (ISM) v4

Page 746 © Copyright 2019 Dell Inc.


Introduction to Storage Infrastructure Management Lesson

of predefined workflows for executing various management operations. Workflow


refers to a series of inter-related component functions that are programmatically
integrated and sequenced to accomplish a desired outcome. The orchestrator also
provides an interface for administrators or architects to define and customize
workflows. It triggers an appropriate workflow upon receiving a service provisioning
or management request. Thereafter, it interacts with the components as per the
workflow to coordinate and sequence the execution of functions by these
components.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 747


Introduction to Storage Infrastructure Management Lesson

Orchestration Example

The example illustrates an orchestrated operation that creates a block volume for a
compute system.

Orchestrator
Administrator

Get Capacity and Storage Update Provision More Storage to


ConfigurationDetails Available Portal (Wait for StoragePool
in Pool? No Approval)
Create
Volume Yes
Yes
Update Portal (Operation Storage
Management Start available in
in Progress)
Portal pool?

Perform Zoningon SAN Update Portal (Operation


Switch Completed) No
End
SDS Controller Create LUN in Storage Create Logical Volume on
Update Portal (Operation
System ComputeSystem
Failed)

Mask LUN in Storage Perform Bus Rescan on


System ComputeSystem End

Storage Infrastructure

V V
AP AP
O O
VMM
Storage
VMM
Hypervisor Kernel
FC Switch System Interaction
ComputeSystem

Notes

In this example, an administrator logs on to the management portal and initiates the
volume creation operation from the portal. The operation request is routed to the
orchestrator which triggers a workflow, as shown on the slide, to fulfill this request.
The workflow programmatically integrates and sequences the required compute,
storage, and network component functions to create the block volume.

The orchestrator interacts with the software-define storage (SDS) controller to let
the controller to carry out the operation according to the workflow. The SDS
controller interacts with the infrastructure components to enable the execution of
component functions such as zoning, LUN creation, and bus rescan. Through the
workflow, the management portal receives the response on the outcome of the
operation.

Information Storage and Management (ISM) v4

Page 748 © Copyright 2019 Dell Inc.


Introduction to Storage Infrastructure Management Lesson

Storage Infrastructure Management Functions

Storage infrastructure management performs two key functions: infrastructure


discovery and operations management.

Definition: Discovery
A management function that creates an inventory of infrastructure
components and provides information about the components
including their configuration, connectivity, functions, performance,
capacity, availability, utilization, and physical-to-virtual dependencies.

Infrastructure Discovery

 Discovery provides visibility into each infrastructure component


 Discovered information helps in monitoring and management
 Discovery tool interacts and collects information from components
 Discovery is typically scheduled to occur periodically

 May also be initiated by an administrator or triggered by an orchestrator

Operations Management

 Involves on-going management activities to maintain storage infrastructure and


deployed services
 Key processes that support operations management activities are:

 Monitoring
 Configuration management
 Change management
 Capacity management
 Performance management
 Availability management
 Incident management

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 749


Introduction to Storage Infrastructure Management Lesson

 Problem management
 Security management

Notes

Infrastructure discovery provides the visibility needed to monitor and manage the
infrastructure components. Discovery is performed using a specialized tool that
commonly interacts with infrastructure components commonly through the native
APIs of these components. Through the interaction, it collects information from the
infrastructure components.

A discovery tool may be integrated with the software-defined infrastructure


controller, bundled with a management software, or an independent software that
passes discovered information to a management software. Discovery is typically
scheduled by setting an interval for its periodic occurrence. Discovery may also be
initiated by an administrator or be triggered by an orchestrator when a change
occurs in the storage infrastructure.

Operations management involves several management processes. The slide lists


the key processes that support operations management activities. The subsequent
lessons will describe these processes. Ideally, operations management should be
automated to ensure the operational agility. Management tools are usually capable
of automating many management operations. These automated operations are
described along with the management processes. Further, the automated
operations of management tools can also be logically integrated and sequenced
through orchestration.

Information Storage and Management (ISM) v4

Page 750 © Copyright 2019 Dell Inc.


Operations Management

Operations Management

Introduction

This lesson covers monitoring, alerting, and reporting in a storage environment.


This lesson also covers configuration management, change management, capacity
management, performance management, availability management, incident
management, problem management, and security management.

This lesson covers the following topics:


 Explain monitoring, alerting, and reporting
 Describe configuration management and change management
 Explain capacity management and performance management
 Discuss availability management
 Explore incident management and problem management
 Emphasize the importance of security management

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 751


Operations Management

Operations Management

Introduction to Monitoring

 Monitoring provides visibility into the storage infrastructure and forms the basis
for performing management operations
 It helps to

– Track the performance and availability status of components and services


– Measure the utilization and consumption of resources by services
– Track events impacting availability and performance of components and
services
– Generate reports and triggering alerts
– Track environment parameters (HVAC)

Notes

Monitoring forms the basis for performing management operations. Monitoring


provides the performance and availability status of various infrastructure
components and services. It also helps to measure the utilization and consumption
of various storage infrastructure resources by the services. This measurement
facilitates the metering of services, capacity planning, forecasting, and optimal use
of these resources. Monitoring events in the storage infrastructure, such as a
change in the performance or availability state of a component or a service, may be
used to trigger automated routines or recovery procedures.

Such procedures can reduce downtime due to known infrastructure errors and the
level of manual intervention needed to recover from them. Further, monitoring
helps in generating reports for service usage and trends. It also helps to trigger
alerts when thresholds are reached, security policies are violated, and service
performance deviates from SLA. Alerting and reporting are detailed later in this
module. Additionally, monitoring of the data center environment parameters such

Information Storage and Management (ISM) v4

Page 752 © Copyright 2019 Dell Inc.


Operations Management

as heating, ventilating, and air-conditioning (HVAC) helps in tracking any anomaly


from their normal status.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 753


Operations Management

Monitoring Parameters

Storage infrastructure is primarily monitored for:


 Configuration
 Availability
 Capacity
 Performance
 Security

Information Storage and Management (ISM) v4

Page 754 © Copyright 2019 Dell Inc.


Operations Management

Monitoring Configuration

Tracks the configuration changes in a storage infrastructure and their compliance


to the configuration policies

WWN 10:00:00:90:FA:18:0D:CF

WWN 50:06:01:6F:08:60:1E:BD

Zone esx161_vnx_152_1
VM VM

Hypervisor

VM VM

Hypervisor

FC Switch

VM VM

Hypervisor

Compute Systems

Storage Systems

The table lists configuration changes in the storage infrastructure shown in the
image.

Changed At Description Device Compliance


Breach

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 755


Operations Management

2019/01/07 The member 100000051E023364 No


@ 13:34:23 10000090FA180DCF
has been added to the
zone
esx161_vnx_152_1

2019/01/07 The member 100000051E023364 No


@ 13:34:23 5006016F08601EBD
has been added to the
zone
esx161_vnx_152_1

2019/01/07 A new zone 100000051E023364 No


@ 13:34:23 esx161_vnx_152_1 has
been added to the fabric
100000051E023364

Notes

Monitoring configuration involves tracking configuration changes and deployment of


storage infrastructure components and services. It also detects configuration
errors, non-compliance with configuration policies, and unauthorized configuration
changes.

Configuration changes are captured and reported by a monitoring tool in real-time.


In the environment shown by the illustration, a new zone was created to enable a
compute system to access LUNs from one of the storage systems. The changes
were made on the FC switch (device).

Information Storage and Management (ISM) v4

Page 756 © Copyright 2019 Dell Inc.


Operations Management

Monitoring Availability

Identifies the failure of any component or process that may lead to service
unavailability or degraded performance.

The figure illustrates an example of monitoring the availability of storage


infrastructure components.

VM VM
APP APP

OS OS

No redundancy due to switch


Hypervisor
SW1 failure
SW1

VM VM
APP APP

OS OS

Hypervisor

VM VM
APP APP SW2
Storage Systems
OS OS

Hypervisor

Compute Systems

Notes

Availability refers to the ability of a component or a service to perform its desired


function during its specified time of operation. Monitoring availability of hardware
components (for example, a port, an HBA, or a storage controller) or software
component (for example, a database instance or an orchestration software)
involves checking their availability status by reviewing the alerts generated from the
system. For example, a port failure might result in a chain of availability alerts.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 757


Operations Management

A storage infrastructure commonly uses redundant components to avoid a single


point of failure. Failure of a component might cause an outage that affects service
availability, or it might cause performance degradation even though availability is
not compromised. Continuous monitoring for expected availability of each
component and reporting any deviation help the administrator to identify failing
services and plan corrective action to maintain SLA requirements.

The figure illustrates an example of monitoring the availability of storage


infrastructure components, including:

 A storage infrastructure includes three compute systems (H1, H2, and H3) that
are running hypervisors
 All the compute systems are configured with two FC HBAs, each connected to
the production storage system through two FC switches, SW1 and SW2. All the
compute systems share two storage ports on the storage system.
 Multipathing software has also been installed on hypervisor running on all the
three compute systems. If one of the switches, SW1 fails, the multipathing
software initiates a path failover, and all the compute systems continue to
access data through the other switch, SW2.
 Due to absence of redundant switch, a second switch failure could result in
unavailability of the storage system. Monitoring for availability enables detecting
the switch failure and helps administrator to take corrective action before
another failure occurs. In most cases, the administrator receives symptom alerts
for a failing component and can initiate actions before the component fails.

Information Storage and Management (ISM) v4

Page 758 © Copyright 2019 Dell Inc.


Operations Management

Monitoring Capacity

Tracks the amount of storage infrastructure resources used and free.

The figure provides an example that illustrates the importance of monitoring NAS
file system capacity.

Notification: File system is 80% Full

File System Expanded

NAS Notification: File system is 66% Full NAS

Free
Capacity
Free
Free Capacity
Capacity
Free
Capacity

Used Used
Used Capacity Capacity
Capacity
Used
Capacity
NAS File System NAS File System

NAS File NAS File NAS File NAS File


LUNs System System System System LUNs

Time

Notes

Capacity refers to the total amount of storage infrastructure resources available.


Inadequate capacity leads to degraded performance or even service unavailability.
Monitoring capacity involves examining the amount of storage infrastructure
resources used and usable such as the free space available on a file system or a
storage pool, the numbers of ports available on a switch, or the utilization of
allocated storage space to a service.

Monitoring capacity helps an administrator to ensure uninterrupted data availability


and scalability by averting outages before they occur. For example, if 90 percent of
the ports are utilized in a particular SAN fabric, this could indicate that a new switch
might be required if more compute and storage systems need to be attached to the
same fabric. Monitoring usually leverages analytical tools to perform capacity trend

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 759


Operations Management

analysis. These trends help to understand future resource requirements and


provide an estimation of the time required to deploy them.

The figure provides an example that illustrates the importance of monitoring NAS
file system capacity:
 If the file system is full and no space is available for applications to perform
write I/O, it may result in application/service outage
 Monitoring tools can be configured to issue a notification when thresholds are
reached on the file system capacity; for example:
 When the file system reaches 66 percent of its capacity, a warning message
is issued, and a critical message is issued when the file system reaches 80
percent of its capacity
 This enables the administrator to take actions to provision additional LUNs
to the NAS and extend the NAS file system before it runs out of capacity
 Proactively monitoring the file system can prevent service outages caused due
to lack of file system space

Information Storage and Management (ISM) v4

Page 760 © Copyright 2019 Dell Inc.


Operations Management

Monitoring Performance

Evaluates how efficiently the infrastructure components and services are


performing.

The figure provides an example that illustrates the importance of monitoring


performance on iSCSI storage systems.

VM VM
APP APP

H1 OS OS

Hypervisor

Storage Systems
VM VM
SW1
APP APP

H2 OS OS

Hypervisor

VM VM
APP APP

H3 OS OS SW2
Hypervisor

100%
VM VM
APP APP

New Compute OS OS
Port
Systems Hypervisor
Utilization
%

Compute Systems H1 + H2 + H3

Notes

Performance monitoring evaluates how efficiently different storage infrastructure


components and services are performing and helps to identify bottlenecks.
Performance monitoring measures and analyzes behavior in terms of response
time, throughput, and I/O wait time. It identifies whether the behavior of
infrastructure components and services meets the acceptable and agreed

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 761


Operations Management

performance level. This helps to identify performance bottlenecks. It also deals with
the utilization of resources, which affects the way resources behave and respond.

For example, if a VM is experiencing 80 percent of processor utilization


continuously, it suggests that the VM may be running out of processing power,
which can lead to degraded performance and slower response time. Similarly, if the
cache and controllers of a storage system is consistently over utilized, it may lead
to performance degradation.

The figure provides an example that illustrates the importance of monitoring


performance on iSCSI storage systems; in this example:
 Compute systems H1, H2, and H3 (with two iSCSI HBAs each) are connected
to the storage system through Ethernet switches SW1 and SW2
 The three compute systems share the same storage ports on the storage
system to access LUNs
 A new compute system running an application with a high work load must be
deployed to share the same storage port as H1, H2, and H3
 Monitoring storage port utilization ensures that the new compute system does
not adversely affect the performance of the other compute systems

Utilization of the shared storage port is shown by the solid and dotted lines in the
graph. If the port utilization prior to deploying the new compute system is close to
100 percent, then deploying the new compute system is not recommended
because it might impact the performance of the other compute systems. However,
if the utilization of the port prior to deploying the new compute system is closer to
the dotted line, then there is room to add a new compute system.

Information Storage and Management (ISM) v4

Page 762 © Copyright 2019 Dell Inc.


Operations Management

Monitoring Security

Tracks unauthorized access and configuration changes to the storage


infrastructure and services.

This figure illustrates the importance of monitoring security in a storage system.

Workgroup 2 (WG2)

V V
AP AP

O O
S S
Hypervisor

V V V V
AP AP AP AP

O
S
Hypervisor
O
S
O
S
Hypervisor
O
S SW1

WG2

WG1
V V
AP AP
SW2
O O
S S
Hypervisor

Replication
Command
V V V V
AP AP AP AP

O
S
O
S
O
S
O
S
Storage System
Hypervisor Hypervisor Warning: Attempted replication of WG2 devices
by WG1 user - Access denied
Inaccessible
Workgroup 1 (WG1)

Notes

Monitoring a storage infrastructure for security includes tracking unauthorized


access, whether accidental or malicious, and unauthorized configuration changes.
For example, monitoring tracks and reports the initial zoning configuration
performed and all the subsequent changes. Another example of monitoring security
is to track login failures and unauthorized access to switches for performing
administrative changes.

IT organizations typically comply with various information security policies that may
be specific to government regulations, organizational rules, or deployed services.
Monitoring detects all operations and data movement that deviate from predefined

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 763


Operations Management

security policies. Monitoring also detects unavailability of information and services


to authorized users due to security breach. Further, physical security of a storage
infrastructure can also be continuously monitored using badge readers, biometric
scans, or video cameras.

The figure illustrates the importance of monitoring security in a storage system. In


this example:
 The storage system is shared between two workgroups, WG1 and WG2
 The data of WG1 should not be accessible by WG2 and vice versa
 A user from WG1 might try to make a local replica of the data that belongs to
WG2
 If this action is not monitored or recorded, it is difficult to track such a violation of
security protocols
 Conversely, if this action is monitored, a warning message can be sent to
prompt a corrective action or at least enable discovery as part of regular
auditing operations

Information Storage and Management (ISM) v4

Page 764 © Copyright 2019 Dell Inc.


Operations Management

Alerting

Alerts are system-to-user notifications


 Provide information about events or impending threats or issues
 Keep administrators informed on the status of components, processes, and
services
 Trigger when specific situations or conditions are reached

 Conditions may be defined through monitoring tool

Type of Description Example


Alert

Information  Provide useful information  Creation of zone or LUN


 Does not require  Creation of a new storage pool
administrator intervention

Warning  Requires administrative  Storage pool is becoming full


attention  Soft media errors

Fatal  Requires immediate  Storage pool is full


attention  Multiple disk failures in RAID set

Notes

An alert is a system-to-user notification that provides information about events or


impending threats or issues. Alerting of events is an integral part of monitoring.
Alerting keeps administrators informed about the status of various components and
processes – for example, conditions such as failure of power, storage drives,
memory, switches, or availability zone, which can impact the availability of services
and require immediate administrative attention. Other conditions, such as a file
system reaching a capacity threshold, an operation breaching a configuration

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 765


Operations Management

policy, or a soft media error on storage drives, are considered warning signs and
may also require administrative attention.

Monitoring tools enable administrators to define various alerted conditions and


assign different severity levels for these conditions based on the impact of the
conditions. Whenever a condition with a particular severity level occurs, an alert is
sent to the administrator, an orchestrated operation is triggered, or an incident
ticket is opened to initiate a corrective action. Alert classifications can range from
information alerts to fatal alerts. Information alerts provide useful information but do
not require any intervention by the administrator.

The creation of a zone or LUN is an example of an information alert. Warning alerts


require administrative attention so that the alerted condition is contained and does
not affect service availability. For example, if an alert indicates that a storage pool
is approaching a predefined threshold value, the administrator can decide whether
additional storage drives need to be added to the pool. Fatal alerts require
immediate attention because the condition might affect the overall performance or
availability. For example, if multiple disks fail in a RAID set, the administrator must
ensure that it is returned quickly.

As every IT environment is unique, most monitoring systems require initial set-up


and configuration, including defining what types of alerts should be classified as
informational, warning, and fatal. Whenever possible, an organization should limit
the number of truly critical alerts so that important events are not lost amidst
informational messages. Continuous monitoring, with automated alerting, enables
administrators to respond to failures quickly and proactively. Alerting provides
information that helps administrators prioritize their response to events.

Information Storage and Management (ISM) v4

Page 766 © Copyright 2019 Dell Inc.


Operations Management

Reporting

 Involves gathering information from various components or processes and


generating reports
 Reports are displayed like a digital dashboard
 Provides real time tabular or graphical views of monitored information
 Commonly used reports are:

 Capacity planning report


 Configuration and asset management reports
 Chargeback report
 Performance report
 Security breach report

Notes

Like alerting, reporting is also associated with monitoring. Reporting on a storage


infrastructure involves keeping track and gathering information from various
components and processes that are monitored. The gathered information is
compiled to generate reports for trend analysis, capacity planning, chargeback,
performance, and security breaches. Capacity planning reports contain current and
historic information about the utilization of storage, file systems, database
tablespace, ports, etc.

Configuration and asset management reports include details about device


allocation, local or remote replicas, and fabric configuration. This report also lists all
the equipment, with details, such as their purchase date, lease status, and
maintenance records. Chargeback reports contain information about the allocation
or utilization of storage infrastructure resources by various users or user groups.
Performance reports provide current and historical information about the
performance of various storage infrastructure components and services as well as
their compliance with agreed service levels. Security breach reports provide details
on the security violations, duration of breach and its impact.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 767


Operations Management

Reports are commonly displayed like a digital dashboard, which provide real time
tabular or graphical views of gathered information. Dashboard reporting helps
administrators to make instantaneous and informed decisions on resource
procurement, plans for modifications in the existing infrastructure, policy
enforcement, and improvements in management processes.

Information Storage and Management (ISM) v4

Page 768 © Copyright 2019 Dell Inc.


Operations Management

Example – Chargeback Report

The ability to measure storage resource consumption per business unit or user
group and charge them back accordingly.
 To perform chargeback, the storage usage data is collected by a billing system
that generates chargeback report for each business unit or user group
 The billing system is responsible for accurate measurement of the number of
units of storage used and reports cost/charge for the consumed units

The figure shows the assignment of storage resource as services to two business
units, Payroll_1 and Engineering_1, and presents a sample chargeback report.

Payroll_1 Compute 50 GB 50 GB 50 GB
Systems

50 GB 50 GB 50 GB

Production LUN Remote


Local Replica (RAID
(RAID 1) Replica
0)
(RAID 5)

100 GB 100 GB 100 GB

Engineering_1 Compute
100 GB 100 GB 100 GB
Systems

Production Storage System Remote Storage System

Notes

In this example, each business unit is using a set of compute systems that are
running hypervisor. The VMs hosted on these compute systems are used by the
business units. LUNs are assigned to the hypervisor from the production storage
system. Storage system-based replication technology is used to create both local
and remote replicas. A chargeback report documenting the exact amount of
storage resources used by each business unit is created by a billing system. If the
unit for billing is GB of raw storage, the exact amount of raw space (usable capacity
plus protection provided) configured for each business unit must be reported.

Consider that the Payroll_1 unit has consumed two production LUNs, each 50 GB
in size. Therefore, the storage allocated to the hypervisor is 100 GB (50 + 50). The
allocated storage for local replication is 100 GB and for remote replication is also

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 769


Operations Management

100 GB. From the allocated storage, the raw storage configured for the hypervisor
is determined based on the RAID protection that is used for various storage pools.
If the Payroll_1 production LUNs are RAID 1-protected, the raw space used by the
production volumes is 200 GB.

Assume that the local replicas are on unprotected LUNs, and the remote replicas
are protected with a RAID 5 configuration, then 100 GB of raw space is used by the
local replica and 125 GB by the remote replica. Therefore, the total raw capacity
used by the Payroll_1 unit is 425 GB. The total cost of storage provisioned for
Payroll_1 unit will be $2,125 (assume cost per GB of raw storage is $5). The
Engineering_1 unit also uses two LUNs, but each 100 GB in size. Considering the
same RAID protection and per unit cost, the chargeback for the Engineering_1 unit
will be $3,500.

Information Storage and Management (ISM) v4

Page 770 © Copyright 2019 Dell Inc.


Operations Management

Operations Management Processes

 Some of the main processes of operation management include:

– Configuration management
– Change management
– Capacity management
– Performance management
– Availability management
– Incident management
– Problem management
– Security management

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 771


Operations Management

Configuration Management

Goal: Configuration Management


Maintains information about “configuration items (CIs)” that are
required to deliver services.

Key functions:
 Discovers and maintains information on CIs in a configuration management
system (CMS)
 Updates CMS when new CIs are deployed, or CI attributes change

Examples of CI information:
 Attributes of CIs such as CI’s name, manufacturer name, serial number, license
status, version, location, and inventory status
 Used and available capacity of CIs
 Issues linked to CIs
 Inter-relationships among CIs such as service-to-user, storage pool-to-service,
storage system-to-storage pool, and storage system-to-SAN switch

Notes

Configuration management is responsible for maintaining information about


configuration items (CI). CIs are components such as services, process
documents, infrastructure components including hardware and software, people,
and SLAs that need to be managed in order to deliver services. The information
about CIs include their attributes, used and available capacity, history of issues,
and inter-relationships. Examples of CI attribute are the CI’s name, manufacturer
name, serial number, license status, version, description of modification, location,
and inventory status (for example, on order, available, allocated, or retired). The
inter-relationships among CIs in a storage infrastructure commonly include service-
to-user, storage pool-to-service, storage volume-to-storage pool, storage system-

Information Storage and Management (ISM) v4

Page 772 © Copyright 2019 Dell Inc.


Operations Management

to-storage pool, storage system-to-SAN switch, and data center-to geographic


location.

All information about CIs is usually collected and stored by the discovery tools in a
single database or in multiple autonomous databases mapped into a federated
database called a configuration management system (CMS). Discovery tools also
update the CMS when new CIs are deployed or when attributes of CIs change.
CMS provides a consolidated view of CI attributes and relationships, which is used
by other management processes for their operations. For example, CMS helps the
security management process to examine the deployment of a security patch on
VMs, the problem management to resolve a connectivity issue, or the capacity
management to identify the CIs affected on expansion of a storage pool.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 773


Operations Management

Change Management

Goal: Change Management


Standardizes change-related procedures in a storage infrastructure
for prompt handling of all changes with minimal impact on service
quality.

Key function:
 Assesses potential risks of all changes to the CIs and makes a decision to
approve/reject the requested changes

 Low risk, routine, and compliant changes may be approved automatically


through an orchestrated approval process
 All other changes are reviewed by the change management team

Notes

With the changing business requirements, the ongoing changes to the CIs become
almost daily task. Relevant changes could range from the introduction of a new
service, to modification of an existing service’s attributes, to retirement of a service;
from replacing a SAN switch, to expansion of a storage pool, to a software
upgrade, and even to a change in process or procedural documentation. Change
management standardizes change-related procedure in a storage infrastructure to
respond to the changing business requirements in an agile way. It oversees all
changes to the CIs to minimize adverse impact of those changes to the business
and the users of services.

Change management typically uses an orchestrated approval process that helps


making decision on changes in an agile manner. Through an orchestration
workflow, the change management receives and processes the requests for
changes. Changes that are at low risk, routine, and compliant to predefined change
policies go through the change management process only once to determine that
they can be exempted from change management review thereafter. After that,

Information Storage and Management (ISM) v4

Page 774 © Copyright 2019 Dell Inc.


Operations Management

these requests are typically treated as service requests and approved


automatically. All other changes are presented for review to the change
management team. The change management team assesses the potential risks of
the changes, prioritizes, and makes a decision on the requested changes.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 775


Operations Management

Capacity Management

Goal: Capacity Management


Ensures that a storage infrastructure is able to meet the required
capacity demands for services in a cost effective and timely manner.

Key functions:
 Determines optimal amount of storage needed to meet SLA
 Maximizes capacity utilization without impacting service levels
 Establishes capacity consumption trends and plans for additional capacity

Examples of capacity management activities:


 Adding new nodes to a scale-out NAS cluster or an object-based storage
system
 Enforcing capacity quotas for users
 Expanding a storage pool and setting a threshold for maximum utilization
 Forecasting usage of file system, LUN, and storage pool
 Removing unused resources from a service and reassigning those to another

Notes

Capacity management ensures adequate availability of storage infrastructure


resources to provide services and meet SLA requirements. It determines the
optimal amount of storage required to meet the needs of a service regardless of
dynamic resource consumption and seasonal spikes in storage demand. It also
maximizes the utilization of available capacity and minimizes spare and stranded
capacity without compromising the service levels.

Capacity management tools are usually capable of gathering historical information


on storage usage over a specified period of time, establishing trends on capacity
consumption, and performing predictive analysis of future demand. This analysis

Information Storage and Management (ISM) v4

Page 776 © Copyright 2019 Dell Inc.


Operations Management

serves as input to the capacity planning activities and enables the procurement and
provisioning of additional capacity in the most cost effective and least disruptive
manner.

Adding new nodes to a scale-out NAS cluster or an object-based storage system is


an example of capacity management. Addition of nodes increases the overall
processing power, memory, or storage capacity. Enforcing capacity quotas for
users is another example of capacity management. Provisioning a fixed amount of
space for their files restricts users from exceeding the allocated capacity. Other
examples include creating and expanding a storage pool, setting a threshold for the
maximum utilization and amount of oversubscription allowed for each storage pool,
forecasting the usage of file system, LUN, and storage pool, and removing unused
resources from a service for their reassignment to another resource-crunched
service.

Capacity management team uses several methods to maximize the utilization of


capacity. Some of the common methods are over-commitment of processing power
and memory, data deduplication and compression, automated storage tiering, and
use of converged network such as an FCoE SAN.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 777


Operations Management

Capacity Management Example

This example illustrates the expansion of a NAS file system using an orchestrated
workflow. The file system is expanded to meet the capacity requirement of a
compute cluster that accesses the file system.

Administrator
Orchestration

Start
Change Management
Yes
Expand File System to a Approval Review and Approve/Reject Change
Request for Approval
Required? Request
Specific Size

No
Yes
Management Portal Request
Expand File System Approved?

SDS Controller
No
Add Required Capacity to File System
Discover and Update CMS
Update Portal (Operation
Rejected)
Configuration Management

Upload CMS Update Portal (Operation


Completed)
End Interaction
End

Notes

In the example, an administrator initiates a file system expansion operation from


the management portal. The operation request is transferred to the orchestrator
that triggers a change approval and execution workflow. The orchestrator
determines whether the request for change needs to be reviewed by change
management team. If the request is preapproved, it is exempted from change
management review. If not, the orchestrated workflow ensures that the change
management team reviews and approves/rejects the request.

If the file system expansion request is approved, the orchestrator interacts with the
SDS controller to invoke the expansion. Thereafter, the SDS controller interacts
with the storage infrastructure components to add the required capacity to the file
system. The orchestrated workflow also invokes the discovery operation which
updates the CMS with information on the modified file system size. The
orchestrator responds by sending updates to the management portal appropriately
following completion or rejection of the expansion operation.

Information Storage and Management (ISM) v4

Page 778 © Copyright 2019 Dell Inc.


Operations Management

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 779


Operations Management

Performance Management

Goal: Performance Management


Monitors, measures, analyzes, and improves the performance of
storage infrastructure and services.

Key functions:
 Measures and analyzes the response time and throughput of components
 Identifies components that are performing below the expected level
 Makes configuration changes to optimize performance and address issues

Examples of performance management activities:


 Tuning database design, resource allocation to VMs, and multipathing
 Adding new ISLs and aggregating links to eliminate bottleneck
 Separating sequential and random I/Os to different spindles
 Changing storage tiering policy and cache configuration

Notes

Performance management ensures the optimal operational efficiency of all


infrastructure components so that storage services can meet or exceed the
required performance level. Performance-related data such as response time and
throughput of components are collected, analyzed, and reported by specialized
management tools. The performance analysis provides information on whether a
component meets the expected performance levels. These tools also proactively
alert administrators about potential performance issues and may prescribe a
course of action to improve a situation.

Performance management team carries out several activities to address


performance-related issues and improve the performance of the storage
infrastructure components. For example, to optimize the performance levels,

Information Storage and Management (ISM) v4

Page 780 © Copyright 2019 Dell Inc.


Operations Management

activities on the compute system include fine-tuning the volume configuration,


database design or application layout, resource allocation to VMs, workload
balancing, and multipathing configuration. The performance management tasks on
a SAN include implementing new ISLs and aggregating links in a multiswitch fabric
to eliminate performance bottleneck. The storage system-related tasks include
separating sequential and random I/Os to different spindles, selecting an
appropriate RAID type for a storage pool, and changing storage tiering policy and
cache configuration, when the performance management is concerned.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 781


Operations Management

Availability Management

Goal: Availability Management


Ensures that the availability requirements of all the components and
services are consistently met.

Key functions:
 Establishes guideline to meet stated availability levels at a justifiable cost
 Identifies availability-related issues and areas for improvement
 Proposes changes in existing BC solutions or architects new BC solutions

Examples of availability management activities


 Deploying redundant, fault tolerant, and hot-swappable components
 Deploying compute cluster, fault resilient applications, and multipathing
software
 Designing multiple availability zones for automated service failover
 Planning and architecting data backup and replication solutions

Notes

Availability management is responsible for establishing a proper guideline based on


the defined availability levels of services. The guideline includes the procedures
and technical features required to meet or exceed both current and future service
availability needs at a justifiable cost. Availability management also identifies all
availability-related issues in a storage infrastructure and areas where availability
must be improved. The availability management team proactively monitors whether
the availability of existing services and components is maintained within acceptable
and agreed levels. The monitoring tools also help administrators to identify the gap
between the required availability and the achieved availability. With this
information, the administrators can quickly identify errors or faults in the
infrastructure components that may cause future service unavailability.

Information Storage and Management (ISM) v4

Page 782 © Copyright 2019 Dell Inc.


Operations Management

Based on the service availability requirements and areas found for improvement,
the availability management team may propose new business continuity (BC)
solutions or changes in the existing BC solutions. For example, when a set of
compute systems is deployed to support a service or any critical business function,
it requires high availability. The availability management team proposes
redundancy at all levels, including components, data, or even site levels. This is
generally accomplished by deploying two or more HBAs per system, multipathing
software, and compute clustering.

The compute systems must be connected to the storage systems using at least two
independent fabrics and switches that have built-in redundancy and hot-swappable
components. The VMs running on these compute systems must be protected from
hardware failure/unavailability through VM failover mechanisms. Deployed
applications should have built-in fault resiliency features. The storage systems
should also have built-in redundancy for various components and should support
local and remote replication. RAID-protected LUNs should be provisioned to the
compute systems using at least two front-end ports. In addition, multiple availability
zones may be created to support fault tolerance at the site level.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 783


Operations Management

Incident Management

Goal: Incident Management


Returns services to users as quickly as possible when unplanned
events, called ‘incidents’, interrupt services or degrade service quality.

Key functions:
 Detects and records all incidents in a storage infrastructure
 Investigates incidents and provides solutions to resolve the incidents
 Documents incident history

The table provides a sample list of incidents that are captured by an incident
management tool.

Severit Event Device Priorit Statu Last Owne Escalatio


y Summar y s Updated r n
y

Fatal Pool A Storag None New 2019/01/0 - No


usage is e 7
95% system 12:38:34
1

Fatal Database DB High WIP 2019/01/0 L. Support


1 is down server 7 John Group 2
1 10:11:03

Warning Port 3 Switch Mediu WIP 2019/01/0 P. Kim Support


utilization A m 7 Group 1
is 85% 09:48:14

Information Storage and Management (ISM) v4

Page 784 © Copyright 2019 Dell Inc.


Operations Management

Notes

An incident is an unplanned event such as an HBA failure or an application error


that may cause an interruption to services or degrade the service quality. Incident
management is responsible for detecting and recording all incidents in a storage
infrastructure. It investigates the incidents and provides appropriate solutions to
resolve the incidents. It also documents the incident history with details of the
incident symptoms, affected services, components and users, time to resolve the
incident, severity of the incident, description of the error, and the incident resolution
data. The incident history is used as an input for problem management (described
next).

Incidents are commonly detected and logged by incident management tools. They
also help administrators to track, escalate, and respond to the incidents from their
initiation to closure. Incidents may also be registered by the users through a self-
service portal, emails, or a service desk. The service desk may consist of a call
center to handle a large volume of telephone calls and a help desk as the first line
of service support. If the service desk is unsuccessful in providing solutions against
the incidents, they are escalated to other incident management support groups or
to problem management.

The incident management support groups investigate the incidents escalated by


the incident management tools or service desk. They provide solutions to bring
back the services within an agreed timeframe specified in the SLA. If the support
groups are unable to determine and correct the root cause of an incident, error-
correction activity is transferred to problem management. In this case, the incident
management team provides a temporary solution (workaround) to the incident; for
example, migration of a storage service to a different storage pool in the same data
center or in a different data center. During the incident resolution process, the
affected users are kept apprised of the incident status.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 785


Operations Management

Problem Management

Goal: Problem Management


Prevents incidents that share common symptoms or root causes from
reoccurring and minimizes the adverse impact of incidents that cannot
be prevented.

Key functions:
 Reviews incident history to detect problems in a storage infrastructure
 Identifies the underlying root cause that creates a problem
 Integrated incident and problem management tools may mark specific
incidents as problem and perform root cause analysis
 Provides most appropriate solution/preventive remediation for problems
 Analyzes and solves errors proactively before they become an incident/problem

Notes

A problem is recognized when multiple incidents exhibit one or more common


symptoms. Problems may also be identified from a single significant incident that is
indicative of a single error for which the cause is unknown, but the impact is high.
Problem management reviews all incidents and their history to detect problems in a
storage infrastructure. It identifies the underlying root cause that creates a problem
and provides the most appropriate solution and/or preventive remediation for the
problem. If complete resolution is not available, problem management provides
solutions to reduce or eliminate the impact of a problem. In addition, the problem
management proactively analyzes errors and alerts in the storage infrastructure to
identify impending service failures or quality degradation. It solves errors before
they turn out to be an incident or a problem.

Incident and problem management, although separate management processes,


require automated interaction between them and use integrated incident and
problem management tools. These tools may help an administrator to track and

Information Storage and Management (ISM) v4

Page 786 © Copyright 2019 Dell Inc.


Operations Management

mark specific incident(s) as a problem and transfer the matter to problem


management for further investigation. Alternatively, these tools may automatically
identify incidents that are most likely to require root cause analysis. Further, these
tools may have analytical ability to perform root cause analysis based on various
alerts. They search alerts that are indicative of problems and correlate these alerts
to find the root cause. This helps to resolve problems more quickly.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 787


Operations Management

Security Management

Goal: Security Management


Prevents occurrence of incidents/activities adversely affecting
confidentiality, integrity, and availability of information and meets
regulatory/compliance requirements for protecting information at
reasonable/acceptable costs.

Key functions:
 Develops information security policies
 Deploys required security architecture, processes, mechanisms, and tools

Examples of security management activities:


 Managing user accounts and access policies that authorize users to use a
service
 Deploying controls at multiple levels (defense in depth) to access data and
services
 Scanning applications and databases to identify vulnerabilities
 Configuring zoning, LUN masking, and data encryption services

Notes

Security management ensures the confidentiality, integrity, and availability of


information in a storage infrastructure. It prevents the occurrence of security-related
incidents or activities that adversely affect the infrastructure components,
management processes, information, and services. It also meets regulatory or
compliance requirements (both internal and external) for protecting information at
reasonable/acceptable costs. External compliance requirements include adherence
to the legal frameworks such as U.K. Data Protection Act 1998, U.K. Freedom of
Information Act 2000, U.S. Health Insurance Portability and Accountability Act
1996, and EU Data Protection Regulation. Internal regulations are imposed based

Information Storage and Management (ISM) v4

Page 788 © Copyright 2019 Dell Inc.


Operations Management

on an organization’s information security policies such as access control policy,


bring-your-own-device (BYOD) policy, and policy on the usage of cloud storage.

Security management is responsible for developing information security policies


that govern the organization’s approach towards information security management.
It establishes the security architecture, processes, mechanisms, tools, user
responsibilities, and standards needed to meet the information security policies in a
cost-effective manner. It also ensures that the required security processes and
mechanisms are properly implemented.

Security management team performs various activities to prevent unauthorized


access and security breaches in a storage infrastructure. For example, the security
management team manages the user accounts and access policies that authorize
users to use a service. Further, the access to data and services is controlled at
multiple levels (defense in depth) reducing the risk of a security breach if a
protection mechanism at one level gets compromised. Applications and databases
are also scanned periodically to identify vulnerabilities and provide protection
against any threats. The security management activities in a SAN include
configuration of zoning to restrict an unauthorized HBA from accessing specific
storage system ports and providing mechanisms to transport encrypted data.
Similarly, the security management task on a storage system includes LUN
masking that restricts a compute system from accessing a defined set of LUNs.

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 789


Concepts in Practice Lesson

Concepts in Practice Lesson

Introduction

This lesson covers the following topics:


 Dell EMC SRM
 Dell EMC Service Assurance Suite
 Dell EMC CloudIQ
 vRealize Operations
 vRealize Orchestrator

Information Storage and Management (ISM) v4

Page 790 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Concepts In Practice

Concepts in Practice

Dell EMC SRM

A management software for automated monitoring and reporting of both traditional


and software-defined storage infrastructure. It provides visibility to the relationships
and topology from applications hosted on virtual or physical machines down to the
LUNs. It also enables administrators to analyze performance trends, capacity
utilization, and configuration compliance. With this insight, it helps administrators to
optimize storage capacity through the alignment of application workload to the right
storage tier, capacity planning, and chargeback reporting.

Dell EMC Service Assurance Suite

Offers a combination of management tools, including Smarts and M&R (formerly


known as Watch4net), to perform IT operations in a software-defined data center. It
discovers infrastructure components and details information about each one,
including configuration and inter-relationship among components. It detects and
correlates events related to availability, performance, and configuration status of
infrastructure components that may occur due to problems. It also identifies the
root causes of the problems and risk conditions. By quickly finding the root causes
and risks, it helps administrators to proactively resolve issues before they impact
the services levels.

Dell EMC CloudIQ

A no cost cloud-native application that leverages Machine Learning to proactively


monitor and measure the overall health of storage systems through intelligent,
comprehensive, and predictive analytics. The easiest way to describe CloudIQ is
that it is like a fitness tracker for your storage environment, providing a single,
simple, display to monitor and predict the health of your storage environment.
CloudIQ makes it simple to track storage health, report on historical trends, plan for

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 791


Concepts in Practice Lesson

future growth, and proactively discover and re-mediate issues from any browser or
mobile device.

vRealize Operations

A management tool that automates some of the key management operations in a


storage infrastructure. It identifies potential performance, capacity, and
configuration issues and helps remediate those issues before they become
problems. It optimizes the usage of capacity and performs capacity trend analysis.
It also collects configuration data, verifies configuration compliance with predefined
policies, and recommends/triggers necessary actions to remediate policy breaches.
This enables organizations to enforce and maintain the conformance with
configuration standards, regulatory requirements, and security hardening
guidelines. Further, it provides end-to-end visibility across storage infrastructure
components including application-to-component mapping in a single console.

vRealize Orchestrator

Orchestration software that helps to automate and coordinate the service delivery
and operational functions in a storage infrastructure. It comes with a built-in library
of pre-defined workflows as well as a drag-and-drop feature for linking actions
together to create customized workflows. These workflows can be launched from
the VMware vSphere client, from various components of VMware vCloud Suite, or
through various triggering mechanisms. vRealize Orchestrator can execute
hundreds or thousands of workflows concurrently.

Information Storage and Management (ISM) v4

Page 792 © Copyright 2019 Dell Inc.


Concepts in Practice Lesson

Assessment

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 793


Summary

Summary

Information Storage and Management (ISM) v4

Page 794 © Copyright 2019 Dell Inc.


Course Conclusion

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 795


Information Storage and Management (ISM) v4

Information Storage and Management (ISM) v4

Summary

This course covered modern technologies that drive a digital transformation,


including: cloud, big data, IoT, and machine learning, as well as modern data
center infrastructure and its elements. It also detailed intelligent storage systems
and their types, including: file, block, and object. It also listed various storage
networking technologies and their deployment as well as software-defined storage
and networking. Business continuity was also covered as well as data protection
solutions such as: replication, backup, and archiving. The course also detailed
storage infrastructure security and management processes.

Information Storage and Management (ISM) v4

Page 796 © Copyright 2019 Dell Inc.


Information Storage and Management (ISM) v4

Summary

Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 797


Information Storage and Management (ISM) v4

© Copyright 2019 Dell Inc. Page 798

You might also like