You are on page 1of 11

Digital Data

A collection of facts that is transmitted and stored in electronic form, and processed through software.

Types of Digital Data

Information
Processed data that is presented in a specific context to enable useful interpretation and decision-making.
Information is stored on storage devices on non-volatile media

Data Center
A facility that houses IT equipment including compute, storage, and network components, and other supporting
infrastructure for providing centralized data-processing capabilities.

Key Data Center Management Processes

First Platform
Based on mainframes Applications and databases hosted centrally Users connect to mainframes through terminals
Second Platform
Based on client-server model Distributed application architecture Servers receive and process requests for resources
from clients Users connect through a client program or a web interface
Third Platform cloud, big data, Mobile, Social

1
Cloud Computing
A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing
resources, (e.g., servers, storage, networks, applications, and services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction.
A cloud is a collection of network-accessible hardware and software resources

Cloud Infrastructure

On-demand self-service
Broad Network Access
Resource Pooling
Rapid Elasticity
Measured Service

Cloud Service Models

Infrastructure as a Service

The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing
resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and
applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating
systems, storage, and deployed applications; and possibly limited control of select networking components, (e.g., host
firewalls).

Platform as a Service

The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired
applications created using programming languages, libraries, services, and tools supported by the provider. The consumer
does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage,
but has control over the deployed applications and possibly configuration settings for the application-hosting environment

Software as a Service

The capability provided to the consumer is to use the providers applications running on a cloud infrastructure. The
applications are accessible from various client devices through either a thin client interface, such as a web browser, (e.g.,
web-based email, or a program interface. The consumer does not manage or control the underlying cloud infrastructure
including network, servers, operating systems, storage, or even individual application capabilities, with the possible
exception of limited user specific application configuration settings.

Cloud Deployment Models


Public, Private, Community and Hybrid Clouds

Big Data
Information assets whose high volume, high velocity, and high variety require the use of new technical architectures and
analytical methods to gain insights and for deriving business value.

Characteristics of Big Data


Volume, Velocity, Variety, Variability, Veracity, Value

2
Components of a Big Data Analytics Solution

Map Reduce _ Parallel computation across many servers


Query _ efficient way to process, store, and retrieve data
Storage _ Distributed architecture
Storage systems consist of multiple nodes collectively called a cluster

Benefits of virtualization: Resource consolidation and multitenant environment

Layers

Physical Infrastructure _ Foundation layer of the data center infrastructure


Virtual Infrastructure _ Virtualization abstracts physical resources and creates virtual resources
Software-defined Infrastructure _ Deployed either on virtual layer or on physical layer
Orchestration _ Provides workflows for executing automated tasks
Services _ Delivers IT resources as services to users
Business Continuity _ Enables ensuring the availability of services in line with SLA
Security _ Supports all the layers to provide secure services
Management _ Enables storage infrastructure configuration and capacity provisioning

Computer System
A computing platform (hardware and system software) that runs applications

Compute Virtualization
The technique of abstracting the physical compute hardware from the operating system and applications
enabling multiple operating systems to run concurrently on a single or clustered physical compute system(s).

Hypervisor
Software that provides a virtualization layer for abstracting compute system hardware, and enables the creation of
multiple virtual machines.

Two key components:


Hypervisor kernel
Provides functionality similar to an OS kernel
Presents resource requests to physical hardware
Virtual machine manager (VMM)
Each VM is assigned a VMM
Abstracts physical hardware and presents to VM
Two types of hypervisor: bare-metal and hosted

VM Files

3
Application Virtualization
The technique of decoupling an application from the underlying computing platform (OS and hardware) to enable the
application to be used on a computer system without installation.

Techniques
Application encapsulation
Application is converted into a standalone, self-contained executable package
Application packages may run directly from local drive, USB, or optical disc
Application presentation
Application is hosted and executes remotely, and the applications UI data is transmitted to client
Locally-installed agent on the client manages the exchange of UI information with users remote application session
Application streaming
Application-specific data is transmitted in portions to clients for local execution
Requires locally-installed agent, client software, or web browser plugin

Desktop Virtualization
Technology that decouples the OS, applications, and user state from a physical compute system to create a virtual
desktop environment that can be accessed from any client device.

Storage virtualization
Abstracts physical storage resources to create virtual storage resources:
Virtual volumes
Virtual disk files
Virtual storage systems

Network Virtualization
Abstracts physical network resources to create virtual network resources:
Virtual switch
Virtual LAN
Virtual SAN

Software-Defined Data Center (SDDC)


An architectural approach to IT infrastructure that extends virtualization concepts such as abstraction, pooling, and
automation to all of the data centers resources and services to achieve IT as a service.

Software-Defined Controller
Discovers underlying resources and provides an aggregated view of resources
Abstracts the underlying hardware resources and pools them
Enables the rapid provisioning of resources based on pre-defined policies

Benefits of Software-Defined Architecture


Agility, Cost efficiency, improved control, Centralized management, Flexibility

4
Week 2

Intelligent Storage System


A feature-rich RAID array that provides highly optimized I/O processing capabilities.

Components of Intelligent Storage System


Two key components of an ISS
Controller Block-based File-based Object-based Unified
Storage All HDDs All SSDs Combination of both

Disk service time = seek time + rotational latency + data transfer time

Seek Time
Time taken to position the read/write head

Rotational Latency
The time taken by the platter to rotate and position the data under the R/W head

RAID
A technique that combines multiple disk drives into a logical unit (RAID set) and provides protection, performance, or
both.

RAID Levels
Commonly used RAID levels are:
RAID 0 Striped set with no fault tolerance
RAID 1 Disk mirroring
RAID 1 + 0 Nested RAID
RAID 3 Striped set with parallel access and dedicated parity disk
RAID 5 Striped set with independent disk access and a Distributed parity
RAID 6 Striped set with independent disk access and dual Distributed parity

5
RAID Techniques
*Striping_ Disk striping is the process of dividing a body of data into blocks and spreading the data blocks across multiple
storage devices, such as hard disks or solid-state drives (SSDs).

*Mirroring_ Disk mirroring, also known as RAID 1, is the replication of data to two or more disks. Disk mirroring is a good
choice for applications that require high performance and high availability, such as transactional applications, email and
operating systems.

*Parity_ In computers, parity (from the Latin parties, meaning equal or equivalent) is a technique that checks whether
data has been lost or written over when it is moved from one place in storage to another or when it is transmitted
between computers.

Types of Intelligent Storage Systems


Block-based storage systems
File-based storage systems
Object-based storage systems
Unified storage systems

Scale-up Vs. Scale-out Architecture

Scale-up can solve a capacity problem without adding infrastructure elements such as network connectivity. However, it
does require additional space, power, and cooling. Scaling up does not add controller capabilities to handle additional host
activities. That means it doesnt add costs for extra control functions either.

Scale-out storage usually requires additional storage (called nodes) to add capacity and performance. Or in the case of
monolithic storage systems, it scales by adding more functional elements (usually controller cards).One difference
between scaling out and just putting more storage systems on the floor is that scale-out storage continues to be
represented as a single system.

What is a Block-based Storage System?


In a block level storage device, raw storage volumes are created, and then the server-based operating system
connects to these volumes and uses them as individual hard drives. ... File level storage devices are often used
to share files with users.

6
Cache Management: Algorithms
Least recently used (LRU) Discards data that have not been accessed for a long time
Most recently used (MRU) Discards data that have been most recently accessed

Cache Management: Watermarking


Three modes of flushing to manage cache utilization are:
Idle flushing
High watermark flushing
Forced flushing

Cache Data Protection


Cache mirroring
Provides protection to data against cache failure
Each write to the cache is held in two different memory locations on two independent memory cards
Cache vaulting
Provides protection to data against power failure
In the event of power failure, uncommitted data is dumped to a dedicated set of drives called vault drives

Storage Provisioning
The process of assigning storage resources to compute system based on capacity, availability, and performance
requirements.

Can be performed in two ways: Traditional storage provisioning


Virtual storage provisioning

MetaLUN
A method to expand LUNs that require additional capacity or performance.

MetaLUNs can either be concatenated or striped

Traditional Provisioning Vs. Virtual Provisioning


Virtual Storage Provisioning. It is also known as Thin Provisioning. Virtual provisioning enables creating and presenting a
LUN with more capacity than is physically allocated to it on the storage system. The LUN created using virtual
provisioning is called a thin LUN to distinguish it from the traditional LUN.

LUN Masking
A process that provides data access control by defining which LUNs a computer system can access.

Storage Tiering
A technique of establishing a hierarchy of storage types and identifying the candidate data to relocate to the appropriate
storage type to meet service level requirements at a minimal cost.

Tiering in block-based storage systems


1-LUN and sub-LUN tiering 2 Cache tiering 3 Server flash-caching

7
LUN and Sub-LUN Tiering
LUN tiering Moves entire LUN from one tier to another
Does not give effective cost and performance benefits

Sub-LUN tiering A LUN is broken down into smaller Segments and tiered at that level
Provides effective cost and Performance benefits

Cache Tiering
Enables creation of a large capacity secondary cache using SSDs
Enables tiering between DRAM cache and SSDs (secondary cache)
Most reads are served directly from high performance tiered cache

Benefits
Enhances performance during peak workload
Non-disruptive and transparent to applications

Server Flash-caching Technology


Uses intelligent caching software and PCIe flash card on compute system
Dramatically improves application performance
Provides performance acceleration for read intensive workloads
Avoids network latencies associated with I/O access to the storage system

What is NAS __An IP-based, dedicated, high-performance file sharing and storage device.

Enables NAS clients to share files over IP network


Uses specialized operating system that is optimized for file I/O
Enables both UNIX and Windows users to share data

General Purpose Servers vs. NAS Devices

Components of NAS System


Controller/NAS head consists of:
CPU, memory, network adaptor, and so on
Specialized operating systems installed
Storage
Supports different types of storage devices
Scalability of the components depends on NAS architecture Scale-up NAS Scale-out NAS

8
Scale-up NAS
Provides capability to scale capacity and performance of a single NAS system
NAS systems have a fixed capacity ceiling
Performance may degrade after reaching the capacity limit

Scale-out NAS
Pools multiple nodes in a cluster to work as a single NAS device
Scales performance and/or capacity no disruptively
Creates a single file system that runs on all nodes in the cluster
Clients, connected to any node, can access the entire file system
File system grows dynamically as nodes are added

NAS File Access Methods


Common Internet File System (CIFS) _ Enables clients to access files that are on a server over TCP/IP
Network File System (NFS) _ Enables clients to access files that are on a server
Uses Remote Procedure Call (RPC) mechanism to provide Access to remote file system
Currently, three versions of NFS are in use:
NFS v2 is stateless and uses UDP as transport layer protocol
NFS v3 is stateless and uses UDP or optionally TCP as transport layer protocol
NFS v4 is stateful and uses TCP as transport layer protocol

Hadoop Distributed File System (HDFS)

HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-
related technologies, HDFS has become a key tool for managing pools of big data and supporting big data analytics
applications.

What is File-level Virtualization?


Eliminates dependency between data accessed at the file-level and the location where the files are physically stored
Enables users to use a logical path, rather than a physical path to access files
Uses global namespace that maps logical path of file resources to their physical path
Provides non-disruptive file mobility across file servers or NAS devices

9
File-level Storage Tiering
Moves files from higher tier to lower tier
Storage tiers are defined based on cost, performance, and availability parameters
Uses policy engine to determine the files that are required to move to the lower tier
Predominant use of file tiering is archival

WEEK 3

Object-based Storage Device


Stores data in the form of objects on flat address space based on its content and attributes rather than the name and
location.

Object contains user data, related metadata, and user-defined attributes


Objects are uniquely identified using object ID

Hierarchical File System vs. Flat Address Space


Hierarchical file system organizes data in the form of files/directories
Limits the number of files that can be stored
OSD uses flat address space that enables storing large number of objects
Enables the OSD to meet the scale-out storage requirement of third platform

OSD system typically comprises three key components:

OSD nodes (controllers)


Internal network
Storage

Key Features of OSD

10
Software-based Object Storage
Object storage software is installed on any compatible hardware
Provides the flexibility to reuse the existing infrastructure (compute and storage) and to use commodity hardware
Object-based software can also be installed on virtual machines

Hardware-based Object Storage

Both software and purpose-built hardware are provided by vendor


Typically pre-configured and pre-tested
Provides better performance

11