You are on page 1of 63

Demystifying vSAN Management

for the Traditional Storage


Administrator

Fuji Setio
Solution Engineer – HCI Specialist
VMware Indonesia

©2018 VMware, Inc.


Full-Stack HCI Transforms IT Operations

Compute Networking Management Hybrid Cloud

• Extend virtualization to • Enhance cluster security • Cross-cluster rebalancing for • Seamlessly extend
storage using existing tools with micro-segmentation consistent performance workloads to the cloud
• Consistent app • Automate infrastructure • vSAN overview, capacity, • Rapidly adopt public cloud
performance from native with policy-based troubleshooting dashboards with similar processes and
architecture management reduce time-to-resolution tooling

©2018 VMware, Inc. 2


Innovation Drives HCI Industry Leadership

2019
20,000+ IDC #1
customers HCI Software revenue in Q418

History of 2017:
Pivotal Container Services
Innovation 2017:
VMware Cloud on AWS

2016:
VMware Cloud Foundation & VxRack

2016:
VxRail
2014:
vSAN
2014:
vRealize Suite
2011: 2013:
SDDC NSX
©2018 VMware,vision
Inc. Source: IDC’s Q42018 Worldwide Quarterly Converged Systems Tracker 3
HCI and
Traditional
Storage
Similar, but best to
avoid flying one
like the other

©2018 VMware, Inc. 4


Agenda Comparing vSAN to Traditional Storage
The logical and physical differences

Applying Proper Design Principals to vSAN


How traditional storage principals apply to vSAN design

Understand New Operational Procedures


Adjusting operational procedures for vSAN

vSAN What’s Next


To Infinity… and Beyond !

©2018 VMware, Inc. 5


Comparing vSAN to Traditional
Storage
The logical and physical differences

©2018 VMware, Inc. 6


Distributed Storage Systems Compared to Monolithic Storage
Differences begin to show when demand and scale increases

Monolithic storage systems


• Logical and physical
architecture challenges
• I/O loses intelligence leaving
host
Cluster Cluster
• Odd things can happen
at scale
Cluster
Switchgear vSAN
• Treats storage as a
Shared cluster resource
Storage • More reliant on
network fabric

Traditional HCI

©2018 VMware, Inc. 7


Enterprise Storage in a Native vSphere Architecture
VMware vSAN – integrated into hypervisor

Pools HDD/SDD into single


cluster-wide shared datastore

Presents single datastore for


vSphere vSAN entire cluster

Uses object store instead of


clustered file system

Availability and performance


requirements set using storage
policies
Object components
vSAN Datastore

©2018 VMware, Inc. 8


vSAN Performance on Hardware
The cost/performance pyramid of storage device types

3D XPoint cache / NVMe capacity


All-flash: More predictable and
responsive than hybrid
All NVMe

SATA protocol is 1-to-1. Locks


NVMe cache / SAS capacity
bus. Avoid if possible

All SAS Storage controllers can be


bottleneck.
NVMe cache / SATA capacity
Limited support of SAS
SAS cache / SATA capacity expanders (due to
performance)
All SATA
NVMe. Fastest, simple (no
Capacity external controller), and low
CPU overhead
©2018 VMware, Inc. 9
Modern Object Based Storage for vSphere
vSAN objects and components

Object Object Types


700GB VMDK The vSAN datastore is an
RAID-1 object store
FTT=1 VM Home
namespace
Each object made up of one or
RAID-1 VM swap
more components
Snapshot
delta Data (components) is
Snapshot distributed across cluster based
on VM storage policy

C1 C2 C3 C1
C2 Assign storage policy to many
W C3
Components
Replica Replica
VMs, single VM, or single
VMDK
Copy of Witness Copy of
Object Component Object

©2018 VMware, Inc. 10


Performance Related Policies
Number of disk stripes per object

# Stripes per Object = 2


Defines the minimum number
RAID-1 of capacity devices a replica is
RAID-0 RAID-0
distributed
vSphere vSAN
May result in better
performance during destaging
and fetching of uncached
Cache reads

May limit abilities of


Capacity
stripe 1a stripe 2b stripe 2a witness compliance of storage policies
stripe 1b
To be used with care
ESX01 ESX02 ESX03 ESX04

©2018 VMware, Inc. 11


The Different Types of Storage Traffic in vSAN
Understanding the difference between front end traffic and back end traffic

Front end storage traffic


• Guest VM storage I/O traffic
Front end
VM Traffic Back end storage traffic
vSphere vSAN • Replica traffic
• Data resynchronizations from:
– Object policy changes
– Host or disk group
Back end evacuations
Traffic – Object or component
rebalancing
– Object or component repairs

vSAN Datastore

©2018 VMware, Inc. 12


Applying Proper Design
Principals to vSAN
Apply traditional storage principals to
vSAN design

©2018 VMware, Inc. 13


Good Infrastructure Practices Also Apply to vSAN
Successful vSAN deployments depend on properly designed environments

Existing environments may


NOT be optimally configured
Redundant switch fabric
Environment can be
Redundant host uplinks functional, but fragile - not
adhering to recommended
New IP address range for vSAN IPs
practices
vSAN segregation on switches
Network IS the storage fabric,
Good vSwitch design and is critical with HCI
Etc.
Resiliency or space efficiency
settings can impact
performance

©2018 VMware, Inc. 14


Influencing Factors of vSAN Performance in an Infrastructure
Some factors are easier to address than others

Multi-tier Biz Critical Next Gen Legacy Physical architecture will


impact vSAN performance
Software

Applications
vSAN/ESXi configuration will
impact performance

Distributed architecture places


vSphere vSAN Hypervisor
connectivity at a premium
Compute
Workloads may place unique
Hardware

stresses on a system that will


Storage
expose bottlenecks
Network

©2018 VMware, Inc. 15


vSAN – A Different Way of Scaling
Scale UP and OUT for maximum agility

Add capacity the way you want


• Scale UP by adding drives
• Scale OUT by adding hosts

Pros and cons to both ways

Some functionality dependent


vSphere vSAN vSphere vSAN vSphere vSAN on cluster host count
Scale Up

vSAN Datastore
Scale Out

©2018 VMware, Inc. 16


Easily Define Desired Outcome for your Applications
SPBM – Creating a storage policy

Policies define levels of


protection and performance
for VMs or VMDKs

Managed by vCenter

Can be used in one or more


clusters

Storage provider (e.g. vSAN)


specific storage capabilities

©2018 VMware, Inc. 17


Easily Assign Desired Outcome to One or More VMs
SPBM – Assign a storage policy and check compliance

Apply Policy Policies applied to a VM or


VMDK, not an entire array
• Unlike traditional storage
• Prescriptive

View Result Change existing, or apply new


policies on the fly

Easily view when VM or VMDK


is compliant with new policy

©2018 VMware, Inc. 18


Option with Data Protection Levels and Schemes
Basic concepts around “failure tolerance method” and level of “failures to tolerate”

RAID-1

Failure Tolerance Method


data data (FTM) defines data layout
approach
FTM = Mirroring (RAID- data data
1) • Mirroring (RAID-1)
data data
FTT = 1 • Erasure Coding (RAID-5/6)
data data
Level of Failure to Tolerate
RAID-5 (FTT) defines level of
resilience
• FTT=1 – accessible with one
parity data data data failure
FTM = Erasure Code data parity data data • FTT=2 – accessible with two
(RAID-5/6) failures
data data parity data
FTT = 1 • FTT=3 – accessible with three
data data data parity failures

©2018 VMware, Inc. 19


Object Based Storage – A Better Way for Data Protection
Setting failures to tolerate (FTT) to 1 with RAID-1 mirroring

RAID-1

Alternative FTM to RAID-5/6


erasure coding
data data witness Data mirrored to another host
data data
Witness needed to determine
data data
quorum
data data
Requires fewer hosts but not
as space efficient as RAID-5/6

Additional hosts needed to


support greater than FTT 1 or
maintenance operations

©2018 VMware, Inc. 20


Object Based Storage – A Better Way for Data Protection
Setting failures to tolerate (FTT) to 1 with RAID-5 erasure coding

RAID-5

Alternative to RAID-1 Mirroring

Data with parity striped across


parity data data data hosts
data parity data data
For erasure coding, FTT 1
data data parity data
implies RAID-5, FTT 2 implies
data data data parity RAID-6

Guaranteed space reduction


• 30% savings with RAID-5
• 50% savings with RAID-6

Additional hosts needed to


support greater than FTT 1 or
maintenance operations
©2018 VMware, Inc. 21
Object Based Storage – A Better Way for Data Protection
Setting failures to tolerate (FTT) to 2 with RAID-6 erasure coding

RAID-6

2 hosts can fail without data


loss (Implied FTT=2)
parity parity data data data data 6 hosts minimum
data data parity parity data data
7+ hosts desired to recover
data data data data parity parity
resiliency level upon failure

50% reduction in overhead


compared to mirroring.
• 20GB disk consumes 60GB
with RAID-1, FTT=2 (3x)
• 20GB disk consumes 30GB
with RAID-6, FTT=2 (1.5x)

©2018 VMware, Inc. 22


vSAN Cluster Host Count Matters
Levels of resilience depend on quantity of hosts within a vSAN cluster

Data services and functionality


dependent on cluster host
count
RAID-1
FTT=1
Ensure cluster is sized for N+1
RAID-1 or N+2 in compute and
FTT=2
storage capacity
RAID-1
FTT=3 Settings are per VM or per
VMDK setting, thanks to SPBM
RAID-5
(FTT=1) Should be a part of your cluster
RAID-6
sizing decision process
(FTT=2)

©2018 VMware, Inc. 23


Guidance – Infrastructure Hardware
Performance of holistic solution comes from discrete hardware

Host Count Understand trade-offs


• What is most important.
(Performance, resiliency, cost)
• Understand options with
technologies (NVMe, 3D
XPoint, 25GbE)

Look at infrastructure
holistically
• Device types, host count,
network speed, etc.
Connectivity
Test on HCL approved
hardware only!

©2018 VMware, Inc. 24


Understand New
Operational Procedures
Adjusting operational procedures for vSAN

©2018 VMware, Inc. 25


HCI Solutions Are Not Created Equal

Other HCI Solutions


• Resource-intensive storage
Controller VMs on every host
Storage Storage
VM VM
• More hops, context switching,
queues, and locks
vSphere vSphere
• Host CPU and I/O amplification

• Fewer VMs per host, less


Compute consistent performance
Utilization
Experience the Unique Benefits of vSphere Native HCI

VMware vSAN
• vSAN embedded into vSphere

• Simplified, efficient I/O path

• Minimal Host CPU and I/O


vSphere vSphere
• More VMs per host, with
consistent performance

Compute • Awareness of hypervisor


Utilization
• Single stack management
Higher Consolidation & Consistency & Security = less IT headaches
Free (Slack) Space Is Critical to a vSAN Environment
vSAN relies on free capacity for ongoing redistribution of object components

Two reasons for free space:


• Slack space for policy
changes & other component
CC CC CC CC CC CC CC CC CC CC movement
C C C C C C C C C C
• Hot spare capacity for N+1 or
N+2 (beyond slack space
needs)
Slack for component movement (~30%)

Consumed
Hot spare capacity (e.g. N+1) Actions that consume space
Capacity for object components • Changing a policy
• Host/disk group evacuations
Cluster Capacity • Repairs/rebuilds
• Rebalancing
• On disk format changes

©2018 VMware, Inc. 28


Example of a Storage Policy Change
Illustrating how a policy change can create resync traffic and use temporary space

Temporary space can increase


from:
• Changing Failure Tolerance
Method (FTM)
• Changed Level of Failure to
Tolerate (FTT)

Consumed
Large quantities of VM policy
Consumed

Overhead Overhead changes can generate:


Primary Primary Primary • Heavy resync traffic
VM Data VM Data VM Data
• Temporary space usage
1. Existing Object RAID- 2. Assign RAID-5 policy 4. Old components
1 with FTT=1 3. Building of new deleted after RAID-5
RAID-5 object begins object is built

©2018 VMware, Inc. 29


Impact of VM Storage Policy Changes
Introduces Resync traffic, and consumes temporary space

Illustrates how policy changes


from RAID-1 to RAID-5
temporarily used more
Resync Event
capacity

Any resynchronizations may


temporarily impact dedup &
compression ratios

Changing Policies may


permanently impact dedup &
compression ratios

Full host evacuations must


Effective Capacity Used over Time consume capacity elsewhere

©2018 VMware, Inc. 30


The Benefit of Using Multiple Storage Policies
Take advantage of the power of SPBM

Name: General Purpose Create multiple policies and


FTM: RAID-1
apply to subset of VMs
FTT: FTT=1

Name: SQL Servers Use whatever naming that suits


FTM: RAID-1 the needs/environment
FTT: FTT=2
Need to change policy for
Name: Dev/Test large number of VMs?
FTM: RAID-5
FTT: FTT=1 Consider:
IOPS Limit: 1,000 • Creating a 2nd policy, and
moving in batches
Name: Default Storage Policy
vSAN
FTM: RAID-1
FTT: FTT=1

©2018 VMware, Inc. 31


Space Efficiency Versus Performance
Tradeoffs – performance, protection, and space efficiency

Standard write amplifications


apply to vSAN, just as with
Client Client
1 write/update 1 write/update other systems
System System

2 writes 3 writes Amplification occurs across


RAID-1 – FTT=1 RAID-1 – FTT=2
hosts

Granular setting. Object-based


RAID in vSAN can be set to
VMDK level
Client Client
1 write/update 1 write/update RAID-5/6 requires more
System System

2 reads, 2 writes 3 reads, 3 writes


computational effort for I/O
activities than mirroring
RAID-5 – FTT=1 RAID-6 – FTT=2

©2018 VMware, Inc. 32


Deduplication and Compression – how it works
A single feature to offer “opportunistic” space efficiency

Deduplication
RAID-1 • Per disk group
• Occurs when destaging to
vSphere vSAN capacity tier
• 4KB fixed blocks

Compression
Cache • Occurs after dedup, prior to
data being destaged
• If block is compressed <= 2KB
Capacity
vmdk vmdk • Otherwise full 4KB block is
vmdk stored

©2018 VMware, Inc. 33


Guidance - Applications
Measure the right way

vCenter Metrics (short term)


vR Ops (long term) Use a common monitoring
Live Optics plane for metrics collection
vSAN Support Insight (e.g. vCenter)

Other tools using VMware APIs


provide consistency
Multi-tier Biz Critical Next Gen Legacy
Monitor observed latencies

Do not compare metrics from


different sources
vSphere vSAN Sample rates can change data
rendered

©2018 VMware, Inc. 34


Maintenance Mode Operations
Know your options

Three maintenance mode


(EMM) options
EMM
• Full data migration
• Ensure accessibility
• No data Migration

Host restarts Follow EMM warnings if


present

If rebooting, view DCUI for


host restarts
• Host restarts in vSAN cluster
take longer
• Best way to view status
vSAN

©2018 VMware, Inc. 35


Keep Cluster Updated with New VUM Integration
Performance, features, and resiliency improved with every release

HCL Database Release


Catalog
ESXi, drivers, and firmware
vmware.com updates now centralized through
VMware Update Manager (VUM)
vSphere Update Manager
Supports custom ISOs for OEM
System Baseline specific builds
Drive Controller Drive Controller
Driver Recommendations Firmware Recommendations Supports vCenter without
internet connectivity
1. Update I/O Controller 2. Update Software
firmware & Drivers based on HCL
Per-host updating with
Custom Baseline Custom Baseline Custom Baseline
additional safeguards

Always update vCenter first

©2018 VMware, Inc. 36


Prescriptive HCL-Aware vSphere/vSAN Updates
VMware Update Manager (VUM) Integration

System Managed HCL aware upgrade


Read-Only baseline recommendation

©2018 VMware, Inc. 37


Enhanced Performance Monitoring
Granular level of performance metrics through the stack

VM/App

Host level

Disk / disk group

Cluster

©2018 VMware, Inc. 38


Managing vSAN with vRealize Operations

©2018 VMware, Inc. 39


Host count impacts performance and types of data
services available

Takeaways Make small adjustments to operational procedures,


and communicate!
Demystifying vSAN for the Traditional Treat every node and the network connecting them
Storage Administrator as your storage system
When in doubt, go out to storagehub.vmware.com

©2018 VMware, Inc. ‹#› 40


vSAN 6.7 Pricing and Packaging Summary

New Feature in vSAN 6.7 vSAN vSAN vSAN


Existing features in vSAN 6.6 STD ADV ENT
Features
Storage Policy-Based Management
Read / Write SSD Caching
Distributed RAID
vSphere Distributed Switch
vSAN Snapshots & Clones
Rack Awareness
Replication (5 min RPO)
All-Flash Support
Block Access (iSCSI)
QoS – IOPS Limits
Deduplication & Compression (All Flash only)
Erasure Coding (All Flash only)
vRealize Operations within vCenter
Stretched Cluster with Local Failure Protection
Data-at-rest Encryption (FIPS 140-2)
vSAN What’s Next?

©2018 VMware, Inc. 42


The Software Stack is Today’s Most Critical Decision

Edge Core Cloud

The Digital Foundation


Hyper-Converged Operations & Network Virtualization &
Infrastructure Automation Security

©2018 VMware, Inc. 43


Fast Release Cycle Drives HCI Leadership

First with New Hardware Support


• Support Intel Optane NVMe SSDs
• Support Intel Xeon Scalable Processors
• Support for Dell EMC 14G and HPE Synergy

First with New Developer Tools


• Integrated container support (vSAN 6.5)
• Endpoint device management (IoT Pulse)
• Non-disruptive Workload Assessment Tool

First with New Features


• Software encryption solution
• Stretched cluster with local site protection
• Only DISA STIG approval for Federal

44
EBS-Backed vSAN: vSAN as the Cloud Storage Control Plane

VMware Cloud on AWS


Powered by VMware Cloud Foundation

vSAN’s Enterprise-grade
data services + EBS cloud
elasticity and global reach

Amazon EC2 R5.metal


Lowers Cost for storage
intensive workloads

©2018 VMware, Inc. 45


vSAN Native Data Protection Beta

Policy-driven, snapshot-based local and replication protection

Primary Data Center Archival Storage Overview


• Space-efficient native snapshots
• Unified local and remote protection
NAS Object Cloud
• Policy-based management

Secondary Storage or Cloud • Native recovery workflows


Replication
Policies
Benefits
Disaster Recovery Site
• Native data protection reduces capex
and opex burden of deploying
additional products
• Lower RTOs compared to traditional
image-based backup

©2018 VMware, Inc. 46


Beta
vSAN Native File Services Beta

• Extending vSAN from block to File services


from same cluster
NFS Block SMB NFS NFS + Block • Easier capacity planning. No boundaries
between data types
vSphere vSAN • Consistent vSAN data services enabled for
file: Snapshots, Dedup, Encryption …
• One pane of glass for applications that
require multi-protocol storage
vSAN Datastore

©2018 VMware, Inc. 47


Storage and Data Management

Self-Driving Operations

Block, File,
Object
Traditional Apps Cloud Native Apps

Storage Features Data Management

Storage Efficiency Encryption Data Data Copy Data Data Data


Policies Protection Security Management Insights Governance

Edge On Prem / Private Cloud Public Cloud

©2018 VMware, Inc. 48


Use cases for vSAN

©2018 VMware, Inc. 49


Use Case 1 – Technology Refresh from Legacy
Architecture

©2018 VMware, Inc. ‹#› 50


Use Case 2 – Co-Existence Strategy with
Legacy and HCI

©2018 VMware, Inc. ‹#› 52


Use Case 3 – Lower Hardware TCO Refresh

©2018 VMware, Inc. ‹#› 54


Year 1 – Base Setup
Year 3 – Hardware Refresh

Tech refresh to
newer hardware
or to different
hardware brand
without any
major migration
Year 3 – Re-purpose Production for DR

Re-purpose
from Production
Use Case 4 – Storage Provisioning Efficiency

©2018 VMware, Inc. ‹#› 58


Storage Provisioning Efficiency
Existing Legacy VMware vSAN
1. Gather Storage requirements 1. Gather Storage requirements

2. Create new LUN from available storage Raid 2. Create new VM using vSAN storage policy
Group
3. Power on VM
3. Provision FC Zoning

4. Assign LUN to Physical host

5. Recognize the new LUN on the Physical host

6. Create a new datastore

7. Create new VM using the new datastore

8. Power on VM
Traditional Storage Provisioning
• Performance and Protection is
designed at the Physical HDD Level
• Raid Groups
• Storage Pools and etc
Datastore Datastore
• Carved in to LUNs and presented
into Datastores
A LUNS B
• Many VM’s inherit the same profile
RAID 1 - Construct RAID 5 - Construct
• Changing performance / protection
profiles meant migration

Physical HDD
Storage Provisioning Efficiency

3 1
1

* 3 Different Consoles to Manage to provision storage * Using the same vCenter console to provision storage
vSAN Storage Provisioning: How Does It Work?
• No Physical RAID Groups
Storage Policy 1 Storage Policy 2
• Almost like a JBOD

• Collectively combine all the disks into a


single vSAN datastore

• VM’s are deployed onto the datastore

• Storage Policies are created based on


availability or performance
vSAN Datastore
• Policies can be individually tagged at the
granularity of a VM/vmdk

• Ease of scale just by adding to the pool


Physical HDD
Thank You
fsetio@vmware.com

©2018 VMware, Inc.

You might also like