You are on page 1of 30

Spectrum Scale

Expert Talks

Episode 3:
Spectrum Scale Strategy
Ted Hoover
Program Director Spectrum Scale Development

Wayne Sawdon
CTO for Spectrum Scale and ESS

Join our conversation:


www.spectrumscaleug.org/join
Show notes:
www.spectrumscaleug.org/experttalks
Survey on Spectrum Scale and ESS
PTF Frequency

Please share your feedback:

https://www.surveygizmo.com
/s3/5727746/47520248d614

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
2
The first 20+ years

A history of High Performance


Storage from supercomputers in the
national labs to today’s data-intensive
workflows and analytics across
commercial markets

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
3
GPFS has evolved …

• In 1993: Started as “Tiger Shark” research


project at IBM Research Almaden as high
performance file system for accessing and
processing multimedia data
• In 1998: Grew up as General Parallel File
System (GPFS) to power the world’s largest
supercomputers POSIX NFS

• In 2014: Rebranded as IBM Spectrum Scale GPFS

Although the workload has remained the same –


high performance analytics on vast quantities of
unstructured data – the product features are more
focused on commercial markets.
NVME SSD Disk Tape
This focus continues to evolve to today’s cloud and
container markets.

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
4
IBM Spectrum Scale
• GPFS is known for scale-out high performance on
the world’s largest supercomputers…
• BUT: If you still just think GPFS, you miss:
– Support for workflows which for example
inject data via object, analyze results via
Hadoop/Spark and view results via POSIX
– Storing and accessing large and small objects
(S3 and Swift) with low latency
POSIX HDFS NFS SMB Swift/S3
– Storing and starting OpenStack VMs without
copying them from object storage to local file IBM Spectrum Scale
system
– Common namespace between Spectrum
Scale clusters on-prem and in the cloud
– Namespace includes Data Management to
automatically destage cold data to on premise
NVMe SSD Disk Tape
or off premise tape or object storage
– GUI , REST API, Grafana Bridge
– HA, DR, Real time Audit & Security
– And much, much more
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
5
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
57 of the Global 100 run IBM Spectrum Scale
9 of the top 10 automobile manufacturers
9 of the top 10 investment banks
18 of the top 25 banks
8 of the top 10 global retailers
4 of the top 5 insurance companies

High performance analytics on vast quantities of unstructured data

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
Strategic Trends

Connected Clouds

Dev Ops

Inescapable AI

AI Data Management Challenges

Security

Performance

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
8
Companies
average almost
Reasons to Hybrid
migrate from
5 public cloud multicloud
private and
public clouds • Security is the platform
• Performance
• Cost
80% • Control of companies

of companies
moved their IDC Survey
85% operate in a
hybrid multicloud
environment today
applications or
data from public
clouds in 2018
of companies

98% will be hybrid


multicloud
in three years
IDC; IBM IBV C-Suite Study; Rightscale
Source: IDC’s Cloud and AI Adoption Survey, January 2018
9
Two simultaneous 1. Hybrid multicloud usage
evolutions are taking
shape in the data 2. Taking advantage of more
center today data for competitive
advantage
IBM’s ACQUISITION
OF RED HAT IN
JULY 2019
COMPLETELY
CHANGED THE
CLOUD
LANDSCAPE TO
BECOME THE
WORLD’S #1
HYBRID MULTI-
CLOUD PROVIDER.
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
10
The shift to containers

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
11
Evolving Storage Market
Traditional Storage
• Deliver underlying infrastructure Container-Ready Storage
needs to support enterprise
requirements. • Leverage existing investments in Container-Native Storage
• Centralized administration for traditional storage to support
organization. container deployments. • Storage deployed inside
containers with enterprise level
Examples: DS8900, FlashSystem, • Allows use of snapshots, clones, data management services to
IBM Spectrum Scale and replication but doesn’t take
support mission critical
advantage of container
applications deployed in
framework and related benefits.
containers.
• Not optimized for Kubernetes so
• Direct attach and external
can be a bottleneck to achieving
storage support varying
increased agility and elasticity.
performance and capacity
Examples: DS8900, FlashSystem, needs.
IBM Spectrum Scale
• Kubernetes control plane allows
self service capabilities driving
higher levels of efficiencies.

• Future: Cloud Native Spectrum


Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation Scale
Cloud Native Storage
Goal: Deliver High Performance File Services to Containerized Application
Workloads

Support Workloads that Require High Support for Multiple Clouds


Performance File Services
• Public, Private, Hybrid
• Analytics & Cognitive
Support Hybrid Use Cases
• High Performance Computing
• Cloud Burst – Single Name Space
• AI Data Pipeline
• Multi Cloud Data Sharing
Support the Workload Ecosystem in the Cloud
• Archive
• Containerized Applications, Storage
• Data Accelerator (High Performance Tiering)
• Ephemeral and Persistent Storage Volumes
Solution Integration (Partners)
Flexible Deployment

• Dynamic Provisioning, Configuration, Upgrade

13
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
Evolution of IBM Spectrum Scale Containers
Scale in a Container
w/ CloudPaks
Scale for Containers
Scale for Containers v2 Scale in a
v1 Container

Spectrum
Scale
(bare metal deployment) CSI 2.x
OpenShift Interoperability

Kubernetes
Kubernetes
CSI 2.x

Spectrum Scale
SEC 2.0 CSI 1.0

Spectrum
Scale
Spectrum Spectrum
Spectrum Scale Scale Scale
OpenShift

Common Services
OpenShift

Kubernetes
Kubernetes
OS Support OS Support OS Support RH CoreOS
RHEL / RH CoreOS
Infrastructure Infrastructure Infrastructure Infrastructure
Infrastructure IBM & Expanded Partner Ecosystem

IBM & Partners IBM & Partners IBM & Partners


IBM & Expanded Partner Ecosystem

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation 14
Evolution of IBM Spectrum Scale on Cloud
Current Future
Partner and Scale Offerings Partner and Scale Offerings

Spectrum Scale Spectrum Scale Spectrum Scale


(bare metal deployment) Partner Solutions
IBM Cloud Scale in a Container
Scale in a Container
w/ CloudPaks
On Cloud
(Multi-Cloud)

Common Services

Spectrum Scale

Spectrum Scale Common Services


CSI

Spectrum Scale OS Support Spectrum Scale in a Container

Spectrum Scale
Infrastructure Scale on AWS CSI

IBM & Partners OS Support

Infrastructure
IBM & Partners
Spectrum Scale

OpenShift
AMI
AWS Common Services Common Services

Kubernetes
OS Support OS Support RHEL
Spectrum Scale

Infrastructure
Infrastructure IBM & Partners
Infrastructure
IBM & Expanded Partner Ecosystem

IBM & Partners


OS Support
Infrastructure
IBM & Partners

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
15
Evolution of Hybrid Cloud with IBM Spectrum Scale
Spectrum Scale Scale in a Container
Spectrum Scale on AWS w/ CloudPaks
(bare metal deployment)
Spectrum Scale (Multi-Cloud)
(bare metal deployment)

AMI
AWS Common Services Single Name Space
Spectrum Scale w/AFM CSI
Single Name Space Spectrum Scale

Spectrum Scale
w/AFM
Spectrum Scale
OS Support
OS Support
Infrastructure OS Support
IBM & Partners

Infrastructure
Scale in a Container OpenShift
Infrastructure IBM & Partners On Cloud Common Services
IBM & Partners

Kubernetes
Spectrum Scale RH CoreOS
IBM Cloud Infrastructure
IBM & Expanded Partner Ecosystem

Common Services

Scale in a Container

CSI

Spectrum Scale
Spectrum Scale

OS Support OS Support
Infrastructure
Infrastructure
IBM & Partners
IBM & Partners

Current Future
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation 16
Why DevOps?

Flexible Provisioning and Deployment

Consistency across On-Prem, Multi Cloud ,


Hardware Solutions

Needs to be Highly Customizable


• Microservices
• Integrated with Workload
• Open Source

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
17
Spectrum Scale Deployment: Strategy
Microservices based reusable Ansible infrastructure
(Provides installation, configuration and upgrade capabilities for all Spectrum Scale form factors)
Customer
Ansible ReST API SDI ESS Cloud Containers
Infrastructure
CLI
(Install Toolkit) Cloud Provisioning Container Deployment
Ansible Tower Hardware
(AWX) Ansible Playbooks (Terraform) Ansible Orchestrators
Scale Ansible Scale Ansible Scale Ansible Scale Ansible
Playbook Playbook Playbook Playbook

Protocols Install AFM Install GUI Install ECE Install Callhome Install File Audit Install

Protocols Configure AFM Configure GUI Configure ECE Configure Callhome Configure File Audit Configure

Microservices Protocols Upgrade AFM Upgrade GUI Upgrade ECE Upgrade Callhome Upgrade File Audit Upgrade
based
Reusable
Ansible Roles Spectrum Scale Core Install
Spectrum Scale Core Configuration
Spectrum Scale Core Upgrade
18
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
Spectrum Scale Deployment: Strategy
Infrastructure specific resource provisioning
Unified Installation and Configuration through reusable Ansible playbooks

Cloud Containers Bare Metal


Public Cloud Private Cloud HW Bundling
On-Premise
(IBM Cloud, (Openstack, (IBM ESS
On Cloud On-Premise range of
(Openstack,
AWS, GCP, VMware etc) VMware etc)
Azure etc) products)

Unified Cloud Resource Unified Provisioning through HW setup HW setup


Provisioning Infrastructure OpenShift, Kubernetes (Ansible/ (Manual/
(Terraform) (Ansible/Go) XCAT) Automated)

Provisioned Infrastructure Information


(Standardized format)

Spectrum Scale Package Installation and Configuration


(Unified infrastructure through Ansible)
19
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
Data Management
Challenges in AI
and Analytics

Data ingest and preparation cycle


are too time consuming
Multi-source data aggregation
Silos of infrastructure for various
analytics use cases
Multiple copies of same data without a single
source of truth
Analytics on stale data
Need to securely manage and protect
data provenance for repeatability

Need for global accessibility and collaboration

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
20
IBM Storage and SDI
The Goal: Move Data from Ingest to Insights

EDGE INGEST CLASSIFY / TRANSFORM ANALYZE / TRAIN INSIGHTS

© Copyright IBM Corporation 2018


Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
21
AI Data Pipeline IBM Storage and SDI

EDGE INGEST CLASSIFY / TRANSFORM ANALYZE / TRAIN INSIGHTS

Transient Storage
SDS/Cloud Insights Out
Throughput-oriented,
software defined
temporary landing
zone Trained Model

Classification & Hadoop / Spark ML / DL


Global Ingest Metadata Tagging Data Lakes Prep Training Inference
Cloud Hybrid/HDD SSD/NVMe
Throughput-oriented, Throughput-oriented,
Data In globally accessible
capacity tier
High volume, index &
auto-tagging zone performance &
High throughput, low
latency, random I/O
capacity tier performance tier

Fast Ingest /
Real-time Analytics ETL Archive
SSD SSD/Hybrid HDD Cloud Tape Inference
High throughput High throughput, random
performance tier I/O, performance & High scalability, large/sequential I/O capacity tier
capacity Tier

1. Single name space across storage platforms Spectrum Scale & ESS
2. Global collaboration / Hybrid Multi-Cloud Cloud Object Storage
IBM
3. Indexing, Auto tagging / metadata management Cloud Paks Spectrum Discover
4. Integrated analytics platform IBM Cloud Paks
© Copyright IBM Corporation 2018
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation 22
Data Accelerator for AI and Analytics

The Problem
We see:
ML / DL
Hadoop / Spark
Prep ⇨ Training ⇨ Inference
• Customers across all verticals are creating
SSD/NVMe HDD Cloud Tape
large PB to EB data stores.

HOT COLD

• Vast majority of data is relatively cold, but still


required for periodic trend analysis.

• But AI / Analytics require high performance,


low latency storage to keep expensive
CPU / GPU / TPU / FPGA busy.

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
23
Moving Data as Close as Possible to Compute
➢ 2 storage tiers with different storage characteristics
▪ Data Lake – minimize storage cost
▪ High Performance Tier (HPT) – maximize storage
performance
➢ HPT – high-performance data analytics on shared data,
scale-out cluster FS, common namespace, no data
transformations, data on-demand or prefetch,
periodically revalidates cache
➢ S.Discover – curates data lake, metadata search engine,
loads HTP, starts analytics, overall governance
➢ S.Conductor – Intelligent workload manager
➢ Data Lake – COS, Cloud or any high capacity data store
➢ Performance – single node & scale out
➢ End to end security and monitoring
➢ Can be deployed on-prem to on-prem, one-prem to
cloud or cloud to cloud.
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
Spectrum Scale as Data Acceleration for AI and Analytics (DAAA)
- Accelerate model training output by prefetching selected dataset real-time in your
ML/DL environment from the Hadoop/Spark data lake.
- Accelerate real time analytics / inference output by prefetching selected dataset real-
time in a near-edge environment from the remote centralized data lake.

Accelerated Insight
Data Scientist
Data ingest to capacity Select the right data set for Cache selected dataset into Spectrum
tier caching Scale namespace

INGEST / STORE ORGANIZE ANALYZE / TRAIN / INFER

AI Servers with CPUs & GPUs

Data Accelerator File Cache

NAS
NVMe Storage
Filers High Performance Tier

Capacity
Tier /Data Lake Complete solution across your data’s life cycle
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
Spectrum Scale Strategic Areas: Security Feature Outlook
Strategic Areas

Hybrid Cloud &


AI & Analytics HPC
Containers
Security & Privacy by Design

Comprehensive Data Security

Industry Compliance Features Advance Features Ecosystem/Solutions


• GDPR • Filesystem Encryption • Multi Factor Auth • Secure AI
• HIPAA • Secure Delete • Fileset level FAL • Cyber Resilience
• FFIEC • Immutability • Live Antivirus • Cloud Pak for Security
• PCI-DSS • File Audit Log • Security posture in • SEIM Integration
• LGPD & CCPA • Kerberos (NFS, SMB) single pane of glass • QRadar
• ISO 27040-2016 • POSIX & NFSV4 ACL • Trusted Boot • SPLUNK
• NIST/FIPS • AD/LDAP support • Restricted root • IBM Secret Server
• RBAC Admin (GUI) admin • IBM Spectrum Discover
• Admin mode central • IPv6 (IPSEC)
• SELinux

Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
26
IBM Spectrum Scale and IBM QRadar: Threat Detection and Data Protection
Motivation Solution Architecture
• Attacks against businesses have almost doubled in
five years, and incidents that would once have been
considered extraordinary are becoming more and
more commonplace.

• If Data is the ‘Crown Jewel’ then Storage (Spectrum


Scale) is the ‘Jewel Safe’ – lets make it more safe.

• IBM QRadar is a leading SIEM+ which analyzes


event data in real time for early detection of targeted
attacks and data breaches.

Benefits to Customers
Integrating IBM Spectrum Scale with IBM QRadar
allows: Blueprint & Redpapers: http://www.redbooks.ibm.com/redpieces/abstracts/redp5591.html
• Customers to proactively safeguard their data
residing on Spectrum Scale or be alerted on New!
potential threats (internal / external) in real
time.

• Auto trigger data protection and backup on


threat detection integrating with Cyber
Resiliency solution.
Solution Brief Released (Q1 2020)
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
27
Next Generation Performance
Scale’s Erasure Code Edition (ECE) was
announced in May 2019. Deploy the same ESS
erasure encoding across storage rich servers for
low cost reliable storage from commodity
hardware.
Leverage ECE with Persistent Memory to
create Extreme Performance to local store.
Research project to exploit persistent memory:
✓ Cooperative consistent client cache

✓ Topology Aware Replication with data affinity

✓ Read Ahead / Write Behind job scheduling

✓ Eventual Durability

Invest to accelerate time to value Intel Optane DC Persistent Memory


Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
28
Next Generation Performance (part 2)
Besides Persistent Memory, Spectrum Scale is
continuing to invest in high throughput, low latency
storage for AI and Analytics, HPC, Cognitive and
Mission Critical workloads.
PCIe Gen3 (1x) -> PCIe Gen4 (2x)-> PCIe Gen5 (4x)

Network IB/RoCE/TCP: 100 gb -> 200 gb -> 400 gb

NVMeoF creates “Composable Storage Infrastructure”

Smart NICs: TCP offload, Encryption, Compression, Erasure


Encoding, QoS, vLan, dynamic flow control, etc

Hardware performance will increase by a factor of 10


in next few years. Spectrum Scale and ESS are making
the investment required to continue its performance
leadership.
Spectrum Scale Expert Talks / Episode 3 / Spectrum Scale Strategy © 2020 IBM Corporation
29
Thank you!
Please help us to improve Spectrum Scale with your feedback
• If you get a survey in email or a popup from the GUI,
please respond
Spectrum Scale
• We read every single reply
User Group

The Spectrum Scale (GPFS) User Group is


free to join and open to all using, interested in
using or integrating IBM Spectrum Scale.

The format of the group is as a web


community with events held during the year,
hosted by our members or by IBM.

See our web page for upcoming events and


presentations of past events. Join our
conversation via mail and Slack.

www.spectrumscaleug.org

You might also like