Professional Documents
Culture Documents
IBM Deduplication
Dedup Dedup
FastBack Dedup
WAN
LAN
Productive Server
Backup Server
SAN
ProtecTIER TS7650 TSM 6.1 Dedup
SVC XIV DS8000 DS 3/4/5* Dedup Dedup Dedup Dedup
Storage Manager 6
Tape
VTL
Disk Buffer
Dedup Dedup
Physical Capacity
LAN-free Client
Backup Server
TSM disk pools can be de-duplicated More space for critical data Smaller disk pool
SAN
Disk
Represented Capacity Disk-buffer Physical Capacity
Disk
VMDK
VMDK
Datastore A
VMs only consume storage for their unique data Reduce Storage Costs with Virtualization
N series de-duplication provides the same benefits as VMwares shared memory functionality
6 2010 IBM Corporation
LAN
LAN-free Client
Possible reduction of required disk capacity 1:5 1:25 , Strong dependency on Backup process used Type of data Bandwidth vs cost Backup Compare with using multiple physical drives
Server
Tape
SAN
Disk-buffer
Virtualization
Physical Capacity
Disk Tape
7 2010 IBM Corporation
Larger or dedicated storage management staff Prefer an integrated software solution with no specific hardware dependencies TSM manages majority of data on tape Data backed up nightly 6 TB or less Spare resources are available to dedicate to dedup processing Moderately sized TSM server installation
De-duplication Topologies
1. Post Processing & Inline 2. Hash Based Approach 3. IBM Hyperfactor 4. Case Studies
#2 Post Processing
As data is received by the target device it is temporarily stored on disk storage
10
Hash-Based Approach
13
2. Generate Hash per chunk and save Ah Bh Ch Dh Eh 3. Slice next data into chunks and compare hashes with table A B C D E
15
Hashing
MD5 128 bits Sha0 128 bits Sha1 160 bit .... sha384, sha512
16
Hash Collision
Hash Collison(n) a term in computer programming for a situation that occurs when two distinct inputs into a hash function produce identical outputs. The possibility of a hash collision (2 chunks of different data assigned the same hash) is not zero. A 10 TB repository has 1.25 billion 8k blocks, even with a low probability, when you are managing that many blocks for a long time, the likelihood increases.
17
18
19
20
HashCollision
HashIndexsize
21
23
24
25
26
35TB
35TB
Backup Servers
Clients
35TB
35TB
27 2010 IBM Corporation
Clients
Actual Results: Full 100TB backup: 36 hours 809MB/s Incr 20TB backup: 7 hours 832MB/s
28
29
Clients
TS7680 (z-OS)
30
HyperFactor Approach
1. Locate data in a backup stream similar to content stored in repository New Data Stream 2. After locating similar content, retrieve existing content from repository and run byte level check between existing and incoming data Element A Element B Element C
31
HyperFactor Approach
HyperFactor has two indexes HyperFactor index used for backup Fixed size of 4 GB, stored in memory Contains most similar data elements Used to filter out similar elements from data stream Restore Index used for restore Dynamic index, growing Includes reference of de-duped objects Stored on disk system
32 2010 IBM Corporation
HyperFactor
Repository
FC Switch
TS7650G
Existing Data
Compute delta
35
36
38
Network
Repository
CFS Metadata files STU data files
39 39 11-Jun-10
2010 IBM Corporation
Storage Fabric
Disk Arrays
40 40
11-Jun-10
HOST
CPF
ProtecTIER Server ProtecTIER Server
Unavailable Unavailable
Active Available
2010 IBM Corporation
42
RAID-10
HyperFactor Index Virtual Volume files Library Configuration Data Storage Management Data
Repository
Metadata Metadata Metadata Metadata Metadata Metadata
RAID-5
User Data from Backup Application
User Data
User Data
43
44
TS7650 Appliance
Highest Performance Highest Performance Largest Capacity Largest Capacity Better Performance Better Performance Larger Capacity Larger Capacity Scalable Scalable Good Performance Highly Scalable Low cost
Sca
Ca lable
Active-Active Cluster Single Node Up to 500 MB/sec 1 PB TB useable Up to 1000 MB/sec 1 PB TB useable
36 TB useable
18 TB useable
TS7650G Gateway
3Q08
Data Deduplication
2010 IBM Corporation
46
TS7680
Disk Cache
TS3500
Comprehensive solution builds on IBM z/OS, tape, tape virtualization and ProtecTIER deduplication
47 2010 IBM Corporation
Primary Site
ProtecTIER Gateway
Represented capacity
Secondary Site
Backup Server
ProtecTIER Gateway
Physical capacity
48 48
11-Jun-10
IP based NR links
Backup Server
ProtecTIER Gateway
Physical capacity
Virtual cartridges can be cloned to tape by the Main-Site B/U server Tape library
Central / DR Site
49
50
Solution
10 TS7650G ProtecTIER Deduplication Gateways
Benefits
Executes backups to disk with a retention of 180 days providing faster backups and even quicker restores Saved over 100+ square meters of floor space by eliminating tape libraries through this implementation Off-site backups are no longer needed. Data is electronically copied and replicated safely and efficiently Enables customer to re-use existing disk infrastructure
51
IBMs TS7650G ProtecTIER seamlessly integrated into an existing backup environment using TSM, removed the complexity of failed backup and restores and will help them contain the growth rate of their data sets
Protect More. Store Less.
2010 IBM Corporation
52
SEB Bank
Business challenge
SEB wants to be tapeless with incremental data with total 4.2 PB in 2-3 years. SEB had already moved to a disk-based backup and recovery solution using hash-based VTL solution but was hampered by their inability to provide performance, scalability, and capacity to meet their backup and recovery requirements. As new datasets were added and their environment continued to grow, performance and capacity suffered. With the current VTL appliances, their only choice was to keep adding appliances to try and solve the problem. SEB decided not to invest any more time and money and opted for IBMs TS7650G deduplication solution in a clustered configuration to have a more robust, dependable solution that could guarantee performance and scalability.
Solution
IBMs ProtecTIER TS7650G (6 in a clustered configuration) IBM DS8000 disk arrays (2)
Benefits
Provides industry-leading performance, scalability and availability with true global deduplication technology IBM provided SEB a solution of 6 TS7650 Gateways vs. Hash-Based Vendors 34 appliances to handle the same amount of data Enables SEB to manage their environment holistically and will enable SEB to meet their goal of going tapeless in the next 2-3 years
53
With industryleading performance, scalability and capacity, ProtecTIER continues to exceed expectations on meeting customer requirements of all sizes
Grytet SAN
TS7650G
Rissne Node 1
TS7650G
Grytet Node 1
Rissne Node 2
Grytet Node 2
60 TB Repository
60 TB Repository
DS8700
54 54
DS8700
11-Jun-10
2010 IBM Corporation
Hilti Dedupe rate 16:1 30% databases (SQL, Oracle and SAP), 70% files Retention time: 21 days = Files, 3 months = DB Backup server: Netbackup Backup Restore Requirements 400 MB/s Ekom21 Dedupe rate 8:1 OS, files (Incremental Forever) , emails, mySQL,Oracle, Informix Daily full backups/Incrementals Backup software: TSM 6.1 Backup/Restore Requirements 300 MB/s Cartridge level IP replication
55 2010 IBM Corporation
Shipped 60 Appliances Open Systems, AS/400, z/OS Support Disk-Based and IP-Based Replication Support
56 56
2010 IBM Corporation
Questions?
Merci! Danke!
57
58