You are on page 1of 17

EMC Data Domain :

Data Protection and Deduplication

© Copyright 2010 EMC Corporation. All rights reserved. 1


Why backup?

 Goals
– Backups are done for restores
 Operational
 Disaster Recovery
– Disaster recovery requires offsite backup
– Operational recovery requires onsite backup
– Need both onsite and offsite copies on disk
– Need quick restores
 Don’t have time for moving physical assets
– Protection of personal data & intellectual property

© Copyright 2010 EMC Corporation. All rights reserved. 2


Why So Much Interest in
Data Deduplication?
 Backup & Archive processes have been
overwhelmed by information growth

 Primary storage efficiency has become a


necessity to cope with massive growth

 ROI drives the compelling appeal of Dedupe


– Reduced Storage Capacities
– Lower Infrastructure Costs
– Improved SLA’s
– Efficient Replication for Business Continuance/DR

One of the top 10 Technology Considerations Deduplication 59%

Very important

Deploying Deduplication 24% 55% 21%

In use Evaluating / In Near – Long Term plan Not in Plan

© Copyright 2010 EMC Corporation. All rights reserved.


- Source: TheInfoPro Wave 11 Storage Study, 2008
3
Why Do Enterprises Still Use Tape?

Primary
Storage • Low upfront cost
DISK • Tape can store the massive
amount of redundant data
created by backups
TAPE
• Transportable for offsite DR

Backup
Storage
5x-10x
Primary

© Copyright 2010 EMC Corporation. All rights reserved. 4


EMC Data Domain:
Leadership and Innovation

• Deduplication storage systems


More than 12,000 systems installed
More than 4,300 customers
More than 2,600 PB under Data Domain protection worldwide

• A history of industry firsts

2003 2004 2005 2006 2007 2008 2009 2010

First Deduplication First Deduplication Largest Fastest Backup


NAS Virtual Tape Library Deduplication Controller
First
Array
Deduplication
First Deduplication First Deduplication Encryption
Volume Replication Directory Replication
Cascaded
First Deduplication Replication
Nearline Storage First Distributed
Processing

© Copyright 2010 EMC Corporation. All rights reserved. 5


Data Domain – works with what you have

Backup Archive

Database

VMware

© Copyright 2010 EMC Corporation. All rights reserved. 6


De-duplication principles

Unique segments (4KB-12KB) – varies “on-the-fly”


7 Confidential
© Copyright 2010 EMC Corporation. All rights reserved. 7
De-duplication principles

Unique segments (4KB-12KB) – varies “on-the-fly”


8 Confidential
© Copyright 2010 EMC Corporation. All rights reserved. 8
Data Deduplication: Technology Overview
Store more backups in a smaller footprint

Friday Full Backup Backup Logical Estimated Physical


Data Reduction
A B C D A E F G
FRIDAY FULL 1 TB 2–4x 250 GB
Mon Incremental A B H

Monday Incremental 100 GB 7–10x 10 GB


Tues Incremental C B I

Tuesday Incremental 100 GB 7–10x 10 GB


Weds Incremental E G J

Wednesday Incremental 100 GB 7–10x 10 GB

Thurs Incremental A C K

Thursday Incremental 100 GB 7–10x 10 GB


Second Friday Full Backup

B C D E F L G H Second FRIDAY FULL 1 TB 50–60x 18 GB

A BCDE FGH I J K L TOTAL 2.4 TB 7.8x 308 GB

© Copyright 2010 EMC Corporation. All rights reserved. 9


Deduplication Dramatically Reduces Storage
Capacity Requirements

Deduplication
10–30 times less data stored versus fulls + incrementals with typical retention policies
30

20
Data Stored

10

0
1 5 10 15 20
Weeks in Use
Deduplication storage
Traditional storage

© Copyright 2010 EMC Corporation. All rights reserved. 10


Data Domain Scale

Data Domain SISL™ Scalable Architecture: CPU-Centric


5

2011 (est.)
3

1.5
DD880, 7/09
Industry’s Fastest
Throughput Backup Storage Controller
GB/sec.
6-Year Improvement
• Throughput: ~90x
0.04 DD200 (2004) • Capacity: ~225x

1.25 70 >PB
Addressable Capacity in TB
Post-RAID (Physical)
© Copyright 2010 EMC Corporation. All rights reserved. 11
Inline vs Post-Process Deduplication:
Provisioning & Admin

Post Process: Inline:


Deduplication After Storing Deduplication Before Storing
At least 3x disk accesses to
shared store
Store Dedupe Dedupe Restore

Restore Replicate? Replicate

Updedupe?
Other activities unimpeded
Process contention increases with − Predictable
#processes − Simpler
− Copy to tape: Too slow to stream tape
− Recovery: SLA predictability
− Replication: Poor time-to-DR
− Deduplication itself if interleaved with backup or
restore
More admin needed to fight these issues

© Copyright 2010 EMC Corporation. All rights reserved. 12


Data Integrity: Data Invulnerability Architecture
Trust but verify—”hope” is not a strategy

Data verification Generate Verify


Checksum Data
Checksum
Verify the file system
Deduplication, write to disk File System
metadata integrity
Verify
Global Compression
Verify user data
Self-healing file system integrity
Local Compression
Cleaning
Expired data
Defrag RAID Verify stripe integrity

Verify

Other
RAID 6
NVRAM
Snapshots

© Copyright 2010 EMC Corporation. All rights reserved. 13


Network-Efficient Replication for True
Disaster Recovery
Lowers WAN costs; improves service level agreements

Flexible replication
 One-to-many
1–5%  Many-to-one
 Bi-directional
DB Data Domain system  System-to-
system DIR A
 Cascaded
Home
Archive data
WAN
Backup data Data Domain system 1–5%

1–5%
Home

Data Domain system Data Domain DDX Array


with DD880s
Source:
Remote sites Destination:
95–99% cross-site bandwidth reduction Data Center Hub
Supports hundreds
of remote sites
© Copyright 2010 EMC Corporation. All rights reserved. 14
Industry’s Most Scalable Inline Deduplication
Systems

New
Global Deduplication Array
DD880

DD600
Appliance Series

Software options:
DD Boost, DD Virtual Tape Library, DDX Array Series
DD Replicator, Retention Lock, Up to 16 Controllers
DD140 Remote Office
and DD Encryption
Appliance

Global
DD140 DD610 DD630 DD660 DD690 DD880 Deduplication DDX Array
Array
Speed (Other) 450 GB/hr 675 GB/hr 1.1 TB/hr 2.0 TB/hr 2.7 TB/hr 5.4 TB/hr 86.4. TB/hr
Speed (DD Boost) 490 GB/hr 1.3 TB/hr 2.1 TB/hr 2.7 TB/hr 3.9 TB/hr 8.8 TB/hr 12.8 TB/hr 140 TB/hr
Logical capacity 17–43 TB 75–195 TB 165–420 TB .520–1.31 PB .710–1.7 PB 2.8–7.1 PB 5.7–14 .2 PB 45.6–114 PB
Raw capacity 1.5 TB Up to 6 TB Up to 12 TB Up to 36 TB Up to 48 TB Up to 192 TB Up to 384 TB Up to 3.07 PB
Usable capacity 0.86 TB Up to 3.98 TB Up to 8.4 TB Up to 26.1 TB Up to 35.3 TB Up to 142.5 TB Up to 285 TB Up to 2.28 PB

© Copyright 2010 EMC Corporation. All rights reserved. 15


Why Data Domain?

• Less disk to resource, less to manage


– CPU-centric deduplication
– Inline
– Green

• Simple, mature, and flexible


– Simple, mature appliance
– Nearline tier: any fabric, any software, backup or nearline
applications

• Resilience and disaster recovery


– Storage of last resort
– Cross-site global compression: data center or remote office

© Copyright 2010 EMC Corporation. All rights reserved. 16

You might also like