Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Download
Standard view
Full view
of .
Look up keyword
Like this
0Activity
0 of .
Results for:
No results containing your search query
P. 1
White Paper Deduplication and the Olsen Twins 20110921 01 (1)

White Paper Deduplication and the Olsen Twins 20110921 01 (1)

Ratings: (0)|Views: 2|Likes:
Published by Keyser Söze

More info:

Published by: Keyser Söze on Jan 21, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

01/21/2013

pdf

text

original

 
7 Technology CircleSuite 100Columbia, SC 29203Phone: 866.359.5411E-Mail:sales@unitrends.comURL:www.unitrends.com
Deduplication,IncrementalForever, and theOlsen Twins
Oh,andlooka t t heblog,oo!
 
1
Deduplication, Incremental Forever, and the Olsen TwinsDIFATOTWP-20110921-01
Introduction
What do deduplication, incremental orever, and the Olsen twins have to do with each other?It’s all about duplicate data. Mary Kate and Ashley Olsen are raternal twins. Identical twinsshare the same 100% o their DNA. Fraternal twins share about 50% o their DNA - thesame as any other sibling.I we created a DNA database or a set o identical twins, we would only need to store thatDNA inormation once since we know that the DNA is identical. However, i we created aDNA database or the Olsen twins, we would either need to create two completely uniquesets o data or we would need techniques or understanding and categorizing which data isunique and which data is identical. The techniques or this understanding and categorizingis what data deduplication is.File-level deduplication, block-level duplication, byte-level deduplication, and incrementalorever are all techniques that eliminate duplicate data. Theoretically, the data reductionachieved is identical or each when 100% o data is duplicated. Practically, there aresignifcant dierences in the time and computational resources required between each o these techniques. It’s when only some data is duplicated, as in the case with the DNAo raternal twins, that the data reduction varies. The time and computational resourcesrequired with each o these techniques also varies in this case.In this paper, we’ll compare and contrast the advantages and disadvantages o each o these techniques and explain why incremental orever when combined with byte-leveldeduplication is the superior methodology or reducing redundant data in the most efcientmanner possible. Further, we’ll discuss the advantages and disadvantages o both physicaland virtual backup appliances versus dedicated deduplication devices.
Deduplication
The purpose o deduplication is to reduce the amount o data storage necessary or a givendata set. Deduplication is a wide-ranging topic and is covered at length in our adaptivededuplication whitepaper. In this chapter, we’re going to explore the basic types o storage-oriented deduplication. However, we’re going to start o by discussing a key concept instorage oriented deduplication - content awareness.
Content Awareness
Content aware deduplication systems simply means that the deduplication algorithmsunderstand the content - sometimes called the “semantics” - o the data that is beingdeduplicated. Why is this important? The reason has to do with efciency - not the efciencyo data reduction, but the resources (e.g., processor, memory, I/O, etc.) that are required ora given level o data reduction.
 
2
 
Deduplication, Incremental Forever, and the Olsen TwinsDIFATOTWP-20110921-01
The fgure below is going to be reerenced repeatedly in this chapter; we use it to illustratemany o the key concepts that are discussed. Panels 1 and 2 simply depict two backupswith a minor change (depicted via the yellow block) shown in panel 2.Content awareness simply reers to the act that the deduplication algorithm can see what’sinside the backup - as depicted in every panel below
except
or panel 3. In panel 3, thebackup appears as the proverbial “black box” - the deduplication algorithm can’t see insideo it.We’re going to come back to this fgure repeatedly in the sections that ollow to help explainthe dierences among various types o deduplication.
File-Level Deduplication
File-level deduplication is a content-aware deduplication technique and is depicted inpanel 5. File-level deduplication operates by comparing fles and storing only the frstunique fle. This is the reason that fle-level deduplication was the frst “content-aware”type o deduplication because the fle-level deduplication algorithms by defnition must beaware o the data to the extent that it recognizes what a “fle” is.
Panel 1Panel 2Panel 3Panel 4Panel 5Panel 6

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->