You are on page 1of 18

Backup Exec 2010:

Deduplication Option
White Paper: Backup Exec 2010: Deduplication Option

Backup Exec 2010: Deduplication Option

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

What is Deduplication?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Client Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Media Server Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Configuring Backup Exec’s Deduplication Storage Folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

System Requirements for Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Appliance Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Which Type of Deduplication to Use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Integration with NetBackup PureDisk 6.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Migrating Data to Tape for Long-Term Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Backup Exec 2010: Deduplication Option

Introduction
Customers of all sizes and needs are seeking new ways to tackle their data protection challenges. While the challenges of
data growth are not new, the pace of growth has become more rapid, the location of data more dispersed, and linkages
between data sets more complex. Data deduplication offers companies the opportunity to dramatically reduce the amount
of storage required for backups and to more efficiently centralize backup data from multiple sites for assured disaster
recovery. Symantec Backup Exec 2010 now includes integrated deduplication capabilities that can help with the task of
managing these data protection challenges. Deduplication capabilities are available when customers purchase the
Deduplication Option for Backup Exec 2010.

The Deduplication Option enables several capabilities that can greatly benefit Administrators looking to control storage
growth. There are three different methods of deduplication that are available with the Deduplication Option. The first is
Client Deduplication, where data is deduplicated at the source and sent to the media server in deduplicated form. The
second is Media Server Deduplication, where data is deduplicated in-line, or as it arrives, at the media server. The third
Deduplication, where a 3rd-party deduplication device (for example, an ExaGrid EX series or a Quantum DXi
is Appliance Deduplication
7500 ) handles all aspects of deduplication. These devices, with manufacturer-supplied Symantec OpenStorage (OST)
plugins can realize enhanced performance and manageability when used in conjunction with Backup Exec. The next
sections will go into detail about each of these deduplication types.

What is Deduplication?
What is deduplication? At the core, deduplication is a process that breaks down files and data into “segments” and uses a
tracking database to ensure the Media Server only stores a single copy of that segment across all client backup data
stored to that media server. For subsequent backups of any client, the tracking database knows what segments have been
protected and only transfers and stores the segments that are new or unique – file segments that are not currently stored
by that Media Server. For example, if five different client systems are backing up data to a Media Server and a file
segment is found that exists on all five of those client systems, only a single copy of the segment will actually be stored by
the Media Server. This tracking database ensures that these segments are kept until any existing disk-based backup no
longer references them. Because only a fraction of the original data is eligible to be stored by the media server, this leads
to significant reduction in disk space needed for backups.

For recovery, the requested file is reconstructed based on the information contained in the tracking database, and then
sent on to the destination.

The deduplication domain is across all backups done with Backup Exec’s deduplication technology. The benefit of this
methodology is that all of the deduplication segment information mentioned above is shared with all other backups
configured to use deduplication for a specific media server. For example, if two Windows 2008 R2 servers are protected
using either Client or Media Server deduplication, only deduplication segments that are unique to either of those servers
will be stored. This helps significantly reduce backup disk utilization across all local and remote servers protected with
deduplication.

1
Backup Exec 2010: Deduplication Option

Regardless of the methodology used for deduplication – Client, Media Server, or Appliance – the end result is the same:
storage is optimized by only storing unique parts of a particular file or data stream, and using some form of database to
associate segments to each other and to the machines where they were backed up from.

With the Backup Exec 2010 Deduplication Option, Administrators have the ability to choose when and where
deduplication takes place. Administrators can mix and match deduplication types to fit their unique needs; for example, a
single media server licensed with the Deduplication Option can simultaneously use Client Deduplication for some jobs,
Media Server deduplication for others, and use Appliance Deduplication for yet another set of jobs.

• Client Deduplication is a software-driven process, where deduplication takes place at the SOURCE of data and
is sent over the network in deduplicated form
• Media Server Deduplication is a software-driven process, where deduplication takes place JUST BEFORE THE
DATA IS STORED TO DISK (also known as Inline Deduplication)
• Appliance Deduplication is a hardware-driven process takes place ON THE DEDUPLICATION APPLIANCE (Can
be In-Line or Post-Process Deduplication, ex. ExaGrid or Quantum)

Each approach has its benefits and will be detailed in turn in the following sections.

Figure 1: Deduplication Types in Backup Exec 2010’s Deduplication Option

Client Deduplication
With Backup Exec 2010, exciting new possibilities for Remote Office protection have been introduced. The concept of
Client Deduplication – where the remote system is responsible for deduplication calculations and where backup data is
sent over the network in its deduplicated form – can make the process of protecting remote offices a much more
streamlined experience. Remote offices can be challenging to protect effectively; WAN environments can be unreliable
and can only utilize a fraction of the bandwidth available to a LAN backup. Backups over the WAN a challenge to set up,

2
Backup Exec 2010: Deduplication Option

much less complete. Other customers have Media Servers which are not as powerful as the application servers they are
protecting – often, the VMware server or the Exchange server in the environment is the most powerful machine available
in terms of processor speed or disk throughput. Where appropriate, why not leverage some of this remote computing
power to achieve faster backups? Both of these situations are problems where Client Deduplication can offer a
comprehensive solution to the data protection challenges brought on by the environment.

Generally, remote offices backup strategies have two basic architectures. First, there are Remote Offices which do not
have local storage, and where backup data is sent directly over the LAN or WAN to the data center for protection. Second,
there are Remote Offices that employ local storage and then “forward” that locally stored backup data to the data center
for protection. Both of these configurations can use Backup Exec 2010’s Deduplication Option to streamline and improve
backup and recovery for Remote Offices.

Client deduplication is the act of deduplicating data at the backup source – that is, on the machine that is being backed
up. Data from the client system is refined into smaller deduplication segments, and only the unique data (i.e. data the
media server doesn’t yet contain) is sent in deduplicated form to the Media Server’s Deduplication Storage Folder. A
Deduplication Storage Folder is special type of Backup to Disk folder where all deduplication segments are stored,
regardless of whether Client or Media Server deduplication was used for a specific backup. With this method, the majority
of the processing necessary for deduplication is done on the remote system rather than as the data arrives at the media
server. Client deduplication is the default deduplication method Symantec recommends for several reasons:

• Client deduplication enables greater scalability: Client deduplication spreads processor usage out across all
clients running backups, enabling the Media Server to process more concurrent backups.
• Client deduplication minimizes network data transfers: Data is deduplicated at the client and sent across
the network in deduplicated form. In this way, only the unique data is sent to the media server, rather than the
entire backup stream. Most environments – be it a LAN or a WAN environment – can benefit from less data
being sent across the network. This is especially useful for WAN or Remote Office protection. Often, the data
sent over the wire is only 1/10th or less the original data size, reducing network traffic and contention for
network resources.

Over time, Client Deduplication can provide overall deduplication rates of 9:1 or higher for file system backups. This
equates to a 90% (or more) saving in terms of disk spaced consumed by backup data. Depending on the type of data
protected (e.g. Office docs vs. Exchange databases, for example) the overall deduplication ratio may change somewhat,
but a good baseline is that 9:1 ratio.

Each Backup Exec Agent for Windows Systems has the built-in capability to do client deduplication, and many of the
Application Agents (Backup Exec Agent for Exchange, Backup Exec Agent for SQL Server, etc.) can also utilize client
deduplication for backup. Note that all deduplication operations require the Deduplication Option to be licensed on the
Media Server. See Table 1 for details.

3
Backup Exec 2010: Deduplication Option

Backup Exec Agent Client Deduplication

Windows File System Backups yes

VMWare, Hyper-V yes *

Exchange, SharePoint, SQL Server, Active


yes *
Directory Agents

Lotus Notes, Oracle on Windows, etc yes

Remote Windows Systems yes

Remote Non-Windows Agents No

Table 1 – Client Deduplication

*Note – for Client deduplication with Backup Exec Agent for VMWare (AVVI) and Hyper-V, the Backup Exec Remote Agent
must be installed in the Guest Operating System. In this circumstance, the Remote Agent is responsible for performing
the backup, and not the Agent for VMWare or the Agent for Hyper-V. Symantec recommends that for optimal
deduplication, a Remote Agent also be installed in Hyper-V guests.

Media Server Deduplication


Are there Linux or Solaris servers in your environment? Do you have VMWare ESX or vSphere servers with high average
processor utilization? If so, Media Server deduplication can be a useful and effective deduplication method for these
environments. This method of deduplication is entirely performed on the Backup Exec Media Server and does not impact
source systems any more than a typical backup.

Media Server deduplication performs the deduplication processes against data when it arrives at the Media Server – that
is, just before the data is laid down on disk. Data is sent over the network in its whole, un-deduplicated form, and then
decomposed into deduplication segments in-line by the Media Server. Only the unique data (i.e. data the media server
doesn’t yet contain) is written to the Deduplicated Storage Folder. A deduplicated storage folder is special type of backup
to disk folder where all deduplication segments are stored, regardless of whether Client or Media Server deduplication was
used for a specific backup. The Media Server does the bulk of the processing necessary for deduplication. Media Server
deduplication is optimal for situations where:

• The remote system’s processor is fully utilized: If the remote system has no processor cycles to spare for
deduplication calculations, Media Server deduplication can take the load and still perform deduplication.
• The remote system is NetWare, Linux, or Solaris: The Remote Agents for non-Windows platforms do not
have the ability to do client deduplication. These systems can only take advantage of Media Server
deduplication to greatly reduce the amount of storage necessary for backups.

4
Backup Exec 2010: Deduplication Option

Media Server deduplication is NOT recommended for the following environments:

• Remote Office Protection over a WAN: With Media Server Deduplication, the Media Server receives the entire
data set before deduplication takes place. This is not a WAN-friendly method of deduplication. Generally,
Remote Office protection without local storage should use Client Deduplication.

Over time, Media Server Deduplication can provide the same level of Deduplication as Client Deduplication, and achieve
deduplication of 9:1 or more for file system backups. This equates to a 90% (or more) saving in terms of disk spaced
consumed by backup data. Depending on the type of data protected (e.g. Office docs vs. Exchange databases, for
example) the overall deduplication ratio may change slightly, but a good baseline is that 9:1 ratio.

Any Backup Exec Media Server that has the Deduplication Option licensed can utilize Media Server deduplication. Most
Agents and backup types supported by Backup Exec can take advantage of the space savings inherent with Media Server
Deduplication.

Backup Exec Agent Media Server Deduplication

Windows File System Backups yes

VMWare, Hyper-V* yes*

Exchange, SharePoint, SQL Server,


yes
Active Directory Agents

Lotus Notes, Oracle on Windows, etc yes

Remote Windows Systems yes

Remote Non-Windows Agents yes

Table 2 – Media Server Deduplication

*Hyper-V backups will experience higher levels of deduplication when a Remote Agent is installed in the virtual Guest
operating system. In this case, the backup is done via the Remote Agent installed in the Virtual Guest. The Agent for
Hyper-V is not involved in Client deduplication. More effective Media-Server based Hyper-V and VHD deduplication will be
delivered in a future Backup Exec release or service pack.

Configuring Backup Exec’s Deduplication Storage Folder


Backup Exec 2010’s Deduplication Option allows customers to create a single Deduplication Storage Folder per media
server. A Deduplication Storage Folder is where all deduplication segments are stored, regardless of whether Client or
Media Server deduplication was used for a specific backup. A Backup Exec Media Server hosting a deduplicated storage
folder can hold up to 16 TB of deduplicated data. Note that the Deduplicated Storage Folder is not utilized for backups

5
Backup Exec 2010: Deduplication Option

and restores when Appliance Deduplication is used. Appliance Deduplication stores all backup data on the specific
appliance.

The Deduplicated Storage Folder contains two distinct items. The first is a file storage location, where the physical
deduplication segments are stored. The second is a PostgreSQL database, where the deduplication segments are tracked
and maintained. By default, the file storage location and the Postgres SQL database are installed to the same location.
While many customers will be able to use these defaults, some customers will need to change them for performance
reasons.

Symantec recommends that customers who use Media Server Deduplication should use separate physical volumes for the
File Storage Location and the Database location. Because the media server is responsible for all aspects of deduplication
with Media Server deduplication, this allows higher scalability for Media Server deduplication jobs by spreading disk read
and writes across multiple physical disks. This same recommendation stands for highly scaled Backup Exec
implementations with 10’s or 100’s of concurrent Client Deduplication backups as well.

Deduplication Database Sizing


Generally, the Deduplication Database is a fraction of the total File Storage Location size. In Symantec’s testing, the
Deduplication Database increases linearly with total stored deduplicated data. Plan for roughly 6-8 GB of database size
per 1 TB of stored deduplicated data; e.g. 8 TB of deduplicated data would equate to a 50 GB deduplication database.

Due to periodic weekly database maintenance routines, the Backup Exec Media Server requires double the database size
available on disk. This is because automated database maintenance routines involve making a backup copy of the
database. In the example above, where the deduplication database is 50 GB, the volume holding the deduplication
database needs to be at least 100 GB in size to account for maintenance activities alongside normal operation. For data
that gets manually removed, space reclamation is automatically queued for processing twice a day.

Processor Utilization with Client and Media Server Deduplication


Depending on the type of deduplication used, processor utilization will vary. In general, the deduplication process is not
gated or throttled in any way, and is geared towards accomplishing deduplicated backups and restores of deduplicated
data as quickly as possible.

Client Deduplication performs the bulk of deduplication calculation on the client (or source) system. The client
deduplication process will consume as much of one (1) core of one processor as it can on that client system. While the
actual amount of processor utilization will depend on the amount of data to be deduplicated and the speed of the
processor, expect to see at least 75% processor utilization for that processor core for the duration of the Client
Deduplication backup.

Media Server deduplication performs the bulk of the deduplication calculations on the Media Server system. Similar to
client deduplication, the Media Server deduplication process will consume as much of one (1) core of one processor as it
can on that Media Server system. While the actual amount of processor utilization will depend on the amount of data to
be deduplicated and the speed of the processor, expect to see at least 75% utilization for that processor core for the
duration of any Media Server deduplication backup job. For both Client and Media Server deduplication, initial backup

6
Backup Exec 2010: Deduplication Option

jobs will be the slowest. As time goes on, backup speeds increase as more database fingerprints are created within the
database.

For Remote Agents that cannot use Client Deduplication (Remote Agent for Linux and Unix [RALUS], Remote Agent for Mac
Server [RAMS], Remote Agent for NetWare [RANW], etc.) there is no change to system requirements as outlined in the
Backup Exec 2010 Administration Guide. This also holds true for Windows Remote Agents that do not choose to use Client
Deduplication.

Memory Utilization with Client and Media Server Deduplication


With both Client and Media Server deduplication, the majority of memory consumption takes place on the Media Server.
This is primarily a performance optimization geared towards fast and accurate calculation of deduplication fingerprints.
On the Media Server, both client and Media Server deduplication require 1 GB (gigabyte) of physical memory for every 1
TB (terabyte) of deduplicated data stored by the Media Server. For example, if 8 TB of deduplicated data is stored, the
Media Server would require at least 8 GB of physical memory.

Memory requirements for clients using Client deduplication are not very stringent. Symantec requires 1 GB of physical
memory on each individual client that uses Client Deduplication.

Disaster Recovery with Backup Exec Deduplication Option (Client and Media Server
Deduplication)
Backup Exec offers two convenient disaster recovery options for customers who use the Deduplication Option and Client
or Media Server deduplication:

1. Built-in writer for backup to tape or disk


2. Disaster Recovery backup via Deduplicated Backup Set Copies to a second Backup Exec Media Server

The built-in writer is available in the Backup Exec product at no additional charge. The writer is used to quiesce the
Backup Exec Deduplication Storage Folder and the associated PostgreSQL database so a consistent backup can be taken
to tape or another disk location. This process copies the entire Deduplication Storage Folder and associated databases to
tape in its entirety and not only useful for disaster recovery of the entire Backup Exec Media Server system. At no other
point does deduplicated data get stored on tape. Refer to the Backup Exec Administration guide for additional
configuration details. Symantec recommends periodic protection of the Deduplicated Storage Folder. While many
customers will be able to use the built-in writer for backups to tape, larger customers should examine the Deduplicated
Backup Set Copy process detailed below.

The ability to perform Deduplicated Backup Set Copies is also built into the product and is available at no additional cost.
The data is replicated in a deduplicated format and requires a second Deduplication Option license, along with the Central
Administration Server Option (CASO), to be purchased for the second media server. Backup Exec will be fully aware of the
secondary copy which allows for easy recovery of data in case of disasters. The Deduplicated Backup Set Copy process
allows complete independence from tape if the customer chooses to do so.

7
Backup Exec 2010: Deduplication Option

System Requirements for Deduplication

Media Server Requirements


In order to license and run the Deduplication Option on the Backup Exec Media Server, the following minimum system
requirements must be met.

• 64-bit version of Windows: The Deduplication Option requires that the media server runs a 64-bit version of
Windows.
◦ Windows 2003 x64 (all supported versions including R2), Windows 2008 x64 (all supported versions),
Windows 2008 R2 x64 (all supported versions).
• Physical Memory: The Deduplication Option requires 1 GB of physical system memory per Terabyte of
Deduplicated storage for Client or Media Server deduplication operations, in addition to Backup Exec’s
minimum memory requirements
◦ For example, if 5 TB of deduplicated data is stored on a media server, the media server must have at
least 5.5 GB of physical memory.
◦ Customers who use Appliance Deduplication *only* are exempt from this physical memory
requirement. These customers need to have at least 1 GB of physical memory installed on the Media
Server.
• Processor: The Deduplication Option requires at least one dual-core processor for the media server. Two dual-
core or one or more quad-core processors are recommended.

Remote Agent Requirements


For Windows Remote Agents configured to use Client Deduplication the following minimum system requirements must be
met:

• Processor: For Client Deduplication, a Windows Remote Agent must have at least one dual-core processor
• Physical Memory: For Client Deduplication, a Windows Remote Agent must have at least 1 GB of physical
system memory.

Note that Remote Agents can be either 64-bit or 32-bit versions of the platforms that Backup Exec supports.

The Media Server has other detailed requirements listed in the Backup Exec 2010 Administration Guide. Be sure to refer
to the Administration Guide before configuring Deduplicated Storage Folders.

Appliance Deduplication
Many Backup Exec customer environments have an existing investment in deduplication appliances for onsite backup,
offsite storage (disaster recovery), and remote office protection. If you already have an investment in these devices, or
plan to do so because of those devices’ scalability, speed, or feature sets, then Appliance Deduplication is an excellent fit
for your environment.

8
Backup Exec 2010: Deduplication Option

Several valued Symantec partners produce deduplication appliances. These devices are purpose-built hardware and
software devices that excel at doing a specific set of tasks in the most efficient and effective manner possible.
Deduplication is one such task that these appliances do very well. Many of these appliances also include a built-in
replication capability, where data can be backed up onsite then replicated in an efficient manner to another appliance in
an offsite location. This replication is another area where appliances can excel. Symantec has taken an embrace and
extend strategy by partnering with hardware deduplication appliance vendors to add value. Symantec’s Appliance
Deduplication feature allows Backup Exec to be aware of, and control, both the Deduplication and Replication functions of
the 3rd-party hardware appliance.

Appliance Deduplication uses Symantec’s OpenStorage (OST) technology, in conjunction with both a 3rd-party
deduplication appliance and a manufacturer-developed OST Plugin, to achieve the following:

• Faster backups: Generally, the OST interface, due to its stream-based nature, is faster than typical NFS and
CIFS-based backup targets. Additionally, deduplication appliances are tuned specifically for fast deduplication
calculations and may be faster than software-based solutions. Some manufacturers have seen 30% to 50%
improvement in backup speed when OST-based transfers are used instead of native CIFS/NFS transfers.
• Optimized Duplication: Many 3rd party deduplication appliances include a replication feature, where backed-
up data is efficiently moved from one device to another downstream device. Optimized Duplication is a feature
that allows the Backup Exec Media Server to track backups regardless of where the deduplication appliance may
replicate them, so any Backup Exec media servers that share catalogs through the Central Admin Server Option
(CASO) can be aware of data that lives in several places. Without Optimized Duplication support, Backup Exec
would not be aware of replicated copies of data without a manual inventory and cataloging operation. Backup
Exec’s Granular Recovery Technology (GRT) can also be used in conjunction with Optimized Duplication. This
patent-pending technology allows administrators to restore granular items in Exchange, SharePoint or Active
Directory with a single pass backup. However, the Optimized Deduplication (downstream) copy of the backup is
not in traditional GRT format. The backup set itself is GRT-enabled. In order to perform a GRT recovery from an
Optimized Deduplication copy of a GRT backup, the Optimized Deduplication copy must be staged to a disk
location before recovery of individual items can take place. Each copy of data, whether onsite or offsite, can
have different retention periods.

Refer to Figure 2 below for a graphical view of the Optimized Duplication process.

9
Backup Exec 2010: Deduplication Option

Figure 2: Optimized Duplication with Backup Exec and OST

The primary benefits for Optimized Duplication in conjunction with Appliance Deduplication are straightforward.
Operational efficiencies are improved because the hardware device does the actual deduplication calculations, off-loading
these processes from the Media Server. Deduplication performance can also be increased, as specific deduplication
hardware devices usually perform deduplication calculations faster than a similar task done in. Lastly, and perhaps most
importantly, the Backup Exec Media Server has cataloged and is aware of backup sets that have been replicated to another
deduplication appliance. This allows customers to set separate retention periods for the various backup sets allowing for
different retention periods for different sites Symantec’s OST technology and Optimized Duplication has received positive
reviews in the Storage industry; W. Curtis Preston of the technology blog Backupcentral.com has a good write-up
regarding OST here: http://www.backupcentral.com/content/view/198/47/.

Appliance Deduplication requires that the Backup Exec Media Server be paired with one or more supported OST-based
deduplication appliances. At Backup Exec 2010’s launch, both ExaGrid and Quantum have (see the Backup Exec 2010
Hardware Compatibility List at http://support.veritas.com/docs/329256) devices that support basic backup and recovery
through OST. ExaGrid EX devices support Optimized Duplication. Symantec Backup Exec is committed to expanding the
breadth and depth of OST partners certified to work with Backup Exec, so additional OST devices are being certified and
supported as they complete Backup Exec’s internal qualification processes.

Which Type of Deduplication to Use?


After a thorough walk-through of the various Deduplication methods available in Backup Exec’s Deduplication Option,
Administrators should have a good idea of how each type might be able to work in their environments. The table below

10
Backup Exec 2010: Deduplication Option

breaks out possible use cases and environments and Symantec’s recommendations for deduplication methodologies.
Symantec recommends that Deduplication be used as follows:

Use Case Recommended Deduplication Method

Remote Office Backups without Local Storage Client Deduplication

Client Deduplication (with deduplicated backup set copies


Remote Office Backups with Local Storage
to Data Center) or Appliance Deduplication

vSphere/ESX or Hyper-V backups with Agents in each VM Client Deduplication

Off-Host vSphere/ESX backups Media Server Deduplication

Applications on Windows (Exchange, File Server, SQL


Client Deduplication
Servers, Active Directory, Notes, Oracle etc)

Remote Linux, Solaris, or Mac Servers Media Server Deduplication or Appliance Deduplication

Applications on Linux (SAP, Oracle, etc) Media Server Deduplication or Appliance Deduplication

Existing or Planned investment in Deduplication Appliances Appliance Deduplication

Enchanced Scalability Appliance Deduplication or Netbackup PureDisk 6.6

Table 3: Deduplication Methodology by Use Case

Integration with NetBackup PureDisk 6.6


Through the implementation of OpenStorage (OST) technology in Backup Exec, additional integration with NetBackup
PureDisk 6.6 can be obtained. Backup Exec can now use PureDisk Storage Pools as data repositories. In fact, Backup Exec
can use PureDisk as both a destination for Appliance Deduplication, and Backup Exec Remote Agents can backup data
directly to the PureDisk storage pool through Client Deduplication. Backup Exec achieves this through integration and use
of PureDisk technology.

NetBackup PureDisk 6.6 is a robust and scalable solution for customers that need to store more than 16TB of
Deduplicated data in a media server. Currently a single NetBackup PureDisk 6.6 server can support 32TB of Deduplicated
data, and additional PureDisk servers can be joined into a pool of PureDisk storage, to exponentially grow capacity to
hundreds of TB’s of Deduplicated data, while also improving throughput, and adding redundancy to your backup
environment.

The ability to do Client Deduplication with a combination of Backup Exec and PureDisk is possible because the Backup
Exec Agents contain much of the same logic that is contained by a stand-alone PureDisk client. This has some side effects
that customers will need to be aware of – namely, that the PureDisk client and a Backup Exec Remote Agent cannot exist
on the same system.

11
Backup Exec 2010: Deduplication Option

This architecture opens up several interesting possibilities with both PureDisk and Backup Exec in the same environment.
Since PureDisk can accept backup data directly from BE Remote Agents or from data backed up through the Media Server,
customers who need the performance and scalability offered by PureDisk can combine the ease-of-use of Backup Exec
with the robust PureDisk back end. In fact, a mix of NetBackup clients, PureDisk clients, and Backup Exec Remote Agents
can all backup data to a single PureDisk data store.

Additionally, through OST integration, Backup Exec can control and manage PureDisk’s replication of its backup data
between PureDisk content routers and Storage Pools. This function is nearly identical to the Optimized Duplication
feature described above. This way, multiple Backup Exec servers with the CASO can be aware of backup data contained in
PureDisk Storage Pools regardless of where customers need to replicate data around their environment. See Figure 3
above for a description of Optimized Duplication and the benefits of using Optimized Duplication technology.

Migrating Data to Tape for Long-Term Storage


An environment with a disk-to-disk-to-tape architecture is fairly common among customers who are interested in
deduplication. It’s important to note that all of the forms of deduplication mentioned here are disk-based; deduplicated
data is never stored on tape in its deduplicated form. However, the process of migrating deduplicated data to tape is a
simple process. Customers can set up “Set Copy” jobs that copy data from the deduplicated storage folder, PureDisk
Storage Pool, or Deduplication Appliance to another device, like a tape device, for long-term storage and retention.

These Set Copy jobs are separate jobs, but aside from the typical job configuration information needed to create a job,
Backup Exec handles the details of copying deduplicated data to a tape location. It’s really as simple as creating a Set
Copy job with a tape device as the destination, and run it.

For data that was backed up using Client or Media Server deduplication, the Media Server will be responsible for
recreating whole files from deduplicated data before transferring to tape, so there will be some impact to processor and
memory usage during the Set Copy operation. While resource consumption varies based on data set, at most the Set Copy
process will use 100% of one processor core while performing the Set Copy.

For data that was protected to a PureDisk Storage Pool or to a Deduplication Appliance, that device will be responsible for
rehydrating the data prior to it being sent to tape.

Conclusion
Backup Exec 2010 introduces some excellent new capabilities around storage management through Deduplication. The
Deduplication Option provides customers with the ability to reduce backup storage by 90% or more, improve backup
windows, and facilitate better remote office protection. The Backup Exec Deduplication Option gives administrators the
flexibility to choose what type of deduplication to use – whether it is software-based deduplication with Client or Media
Server deduplication, or hardware-based deduplication with devices from OST partners like ExaGrid and Quantum.
Symantec has taken an embrace and extend strategy by partnering with hardware deduplication appliance vendors to add
value. This is accomplished by integrating management of appliance deduplication and replication operations, and
Backup Exec 2010’s Deduplication Option can leverage these capabilities for advanced multi-site and multi-appliance
recovery operations. The Backup Exec Deduplication Option allows customers complete freedom to use whatever type of

12
Backup Exec 2010: Deduplication Option

deduplication they desire, whether it’s deduplication through a 3rd-party appliance, or a mix of client and media server
deduplication.

13
About Symantec
Symantec is a global leader in providing security,
storage and systems management solutions to help
consumers and organizations secure and manage
their information-driven world. Our software and
services protect against more risks at more points,
more completely and efficiently, enabling
confidence wherever information is used or stored.

Symantec helps organizations secure and manage


For specific country offices Symantec World Headquarters their information-driven world with high availability,
and disaster recovery solutions.
and contact numbers, please 350 Ellis St. Copyright © 2010 Symantec Corporation. All rights
reserved. Symantec and the Symantec logo are
visit our website. Mountain View, CA 94043 USA trademarks or registered trademarks of Symantec
Corporation or its affiliates in the U.S. and other
+1 (650) 527 8000 countries. Other names may be trademarks of their
respective owners.
2/2010 20999105
1 (800) 721 3934
www.symantec.com

You might also like