You are on page 1of 30

1

HOW TO CREATE A NIMBLE BACKUP
ARCHITECTURE FOR YOUR DATA CENTER
OR, TEACHING YOUR TAPE SYSTEMS TO DANCE
WITHOUT STOPPING OFF FOR A SHOE SHINE
2
The Backup Book: Disaster Recovery from Desktop to Data Center
COPYRIGHT INFORMATION
Copyright © 2003 by Network Frontiers, LLC
All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by
any means, electronic or mechanical, including photocopying, recording, or any
information storage and retrieval system, without permission in writing from the
author.
All brand names and product names mentioned in this book are trademarks or
registered trademarks of their respective companies.
Schaser-Vartan Books
US Offices: Feedback: info@backupbook.com
5620 West Dayflower Path
Lecanto, FL 34461 http://www.backupbook.com
SAN # 255-2582
ISBN 0-9729039-0-9
Library of Congress Catalog Card Number: 2001274299
For more information on Quantum’s tape libraries or DX30 Enhanced Backup
Solutions, visit them at:
http://www.quantum.com/storagesolutions
or call 1 (866) 827-1500
or 1 (410) 421-8999, ext 14 to speak to a Quantum Govt. specialist.
Courtesy of…
3
CHAPTER ADDENDUM:
HOW TO CREATE A NIMBLE BACKUP
ARCHITECTURE FOR YOUR DATA CENTER
(OR, TEACHING YOUR TAPE SYSTEMS TO DANCE
WITHOUT STOPPING OFF FOR A SHOE SHINE)
Can you create a nimble backup architecture for your data center? Sure, you can.
If you design your architecture right, not only will it be able to dance—you’ll also
overcome those data under-run problems that necessitate shoe shining!
During my tenure as the CIO of True North Communications (one of the largest
ad agencies in the world, and now Interpublic Group), I was called into the office
of Bruce Mason, our CEO at the time. “I’ve got a present for you; open it up,”
said Bruce, pointing to a small box on his coffee table. It was a pen. A nice pen,
but a pen? “It’ll make you look more professional in the MEC (the company’s
internal board) meetings in front of the other CEOs,” he smiled. For a second I
was dumbfounded, and then I started to laugh—“You mean because I’ve been
bringing in pads of paper and a bunch of good ol’ number-two pencils?” I asked.
“Yep, you’re all grown up now, so you can put them away.” I thanked him, took
the pen, and laughed all the way to my office.
I had been going into our CEO management meetings laden with pencils because
I was drafting an outline for our new data center’s backup and disaster recovery
plan. Each time the CEOs in the meetings had decided that the company should
4
The Backup Book: Disaster Recovery from Desktop to Data Center
go in a different direction (about two or three times per meeting, with many meet-
ings over the course of a couple of months), I had to erase part of my plan and
overwrite it. I wasn’t going to put anything in ink until I knew what the final plan
was. As it turned out, even when everything was “decided,” and our data center
was being rebuilt and finalized, the company’s data needs changed in rapid order
and we had to shift our priorities again (and again and again).
In the past, not a lot of us have considered data centers and backup architectures
to be particularly nimble and easily changed. But with the storage networking
technology that you’ll read about here and some smart choices on your part, you
can create a hardware plan that can be modified and reconfigured as quickly as
your board’s most capricious whims.
The first step toward finalizing your data center’s backup architecture is making a
wise media choice. The amount of data you need to back up divided by your
backup window length (usually eight hours) will clue you in to the throughput
performance you’ll need in your tape format. It will also determine the number of
tape cartridges necessary for each backup. Throughput and tape count are two key
design items in your plan.
Once you know your throughput and cartridge count, you can make decisions about
the numbers of drives and the library cartridge capacity you’ll need to employ. You’ll
need to decide whether you want to build a modular, stackable system that can be
expanded and upgraded occasionally, or whether you should consolidate your
backup efforts and jump straight to a cabinet library that you can partition as nec-
essary.
Your final step is to round out your design by choosing a few backup options. Will you
choose to back up your NAS filers directly to your tape library over the GbE net-
work using NDMP? What about leveraging an application’s XCOPY (SCSI
eXtended Copy) over Fibre Channel features? How will you overcome the prob-
lems of shoe shining and interleaving? Each of these questions is asked and
answered once you’ve made your first two choices, because those choices will set
the boundaries of what you can and can’t do within your backup plan and backup
architecture.
Creating a nimble backup architecture for your data center
5
FIRST STEP: CHOOSE YOUR SHOES CAREFULLY
If you’re going to standardize on anything in your data center’s backup plan, it
should be the tape format for your backup media. Choosing a tape format is like
choosing a pair of shoes for your dance—Birkenstocks are great for swaying to the
Grateful Dead, but wouldn’t do well in a tango. Do some research before making
your decision. A thorough understanding of the market and the technologies is
vital for mapping a strategy to handle your company’s tape storage. Your choice of
one technology over another should be based on up-front price, ongoing costs
(TCO stuff ), speed (both backup and restore), reliability, and capacity.
There are two basic types of tape recording formats: linear and helical scan. Let’s
face it—in the market, linear tape formats have won (we’ll show you some charts
in a page or two to prove our point), so let’s just stick with linear tapes for this
discussion. Linear tape technology uses a recording method in which data tracks
are written in a linear pattern on the tape. The first set of tracks is recorded in par-
allel over the entire length of the tape. The recording heads are then repositioned
and the tape direction is reversed to record the next set of tracks, again across the
entire length of the tape, repeating this process in a serpentine fashion (Remember
The In-Laws? The original one? Peter Falk and Alan Arkin are beyond compare!)
until all data has been recorded. The major linear formats for data recording are
DLTtape (normal, super, and DLT1), LTO (ultrium), and Travan (a great entry-
level tape system).
With that said, here’s the scoop: This is a data center we’re talking about, not your
home-brew backup plan. You should either be looking at the DLTtape family or
the LTO family of tape systems. If you want to read about the others, pick up a
copy of our most recent book, The Backup Book
1
to get the skinny on all of them,
because the purview of the book is much broader than what you’re reading here.
Here, we’re talking about high-speed data centers, and therefore, half-inch tape
formats and tape automation libraries.
1.
The Backup Book: Disaster Recovery from Desktop to Data Center, by Dorian J. Cougias, E. L.
Heiberger, and Karsten Koop. S-V Books, 2003.
6
The Backup Book: Disaster Recovery from Desktop to Data Center
Why LTO or DLTtape families?
First, when planning for your data center, always try to plan for the winner in the
market share race. The last thing you want is to invest in a dead-end technology.
While the helical scan markets comprise a very solid 36 percent of the tape market
today, the linear DLTtape and LTO formats command a whopping 63 percent of
the market.
Figure Addendum-1. Worldwide market share in 2002
And in a market the size of $2.1 billion in factory revenue, 63 percent represents
a huge number. With market share clearly pointing to the DLTtape and LTO fam-
ilies, how do you choose between them? Let’s take a look at a couple of info-tables
to bring to light some of the choices you can make. Below are the basic differences
between the leapfrogging LTO and SDLT tape formats. DLT was the de facto
standard for high-speed backups. LTO emerged and then SDLT leapfrogged over
it. Then SDLT was followed by LTO-2, and now SDLT 320 and 600.
Here’s the 411 on the basics of both DLTtape and LTO formats.
LTO-1 SDLT 320 LTO-2 SDLT 600
Media Type Ultrium 1 SDLT-1 Ultrium 2 SDLT-2
Servo Method
Pre-Format*
Magnetic
Backside
Optical
Pre-Format*
Magnetic
Backside
Optical
Cartridge Capacity (GB) 100 160 200 300
Transfer Rate (MB/s) 15-16 16 30-35 34
Max. GB/hr. (Native) 52-56 56 105-123 119
Table Addendum-1. Basic differences between LTO and SDLT
8mm Helical
Scan
12%
LTO/DLT/SDLT
63%
4mm Helical
Scan
24%
3480/90/3590
9840/9940
1%
Source: Gartner DataQuest 2003
Creating a nimble backup architecture for your data center
7
What you should notice in the table above is that the number of recording tracks
continues to increase with each iteration of the format’s progression. This is one
of the main reasons behind the capacity increases while the tape cartridge’s physi-
cal size remains the same. While the media costs rise, the cost per gigabyte is actu-
ally decreased, because the density ratios allow you to put more gigabytes in the
same form factor. And as the density ratios increase, so do the throughput speeds,
which brings us to our second table, wherein you’ll totally ignore everything we
showed you above because you’re only wondering one thing: How fast do they go?
Back-Write Compatibility N/A SDLT220 LTO-1 N/A
Back-Read Compatibility N/A
DLT4k,7k,8k,
DLT-1 / VS80
LTO-1
SDLT220, 320,
VS160
Recording Tracks 384 448 512 720
Recording Channels 8 8 8 16
Power Consumption
(write / streaming)
28 watts 27 watts 28 watts 27 watts
MTBF (@100% Tape Motion) 250,000 hrs 250,000 hrs 250,000 hrs 250,000 hrs
Head Life (Tape Motion Hrs) 30,000 hrs 30,000 hrs 60,000 hrs 30,000 hrs
Media Durability
(end-to-end passes)
20,000 20,000 20,000 20,000
Archival Storage Life 30 years 30 years 30 years 30 years
Media Cost
(Street @ volume)
$69 $99 $115 $149
Media Cost / GB $0.69 $0.62 $0.58 $0.50
* You can’t bulk erase this format
LTO-1 SDLT 320 LTO-2* SDLT 600
Native Throughput (MB/sec.) 15 16 30 33
Hourly Native Throughput (GB/hr.) 53 56 105 116
4 Drive Native Throughput (GB/hr.) 211 225 422 464
8 Drive Native Throughput (GB/hr.) 422 450 844 928
12 Drive Native Throughput (GB/hr.) 633 675 1,266 1,392
16 Drive Native Throughput (GB/hr.) 844 900 1,688 1,856
20 Drive Native Throughput (GB/hr.) 1,055 1,125 2,109 2,320
* Assumes HP LTO-2 (30 MB/s). IBM rated slightly higher (35 MB/s).
Table Addendum-2. Basic throughput ratios
Table Addendum-1. Basic differences between LTO and SDLT
8
The Backup Book: Disaster Recovery from Desktop to Data Center
The first important thing about tape speeds is the native throughput potential. In
other words, how many gigabytes per hour of data can you push onto your tape
drives? The reality of the question depends upon your format, the number of drive
mechanisms in your autoloader or library, and compress-ability of your data
(SDLT and LTO both have built-in compression). Because there’s no way in the
universe we could figure out a constant for compression ratios, we’ll show the raw
numbers in native, non-compressed formats in Table Addendum-2. on page 7,
breaking them down into common drive numbers found in most autoloaders and
tape libraries.
Once you are done examining Table Addendum-2. on page 7, the table you’ll care
about for planning purposes is Table Addendum-3. because it shows how much
data you can move through in a night’s normal eight-hour backup window. Most
organizations are forced to run backups when the servers being backed up aren’t
in use, or at least are being used less than normal. This usually means that the data
center’s backup routines begin when normal folks go to bed. And since most data
centers today measure the amount of data being backed up in terabytes instead of
gigabytes, we show the number of terabytes that can be backed up during that
eight-hour window.
Last but not least, let’s go over the number of cartridges that you’ll need to hold
your data (see Table Addendum-4. on page 9). The amount of data that you have
and the number of cartridges it takes to store that data comprise one of the most
important factors to consider when you make your autoloader or library purchase
decision. If you’ve passed the 100-cartridge mark, with factoring in future growth,
you’re a candidate for multiple stacked libraries or a cabinet library system.
Let’s wrap this up with this thought. For simplicity’s sake, let’s say that you have
10 TB of data that you have to move through your system and onto backup tapes
during your nightly eight-hour backup window. This means that if you choose…
• LTO-1, you’ll need 25 drives and about 103 cartridges.
LTO-1 SDLT 320 LTO-2* SDLT 600
4 Drive / 8 Hour Native Throughput (TB) 1.65 1.76 3.30 3.63
8 Drive / 8 Hour Native Throughput (TB) 3.30 3.52 6.59 7.25
12 Drive / 8 Hour Native Throughput (TB) 4.94 5.27 9.89 10.88
16 Drive / 8 Hour Native Throughput (TB) 6.59 7.03 13.18 14.50
20 Drive / 8 Hour Native Throughput (TB) 8.24 8.79 16.48 18.13
* Assumes HP LTO-2 (30 MB/s). IBM rated slightly higher (35 MB/s).
Table Addendum-3. Throughput in TB during eight-hour backup window
Creating a nimble backup architecture for your data center
9
• SDLT 320, you’ll need 23 drives and 64 cartridges.
• LTO-2, you’ll need 13 drives and 53 cartridges.
• SDLT 600, you’ll need 12 drives and 35 cartridges.
As you can see, the faster and denser the tapes become, the fewer drives and fewer
cartridges you’ll need for your backup operations.
Which dance steps are for you?
Once you know the basics about your media needs, it’s time to take your first few
dance steps to see how nimble your planning can become. If data center backup
operations were simply a matter of moving all of your data onto backup tapes in
one fell swoop, you’d have this licked and probably wouldn’t be reading this paper
right now. Lucky for me as a writer, the world isn’t that simple (it gives me a raison
d’être, Lebenszweck, scopo da vita—a life purpose). Not only must you design
your backup architecture to move from high to massive quantities of data, you
must design it to be versatile, reliable, and scalable—and those qualities are rooted
in the hardware choices you make at this stage of the game.
Two types of tape systems are employed in today’s data centers;
• Modular libraries that hold 20–100 tapes and have one or more drives. These
modules can be stacked and arranged like building blocks; and
• Cabinet libraries that hold multiple drives (5–20) and as many tapes as you
might expect in something named “library,” and are extensible in ways that
modular libraries are not.
LTO-1 SDLT 320 LTO-2 SDLT 600
10 Terabyte (Native) 103 64 53 35
20 Terabyte (Native) 205 128 103 69
30 Terabyte (Native) 308 192 154 103
40 Terabyte (Native) 410 256 205 137
50 Terabyte (Native) 512 320 256 171
Table Addendum-4. The cartridge count per Terabyte
10
The Backup Book: Disaster Recovery from Desktop to Data Center
Modular libraries
Below is Quantum’s M2500 modular library (the largest in their series of modular
libraries). Each modular library has capacity for multiple tapes as well as for mul-
tiple drive mechanisms (some lower-end autoloaders have only a single drive
mechanism). The series begins with the M1500 (two drives and up to 25 tapes),
progresses to the M1800 (four drives and 50 tapes) and ends with this one.
Figure Addendum-2. Quantum M2500 tape library
The M2500 modular library shown above can hold up to six drives and up to 84
DLTtapes or 100 LTO tapes, giving it a native capacity of up to 20 TB. And of
course, since it’s modular, you can mix and match any of these systems to form a
much larger one. We’ll cover that mix-and match concept in a few pages.
Creating a nimble backup architecture for your data center
11
Cabinet libraries
Cabinet libraries, as their name implies, offer massive capacity solutions for tape
backups. These are the enterprise systems for storage, and are quite powerful tools.
Quantum’s PX720 shown below is one such tape library.
Figure Addendum-3. Quantum PX720 tape library
Cabinet libraries are employed when you need the greatest density of storage per
cubic foot of storage space. These libraries belong in the data centers of the world
and are thus optimized for data center usage. The library shown above can hold
up to 20 SDLT or LTO tape drive mechanisms and 732 tapes, effectively giving it
200 TB of (native) storage capacity. And, with the ability to connect at least five
of these together, that’s one heck of a lot of data.
Building scalability, flexibility, and reliability into your backup architecture
When picking out a tape system for your organization, keep in mind three factors:
12
The Backup Book: Disaster Recovery from Desktop to Data Center
1. Flexibility
2. Modularity and scalability
3. Density
There’s no such thing as an organization that creates less and less data each month.
And you know that as soon as you make up your mind about what you want to
put into the system, the organization will find new ways to make more data; there-
fore, your storage needs will grow. The only thing that’s not going to grow is the
square footage in your data center. So let’s start with density first. Look for the
highest volume-per-rack-unit you can find. In other words, based on the
amount of rack-units your tape system uses, how many GB or TB of data can you
store? If you’re looking into a cabinet system, does the cabinet use only the back
wall, or does it also use the left and right door space (who needs to look into a win-
dow in a tape cabinet, anyway)?
Second, flexibility must involve both types of scalability—internal growth and
link-ability (which also gives you additional reliability). Internal growth scalability
means that the unit is large enough that you can add either drives or additional
tapes to the system without having to purchase an entirely different system. The
M2500, for example, allows the user to configure the system with one through six
internal drive mechanisms, and can hold from one through 84 SDLT tapes and
up to 100 LTO tapes. That’s what we mean by internal growth.
Third, by adding stack-ability of a tape library system like the M2500, you can be
assured that whatever money you spend on a library is a protected investment; not
a wasted expense. Adding more modules to the first library (thus making it appear
as a single unit to the backup software) means that the tape system can continue
growing within the same rack unit while additional reliability is added to the sys-
tem overall.
In terms of reliability, with more than a single drive mechanism in any of the tape
libraries, the backup planner is pretty much guaranteed that at least one of them
will be working at all times. If any of the drive mechanisms fail, the other mecha-
nisms will take over for it until it can be repaired. However, if the entire unit fails,
or the robot in the unit fails, that’s another story—and the reason you want to look
at link-ability for scalability and reliability plans. On the following page, we show
the M2500 on the left, and two M1800s on the right. Each of the M1800s holds
42 SDLT tapes (half that of the M2500) and up to four tape drive mechanisms.
Creating a nimble backup architecture for your data center
13
By linking two M1800s together, the backup planner now has a total of eight drive
mechanisms and the same number of tape slots (84) as the M2500.
Since Quantum’s M-Series allows additional units to be linked and therefore act
as a single system, this enables expansion beyond the unit’s physical limitation. It
also offers greater reliability. Because all of the M-Series can be linked together, the
backup planner can start with a single M1800 and then add more M1800s or
additional M2500s until all 41 units in the rack are full. That could be three
M2500s, one M2500 and three M1800s, or five M1800s—you get the picture.
Flexibility sometimes means redesigning on-the-fly
One of the things to ask about your modular tape library system is how “flexible”
is “flexible.” When stacking systems, some libraries make you take the system
down completely and then hard-wire everything together, essentially rebuilding
and reconfiguring the system completely. Make sure that you employ a modular
system that allows you to re-stack and rebuild while the unaffected units are online
and running. This avoids downtime and gives your design true flexibility.
Flexibility should extend to cartridge loading
Quantum uses the magazine load approach with their M-Series tape libraries. This
greatly enhances speed of movement between systems if one unit fails. Within a
Figure Addendum-4. One M2500 or two stacked and linked M1800s
14
The Backup Book: Disaster Recovery from Desktop to Data Center
minute or two, tapes can be moved from the failed module to the good module
and given read and write access.
If hot dogs comes in packages of 10, why do hot dog buns come in packages of
eight? Where do you put the extra two? Think about where you’re putting another
displaced entity: your cleaning cartridge. It really bothers the heck out of me when
a library forces me to put the cleaning cartridge in one of the magazine slots.
Someone always forgets to take it out of the slot before the magazine goes offsite.
Therefore, look at how your modular library handles cleaning cassettes as one of
the keys to flexibility.
The other key to magazine flexibility is the ability to rotate it and use in on both
sides of the library. I really hate libraries that have distinct left and right maga-
zines—who came up with that brilliant idea?
And finally, flexibility in the cartridges and modularity also means this: If all the
drives in one of the M-Series modules is busy, within seven seconds or so (give or
take a few nanoseconds), the cartridge can be passed up or down the stack to a
drive that’s waiting for something to do.
And flexibility should extend to your backup network’s architecture
Within today’s data center network architectures, there are three traffic routes for
moving data from the source to the backup server. The first method is to move the
data over the same production network that the normal traffic flows through. The
second method is to move the data through a specially designed GbE sub-network
for backup purposes only. And the third method is to move the data over a Fibre
Channel Storage Area Network (SAN).
Of course, when adding backup servers over the normal network, you’ll need to
make sure that the backup servers are attached at the highest level of the network’s
backbone. You wouldn’t want the servers off on a spur, forcing traffic to go
through several hubs, switches, or even routers. Each time your backups begin
going through routers and multiple switches, you begin adding time to the backup
process because you’re adding latency. On the next page (see Figure Addendum-5.
on page 15), we show one of our networks that has a phalanx of servers and four
different backup servers with their direct attached DLTtape libraries.
Creating a nimble backup architecture for your data center
15
Figure Addendum-5. Backup servers on work LAN
If you add four backup servers running simultaneously each night, make sure
when you add them directly to the network, that you add them on a switched
backbone, or at least have a switched network segment that routes the traffic
among the four individual backup servers and the computers they’re simulta-
neously backing up. Think of it: If the backup servers were all on the same hub,
they’d have to divide the network’s total bandwidth by 4. That would defeat the
purpose.
Creating a GbE
sub-network for
backups
The second method for running backups is to create a dedicated sub-network just
for backups and management traffic. While this might be pretty hard to do for
backing up servers strewn throughout corporate facilities, it’s relatively easy in a
data-center environment because of proximity and the open wiring architecture of
data centers. In Figure Addendum-6. on page 16 we segregated the backup serv-
ers onto their own switch and ran a second set of cables from each of the servers
being backed up over to the backup sub-network (the dashed lines).
PBX/Fax
GL/AR/AP
Apps & DBs
HR/Scheduling
Payroll
Apps & DBs
EIS/Reporting
Apps & DBs
DNS/Web
AD/DNS/Security
Proxy
Print Server/
RIP
LDAP/eMail
IM/SCM/POS
Apps & DBs
NAS Boxes
Backup Server 2
Backup Server 1
Backup Server 4
Backup Server 3
DLT Library A
DLT Library B
16
The Backup Book: Disaster Recovery from Desktop to Data Center
Figure Addendum-6. Backup servers on sub network for backup only
The additional costs are the gigabit Ethernet switch and each of the gigabit Ether-
net cards that you must install into the servers being backed up.
The benefits are many. The backups won’t have to run over the standard produc-
tion network, and therefore you won’t have to worry about normal traffic interfer-
ing with the backup traffic. You can install the fastest network cards in each of the
servers being backed up, and therefore improve the speed over the normal network
speeds.
Creating a SAN
for your backups
Want a backup system superior to a sub-network? If you use a Storage Area Net-
work (SAN) in your data center’s server farm, you could run the backups over the
Fibre Channel SAN instead of a gigabit Ethernet. This provides the maximum
amount of throughput, but is also the most complex method. As discussed previ-
ously, when multiple computers back up across the network to a single tape
library, they transfer the data across that network to a single backup server that has
connectivity to the tape device. On a SAN, each server can have equal access to
the same tape device. Because each machine on the SAN directly communicates
PBX/Fax
GL/AR/AP
Apps & DBs
HR/Scheduling
Payroll
Apps & DBs
EIS/Reporting
Apps & DBs
DNS/Web
AD/DNS/Security
Proxy
Print Server/
RIP
LDAP/eMail
IM/SCM/POS
Apps & DBs
NAS Boxes
Normal Network
Backup Network
Backup Server 2
Backup Server 1
Backup Server 4
Backup Server 3
DLT Library A
DLT Library B
Creating a nimble backup architecture for your data center
17
with the tape device, transfers speeds are equivalent to having a device locally
attached to the server’s PCI bus.
In the diagram below, we show a small SAN with redundant fibre switches con-
necting each of the servers on the left with the storage farms on the right. Because
the Quantum tape libraries and others like it can be connected to a SAN, and
because most enterprise-level backup software can run backups over a SAN, this
becomes a real option. We’ve kept the four backup servers in place because we
need four restore servers running simultaneously in case of building loss, and we’re
faced with restoring everything in the data center as quickly as possible. Remem-
ber, each backup server can restore only one device at a time from tape.
Figure Addendum-7. Storage Area Network for backups
Now, before you get all excited about the types of designs you’ve seen and what
modular libraries can do for you, let me cool your heels for a second. These designs
and modular libraries are great for very specific backup architectures. If you’re pro-
tecting your department’s server farm, pick and choose your tape format, library
style, and backup architecture infrastructure—you’ll be well on your way to bliss.
But if you’re designing a backup plan for an organization-wide data center, a
backup plan that has several and distinct goals, you might want to hold off on your
decision to build modular and instead consider cabinet libraries that are both large
and agile.
GL/AR/AP
HR/Scheduling
NAS
IM/SCM/POS
IM/SCM/POS
IM/SCM/POS
GL/AR/AP
HR/Scheduling
NAS
DLT B
DLT A
EIS/Reporting
EIS/Reporting
18
The Backup Book: Disaster Recovery from Desktop to Data Center
Being big doesn’t mean you can’t be agile
Modular tape libraries like Quantum’s M-Series are fantastic, packing a rack with
tape density while ensuring flexibility in design. However, there comes a point
when you need more tape drives in the library than a modular system can handle.
And with most organizations today following the beck and call to consolidate any-
where possible, organizational data centers are being inundated with more and
more equipment that had previously been strewn hither and yon. Add to that the
regulatory requirements for maintaining X data sets over Y years (such as critical
information for litigation support), and what you have is a data center with needs
for short-term archival solutions, long-term backup solutions, direct attached
storage, Network Attached Storage (NAS), and Storage Area Networks all rolled
into one consolidated data center.
In short, this means that you’ll need vast quantities of backup storage. The folks
at Gartner’s Dataquest (July 2002) agree with me, noting the growth in sales for
tape systems with less than 21 cartridges to be a measly 6 percent. Growth of sales
for tape systems with 21–100 cartridges is expected to be 20 percent. And the gi-
normous size range of 101+ cartridge systems have a growth rate expected to be
around 10 percent.
In an enterprise data center, it’s not that hard to have 30–40 TB of data that needs
protecting. If you remember your math of a few pages ago, that means that you’re
working with a minimum of 200 tapes and with a backup window of eight hours,
about 50 drives. The PX720 that we talked about earlier has several drive config-
urations, with the ability to hold up to 20 drives and 732 tapes.
If you’re moving 70 to 80 TB of data through your consolidated enterprise data
center, that means that you’re dealing with roughly three cabinets full of drives and
tape cartridges. One of the benefits of Quantum’s PX-Series products is that you
can scale up to five of the library chassis together, giving you up to 3,600 total car-
tridges and 100 total drives.
Creating a nimble backup architecture for your data center
19
Figure Addendum-8. Three PX720s that you might find in your data center
While these systems may look daunting, the deftness and agility of their internal
architecture is quite amazing.
The Quantum PX-Series storage architecture
The PX-Series offers three things you need when building a nimble data-center
backup architecture: flexibility, reliability, and storage density.
• Their flexibility is delivered in the form of SCSI, 1 GB and 2 GB Fibre Chan-
nel, and IP storage interface connectivity and the range of drive options you
can load into them.
• Reliability is achieved through their hot-swap drives and redundant power
supplies and fans.
• And they offer the highest TB per square footage density in the market today.
The PX-Series GbE library leverages GbE for high-speed data movement and the
Network Management Data Protocol (NDMP) for control. Multiple Network
Attached Storage (NAS) filers can simultaneously share an NDMP-aware tape
library, significantly decreasing the backup window and consolidating backup
storage to the same backup device.
And speaking of canisters, each PX-Series library includes cableless, user-replace-
able, hot-swap drives that don’t interrupt any library operations, including data
throughput on a single common SCSI bus. Additional available hot-swap compo-
nents include fans and power supplies. Both power supplies and fans are standard
with N, N+1 or 2N redundancy
2
. Should a power supply or fan fail, the extra
20
The Backup Book: Disaster Recovery from Desktop to Data Center
component shares the load, eliminating the downtime that the failure would have
otherwise caused. The downed component can then be replaced at a convenient
time without powering down the library; even while it’s performing backups and
restores. Separate power inputs allow the library to be configured on separate
power circuits so if power goes out on one line (circuit-breaker trip or failure,
power cord accidentally disconnected, etc.), the library is unaffected.
Figure Addendum-9. One of the canisters in a PX720 Series
The Prism Fibre Channel router is an industry first: an integrated Fibre Channel
card with two 2 GB Fibre Channel and four Ultra-2 SCSI ports for high perfor-
mance. One of the benefits of moving to a Fibre Channel backup network is that
the PX-Series systems support XCOPY for serverless backup operations through
the use of a Fibre Channel router. The router is actually embedded within the
library per se
3
. For those applications that are XCOPY aware, serverless backup
will move the data directly from the disks used by the software to the tape system
over the FC pipe, thus bypassing the backup server. By eliminating the backup
server, XCOPY increases the availability of the server’s resources and reduces the
backup window immensely.
2.
2N is the doubling of a component, in case you couldn’t figure that out.
3.
For more information about XCOPY, or the PX-Series in general, please reference
http://www.theanswerisx.com.
Creating a nimble backup architecture for your data center
21
For VERITAS NetBackup environments, Port Failover offers another layer of
redundancy for a PX-Series library running within a 2 GB Fibre Channel Storage
Area Network (SAN). Port Failover enables the third Fibre Channel port to be
reserved as a special fail-over port. Therefore, if one of the other two Fibre Chan-
nel ports fails, it will automatically fail over to the reserved port, preventing a dis-
ruption of the backup operation. In this scenario, the backup will continue
unabated and an ALERT e-mail will be sent via the Prism Management Card to
notify the administrator of the event.
Which brings me to the last stop on the PX-Series architecture tour: Quantum’s
embedded Prism Management, which runs off the cabinet controller. This con-
troller drives the Web-based management interface for the library system as well
as the ALERT notification engine and the SNMP integration for the library. One
of the Prism system’s key features is its ability to remotely manage the partitioning
of the server’s drive libraries—a must feature for consolidating heterogeneous envi-
ronments like drive types, operating systems, and/or ISV applications. Because
Quantum’s Web-based library partitioning is integrated within the library’s inter-
nal architecture (and therefore doesn’t require an external server for setup), it’s
both cross-platform and cross-application compatible. As shown in the diagram
below, each library can be partitioned into three segments, alleviating your need
for new libraries as your needs change. In the last part of this paper, we’ll show you
why partitioning (and re-partitioning when necessary) is an essential element of a
data center’s backup operations.
Web-based partitioning, individual
canisters for ease of management, Fibre
Channel and SCSI connectivity within
the same system, or a system with direct
GbE connections to each drive—that’s
what I call nimble. There is no more
dexterous a Tape Library system than
the PX-Series from Quantum. It seems
that the only problems they can’t solve
are those related to the throughput of
some of your slower legacy systems. But
while Quantum can’t speed up those
old systems, they do have a unique way
of working around the problems these
slower units cause.
Partitioning the libraries’ drives into three segments or “virtual” libraries
22
The Backup Book: Disaster Recovery from Desktop to Data Center
The last hurdle you’ll need to overcome: tape shoe-shining
Let’s get back to where we started this discussion—your tapes. Today’s tape drives’
read/write heads are positioned precisely against the tape, which then “streams”
past the heads. As the device writes to the tape, the read head is positioned so it
verifies each “frame” of data after writing it. Because the tape continually streams
past the tape drive heads, a constant source of information must flow to the tape
drive to keep the tape streaming. When the drive runs out of information to write
to the tape (1); it must stop the tape (2); which takes a foot or two of actual tape,
rewind the tape to a point behind where it left off (3); stop again (4); then start
the tape stream again (5); so that it’s up to speed by the time it reaches the last
point on which it wrote data. The process looks like the back-and-forth motion
of a shoe shine.
Figure Addendum-10. Tape shoe-shine process
To overcome this problem of data under-runs, the backup software vendors
invented a process called interleaving, also known as multiplexing, but we’ll call
it interleaving here. There are two basic methods: file interleaving and block inter-
leaving. File interleaving writes file 1 from source 1 to the tape drive, buffering
incoming files from other sources while this occurs. Once the file has been written
to the tape, it then writes the next sequential incoming file. The diagram below
right shows three backup sources interleaving their files onto a single tape. The
only difference between file interleaving and
block interleaving? Block interleaving writes
data to the tape in 32 K chunks (or in another
“fixed block” state). Block interleaving is better
than file interleaving in that if the tape encoun-
ters a very large file, it doesn’t have to handle it
all at once before accepting data from the next
source. Instead, it simply splits the large file into
multiple blocks, therefore accepting data more
evenly from each of the incoming sources.
(1)
(2)
(3)
(4)
(5)
NAS 1
NAS 2
IM 1
IM 2
POS 2
POS 1
IM/SCM/POS
NAS Boxes
Creating a nimble backup architecture for your data center
23
Interleaving is great for speeding up backups because it ensures that the tape drive
is always running at its maximum speed threshold. However, it’s not so great for
restorations because it spreads the same amount of data from a single source over
a greater length of tape. During restoration, the tape must find the first chunk,
read it, jump to the next chunk, read it, and so on. It’s a nice way to optimize the
backups (and cartridge capacity utilization), but it’s a complete disaster for restores
because it injects a vast amount of latency that must be taken into consideration.
The biggest mistake in sizing a backup system is to look at backup requirements
alone. The sole reason for doing backups is the ability to restore; therefore, the
restore requirements are extremely important. You must ask yourself how much
time is allowed to bring back the data during a restore. In a worst-case scenario
test we ran, using a high-performance HP/IBM LTO-2 tape drive (30 MBps
native transfer speed) and 16 multiplexed clients means that each client stream
would have to supply only 2 MBps. Provided the network infrastructure and
backup server can handle the load, the LTO-2 will zoom along, streaming, and all
clients will happily supply this data rate. Say it takes eight hours to complete all
jobs.
None of the backup packages on the market today can run multiplexed restores.
They all schedule restores in a queue. Each job is matched against a tape, which is
loaded into the next available tape drive, read, and demultiplexed. Only the
stream that is relevant to the job is used; all other streams are discarded.
When restoring all the data to all systems, each client gets its data back at a measly
1/16th of the backup rate. One in 16 blocks of data is part of “his” stream. Effec-
tive restore rate: about 2 MBps. Each tape must be read 16 times (once for each
stream), so now it takes 16*8 = 128 hours (over five days!) to fully restore all sys-
tems. That’s very bad.
How do you avoid this problem? Your choices are to find a way to turn off inter-
leaving (some applications won’t let you do that though), thus being stuck with
shoe shining, go back and redesign your entire data flow path (we cover that in The
Backup Book) which can be daunting, or solve the problem with a staged backup
process.
In a staged backup process, you simply run the first set of backups to disk. Disks
don’t shoe shine. Disks don’t save data in interleaved formats even though they can
accept multiplexed backups. Disks are a great intermediary for data-center back-
ups because they’re a very reliable, simple technology that’s been around for quite
some time. However, there’s one small problem with backing up to disk: Most
backup software of the data center caliber is set up for backing up to tape. Rear-
24
The Backup Book: Disaster Recovery from Desktop to Data Center
ranging the software and changing the “media-handling metaphors” can be daunt-
ing at best. Leave it to the Quantum engineers to come up with an answer both
technologically sound and brilliantly simple: virtual tapes. Quantum has created
a system that’s actually made up of a series of RAID-protected hard drives but to
backup software looks and acts like a tape library with multiple tape drives.
Virtual Tape Libraries
The Quantum DX-Series of backup devices are Virtual Tape Library (VTL) prod-
ucts that emulate a tape library, but are in fact a RAID system of very fast ATA
hard drives. The DX family provides high-speed disk-to-disk backups, alleviating
the problem of tape shoe-shining and therefore allowing the administrator to con-
figure up to six simultaneous multiplexed streams. And since the multiplexed
streams are to disk, even though the DX systems read the backup file linearly, the
multiplexed restores are 10 times faster than tape restores. In other words, it’s as
fast as it could be, and you never have to worry about restoration slowdowns
caused by multiplexing (while gaining the shorter backup window advantages of
multiplexing).
Figure Addendum-11. DX30 Virtual Tape Library
What the backup software sees is what it would expect to see from a six-drive tape
library. In Figure Addendum-12. on page 25 we show a screen from VERITAS
NetBackup running a backup to a 30-tape robot. That 30-tape robot is actually
the DX30. To VERITAS NetBackup (the VERITAS products, Legato’s Net-
worker, Computer Associates, and others currently support this drive set.), it’s a
tape library with 30 tapes in it.
Creating a nimble backup architecture for your data center
25
Figure Addendum-12. VERITAS NetBackup running a backup to a DX30
Think of these DX systems as a two-part component set: a controller and the disk
array(s). Utilizing software that resides on the controller, the DX-Series configure
the disk arrays to look like a set of tapes and tape drives in a tape library. This
allows zero configuration on the part of the backup software, as it now thinks that
it’s writing data to the DX30 in tape format and utilizing the standard media man-
agement schema (which is sized to fit the storage capacity of the specific DX sys-
tem in use). The DX30 controller (1U in size) and disk array (3Us in size, with 24
ATA drives in the array) are a combined unit and have all the features you’ll find
in a mid-tier or enterprise system: Fibre Channel interface, SNMP, SES, RAID 5
or 10 arrays for the drives, dual cooling and power, etc.
The usable capacity of the first drive array in the DX30 is 3.2 TB (roughly equal
to about 16 LTO-1 or 10 SDLT 320 tapes, given compression ratios). Two addi-
tional arrays (of 4.3 TB each, because they can hold more data) can be added to
the system to bring the total expanded storage capacity up to 12.4 TB.
That’s about where the comparison of a RAID 5 drive capacity of a disk array and
the DX system virtual tape capacity stops—but remember, kids, this is a printed
update to a book, and printed materials are frozen in time the moment the press
slaps ink onto paper—by the time you read this, the capacity will probably have
grown faster than the weeds in my back yard.
For one thing, using ATA-based drives usually isn’t a good thing for fast backups.
When reading and writing the small chunks of data that incremental backups cre-
26
The Backup Book: Disaster Recovery from Desktop to Data Center
ate, ATA drives are much slower than the speedy SCSI-based drives in high-end
RAID arrays. However, ATA drives are lightning fast when they’re used for
sequential I/O large-block format writes. Because the DX family emulates a tape
versus a drive write, the write’s “personality” is changed to the large-block format
that works great for the ATA drive-base. A regular RAID array just can’t do that—
precisely why the Quantum controller technology in the DX-Series family is so
important to the process.
When a backup software application (or an operating system, for that matter)
writes to a RAID stripe, it has no idea of the optimal method to write the data; so
the stripe size can be mismatched with the size of the data block being written.
This mismatching causes more CPU overhead, using more space than is necessary.
In the DX-Series, the controller running the tape emulation software is also tuned
to match the disk array, so it can more precisely match the stripe sizes of the array
and use fewer CPU processes during the operation. That’s one smart controller!
Where does the DX30 virtual tape library belong in backup network architecture?
Because the DX family virtual tape systems write their data to hard drives, you still
need to pair them with actual tape drives so that you can rotate your tapes offsite.
Therefore, within your architecture, they belong as a staging point for data. Their
high throughput ratios make them excellent primary targets for windowed nightly
incremental backups as well as weekly backups of huge quantities of data because
they can move that data from the source to the virtual tape system much faster
than backing up to real tapes. Whether you’re dealing with a backup window you
have to live within, or moving massive amounts of weekly backups, speed is your
best friend, and the DX-Series provides plenty of that. Once the backup window
is closed, you can move your data off to real tape.
In an enterprise data-center environment, pairing a DX30 virtual tape system with
a Fibre Channel/SCSI-based PX720 tape library makes perfect sense. Let’s say that
you’re running a data center that has evolved over the last couple of years. Some
of the equipment in there is legacy equipment, running at 500–700 MHz speeds
with drives that are a generation or so old. Along with that equipment, you have
newer, faster servers and faster hard drives. You’ve also added a GbE sub-network
so that you don’t have to run your backups over the production LAN or SAN. And
to top it off, you also have the ERP and Supply Chain database systems running
on a Fibre Channel SAN. All of this is in the same data center, under one roof,
one budget, and feeding into one PX720 library system. How do you take advan-
tage of the library while avoiding the inherent bottlenecks and problems that shoe
Creating a nimble backup architecture for your data center
27
shining or interleaving will cause? Simple—teach the system to dance the foxtrot,
tango, and waltz at the same time through partitioning the drives and adding a
virtual tape library where necessary.
Figure Addendum-13. on page 28 shows an oversimplified diagram of just such
a datacenter, utilizing the PX720’s multiple data paths and virtual library capabil-
ities. In the diagram, we have three different data movers that are all sending their
payloads directly to the PX720.
1. Our first data mover is the XCOPY (SCSI eXternal COPY) process going di-
rectly into the PX720. To the left of the diagram is a SAN with clusters of
servers accessing and sharing data on a series of RAID and JBOD storage ar-
rays. The first partition within the PX-Series “virtual” library is set for the di-
rect XCOPY operation that the SAN and the on-board FC Router can process
without the need for a backup server.
2. Our second data mover is a backup server that is directly attached to the
PX720 through a SCSI connection (to the right of the SAN is Backup Server
1). In our scenario, this backup server will handle most of the load for the
backups and would probably have most of the drives within the PX720 as-
signed to its partition.
3. Our third data mover is at the top right of the diagram. The PX720, Backup
Server 2, and a DX30 virtual tape system are all interconnected on their own
Fibre Channel SAN. In reality, you wouldn’t need a dedicated Fibre Channel
fabric just for connecting a DX30—we separated it in this diagram merely to
show that there can be three distinct data movement paths to any of Quan-
tum’s Libraries. For our scenario’s purposes, the third backup server and
DX30 will own the final partition within the PX-Series library system. It will
be used to back up the slower devices, eliminating both the shoe-shining (data
under-run) and interleaving problems that slower legacy systems create. In
this type of scenario, after the backup window has closed, data will be moved
to the tape library over the FC connection from the backup server using the
backup software’s inherent tape copy or net-vaulting operations.
28
The Backup Book: Disaster Recovery from Desktop to Data Center
Figure Addendum-13. Integrating the DX30 into the enterprise data-center backup environment
In an architecture like this with a virtual tape library in the mix, the library can
also hold a duplicate of a tape backup so that the duplicate is poised and ready for
a very speedy restore in the event of an emergency.
The Main Thing…
The Main Thing that you understand here, the point that I hope you get after
reading this addendum, is that you have a lot of choices. Those choices begin with
a tape format selection, but don’t end there. You can build a backup plan that can
isn’t limited to a single set of dance steps. You can use the same equipment for dif-
PBX/Fax
GL/AR/AP
HR/Scheduling
EIS/Reporting
DNS/Web
AD/DNS/Security
Print Server/
LDAP/eMail
IM/SCM/POS
NAS Boxes
Normal Production Network
GbE backup sub-network
Backup Server 1
Backup Server 2
DX30
Quantum
PX720
Fibre Channel
FC or
F
C

S
t
o
r
a
g
e

A
r
e
a

N
e
t
w
o
r
k
XCOPY over FC
1
2
3
R
A
I
D

a
n
d

J
B
O
D
Q
u
a
n
tu
m
Creating a nimble backup architecture for your data center
29
ferent functions and different purposes running at different speeds—as long as
your system, like those shown here from Quantum, offers you the flexibility, scal-
ability, and manageability you need to create a consolidated data-center backup
plan that covers most of the problems you’ll face. You really can teach these Quan-
tum systems to dance like Fred Astaire. And with technologies like the virtual tape
system, you won’t even need to stop for a shoe shine.
For more information
Hopefully, throughout this addendum you’ve been spurred on to think about your
own data-center backup architecture—what you’re doing versus what you could
be doing better, what you could be tweaking, or adding. This isn’t all there is on
the subject, especially when it comes to great products like the DX30 and tape
libraries. I suggest that you use what’s written here as a starting point and then go
directly to the source: Quantum. We’ve worked closely with them on this adden-
dum, and they’re more than happy to talk to you and provide what you need.
For more information on Quantum’s tape libraries, visit them at:
http://www.quantum.com/storagesolutions
or call 1 (866) 827-1500
or 1 (410) 421-8999, ext 14 to speak to a Quantum Govt. specialist.
30
The Backup Book: Disaster Recovery from Desktop to Data Center
INDEX
ATA, 24–26
The Backup Book, 5, 24
Computer Associates, 25
DLT, 6, 10
DLTtape, 5–6, 15
DX-Series, 24–26
ERP, 27
Fibre Channel
SAN, 16, 27
Storage Area Network, 14, 21
Gartner’s Dataquest, 18
Gigabytes, 7–8
IP, 19
Legato’s Networker, 25
LTO, 5-7, 10–12
Modular Libraries, 9–10, 17
Modularity, 11, 14
NAS, 4, 18–19
NDMP, 4, 19
Network
Attached Storage, 18–19
Management Data Protocol, 19
Port Failover, 21
Prism Management Card, 21
Quantum’s
Libraries, 27
M-Series, 13-14, 18
PX-Series, 11, 18-21, 26-28
SCSI, 4, 19–21, 27
SDLT, 6–7, 9, 11–12, 25
SES, 25
SNMP, 21, 25
Storage Area Network, 16–17
Terabytes, 8
VERITAS NetBackup, 21, 25
Virtual Tape Library, 24, 26–28
XCOPY, 4, 20–21, 27