You are on page 1of 3

1) Have a dedicated master server – if there are many jobs, the last thing you want is your

master also being busy doing backups and vaults. It’s the half-witted brains of the operation,
don’t stress it.

2) Go way beyond the tuning recommendations in the manual – if you know what you’re doing.
For instance, I have some voodoo tunings for Solaris (up to 9) that make a huge difference.
Prepare for comments from Veritas (Symantec, whatever) support… “no sir it’s not like in the
book sir, we can’t guarantee it will work sir…” whatever, I’ve gotten such ridiculously bad
advice from their support I still cringe (and sometimes pee a little) every time I get a
flashback, not to mention the endless dreams and the screaming that wake me up at night.

3) Separate HBA ports for disk and tape. No exceptions. I don’t care what vendors say.

4) Separate TAN (Tape Area Network), if you can swing it.

5) Separate backup LAN. And/or Ethernet port bonding/trunking/teaming (whatever nomencla


ture appears in your systems). 4 gig ports per media server. 10G if you have the dough. 4 10G
ports teamed.

6) Experiment with TOE cards, such as the Alacritech ones. You will get closer to full gig,
though they’re expensive. Bonding is way cheaper and effective if you have many clients.

7) Try to use port bonding that works at the switch level, too – 802.3ad is the standard, Cisco’s
Etherchannel is Cisco’s. The software on the server and the setting on the switch have to jive.
Half-assed intermediate approaches are just that.

8) Don’t use weak switches at the core. I’m tired of seeing people with Cisco 4506 switches
(6509 wannabe) and 8:1 oversubscribed 48-port cards. YOU WILL HAVE PROBLEMS!!!!
Do your homework, find out whether or not the switch is oversubscribed, find out the total
backplane throughput, figure out the blade throughput, don’t plug everything in the same port
octet if you’re going to be oversubscribed – i.e. a 4-port team going to the octet that shares
1Gbit in a 4506 will not give you 4Gbits, it will give you, at best, a thoroughly blocked
150Mbits per port, tops, with problems. Did you know that if one of the 8 ports starts out
before the rest and continues pumping, the rest will NOT make the first port reduce its speed
but will instead trickle along at 10Mbits sometimes? Even after the initial transfer that was
fast is finished and there’s nothing else going on? As Rutger Hauer said in Blade Runner, “I
have… seen things you people wouldn’t believe”. Figure THAT one out when you’re having
throughput problems.

9) Use jumbo frames if you can. Bigger is better in this case. Do your homework, there are
caveats.

10) Use the right block size for your tape devices. Windows users, beware. Patches are necessary.
SP1 broke block sizes over 64K on 2003 Server.
11) Don’t go nuts with SSO! Among the myriad things Veritas doesn’t tell you unless you know
the right people is that at around 250 instances of devices you will have weird device
problems (25 tape drives shared among 10 media servers would make 250 instances). The
safe number is closer to 150. Ignore this at your peril. If you use VTL just make more virtual
drives.

12) Use snapshots as much as possible.

13) If you have more than a couple of media servers, consider a VTL.

14) If you have DBAs that insist on flushing the redo logs to tape every few seconds, get a
heavy- gauge jumpstart cable and a power supply that can put out, say, 20KV, a coat hanger,
and wearing nothing but a stained leather apron go to work on them until they regain their
senses (or not). Good times.

15) If the DBAs can’t be persuaded even after their various body parts have been charred by high
voltage, try to send the smaller backups to disk. Do NOT send frequent backups to tape. If a
job is going to take less than 10min send it to disk.

16) As a corollary to #15, only use tape for large jobs that will actually stream your tape drives.

17) Know what your boxes can push. Most servers, even very large ones, will be hard-pressed to
push 2 LTO3 drives, let alone LTO4. FYI, I’ve gotten LTO3 to go as fast as 130MB/s,
sustained. Do the math. Beat the score! I cheated, BTW.

18) Know what expansion slots to use – not all are equal, even if they look the same.

19) Don’t push too much backup traffic over switch ISLs. Preferably don’t push any.

20) Be super-careful with command-line manipulation of the NBU DB. Perfectly legitimate cmds
will not function as you might think due to silly heuristics (or lack thereof). Stay tuned, there
will be a large post outing NBU in the future. The amount of dirt I have is beyond staggering.
Maybe I shouldn’t have said that, I might have to look out for contract killers or Veritas
people offering payola, not sure which is preferable. I’m 5 feet tall, with a goatee, skinny and
blond, by the way. You can’t miss me. I also have a pronounced limp.

21) Beware of multiplexing. Too much and restores take forever. Too little and you can’t stream
your devices. Disk is your friend. Anything beyond 4-way multiplexing on tape is not.

22) Do not send tapes offsite only once a week. You are asking for pervy uncle Murphy to pay
you a visit, and he is a known repeat sex offender. He won’t discriminate, either.

23) If you use tapes, have 2 copies of everything.


24) Replicate to remote sites if at all possible. Tape should be a last resort.

25) Use VMWare if at all possible. Along with #12 and #24, this helps quick recovery.

26) Do at least 2-3 different backups of the NBU catalog. In really busy systems it’s impossible
to do it after each session – there’s just no quiet time. Just have a copy on disk and 2 on tape
(you can do the ones on tape inline, will create 2 at the same time, it works), then send the
ones on tape to 2 different offsite locations. Have NBU email you the tape(s) barcodes it used
for the catalog if you’re doing a non-standard catalog backup. Send an extra email to an
externally available address. You’re not paranoid if they’re really out to get you!

27) Can you even read from disk as fast as you can write to your backup medium? Benchmark.

28) What’s your current network throughput if you max out all the media servers? Benchmark.

29) Don’t use your production systems as media servers. You are inviting uncle Murphy again
and he’s feeling randy.

30) Use storage unit groups. Why on earth would you not?

31) Cluster the master.

32) Do NOT put media traffic through firewalls, it’s too much. ACLs on switches can work just
fine.

33) Do NOT put a dedicated media server for a subset of your boxes that are secured from the
main network. If they lose access to that media server, backups fail. At any rate you’ll have to
allow a few ports for the master to communicate with the media server, might as well let
media server traffic through. If it seems that #32 and #33 are somewhat self-contradictory,
give yourself a cigar.

34) Simplify your life. Elaborate and numerous policies are more ways to invite uncle Murphy.

You might also like