NetBackup Training
KEEPING PEOPLE AND INFORMATION CONNECTED.®
Module 1:
Brief Overview, Client/Policy Configuration,
Troubleshooting
For Internal SunGard Use Only
Agenda
Introduction
Purpose & Assumptions
History
Terminology and Concepts
Architecture
Standards
How Backups Work
Managing NetBackup
Client Implementation and Configuration
Policies
Troubleshooting
Reporting
Monitoring Overall Environment
Shutdown/Restart NetBackup
Tips and Tricks
Education/Further Reading
Q&A
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Purpose and Assumptions
Purpose
– Increase knowledge of NetBackup product
Assumptions
– Presentation assumes 6.5.3
– Vague familiarity of NetBackup
– Know how to access environments
– Windows and/or Unix admin experience
– Please write down your questions for the Q&A session at
the end
KEEPING PEOPLE AND INFORMATION CONNECTED ®
History
Corporate
1987 - proprietary software solution written by engineers at Control
Data for Chrysler Corp.
1993 - renamed to BackupPlus (‘bp’ prefix)
Late 1993 - OpenVision acquisition (/usr/openv/ install path) and re-
branded product “NetBackup”
1997 - Veritas acquired OpenVision
2005 - Symantec acquired Veritas
Version
1993 – BackupPlus 1.0 (Control Data)
1994 – NetBackup 1.6 (OpenVision)
1996 – NetBackup 2.0
1997 – NetBackup 3.0 (Veritas)
2000 – NetBackup 3.4
2002 – NetBackup 4.5
2003 – NetBackup 5.0
2005 – NetBackup 6.0 (Symantec)
2007 – NetBackup 6.5
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Terminology and Concepts
Master Server – brains of the operation, houses catalog
Media Server – where storage units exist, pushes data
Client – device providing data to be backed up
Enterprise Media Manager (EMM) – manages device and media information;
typically installed on Master
Catalog – database of backup images and other information
Metadata – info of files backed up (name, path, size, date, image location, etc.)
Duration - time it takes to perform the backup
Exit Code – final status of job
– 0 = Successful with NO files missed
– 1 = Successful with files missed
– 2+ = Backup Failed
Start Window – time when a backup can START
Frequency – how often the backup should execute
Retention – length of time backups are valid
Policy – grouping of like clients sharing similar attributes
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Terminology and Concepts continued
Schedule – subset of Policy, defines Start Window, retention,
storage unit, etc.
Storage Unit – location defined to store backups, can be disk/tape,
exist only on a Media Server
Backup Image – one backup job comprised of all files backed up;
job must complete
Disk Storage – primary landing zone for jobs; destage to tape later;
removes older images as needed; can be configured many ways,
current standard is Basic Disk; optional
Multiplexing – interleaving of multiple jobs on tape to prevent ‘shoe-
shining’
Long Term Data Retention – utilizes media and marginally increases
catalog size; non-issue
Dependent on proper forward and reverse lookups
Scaling – horizontally by adding more media servers
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Terminology and Concepts continued
Files to Backup – Part of policy config
– Exclude List – files to skip
– Include List – files to include after processing excludes
– Additional config on client; granular to policy or schedule level; no stacking
Backup Type
– Full – all files captured
– Differential Incremental – all changes since last backup
– Cumulative Incremental – all changes since last full
– User – Allows user to run backups from client side; most used for child jobs
of DB Agents - “Default-Application-Backup”
Database Agents
– Exchange, Notes, Oracle, SQL, SAP, etc.
Options
– NDMP, Off-site Management (Vault), Tape/Disk Sharing, Bare Metal Restore,
Snapshot, VMWare etc.
Licensing - gold key with many options; SunGard pays for ‘Protected Data’
Recovery – restore catalog or import all images manually
KEEPING PEOPLE AND INFORMATION CONNECTED ®
NetBackup Tiered Architecture
Master Server (Top Tier)
Scheduler
Stores Catalog (Metadata,
Images), Volume Information
Vaulting Management
Media Server(s) (Mid Tier)
Data Mover
Sends Metadata to Master
Can be located on Master
Clients (Lower Tier)
Configured via GUI/Registry
(Win) or config files (*nix)
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Example NetBackup Architecture Diagram
Master Server
FC Switch
Fabric A
Meta-data Nework
Disk Storage
Media Server 1 Media Server 2 Media Server 3 Media Server N
Backup Network
Enterprise Class
FC Switch Tape Library
Fabric B
Client Hosts
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Standards
Infrastructure
– Server/OS Types
Unix – Solaris 10
T2000
Naming Standards Defined
Network Configuration Standards (Metadata, backup, mgmt)
– Robot Types
Quantum Scalar i2000
STK SL8500
Small Robots for legacy restores
– LTO3/4 Tape Drives / Media Types
– Volume Serial Numbers (VolSers/bar codes)
– SAN connectivity
– Disk Array Standards
– DSSU Configuration
Application/Configuration
– Documented on LiveLink
KEEPING PEOPLE AND INFORMATION CONNECTED ®
How Backups Work (simplified)
Scheduler on Master tells Media to backup its client
Media server is granted storage unit resource (disk or tape)
Media connects to client software and tells it to start
backing up
Client creates list of files to backup
– Full – everything
– Differential – changes since last backup
– Cumulative – changes since last full
Copies of files are sent to buffer
Buffer contents sent to Media Server
Media server writes buffer contents to storage unit
Media server sends metadata to Master server to update
catalog
Backup completes
Storage unit resource released
Backup image is completed and closed
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Managing NetBackup (Demonstration)
NBU Administration Console – 99.9% of daily administration occurs here
Activity Monitor – Overall job status
– Jobs tab
Job details
State - Queued, Active, Partial, Failed
Type – Backup, Restore, Catalog, Duplicate, Vault
Status – Exit Code of job
– 0 = All files backed up, no problems
– 1 = Some files skipped (open/locked)
– >1 = Failure
Additional info
Suspend/kill jobs
Sorting/Filtering - Be aware of any filters you have set
Exporting
– Daemons tab
– Processes tab
– Help
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Managing NetBackup (cont’d)
Storage
– Storage Units – defined target for backups (similar to storage pool in TSM)
– Disk or Tape
– Storage Unit Groups
Media
– Volume Pools – logical grouping of tapes
Various defined pools
Scratch
SG_SHARED_xxx
Policy defines Volume Pool
– Volume Groups – locational grouping of tapes
Robot groups
Onsite group
Offsite groups
Vault moves media between volume pools
– Robots – media currently in robot
– Standalone – tapes no longer associated with robot/volume group
– Inventory Robot
– Ejecting media
– States – Active, Full, Frozen, Suspended, Imported
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Managing NetBackup (cont’d)
Device Monitor
– Up/Down/Reset drive
Devices
– Drives
– Robots
SCSI Robots have single Control Host
ACS any server can control
– Media Servers
– Topology
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Managing NetBackup (cont’d)
Backup Archive Restore
– Used for restoring files
Host Properties
– Master Server
– Media Servers
– Clients
Include/Exclude Lists
Server authorization
Catalog
– Offline backup (legacy method)
– Import images
– Verify Images
– Duplicate images
Reports
Vault – option that processes and tracks volumes sent offsite
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Client Implementation and Configuration
All systems
– Install client binaries
Agents included for Windows, not for Unix
– Verify network communication
Client configuration
– Unix
Configuration files
– bp.conf
SERVER = backup01-dal Master Must be Listed First!
SERVER = backup02-dal
SERVER = backup03-dal
SERVER = backup0N-dal
CLIENT_NAME = jumpstart01-dal
– exclude_list and include_list
» exclude_list.policyName.scheduleName
» include_list.policyName.scheduleName
» Exclude/Include lists do not stack
– Windows
Backup, Archive, Restore GUI or Registry
Some configuration available from Admin Console>Host Properties>Clients
Changing open file backup for Windows
– Demonstration of Windows client configuration
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Policies (Demonstration)
Policies - A backup policy allows the admin to configure how and when backups are to
be performed for a group of clients. This group of clients share similar backup
requirements (type, backup window, retention, etc.)
Attributes
– Policy Type – Active/Inactive
– Destination
– Follow NFS
Classification
Storage Unit
– Cross mount points
Volume Pool – Compression
– Check Points – Encryption
– Limit Jobs per Policy – Collect DR Info
– Job Priority – Allow Multiple Data Streams
– Media Owner – Keyword Phrase
– Snapshot Client
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Policies (cont’d)
– Destination
Schedules – Multiple Copies
– Attributes Tab – Override Policy Storage
Name – Override Policy Vol Pool
– Override Media Owner
Type of Backup
– Retention
Full, Incremental, Differential, – Media Multiplexing
Cumulative., User – Start Window Tab
Synthetic Defines when backup can
START
Schedule Type
– Exclude Dates Tab
– Calendar Based Defines when backup cannot
– Frequency Based run
– Calendar Schedule
Only available when calendar
sched type chosen
Retries allowed after runday
Specific Days or Recurring
Days
– Summary of All Policies
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Policies (cont’d)
Clients
– Know hardware/OS type
Backup Selections – what to backup
– ALL_LOCAL_DRIVES
– System_State:\ or Shadow Copy Components:\
– NEW_STREAM for multistreaming
Manual backups
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Troubleshooting
MSS Document
When in doubt, ASK!
Windows client Troubleshooting
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Windows Clients
Over 3000 servers across all environments
77% of all servers
85% of all failures
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Error Codes
Media related (8x)
Network Communication related (4x)
Configuration/Hardware related (5x)
Most Common Codes:
– 41, 196, 5x, 219, 13, 14, 2x
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Check the Simple Stuff
Is Server On and Cabled
– Decommissioned
– Maintenance
Hosts Files or DNS correct
– Host
– All backup servers
– All backup interfaces on backup servers
Network
– Functional
– Routing
Library/Media Problem
Server Hardware
Windows Event Log
Correlation
Telnet
– To Master/Media from Client
– To Client from Master/Media
– telnet <hostname> bpcd (or 13782)
– telnet <hostname> vnetd (or 13724)
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Check the Simple Stuff (cont’d)
BPCLNTCMD
– Command Options
-sv – returns version of Master
5.1
-pn – communicates back to Master
expecting response from server backup01-dal
backup03-dal backup03-dal 10.229.133.233 56618
-self – returns info about local system
gethostname() returned: backup03-dal
host backup03-dal: backup03-dal at 10.229.133.233 (0xae585e9)
checkhname: aliases:
-hn <hostname> - returns info resolved from hostname
host backup01-dal: backup01-dal at 10.229.133.229 (0xae585e5)
checkhname: aliases:
-ip <IP address> - returns info resolved from IP
checkhaddr: host : backup01-dal: backup01-dal at 10.229.133.229 (0xae585e5)
checkhaddr: aliases:
-server <Master> - see –hn option
KEEPING PEOPLE AND INFORMATION CONNECTED ®
In Depth Client Troubleshooting
Turn up logging on client
– Host properties or client BAR GUI
– Must have <install>\netbackup\logs\* dirs created
Client Logs and Directories:
– bpbkar\<date>.log – Backup/Archive process
(BPBKAR32)
– bpcd\<date>.log – Client Daemon (BPCDW32)
– tar\<date>.log – Restores (TAR32)
KEEPING PEOPLE AND INFORMATION CONNECTED ®
In Depth Client Troubleshooting (cont’d)
Run test backup/restore
Examine logs after failure
Logs structured as such:
00:00:03.125 [3652] <2> bpcd exit_bpcd: exit status 0 ------>exiting
09:55:33.941 [6092] <16> bpfsmap: ERR - open_snapdisk: NBU snapshot failed
Search for <#> entries:
– <2>, <4>, <8>, <16>, <32>: <2>=informational and <32>=Critical Failure
Search error message on Google and Symantec
Test recommended solution
Lather, rinse, repeat
Last resort/time sensitive – open case with Symantec
– (800) 342-0652
– Customer Number 3680-5196-9875
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Example Log
Error 41
5:20:55.454 PM: [1656.2600] <16> dtcp_write: TCP - failure: send socket (904) (TCP
10053: Software caused connection abort)
5:20:55.454 PM: [1656.2600] <16> dtcp_write: TCP - failure: attempted to send 6 bytes
5:20:55.486 PM: [1656.2600] <16> dtcp_write: TCP - failure: send socket (904) (TCP
10053: Software caused connection abort)
– The connection is being reset internally to the host.
Recommendation is to reload the NIC driver or replace the NIC.
– Error 41 can also produce TCP 10054 errors in the logs, but this
is an external closing of the connection. These can be caused by
loss of network connectivity, crashes or reboots.
– Error 41 has also been the result of corrupted VSS. Check the
Event Log for any related error messages and consult with
Systems Engineers, if necessary
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Windows Client Troubleshooting Checklist
Narrow your effort based on error Maximize logging values for client
code Verify log dirs created in <install>\
Check the simple stuff: netbackup\logs\*
Is server cabled, decomm’ed, – bpbkar
under maint. – bpcd
Verify hosts file(s) or DNS on all – tar
involved servers
Network functional? Start backup/restore
Verify routing Review logs searching for errors
(look for <4> <8> <16> <32>)
Library or Media problem? Search error message on Google
Server hardware problem? and Symantec sites
Check Windows event log Test solution
Correlate any issues Repeat until resolved
Run BPCLNTCMD on all involved Open case with Symantec
servers using each option:
– -sv
– (800) 342-0652
– -pn – Cust. #: 3680-5196-9875
– -self
– -hn <hostname>
– -ip <ip address>
– -server <name of Master>
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Reporting
NetBackup Reports
Aptare
– In depth historical reporting and trending
– Supports several backup products, incl. TSM
– Command Center Dashboard
– Job Reports
– The Dot Report – “Don’t agitate the Dots”
– Billing – yes we can be a profit center IF we are
successful
– Media Reports
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Keeping Tabs on the Infrastructure
Use Aptare
Check for down drives/stuck tapes regularly
Verify Drive Configuration
Scratch
Destaging
Balance Jobs
Tape Injects/Ejects
KEEPING PEOPLE AND INFORMATION CONNECTED ®
How To Shutdown/Restart NetBackup
Shutdown Startup
– Suspend/Cancel jobs – ‘netbackup start’
– Stop Aptare – Resume/Restart all jobs
– ‘netbackup stop’ – Start Aptare
– ‘bpps –a’ to see what’s
– Verify environment
running
functions
– ‘kill -9 <pid>’ to kill hung
processes
– Optionally rename startup
script
– Use ‘init 6’ to restart server if
processes will not die
– Ensure drives are empty
‘robtest’
ACSLS server
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Management Tips and Tricks
Use Activity Monitor, Restore, Policies, Device Monitor, Clients Properties most
often
Policies – Use Summary of All Policies
Sorting/Filtering
Sort by State – long running jobs?
Export to Excel – Selected rows or all rows
Column Fields – Move, Hide, Show
Built-in NetBackup Reports
Help
Use multiple windows
Break up long running jobs
– Multiple streams per policy
– Multiple policies
– Watch jobs per policy and client settings
Don’t forget about Aptare!
It isn’t always clear, look at it, correlate it, think about it
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Education and Further Reading
Google
Symantec
– Detailed PDFs on EC troubleshooting
– Manuals/Troubleshooting Guide
– Technotes
NetBackup Mailing List/Forums
– List: http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
– Forums
Backup Central (mirrors the mail lists):
http://www.backupcentral.com/phpBB2/
Symantec: https://forums.symantec.com/syment/board?board.id=21
Tek-Tips:
http://www.tek-tips.com/threadminder.cfm?pid=776
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Questions and Answers
Altered Lyrics to the tune of the Beatles “Yesterday”
Yesterday,
All those backups seemed a waste of pay.
Now my database has gone away.
Oh I believe in yesterday.
Suddenly,
There's not half the files there used to be.
And there's a milestone hanging over me.
The system crashed, so suddenly.
I pushed something wrong,
What it was, I could not say.
Now all my data's gone,
And I long for yesterday-ay-ay-ay.
Yesterday, the need for back-ups seemed so far away.
I knew my data was all here to stay,
Now I believe in yesterday.
KEEPING PEOPLE AND INFORMATION CONNECTED ®
Thanks for attending!
KEEPING PEOPLE AND INFORMATION CONNECTED.®