Professional Documents
Culture Documents
XI Recovery
Consistent Backup
Business data distributed over several systems
Redundant data
Different database vendors
Da
Point-in-time Recovery
t ime ta
No common checkpoint
w n lo s
Data in file systems Do s
Controlled Shutdown/Startup
Automated interaction between systems
Multiple input sources per system
Inconsistencies
Requirements
Distributed processes must ensure data consistency
System failures may not have an impact on data consistency
Consistent system copies of the whole environment must be possible
Not intended for restore after failure of a single system
Inconsistencies
Key concepts: RAID protection, implementation of HA solutions, additional offline copies of DB,
different devices for logs and data, DB-log mirroring, logs saved twice on disk and twice on tape, apply
logs regularly to ensure correct recovery, test restore and recovery regularly, avoid restore on production
system, better restore on another system and fix problems afterwards.
Depending on the backup method used for a complete environment backup, a complete backup can often
be used to restore and recover just a single system.
A consistent copy of the complete environment may be needed to set up a test landscape with production
data. It could serve as a fallback in case of an unsuccessful upgrade or for setting save points during data
migration (so you could go back to a well-defined state if one part of the migration fails).
It is not possible to create a 100% consistent copy of the environment by just doing point-in-time restores
of all systems. Because there is no common, synchronized time for all systems, there are always some
modifications that were already done by one system and not yet done by some other system, leading to an
inconsistent state. To get a consistent environment copy, all methods presented in this lesson use some
mechanism to ‘freeze’ any modifications that can result in inconsistencies between systems.
Other Important Data Security Measures
RAID protection
Redundant hardware paths
Implementation of HA solutions
Additional offline copies of database
Different devices for logs and data
DB log mirroring (on different controllers)
Logs saved twice on disk and twice on tape
Apply logs regularly to ensure correct recovery
Test restore and recovery regularly
Execute database consistency checks regularly
Avoid restore on production system
Explanations for backup and restore strategies for individual components can be found on the next two
slides.
Backup and Recovery for Individual System Components
Java Instances
Overwrite the installed Java Instance (standard path:
/usr/sap/<SID>/JC<inst-nr>)
Databases
After the database has been imported, you also have to import
the available log backups.
System 1
finished
System 1
Online Backup Delta Data Log Backup
System 2
Online Backup Delta Data Log Backup
System 3
Online Backup Delta Data Log Backup
Systems
synchronized
Consistent
backup
Offline time can be reduced by taking an online database backup followed by an offline log backup.
Steps:
- Take an online backup of all systems.
- Shut down all but one system after the online backup of all systems is finished (application shutdown) and start
a log backup (or initiate logswitch) for these systems.
- Start log backup (or initiate logswitch) on online system after last of the other systems was shut down.
- All systems can be restarted after the log backup / logswitch on the online system was started.
- Details of the procedure may differ for different database systems.
To get a consistent state, all systems except one must be down at one point in time. A consistent state is reached,
with systems synchronized, after the last of the n - 1 systems is shut down. Then the log backup of the one system
that stays online can be started. The log backup of the shutdown systems can be started earlier because there cannot
be any changes.
A restore can now recover all systems using the logs. Because all but one systems were down, the restore will
provide a consistent state (the time when the log backup / logswitch on system 3 was done).
Advantage: Short downtime. The downtime can be even shorter than for stopping all applications because the other
systems may already be restarted after the last system started its log backup / logswitch.
Disadvantages: same as for stopping all applications
Failover/Second Instance Distributed Systems
SAP SAP
SAP BW SAP CRM SAP BW SAP CRM
XI XI
Remote Copy
Consistency Group
The concept of consistency groups can also be extended to include remote copies to another site.
Incomplete Recovery
Possible reasons:
Logfiles corrupt
Tapes destroyed
DB logically corrupt
Consequences:
Data loss
Inconsistencies between systems
All database systems guarantee that it is always possible to recover to the current point in time after a
system crash. This means that all committed transactions can be recovered after a crash. Because of the
techniques used to exchange data between systems, this also ensures that the systems are in a consistent
state after a system is recovered.
Data between systems can only be inconsistent if it proved impossible to recover one of the systems to the
point of the crash, and therefore a recovery to an earlier point in time was performed.
Do all you can to avoid a point-in-time restore.
Data loss for a few tables (such as unintentionally dropping a table), import of incorrect transports, or
logical corruption are usually insufficient reasons for doing a point-in-time restore. Instead, the system
should be kept running and the errors fixed by importing tables from a test system or an earlier backup,
correcting wrong or inconsistent data, applying correcting transports, and so on. This procedure becomes
more important the longer the system continued running before the error was detected. The data loss
caused by a point-in-time restore is usually more serious for business operation than the energy needed to
fix the problems.
Thus, the only reasons for an incomplete recovery are of physical nature.
- Examples: the log files are corrupt (this may also be due to software errors) and cannot be applied,
the log tapes are destroyed, or the database is so logically corrupt that it cannot be fixed at all.
Alternatives to a Point-in-Time Recovery
If another solution
Other handling errors seems possible, avoid
Repair inconsistent data point-in-time recovery
Instance Management
Manages your database instance or instances if you have Oracle Real
Application Cluster (RAC)
Space Management
Manages space in your database
Segment Management
Manages segments (i.e. tables and indexes) in your database