1.0 1.1 1.2 Introduction Objectives Risk Analysis 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 1.2.7 1.3 Initial Planning Role of a Risk Manager The Need for Backup and Recovery Preparing Procedures Requirement of Critical Jobs Evaluating Alternate Response Compiling the Package

Disaster Recovery Planning 1.3.1 1.3.2 Disaster Recovery Planning Task Disaster Recovery Plan Components



Information has now come to be treated at par with other vital resources by most organisations. Inadvertent or malicious loss, misuse or destruction of data can lead to consequences as disastrous as loss of men, material or money. Traditionally, the armed forces have been very sensitive to leakage of plans or information on dispositions. Financial institutions too have paid attention to building checks and balances to guard against fraud or misappropriations. Currently, the need for safeguarding Corporate Information has become more acute. This is due to the wide dispersal of data within the organisation and the sophisticated means available for tapping into the databases. An ostrich like attitude, towards security of data, can only result in disasters, and, therefore, it is better to be aware of and implement security measures.

At the end of this unit you would be in a position to • explain and appreciate the need of Risk Analysis

• • • •

define initial planning and role of a Risk manager understand the need of backup and recovery process explain disaster recovery planning define various places of disaster recovery planning.

The purpose of risk analysis is to determine the probability of problems occurring, the cost of each possible disaster, the areas of vulnerability and the preventive measures to adopt as part of a contingency plan. Thus, what is required is risk management. Risk management has been described as that element of managerial action that is concerned with identification, measurement and control of uncertain events. It is used to make decisions regarding the costs of (monetary as well as other) protecting against possible events endangering the organisation. In subsequent sections let us look into several aspects relating to Risk Management.

1.2.1 Initial Planning
While carrying out the initial planning, considerable thought should be given to the following: • • • • • Estimated cost and availability of funds to perform an analysis. Value of the physical installation. Worth of data to the organisation and to others. Existing safeguards. Impact of data processing on the organisation's mission of goals.

From this summary, management could then determine those risks that could be tolerated by the organisation and those which require some control. Those requiring control then could be assessed clinically for risk avoidance.

1.2.2 Role of a Risk Manager
Creation of a position of risk manager is strongly recommended because the system is not likely to succeed without having one knowledgeable individual responsible for decision making, and supervision; overall control of technical and analytical activities in the process; and it is continuum. In a small organisation, the position could be assumed as a collateral one to a top level management official. In a large and complex entity, however, a separate position that is sufficiently high in the organisation, should be established for a risk manager, with authority for data processing security across the organisational lines. Some requisites for a top level risk management position are: • • • Knowledge of short and long range goals of the organisation; Awareness of users security needs and priorities to the establishment and maintenance of appropriate level of security; Awareness of new technology in security;

Authority to make, or assist in making, policy decisions on security programs and procedures; • Authority, with management approval, to implement security measures, deemed feasible from a risk analysis; • Ability to follow through, periodically, on security policies and practices in action; checking actual performance and, results and taking corrective action; if necessary punitive action.

It is advisable to take up this work along with the Data Base Administration of the organisation. To the start of the contingency planning project, a team of 3-4 managers from various functional areas is formed. The approach normally followed is to base the contingency plans on rational economic analysis and to avoid problems of internal politics of the organisation. The objectives of the project team generally include the following : • • • Conservation of assets upon exposure to a major hazard whether fire, storm, sabotage of other hazard; Assurance that the corporation will survive even if the computer facilities are disabled, or destroyed;. Specific action plans that a 'prudent man' should take while incharge of the organisation's most vital asset : data.

Generally this activity is a pioneering effort, therefore a detailed project plan preparation is recommended Typical duration of the contingency planning project is an estimate of 275 man-days for the total effort for the development of the contingency plan, Break up of activity duration are given in Table 1 Table 1 Project Out-Line S1.No. Task 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Applied effort (man-days) Plan the project 11 Establish current status of backup and recovery 08 Prepare procedure, lists and forms 09 Establish loss due to delay* 136 Specify critical applications 26 Evaluate alternate responses 18 Document the recommended plans 17 Creation of emergency procedures note-book 22 Document the information required to reconstruct 18 Complete project 'package' 10 Total 275

*Establishing losses resulting in delays in processing is the most difficult part of the contingency planning.

1.2.3 The Need for Backup and Recovery

The hazards that could disable the computer operations are generally categorised as follows: • • • • • • Hardware and software failures; Environmental failures involving electric power air conditioning, building integrity etc. Accidents like fire, smoke, water, storms, Vandalism, sabotage, rioting; Operational errors-probably the most frequent case for inability to operate, often with the most severe consequences; Non-availability of personnel whether due to strike, disease communications breakdown or disruption of transportation.

For any of the first five categories, the effect would be partial or total inoperability, or perhaps the destruction of facilities, data, programs and files, the duration of the effect could range from a temporary interruption to a permanent loss. Hence, there is a definite need of a proper system for backup and recovery. The sixth category, the unavailability of personnel, would result in temporary interruption.

1.2.4 Preparing Procedures
The form at Table 2 is used as a tool for uniformly recording and evaluating tile data showing the potential losses to the organisation if a hazard makes it impossible for the computers to produce outputs on time. Table 2 "Criticality Evaluation" Application Progress Loss if delay is 12 hrs. 24 hrs. 2 days. 4 days. 7 days. 2 weeks. System A Subsystem A3 Program A3N Program A3M Subsystem A4 System where m is the monetary unit Note : The object of the contingency plan is to discover which applications/programs are most critical in terms of losses incurred. In many cases it is discovered that the cost to the organisation (if it was unable to produce the outputs on time) was of such magnitude that both the Organisation and the users agreed that under no circumstances would the organisation tolerate such losses. Hence a dual evaluation was undertaken for those application systems with extremely high loss potential. First, as usual, the loss to the organisation, if unable to produce the output on time, is calculated; then for comparison, the steps that would be required under the worst conditions regardless of cost to prevent this major loss from ever happening. 1m 0m 50m 2m 15m 3m 5m 175m 70m 5m 25m 200m

1 month

The detailed analysis, in normally all cases, is done by the user group itself, with assistance and guidance from the project team members. Getting the user group involved in the analysis is found to be of high value, because it forces to think through what user will have to do in case of an emergency. It also compelled them to make an economic analysis of the value of their work, in a corporate sense, rather than from the usual parochial point of view.

1.2.5 Requirement of Critical Jobs
Upon identifying the critical jobs, the requirements of these jobs are established as following: • • • • • • • • Timing Equipment Data Software Preprocessing Personnel Post Processing Others

:Schedules acceptable delays, minimum/maximum for normal processing. :Core, tape (density, track), discs, printers, special features. :Files, generation, data group, catalogue and procedures. :Special programs, protection, passwords, :User interface, Inputs, data preparation, error handling, prerequisite runs. :User contacts, data preparations data controller, distribution, supervision, support. :Distribution lists, control. :Documentation (Block diagrams, record layouts, source program listing), procedures (operati instructions, check points and restarts), security, sabotage, forms, supplies.

1.2.6 Evaluating Alternate Response
With critical systems satisfactorily identified, what should be the responses to an 'accident' or 'catastrophe'? Let us summarise, first, the essential elements of any form of response to an unwanted vent which could lead to delay in data processing operations : • Obviously one must evaluate the situation and estimate the consequences, including a recognition of the time period in which the accident occurred. If it occurs on a weekend, some specific steps must be taken. At what period of a cycle, in processing, are we when the operation is brought to sudden halt ? Probably the most neglected response element is communication with all of the affected parties. One should not hide the fact that a significant emergency has occurred. Mechanisms (including responsibilities and authorisations) must be set up in advance to communicate with the users, suppliers, personnel and all others in any way involved. As quickly as possible, the selected response actions should be initiated. Operations in the backup mode should be activated on the basis of the contingency plans developed, and by those made responsible in the plan. Necessary check points and controls should not be over looked including extra security safeguards. It should be remembered that everything will be under abnormal conditions, for instance, transportation problems may become severe. Actions to restore normalcy should be started. During emergency the data processing is based on a limited scope. Once back to operating in routine, nomalcy has not yet been reached. Time will be need on equipment and overtime for most of the personnel to restore master files and bring them up to a current status. Those files and systems, that were temporarily processed in a contingency mode, will require much updating. Additional checking of files and supplementary audits must be undertaken to assure that normalcy is indeed restored.

Let us now examine some of the alternative responses :

• • •

Accept the delay. Just do nothing and wait, if one can afford to. This is the simplest response. Attempt to remove or minimize the cost of delay. Change, immediately, the schedules of operation and process only what is critical, using as a basis the economic analysis of critical jobs. By reducing the scope of operation one will concentrate on only the true essentials. Go off-site whether locally or remotely. This may require running extra hours for the main processing, and again subsequently to help catch up with the backlog of systems to be updated. For any processing off-site, appropriate concern must be shown for configuration and software compatibilities. Cash advances or credit should be handy to provide air tickets for personnel to fly out suddenly. Communications, work-flow, controls, and security will become important items requiring attention.

The emergency procedures note-book, like the whole contingency plan, is designed to limit losses. It should he available to console operators, shift supervisors, and operations managers. Included in it should be sections dealing with fire, water, flood, bomb, threats, smoke, dirt, storms, electric problems, airconditioning failure, building hazards, communication facility problems, hardware malfunctions, evaluation of the building and entry procedures, and other emergency situations ( The section on other situations, could deal with radar interference, magnets, backup tapes, situations involving off-site data storage vaults, lack of supplies and forms, vandalism, theft and fraud ). Much of the information incorporated in these sections previously exists within the organisation in various shapes and forms, and in various degrees of completeness. By consolidating all the information and by assembling the best for each source, it would be possible to produce a useful reference. In an emergency, things usually go from bad to worse. Taking hasty steps, by-passing normal precautionary measures and making faulty responses aggravate the situation. But the emergency procedures notebook will certainly help to avoid this. By assigning specific contingency responsibilities in advance, of major emergency, it will do much towards the elimination of chaos and confusion. In the plan, and in the emergency response notebook, each response action should be spelt out in detail, after thinking them out under calm conditions.

1.2.7 Compiling the Package
In order to achieve ultimate restoration of the data processing operation, one needs to be able to replace damaged or destroyed facilities. This calls for an up-to-date package of records containing complete specifications and purchasing information for all resource necessary in the operation. It should include data for hardware, communication equipment, system software, operating procedures, run instructions and various logs. Also to be included are data needed for the reconstruction of files, and for updating, testing and debugging of programs. One should be certain that the environmental services such as air conditioning and electric power, as well as paper stock, tapes, discs, printer ribbons, forms and general supplies, are all taken care of.


Contingency planning is not easy, and it can take a great deal of time for sophisticated installations. But planning for emergencies is well within the state of the art. The methodology listed here could be of help to those who wish to take advantage of it.

What follows is, possibly, relevant to any disaster as much as to a cataclysmic event in an Information System. To convey the idea rather than a standard specimen, only the frame has been sketched in; to be interpreted, as applicable, to an organisation or a situation. This submit deals with two aspects : the tasks in planning; and the components to a Plan.

1.3.1 Disaster Recovery Planning Tasks
Disaster Recovery Plan tasks can be visualized to be of six major phases, as below, and detailed later • • • • • • Definition Phase; Functional Requirements Phase; Design Phase; Implementation Phase; Testing and Activation Phase; Maintenance Phase.

Phase I : In this phase the parameters of all that is to be included is assessed and put in perspective. It would consist of things like. • • • The objectives; Terms of reference; Planning perspective;

Phase II : Is possibly the most critical phase to include such sections and activities as • • • • Making an inventory of resources to be included; e.g. hardware, software, telecommunications components, data life cycles, also actions or movement like Data Conversion, Movement, Physical as well as electrical; also procedures and Standard Operating Practices; Critical appraisal of the Applications and the installation against recovery Objectives; Deciding what is to be covered in the Plan; Establish priorities based on the criticality of time frames, threats and Organisation's performance.

Phase III :

Is particularly significant in a plan being prepared for the first time (Note the reference to equipment alternatives) and would include such things as: • • • • • Identify design alternatives; Specify in detail, the alternatives e.g. hardware, software, telecommunication, staffing, rocedures etc. Identify potential vendors : if purchase necessary; Analyse risks in various alternatives as well as costs involved to improve to desired levels; Select the acceptable design: including financial approvals.

Phase IV : Would put into action the desired and designed Plan and would be made up of: • • • • • • • Acquisition of land, building, utilities, hardware, Telecommunications lines etc.; Negotiate and sign contracts with vendors, Consultants; Writing of Manuals of Procedure; Training of personnel; Site preparation; Development of a test plan; Development of a Maintenance Plan.

Phase V : Is equivalent to a system test run in computer jargon. It will consist of three Segments as below : Segment 1 : Paralleling; Segment 2 : Live Testing; Segment 3 : Maintenance Testing. Segment 1 : In this, all activities external to the Complex in Disaster Recovery plan are tested or triggered, such as : (a) (b) (c) (d) (e) (f) Segment 2 : "Set the dogs free" and simulate a breakdown; the following actions will be included: (g) attempt to run using the plan only; scheduling all 'on call' personnel and practising the drill; triggering arrangements external to the complex; practice 'back-up' by invoking them and working job; validate adequacy of back-up by comparing with a live job selected at random; correct errors in plan if any; repeat (d), (e) till snags, complications and so on are removed and simple, streamlined Standard Operating Procedures emerges.

(h) (i) Segment 3 :

correct defects, if detected: retrain if necessary and review Standard Operating procedures; repeat (c) to (h) till drill is free of all bugs, and is simple, reliable, economical and effective.

Is in the best traditions of Management Science and is invoked on two occasions : as a routine, and whenever there is a change as below : (j) (k) Phase VI : It is not strictly a part of the planning, as no tasks are performed. It is a development of philosophies during the implementation phase and applied as on-going activity. The stress will be on the software and in this connection, two books must be maintained : software Change authorisation and software packages. Other items needing constant maintenance are: • • • Names, titles, Media etc.; Back-up library (data systems, Application Software etc.); Documentation and Standard Operating Procedures. repeat (g) to (i) annually; repeat (c) to (f) whenever there is a revision to Plan.

1.3.2 Disaster Recovery Plan Components
In this Section, the sub-divisions to the plan manual are purely recommendatory and are guides only. Twelve facets are identified : Section I : Statement of Purpose • • • Objectives Scope, constraints Priorities

Section II : Would describe all Hardware in use • • CPU/S; Peripherals etc.

Section III : Could be devoted to a description of the Telecommunicaions component of the complex and would include • • Message switching; Multiplexors, concentrators and the like;

• • • • •

Diagnostic devices; Modems; Terminals and the like; Protocols; Lines, channels and circuits.

Section IV : Off-line devices like data conversion, Data entry, plotters etc. Section V : All "firm-ware" should be described. Section VI : Should describe all software like: • • • • • Operating Systems; Compilers; Utilities; Data Base Management and Communications Management; Full details of Applications - Source, Object, etc.

Section VII : Every form used in the complex should be covered • • • • • • Flat packs; Checks; Turn-around Documents; Input forms; Coding sheets; Forms used to invoke Back-up etc.

Section VIII : This will elaborate on procedures, areas with potential for jeopardy to the System and include: • • • • Operation at Back-up installation; Critical procedures for Manual Operations; Controls on Data, software etc.; Training.

Section IX : Could elaborate on the policy on space e.g. • • • • • • • • • Hardware deployment; Storage; Terminals; Off-line devices; Clerical areas; Forms and Stationery; Input/Output controls; Repair and maintenance; Security.

Section X :
All aspects of the Utilities: Water, electric power, Air conditioning etc.

Section XI :
Personnel aspects and assignment of duties in the various stages of the System • • Recovery Management; Site preparations; • • • • • • • • • • • • • • • Selections Construction. Hardware installation Telecom installation. Stores Support Services like typing, reprography etc. Administration Applications ManagementThe Manager and his responsibility; System maintenance; System development and review; System reconstruction; Data Base reconstruction; Transaction processing principles, supervision (like transaction authorisation, input preparation, including conversion/entry, output control error correction etc.); Staffing and training.

Data Centre Recovery• • • • • • Installation Management; Shift Organisation and supervision; Console Operation; Scheduling; Terminal Access; Media Library;

• •

System Programming; Input/Output Control etc.

Plan Maintenance• • • • Overall responsibility; Applications responsibility; Installation responsibility; Plan testing and review.

This unit introduces you to the techniques of risk analysis and disaster planning. With examples it explain various components of risk analysis and their usefulness. Briefly it also discussed about disaster recovery planning and its requirement.

Sign up to vote on this title
UsefulNot useful