Professional Documents
Culture Documents
SG24-4503-00
IBML
International Technical Support Organization System/390 MVS Parallel Sysplex Continuous Availability SE Guide December 1995
SG24-4503-00
Take Note! Before using this information and the product it supports, be sure to read the general information under Special Notices on page xvii.
Abstract
This document discusses how the parallel sysplex can help an installation get closer to a goal of continuous availability. It is intended for customer systems and operations personnel responsible for implementing parallel sysplex, and the IBM Systems Engineers who assist them. It will also be useful to technical managers who want to assess the benefits they can expect from parallel sysplex in this area. The book describes how to configure both the hardware and software in order to eliminate planned outages and minimize the impact of unplanned outages. It describes how you can make hardware and software changes to the sysplex without disrupting the running of the applications. It also discusses how to handle unplanned hardware or software failures, and to recover from error situations with minimal impact to the applications. A knowledge of parallel sysplex is assumed. (296 pages)
iii
iv
Contents
Abstract
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii xvii
Special Notices
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How This Document Is Organized . . . . . . . . . . . . . . . . . . . . . . Related Publications International Technical Support Organization Publications ITSO Redbooks on the World Wide Web (WWW) . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
1 3 3 3 4 6 7 7 7 7 7 8 10 10 11 11 11 11 12 13 13 14 14 14 14 14 15 16 17 17 17 17 18 18 19 19 20 20 21
Chapter 1. Hardware Configuration . . . . . . . . . 1.1 What Is Continuous Availability? . . . . . . . . . 1.1.1 Parallel Sysplex and Continuous Availability . . . . . . . . . . . . . . . . . . . 1.1.2 Why N + 1 ? 1.2 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Coupling Facilities 1.3.1 Separate Machines . . . . . . . . . . . . . . 1.3.2 How Many? . . . . . . . . . . . . . . . . . . . 1.3.3 CF Links . . . . . . . . . . . . . . . . . . . . . 1.3.4 Coupling Facility Structures . . . . . . . . . 1.3.5 Coupling Facility Volatility/Nonvolatility . . 1.4 Sysplex Timers . . . . . . . . . . . . . . . . . . . 1.4.1 Duplicating . . . . . . . . . . . . . . . . . . . 1.4.2 Distance . . . . . . . . . . . . . . . . . . . . . 1.4.3 Setting the Time in MVS . . . . . . . . . . . 1.4.4 Protection . . . . . . . . . . . . . . . . . . . . 1.5 I/O Configuration . . . . . . . . . . . . . . . . . . 1.5.1 ESCON Logical Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 CTCs . . . . . . . . . . . . 1.6.1 3088 and ESCON CTC . . . . . . . . 1.6.2 Alternate CTC Configuration . . . . . . . . . . . . . . 1.6.3 Sharing CTC Paths 1.6.4 IOCP Coding . . . . . . . . . . . . . . . . . . 1.6.5 3088 Maintenance . . . . . . . . . . . . . . . 1.7 XCF Signalling Paths . . . . . . . . . . . . . . . . 1.8 Data Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 DASD Configuration 1.9.1 RAMAC and RAMAC 2 Array Subsystems 1.9.2 3990 Model 6 . . . . . . . . . . . . . . . . . . 1.9.3 3990 Model 3 . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.4 DASD Path Recommendations 1.9.5 3990 Model 6 ESCON Logical Path Report . . . . . . . . . . . . . . . . . 1.10 ESCON Directors 1.10.1 ESCON Manager . . . . . . . . . . . . . . . 1.10.2 ESCON Director Switch Matrix . . . . . . . 1.11 Fiber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.11.1 9729 . . . . . . . . . . . . . . . . . . . . . . 1.12 Consoles
Copyright IBM Corp. 1995
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.12.1 Hardware Management Console (HMC) . . . . . 1.12.2 How Many HMCs? . . . . . . . . . . . . . . . . . . 1.12.3 Using HMC As an MVS Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12.4 MVS Consoles 1.12.5 Master Console Considerations . . . . . . . . . . 1.12.6 Console Configuration Considerations . . . . . . 1.13 Tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.13.1 3490 1.14 Communications . . . . . . . . . . . . . . . . . . . . . . 1.14.1 VTAM CTCs . . . . . . . . . . . . . . . . . . . . . . 1.14.2 3745s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14.3 CF Structure 1.15 Environmental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.15.1 Uninterruptible Power Supply (UPS) 1.15.2 9672/9674 Protection against Power Disturbances Chapter 2. System Software Configuration . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 2.2 N, N+1 in a Software Environment . . . . . . 2.3 Shared SYSRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Shared SYSRES Design . . . . . . . . . 2.3.2 Indirect Catalog Function 2.4 Master Catalog . . . . . . . . . . . . . . . . . . 2.5 Dynamic I/O Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Exceptions 2.6 I/O Definition File . . . . . . . . . . . . . . . . . 2.7 Couple Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 JES2 Checkpoint . . . . 2.8.1 JES2 Checkpoint Reconfiguration 2.9 RACF Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 PARMLIB Considerations . . . . 2.10.1 Developing Naming Conventions 2.10.2 MVS/ESA SP V5.2 Enhancements . . . . . . . . . . . . . . . . . . . 2.10.3 MVS Consoles 2.11 System Logger . . . . . . . . . . . . . . . . . . . . 2.11.1 Logstream and Structure Allocation . . . . . . . . . . . 2.11.2 DASD Log Data Sets . 2.11.3 Duplexing Coupling Facility Log Data 2.11.4 DASD Staging Data Sets . . . . . . . . . . 2.12 System Managed Storage Considerations 2.12.1 SMSplex . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12.2 DFSMShsm Considerations 2.12.3 Continuous Availability Considerations . . . . . . . . . . . . . 2.12.4 RESERVE Activity 2.13 Shared Tape Support . . . . . . . . . . . . . . 2.13.1 Planning . . . . . . . . . . . . . . . . . . . 2.13.2 Implementing Automatic Tape Switching 2.14 Exploiting Dynamic Functions . . . . . . . . . 2.14.1 Dynamic Exits . . . . . . . . . . . . . . . . 2.14.2 Dynamic Subsystem Interface (SSI) . . . . . . . 2.14.3 Dynamic Reconfiguration of XES . 2.15 Automating Sysplex Failure Management 2.15.1 Planning for SFM . . . . . . . . . . . . . . 2.15.2 The SFM Isolate Function . . . . . . . . . 2.15.3 SFM Parameters . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21 21 21 21 22 23 25 25 26 26 26 26 26 26 27 29 29 29 29 30 30 32 33 34 35 35 38 39 40 40 40 41 43 46 46 46 47 49 50 50 52 52 53 54 54 54 55 55 56 57 57 58 59 63
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
2.15.4 SFM Activation . . . . . . . . . . . . . 2.15.5 Stopping SFM . . . . . . . . . . . . . . 2.15.6 SFM Utilization . . . . . . . . . . . . . 2.16 Planning the Time Detection Intervals . . 2.16.2 Synchronous WTO(R) . . . . . . . . . 2.17 ARM: MVS Automatic Restart Manager . . . . . . . . . . 2.17.1 ARM Characteristics 2.17.2 ARM Processing Requirements . . . 2.17.3 Program Changes . . . . . . . . . . . 2.17.4 ARM and Subsystems . . . . . . . . . 2.18 JES3 . . . . . . . . . . . . . . . . . . . . . . 2.18.1 Planning . . . . . . . . . . . . . . . . . . . . . 2.18.2 JES3 Sysplex Considerations 2.18.3 JES3 Parallel Sysplex Requirements 2.18.4 JES3 Configurations . . . . . . . . . . 2.18.5 Additional JES3 Planning Information Chapter 3. Subsystem Software Configuration 3.1 CICS V4 Transaction Subsystem . . . . . 3.1.1 CICS Topology . . . . . . . . . . . . . 3.1.2 CICS Affinities . . . . . . . . . . . . . . . . . . . . . . 3.1.3 File-Owning Regions . 3.1.4 Resource Definition Online (RDO) . . . . . . . . . 3.1.5 CSD Considerations . . . 3.1.6 Subsystem Storage Protection 3.1.7 Transaction Isolation . . . . . . . . . . . . . . . . . . . . . . . 3.2 CICSPlex SM V1 3.2.1 CICSPlex SM Configuration . . . . . . . . . . . . 3.3 IMS Transaction Subsystem . . . . . . . . . . . . . 3.3.1 IMS Topology . . . . . . . . . . . . . . 3.3.2 IMS RESLIB 3.3.3 IMSIDs . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Terminal Definitions . . . . . . . . . . . 3.3.5 Data Set Sharing 3.3.6 IRLM Definitions . . . . . . . . . . . . 3.3.7 Coupling Facility Structures . . . . . 3.3.8 Dynamic Update of IMS Type 2 SVC 3.3.9 Cloning Inhibitors . . . . . . . . . . . 3.4 DB2 Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 DB2 Environment 3.4.2 DB2 Structures . . . . . . . . . . . . . . . . . . . 3.4.3 Changing Structure Sizes 3.4.4 DB2 Data Availability . . . . . . . . . 3.4.5 IEFSSNXX Considerations . . . . . . 3.4.6 DB2 Subsystem Parameters . . . . . 3.5 VSAM RLS . . . . . . . . . . . . . . . . . . 3.5.1 Control Data Sets . . . . . . . . . . . . . . . . . . . 3.5.2 Defining the Database 3.5.3 Defining the SMSVSAM Structures . 3.5.4 CICS Use of System Logger . . . . . 3.6 TSO in a Parallel Sysplex . . . . . . . . . 3.7 System Automation Tools . . . . . . . . . 3.7.1 NetView . . . . . . . . . . . . . . . . . 3.7.2 AOC/MVS . . . . . . . . . . . . . . . . 3.7.3 OPC/ESA . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69 72 72 73 79 79 80 80 82 82 87 87 89 90 91 93
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95 95 . 96 . 97 . 97 . 97 . 97 . 98 . 98 . 98 . 99 100 100 101 101 101 102 102 102 102 103 103 103 104 105 105 105 105 106 107 108 108 109 109 110 110 110 111
Contents
vii
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
112 112
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
113 115 115 115 115 116 116 116 117 117 118 118 119 120 120 121 123 125 126 127 127 127 128 132 132 133 133 134 138 141 142 143 143 143 143 144 144 145 145 145 145 146 146 146 146 148 148
4. Systems Management in a Parallel Sysplex . . . . . . Importance of Systems Management in Parallel Sysplex Change Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem Management Operations Management . . . . . . . . . . . . . . . . . . . . . . . . . . The Other System Management Disciplines Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 5. Coupling Facility Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Structure Attributes and Allocation 5.2 Structure and Connection Disposition . . . . . . . . . . . . . 5.2.1 Structure Disposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Connection State and Disposition 5.3 Structure Dependence on Dumps . . . . . . . . . . . . . . . 5.4 To Move a Structure . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 The Structure Rebuild Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Altering the Size of a Structure 5.6 Changing the Active CFRM Policy . . . . . . . . . . . . . . . 5.7 Reformatting the CFRM Couple Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Adding a Coupling Facility 5.8.1 To Define the Coupling Facility LPAR and Connections . . . . . . . . . . . . 5.8.2 To Prepare the New CFRM Policy 5.8.3 Setting Up the Structure Exploiters . . . . . . . . . . . . 5.9 Servicing the Coupling Facility . . . . . . . . . . . . . . . . . 5.9.1 Concurrent Hardware Upgrades: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9.2 Concurrent LIC Upgrades 5.10 Removing a Coupling Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Coupling Facility Shutdown Procedure . . . . . . 5.11.1 Coupling Facility Exploiter Considerations 5.11.2 Shutting Down the Only Coupling Facility . . . . . . . 5.12 Putting a Coupling Facility Back Online . . . . . . . . . . . Chapter 6. Hardware Changes . . . . . . . . . . 6.1 Processors . . . . . . . . . . . . . . . . . . . . 6.1.1 Adding a Processor . . . . . . . . . . . . 6.1.2 Removing a Processor . . . . . . . . . . . . . . . . . . . . 6.1.3 Changing a Processor . . . . . . . . . . 6.2 Logical Partitions (LPARs) 6.2.1 Adding an LPAR . . . . . . . . . . . . . . 6.2.2 Removing an LPAR . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Changing an LPAR . . . . . . . . . . . . . . . . . . . 6.3 I/O Devices 6.4 ESCON Directors . . . . . . . . . . . . . . . . 6.5 Changing the Time . . . . . . . . . . . . . . . 6.5.1 Using the Sysplex Timer . . . . . . . . . . . . . . . . . . 6.5.2 Time Changes and IMS . . . . . . . . . 6.5.3 Time Changes and SMF 6.5.4 Changing Time in the 9672 HMC and SE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
Chapter 7. Software Changes . . . . . . . . . . . . . . . . . 7.1 Adding a New MVS Image . . . . . . 7.1.1 Adding a New JES3 Main . . . . . . . . . . 7.2 Adding a New SYSRES 7.2.1 Example JCL . . . . . . . . . . . . . . 7.3 Implementing System Software Changes . . . . . . . . . . . . 7.4 Adding Subsystems 7.4.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 IMS Subsystem . . . . . . . . . . . . . . . . . . . 7.4.3 DB2 . . . . . . . . . . . . . . . . . . . 7.4.4 TSO . . . . . . . . . 7.5 Starting the Subsystems 7.5.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 DB2 . . . . . . . . . . . . . . . . . . . 7.5.3 IMS 7.6 Changing Subsystems . . . . . . . . . . . . . . . . . . . . . . 7.7 Moving the Workload 7.7.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.2 IMS . . . . . . . . . . . . . . . . . . . 7.7.3 DB2 . . . . . . . . . . . . . . . . . . . 7.7.4 TSO . . . . . . . . . . . . . . . . . . 7.7.5 Batch . . . . . . . . . . . . . . . . . 7.7.6 DFSMS 7.8 Closing Down the Subsystems . . . . . . 7.8.1 CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.2 IMS . . . . . . . . . . . . . . . . . . . 7.8.3 DB2 . . . 7.8.4 System Automation Shutdown 7.9 Removing an MVS Image . . . . . . . . . Chapter 8. Database Availability 8.1 VSAM . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Batch . . . . . . . . . 8.1.2 Backup . . . . . . . . . . 8.1.3 Reorg 8.2 IMS/DB . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Batch . . . . . . . . . 8.2.2 Backup . . . . . . . . . . 8.2.3 Reorg 8.3 DB2 . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Batch . . . . . . . . . 8.3.2 Backup . . . . . . . . . . 8.3.3 Reorg
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149 149 150 151 151 154 155 156 157 158 159 159 159 160 160 160 161 161 163 163 164 164 165 165 166 166 167 169 169 171 171 171 172 172 172 173 173 174 174 174 175 175
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 9. Parallel Sysplex Recovery . . . . . 9.1 System Recovery . . . . . . . . . . . . . . . 9.1.1 Sysplex Failure Management (SFM) . 9.1.2 Automatic Restart Management (ARM) . . . . . . . 9.1.3 What Needs to Be Done? . . . . 9.2 Coupling Facility Failure Recovery . . . 9.3 Assessment of the Failure Condition . . 9.3.1 To Recognize a Structure Failure
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
ix
9.3.2 To Recognize a Connectivity Failure . . . . . . . . . . . . . . . . . 9.3.3 To Recognize When a Coupling Facility Becomes Volatile . . . . 9.3.4 Recovery from a Connectivity Failure . . . . . . . . . . . . . . . . 9.3.5 Recovery from a Structure Failure . . . . . . . . . . . . . . . . . . 9.4 DB2 V4 Recovery from a Coupling Facility Failure . . . . . . . . . . . . . . . . . . 9.4.1 DB2 V4 Built-In Recovery from Connectivity Failure . . . . . . . . 9.4.2 DB2 V4 Built-In Recovery from a Structure Failure 9.4.3 Coupling Facility Becoming Volatile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.4 Manual Structure Rebuild 9.4.5 To Manually Deallocate and Reallocate a Group Buffer Pool . . 9.4.6 To Manually Deallocate a DB2 Lock Structure . . . . . . . . . . . 9.4.7 To Manually Deallocate a DB2 SCA Structure . . . . . . . . . . . 9.5 XCF Recovery from a Coupling Facility Failure . . . . . . . . . . . . . 9.5.1 XCF Built-In Recovery from Connectivity or Structure Failure . . 9.5.2 Coupling Facility Becoming Volatile . . . . . . . . . . . . . . . . . 9.5.3 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . . . . . . . 9.5.4 Manual Deallocation of the XCF Signalling Structures . . . . . . . . . . . . . . . . . . . . . . . . 9.5.5 Partitioning the Sysplex 9.6 RACF Recovery from a Coupling Facility Failure . . . . . . . . . . . . 9.6.1 RACF Built-In Recovery from Connectivity or Structure Failure . 9.6.2 Coupling Facility Becoming Volatile . . . . . . . . . . . . . . . . . 9.6.3 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . 9.6.4 Manual Deallocation of RACF Structures . . . . . . . . . . . . . . 9.7 VTAM Recovery from a Coupling Facility Failure . . . . . . . . . . . . . . . . . . . . 9.7.1 VTAM Built-In Recovery from Connectivity Failure 9.7.2 VTAM Built-In Recovery from a Structure Failure . . . . . . . . . 9.7.3 The Coupling Facility Becomes Volatile . . . . . . . . . . . . . . . 9.7.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . 9.7.5 Manual Deallocation of the VTAM GRN Structure . . . . . . . . . 9.8 IMS/DB Recovery from a Coupling Facility Failure . . . . . . . . . . . . . . . . . 9.8.1 IMS/DB Built-In Recovery from a Connectivity Failure 9.8.2 IMS/DB Built-In Recovery from a Structure Failure . . . . . . . . 9.8.3 Coupling Facility Becoming Volatile . . . . . . . . . . . . . . . . . 9.8.4 Manual Invocation of Structure Rebuild . . . . . . . . . . . . . . . 9.8.5 Manual Deallocation of an IRLM Lock Structure . . . . . . . . . . . . . . 9.8.6 Manual Deallocation of a OSAM/VSAM Cache Structure 9.9 JES2 Recovery from a Coupling Facility Failure . . . . . . . . . . . . . . . . . . . . . . . 9.9.1 Connectivity Failure to a Checkpoint Structure . . . . . . . . . . . . 9.9.2 Structure Failure in a Checkpoint Structure 9.9.3 The Coupling Facility becomes Volatile . . . . . . . . . . . . . . . 9.9.4 To Manually Move a JES2 Checkpoint . . . . . . . . . . . . . . . . 9.10 System Logger Recovery from a Coupling Facility Failure . . . . . . 9.10.1 System Logger Built-In Recovery from a Connectivity Failure . 9.10.2 System Logger Built-In Recovery from a Structure Failure . . . . . . . . . . . . . . . . . . . 9.10.3 Coupling Facility Becoming Volatile . . . . . . . . . . . . . . 9.10.4 Manual Invocation of Structure Rebuild . . . . . . . . . . 9.10.5 Manual Deallocation of Logstreams Structure 9.11 Automatic Tape Switching Recovery from a Coupling Facility Failure 9.11.1 Automatic Tape Switching Recovery from a Connectivity Failure 9.11.2 Automatic Tape Switching Built-In Recovery from a Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failure . . . . . . . . . . . . . . . . 9.11.3 Coupling Facility Becoming Volatile . . . . . . . . . . . . . . 9.11.4 Manual Invocation of Structure Rebuild . 9.11.5 Consequences of Failing to Rebuild the IEFAUTOS Structure . . . . . . . . . . . 9.11.6 Manual Deallocation of IEFAUTOS Structure
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
186 186 187 188 189 189 190 190 190 190 191 192 192 192 193 193 193 193 194 194 195 195 196 196 196 196 196 196 197 197 197 198 198 198 199 199 199 199 202 203 203 203 203 203 203 204 204 204 204 204 204 204 205 205
. . . . . . . . . .
9.12 VSAM RLS Recovery from a Coupling Facility Failure . . . . . . . . . 9.12.1 SMSVSAM Built-In Recovery from a Connectivity Failure . . . . 9.12.2 SMSVSAM Built-In Recovery from a Structure Failure . . . . . . . . . . . . . . . 9.12.3 Coupling Facility Becoming Volatile . . . . . . . . . . . . . 9.12.4 Manual Invocation of Structure Rebuild . . . . . . . . . 9.12.5 Manual Deallocation of SMSVSAM Structures 9.13 Couple Data Set Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.13.1 Sysplex (XCF) Couple Data Set Failure 9.13.2 Coupling Facility Resource Manager (CFRM) Couple Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failure 9.13.3 Sysplex Failure Management (SFM) Couple Data Set Failure 9.13.4 Workload Manager (WLM) Couple Data Set Failure . . . . . . 9.13.5 Automatic Restart Manager (ARM) Couple Data Set Failure . 9.13.6 System Logger (LOGR) Couple Data Set Failure . . . . . . . . 9.14 Sysplex Timer Failures . . . . . . . . . . . . . . . . . . . . . . . . . . 9.15 Restarting IMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.15.1 IMS/IRLM Failures Within a System 9.15.2 CEC or MVS Failure . . . . . . . . . . . . . . . . . . . . . . . . . 9.15.3 Automating Recovery . . . . . . . . . . . . . . . . . . . . . . . . 9.16 Restarting DB2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.17 Restarting CICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.17.1 CICS TOR Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 9.17.2 CICS AOR Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.18 Recovering Logs . . . . . . . . . . . . . . . . 9.18.1 Recovering an Application Failure . . . . . . . . . . . . . . . . . . . . 9.18.2 Recovering an MVS Failure 9.18.3 Recovering from a Sysplex Failure . . . . . . . . . . . . . . . . 9.18.4 Recovering from System Logger Address Space Failure . . . 9.18.5 Recovering OPERLOG Failure . . . . . . . . . . . . . . . . . . . 9.19 Restarting an OPC/ESA Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.20 Recovering Batch Jobs under OPC/ESA Control 9.20.1 Status of Jobs on Failing CPU . . . . . . . . . . . . . . . . . . . 9.20.2 Recovery of Jobs on a Failing CPU . . . . . . . . . . . . . . . . Chapter 10. Disaster Recovery Considerations 10.1 Disasters and Distance . . . . . . . . . . . 10.2 Disaster Recovery Sites . . . . . . . . . . 10.2.1 3990 Remote Copy . . . . . . . . . . . 10.2.2 IMS Remote Site Recovery . . . . . . . 10.2.3 CICS Recovery with CICSPlex SM 10.2.4 DB2 Disaster Recovery . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
205 205 205 206 206 206 206 206 207 207 207 207 208 209 210 210 210 211 211 211 211 212 212 212 213 213 213 213 213 214 214 214 215 215 215 215 216 217 218 221 221 222 222 222 223 224 226 227 228 232 232 233
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix A. Sample Parallel Sysplex MVS Image Members A.1 Example Parallel Sysplex Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 IPLPARM Members . . . . . . . . . . . . . . . . . . . . . . . . . A.2.1 LOADAA . . . . . . . . . . . . . . . . . . . . . A.3 PARMLIB Members A.3.1 IEASYMAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.2 IEASYS00 and IEASYSAA A.3.3 COUPLE00 . . . . . . . . . . . . . . . . . . . . . . . . A.3.4 JES2 Startup Procedure in SYS1.PROCLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.5 J2G . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3.6 J2L42 A.4 VTAMLST Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4.1 ATCSTR42
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
xi
A.4.2 ATCCON42 . . . . . A.4.3 APCIC42 . . . A.4.4 APNJE42 . . . A.4.5 CDRM42 . . . . A.4.6 MPC03 A.4.7 TRL03 . . . . . . A.4.8 APAPPCAA A.5 Allocating Data Sets . A.5.1 ALLOC JCL
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
234 235 235 236 236 236 237 238 238 241 241 243 245 245 245 246 246 248 248 249 249 255 259 259 260 263 267 267 270 271 272 275 285 289
Appendix B. Structures, How to ... . . . . . . . . . . . . . . B.1 To Gather Information on a Coupling Facility . . . . . . B.2 To Gather Information on Structure and Connections . B.3 To Deallocate a Structure with a Disposition of DELETE B.4 To Deallocate a Structure with a Disposition of KEEP . B.5 To Suppress a Connection in Active State . . . . . . . . . B.6 To Suppress a Connection in Failed-persistent State . . . . . . . . . . . . . . B.7 To Monitor a Structure Rebuild . . . . . . . . . . . . . . . . B.8 To Stop a Structure Rebuild . . . . . B.9 To Recover from a Hang in Structure Rebuild Appendix C. Examples of CFRM Policy Transitioning C.1 Changing the Structure Definition . . . . . . . . . . . . . C.2 Changing the Coupling Facility Definition Appendix D. Examples of Sysplex Partitioning . D.1 Partitioning on Operator Request . . . . . . D.2 System in Missing Status Update Condition Appendix E. Spin Loop Recovery
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix F. Dynamic I/O Reconfiguration Procedures . F.1 Procedure to Make the System Dynamic I/O Capable . . . . . . . . . . . F.2 Procedure for Dynamic Changes . . . . . . . F.3 Hardware System Area Considerations . . . . . F.4 Hardware System Area Expansion Factors Glossary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xii
Figures
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. Sample Parallel Sysplex Continuous Availability Configuration . . ESCON Logical Paths Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTC Configuration . . . . . . . . . . Recommended XCF Signalling Path Configuration Recommended DASD Path Configuration . . . . . . . . . . . . . . . . ISCKDSF R16 ESCON Logical Path Report . . . . . . . . . . . . . . . Console Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Recommended Console Configuration . . . . . . . . . . . . . . . . . 9910 Local UPS and 9672 Rx2 and Rx3 Indirect Catalog Function with SYSRESA . . . . . . . . . . . . . . . . Indirect Catalog Function with SYSRESB . . . . . . . . . . . . . . . . Alternate Consoles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of Failure Dependent Connection . . . . . . . . . . . . . . . Example of Failure Dependent/Independence Connections . . . . . . . Basic Relationship between Sysplex Name and System Group SMSplex Consisting of System Group and Individual System Name . . . . . . . . . . . . . . . . . . . . . . . . . . Isolating a Failing MVS INTERVAL and ISOLATETIME Relationship . . . . . . . . . . . . . . . SFM Policy with the ISOLATETIME Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SFM LPARs Actions Timings Sample JCL to Delete a SFM Policy . . . . . . . . . . . . . . . . . . . Figure to Show Timing Relationships . . . . . . . . . . . . . . . . . . JES3 *I S Display Showing Non-Existent Systems . . . . . . . . . . . JES3-Managed and Auto-Switchable Tape . . . . . . . . . . . . . . . NJE Node Definitions Portion of JES3 Init Stream . . . . . . . . . . . . . . . . . . . . . . . Sample JES3 Proc for Use by Multiple Globals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cloned CICSplex CICSPlex SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sample IMS 5.1 Configuration Sample DB2 Data Sharing Configuration . . . . . . . . . . . . . . . . Sample VSAM RLS Data Sharing Configuration . . . . . . . . . . . . START Command When Adding a New JES3 Global . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volume Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Copy SYSRESA SMP/E ZONEEDIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add IPL Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example parallel sysplex Environment Introducing a New Software Level into the parallel sysplex . . . . . . . . . . . . . . . . . . . . . . . . . Redistributing Workload on TORs Redistributing Workload on AORs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DB2 Data Sharing Availability Sample Checkpoint Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3990-6 Peer-to-Peer Remote Copy Configuration . . . . . . . . . . . . . 3990-6 Extended Remote Copy Configuration . . . . . . . . . . . . . . . IMS Remote Site Recovery Configuration . . . . . . . . . DB2 Data Sharing Disaster Recovery Configuration . . . . . . . . . . . . . . . . Example Parallel Sysplex Configuration LOADAA Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IEASYMAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IEASYS00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IEASYSAA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 13 . 15 . 16 . 19 . 20 . 23 . 25 . 28 . 31 . 32 . 44 . 48 . 49 . 51 . 51 . 59 . 61 . 62 . 67 . 72 . 74 . 88 . 90 . 91 . 92 . 96 . 99 100 104 107 151 152 152 153 153 154 155 162 163 168 200 217 218 219 220 221 222 223 224 225
xiii
52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81.
COUPLE00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JES2 Member in SYS1.PROCLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J2G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J2L42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ATCSTR42 ATCCON42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APCIC42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APNJE42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CDRM42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MPC03 TRL03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APAPPCAA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allocating System Specific Data Sets . . . . . . . . . . . . . . . . . . . Coupling Facility Display . . . . . . . . . . . . . . . . . . . . . . . . . . . Structures and Connections Display . . . . . . . . . . . . . . . . . . . . . . . . . Monitoring Structure Rebuild through Exploiter s Messages Monitoring Structure Rebuild by Displaying Structure Status . . . . . CFRM Policy Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JCL to Install a New CFRM Policy . . . . . . . . . . . . . . . . . . . . . Original CFRM Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New CFRM Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VARY OFF a System without SFM Policy Active . . . . . . . . . . . . . VARY OFF a System with an SFM Policy Active . . . . . . . . . . . . . System in Missing Status Update Condition and No Active SFM Policy System in Missing Status Update with an Active SFM Policy and CONNFAIL(YES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resolution of a Spin Loop Condition HCD Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONFIG Frame Fragment HCD Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic I/O Customization . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
226 227 228 232 233 234 235 235 236 236 236 237 238 241 243 246 247 250 252 256 256 259 260 260 261 264 268 268 269 270
. . . . . .
xiv
Tables
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Couple Data Set Placement Recommendations . . . . . . . . . . . JES2 Checkpoint Placement Recommendations . . . . . . . . . . . References Containing Information on the Use of System Symbols . . . . . . . . . . . . Summary of SFM Keywords and Parameters . . . . . . . . . . . . . . . . . . . . . . . . IMS Data Sets in Sysplex Automation Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Support of REBUILD by IBM Exploiters Support of ALTER by IBM Exploiters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DB2 Changes Subsystem Recovery Summary Part 1 . . . . . . . . . . . . . . . Subsystem Recovery Summary Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of Couple Data Sets
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xv
xvi
Special Notices
This publication is intended to help customers systems and operations personnel and IBM systems engineers to plan, implement and use a parallel sysplex in order to get closer to a goal of continuous availability. It is not intended to be a guide to implementing or using parallel sysplex as such. It only covers topics related to continuous availability. The information in this publication is not intended as the specification of any programming interfaces that are provided by MVS Version 5 or any other product mentioned in this redbook. See the PUBLICATIONS section of the IBM Programming Announcement for MVS Version 5, or other products, for more information about what publications are considered to be product documentation. References in this publication to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM s product, program, or service may be used. Any functionally equivalent program that does not infringe any of IBM s intellectual property rights may be used instead of the IBM product, program or service. Information in this book was developed in conjunction with use of the equipment specified, and is limited in application to those specific hardware and software products and levels. IBM may have this document. these patents. Licensing, IBM patents or pending patent applications covering subject matter in The furnishing of this document does not give you any license to You can send license inquiries, in writing, to the IBM Director of Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA.
The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The information about non-IBM (VENDOR) products in this manual has been supplied by the vendor and IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer s ability to evaluate and integrate them into the customer s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. Reference to PTF numbers that have not been released through the normal distribution process does not imply general availability. The purpose of including these reference numbers is to alert IBM customers to specific information relative to the implementation of the PTF when it becomes available to each customer according to the normal IBM PTF distribution process. The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries:
ACF/VTAM AIX CICS CICS/MVS Copyright IBM Corp. 1995 Advanced Peer-to-Peer Networking APPN CICS/ESA CUA
xvii
DATABASE 2 DFSMS DFSMSdfp DFSMShsm Enterprise Systems Connection Architecture ES/9000 ESA/390 ESCON Hardware Configuration Definition IMS IPDS Magstar MVS/ESA MVS/XA PR/SM PS/2 RAMAC RMF S/390 SQL/DS System/360 System/390 SystemView Virtual Machine/Extended Architecture VM/XA VTAM
DB2 DFSMS/MVS DFSMSdss DFSORT ES/3090 ESA/370 ESCON XDF GDDM IBM IMS/ESA LPDA MVS/DFP MVS/SP NetView Processor Resource/Systems Manager RACF RETAIN S/370 SAA Sysplex Timer System/370 Systems Application Architecture Virtual Machine/Enterprise Systems Architecture VM/ESA VSE/ESA
The following terms are trademarks of other companies: C-bus is a trademark of Corollary, Inc. PC Direct is a trademark of Ziff Communications Company and is used by IBM Corporation under license. UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Limited. Windows is a trademark of Microsoft Corporation.
xviii
Preface
This document discusses how the parallel sysplex can help an installation get closer to a goal of Continuous Availability. This document is intended for customer systems and operations personnel responsible for implementing parallel sysplex, and the IBM Systems Engineers who assist them. It will also be useful to technical managers who want to assess the benefits they can expect from parallel sysplex in this area.
Part 1, Configuring for Continuous Availability This part describes how to configure both the hardware and software in order to eliminate planned outages and minimize the impact of unplanned outages. Chapter 1, Hardware Configuration This chapter discusses how to design a hardware configuration for continuous availability. Chapter 2, System Software Configuration This chapter describes how to configure the system to support continuous availability and minimize the effort needed to maintain and run it. Chapter 3, Subsystem Software Configuration This chapter deals with configuring the various subsystems to provide an environment that will support the goal of continuous availability.
Part 2, Making Planned Changes This part describes how you can make changes to the sysplex without disrupting the running of the applications. Chapter 4, Systems Management in a Parallel Sysplex This chapter discusses the importance of maintaining good systems management disciplines in a parallel sysplex environment. Chapter 5, Coupling Facility Changes This chapter deals with changes that can be made to the coupling environment, for installation, planned or unplanned maintenance. Chapter 6, Hardware Changes This chapter discusses how to add, change or remove hardware elements of the sysplex in a non-disruptive way. Chapter 7, Software Changes This chapter discusses how to make changes such as adding, modifying or removing system images and subsystems. Chapter 8, Database Availability
xix
This chapter discusses subsystem (CICS, IMS, DB2) configuration options to minimise the impact of making database changes.
Part 3, Handling Unplanned Outages This part describes how to handle unplanned outages and recover from error situations with minimal impact to the applications. Chapter 9, Parallel Sysplex Recovery This chapter discusses how to recover from unplanned hardware and software failures. Chapter 10, Disaster Recovery Considerations This chapter contains a discussion of disaster recovery considerations specific to the parallel sysplex environment.
Related Publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this document. The publications listed are sorted in alphabetical order.
CICS/ESA Release Guide GC33-0655 CICS VSAM Recovery Guide SH19-6709 CICS/ESA Dynamic Transaction Routing in a CICSPlex , SC33-1012 CICS/ESA Version 4 Intercommunication Guide , SC33-1181 CICS/ESA Version 4 Recovery and Restart Guide , SC33-1182 CICS/ESA Version 4 CICS-IMS Database Control Guide , SC33-1184 Concurrent Copy Overview GG24-3936 DB2 Version 4 Data Sharing: Planning and Administration , SC26-3269 DB2 Version 4 Release Guide , SC26-3394 DCAF V1.2.1 Installation and Using Guide , SH19-6838 DFSMS/MVS V1 R3 DFSMSdfp Storage Administration Reference , SC26-4920 ES/9000 and ES/3090 PR/SM Planning Guide , GA22-7123 ES/9000 9021 711-based Models Functional Characteristics , GA22-7144 ES/9000 9121 511-based Models Functional Characteristics , GA24-4358 Hardware Management Console Application Programming Interfaces , SC28-8141 Hardware Management Console Guide , GC38-0453. IBM CICS Transaction Affinities Utility User s Guide , SC33-1159 IBM CICSPlex Systems Manager for MVS/ESA Concepts and Planning , GC33-0786. IBM Token-Ring Network Introduction and Planning Guide , GA27-3677 IBM 3990 Storage Control Reference for Model 6 , GA32-0274 IBM 9037 Sysplex Timer and System/390 Time Management , GG66-3264 Implementing Concurrent Copy , GG24-3990 IMS/ESA Version 5 Administration Guide: Data Base , SC26-8012 IMS/ESA Version 5 Administration Guide: System , SC26-8013 IMS/ESA Version 5 Administration Guide: Transaction Manager , SC26-8014 IMS/ESA V5 Operations Guide , SC26-8029 IMS/ESA Version 5 Sample Operating Procedures , SC26-8032 JES2 Multi-Access Spool in a Sysplex Environment , GG66-3263 Large System Performance Reference Document , SC28-1187 LPAR Dynamic Storage Reconfiguration , GG66-3262 MVS/ESA Hardware Configuration Definition:Planning , GC28-1445
xx
MVS/ESA RMF User s Guide , GC33-6483 MVS/ESA RMF V5 Getting Started on Performance Management , LY33-9176 MVS/ESA SML:Implementing System-Managed Storage, SC26-3123 MVS/ESA SP V5 Hardware Configuration Definition: User s Guide , SC33-6468 MVS/ESA SP V5 Assembler Services Guide , GC28-1466 MVS/ESA SP V5 Authorized Assembler Services Guide , GC28-1467 MVS/ESA SP V5 Authorized Assembler Services Reference, Volume 2 , GC28-1476 MVS/ESA SP V5 Conversion Notebook , GC28-1436 MVS/ESA SP V5 Initialization and Tuning Guide , SC28-1451 MVS/ESA SP V5 Initialization and Tuning Reference , SC28-1452 MVS/ESA SP V5 Installation Exits , SC28-1459 MVS/ESA SP V5 JCL Reference , GC28-1479 MVS/ESA SP V5 JES2 Initialization and Tuning Reference , SC28-1454 MVS/ESA SP V5 JES2 Commands , GC28-1443 MVS/ESA SP V5 JES3 Commands , GC28-1444 MVS/ESA SP V5 Planning: Global Resource Serialization , GC28-1450 MVS/ESA SP V5 Planning: Security , GC28-1439 MVS/ESA SP V5 Planning: Operations , GC28-1441 MVS/ESA SP V5 Planning: Workload Management , GC28-1493 MVS/ESA SP V5 Programming: Assembler Services References , GC28-1474 MVS/ESA SP V5 Programming: Sysplex Services Guide , GC28-1495 MVS/ESA SP V5 Programming: Sysplex Services Reference , GC28-1496 MVS/ESA SP V5 Setting Up a Sysplex , GC28-1449 MVS/ESA SP V5 System Commands , GC28-1442 MVS/ESA SP V5 Sysplex Migration Guide , SG24-4581 MVS/ESA SP V5 System Management Facilities (SMF) , GC28-1457 S/390 MVS Sysplex Application Migration , GC28-1211 S/390 MVS Sysplex Hardware and Software Migration , GC28-1210. S/390 MVS Sysplex Overview: An Introduction to Data Sharing and Parallelism , GC28-1208 S/390 MVS Sysplex Systems Management , GC28-1209 S/390 9672/9674 Managing Your Processors , GC38-0452 S/390 9672/9674 System Overview , GA22-7148 SMP/E R8 Reference , SC28-1107 Sysplex Timer Planning , GA23-0365 TSO/E V2 User s Guide , SC28-1880 TSO/E V2 CLISTs , SC28-1876 TSO/E V2 Customization , SC28-1872 VTAM for MVS/ESA Version 4 Release 3 Migration Guide , GC31-6547
Automating CICS/ESA Operations with CICSPlex SM and NetView , GG24-4424 Batch Performance , SG24-2557 CICS Workload Management Using CICSPlex SM And the MVS/ESA Workload Manager , GG24-4286 CICS/ESA and IMS/ESA: DBCTL Migration For CICS Users , GG24-3484 DFSMS/MVS Version 1 Release 3.0 Presentation Guide , GG24-4391 DFSORT Release 13 Benchmark Guide , GG24-4476 Disaster Recovery Library: Planning Guide , GG24-4210 MVS/ESA Software Management Cookbook , GG24-3481 MVS/ESA SP-JES2 Version 5 Implementation Guide , SG24-4583 MVS/ESA SP-JES3 Version 5 Implementation Guide , SG24-4582
Preface
xxi
MVS/ESA Version 5 Sysplex Migration Guide , SG24-4581 MVS/ESA Sysplex Migration Guide , GG24-3925 Planning for CICS Continuous Availability in an MVS/ESA Environment , SG24-4593 RACF Version 2 Release 1 Installation and Implementation Guide , GG2 RACF Version 2 Release 2 Technical Presentation Guide , GG24-2539 Sysplex Automation and Consoles , GG24-3854 S/390 Microprocessor Models R2 and R3 Overview , SG24-4575 S/390 MVS Parallel Sysplex Continuous Availability Presentation Guide , SG24-4502 S/390 MVS Parallel Sysplex Performance , GG24-4356 S/390 MVS/ESA Version 5 WLM Performance Studies , SG24-4352 Storage Performance Tools and Techniques for MVS/ESA , GG24-4045
A complete list of International Technical Support Organization publications, known as redbooks, with a brief description of each, may be found in:
http://www.redbooks.ibm.com/redbooks
IBM employees may access LIST3820s of redbooks as well. The internal Redbooks home page may be found at the following URL:
http://w3.itsc.pok.ibm.com/redbooks/redbooks.html
xxii
Acknowledgments
This publication is the result of a residency conducted at the International Technical Support Organization, Poughkeepsie Center. The advisor for this project was:
Preface
xxiii
xxiv
Provide a single system image to the end-user of the application Support multiple copies of the applications, and provide services for dynamic balancing of the workload over the multiple copies Provide locking facilities to allow data to be shared among the multiple copies of the applications with integrity Provide services to facilitate communication between the multiple copies
From the perspective of continuous availability, the two most important functions provided by a parallel sysplex are:
Data Sharing
Which allows multiple instances of an application running on multiple systems to work on the same databases simultaneously.
Workload Balancing Which means that the workload can be distributed evenly across these multiple application instances. This is made possible by the fact that they can share data.
These radically new possibilities provided by parallel sysplex change the way we approach continuous availability. Today, a specific system provides the infrastructure for a major customer application. The loss or degradation of that system can severely impact the customer s business. In the parallel sysplex environment, where multiple cooperating systems provide the infrastructure, the loss or degradation of one of the many identical systems has little impact. This means that we can now design a system that is fault-tolerant from both a hardware and software perspective, giving us the possibility of the following:
Very High Availability With redundancy in both hardware and software we can eliminate points-of-failure, and workload balancing can ensure that the work being done on a lost component will be distributed across the remaining ones.
Nondisruptive Change Hardware changes can be made by removing the system that needs to be changed from the sysplex while the applications continue to run on the remaining systems, making the change, and then returning the system to the sysplex. Software changes can be achieved in a similar way, provided that the changed version of the software in question can co-exist with the current ones in the sysplex. This coexistence (at level N and N+1) is a design objective of the IBM systems and subsystems that support parallel sysplex.
This shift in philosophy changes the way we think about designing the configuration in a parallel sysplex. In order to take advantage (or exploit ) the parallel sysplex there must be more than one of each hardware component, and the software must be designed for cloning. If the application requires N images in order to provide the processing capacity, then the system designer should provide N+1 images in the sysplex.
1.1.2 Why N + 1 ?
When designing systems for high availability we must always consider the possibility that a component can fail. If we build the system with redundant components such that, even if any component does fail, the system will continue to function, then we have a fault-tolerant system. We can also say that we have no single point of failure. Obviously this component redundancy has a cost. The simplest, but most expensive solution, is to duplicate everything. This is often not an economically viable alternative. Fortunately there are others.
If we assume that the individual components of the system are inherently reliable, that is that the probability of failure is very low for each component, then the probability of more than one failing at any one time is extremely low, and can be ignored. So, if we need a number of components (N) to do a particular job, all we need to do is allocate one extra to allow for the possibility of failure, and these N+1 components give us the redundancy we need. The larger the number of components (N) sharing the work, the less the relative cost of this redundancy. In other words, if we are flying in a two-engined plane and want to be safe in the case of an engine failure, then one engine must be able to fly the plane. This means one of the two engines (50%) is redundant. If it is a four-engined plane then we want to be able to continue with three engines, so the fourth one (25%) is redundant. In the same way we have been building hardware redundancy into computer systems for some time, the number of channels to I/O units, power supplies in the processor, and so on. Now with parallel sysplex we can take this concept one step further, and introduce N+1 redundancy in the number of machines or system images in the system. This allows us to configure for the failure of entire machines or system images and still keep the system on the air.
Figure 1. Sample Parallel Sysplex Continuous Availability Configuration. and all the links are duplicated to eliminate single points of failure.
Chapter 1. H a r d w a r e Configuration
1.2 Processors
The first prerequisite is that we have multiple processors following the N+1 philosophy outlined above.
The performance overhead on the sysplex (between 0.5% and 1% for each extra machine). The extra human effort in managing more machines (which will depend on how well the systems management procedures and tools can handle multiple machines). The extra work involved in maintaining more system images (which will depend on how well the clones are replicated and on how well the naming and other installation standards support this). How useful small machines are in handling the workload. If there are components in the workload that require larger machines to perform satisfactorily then this will tend to reduce the number of ways we can split the sysplex.
1.3.3 CF Links
The recommended number of CF links to each machine in the sysplex is at least two, for availability reasons. You may need more for performance. See Parallel Sysplex Performance , GG24-4356. Note that each of these receiver links (at the CF end) is separate. Sender links (at the MVS end) can be shared between partitions in a fashion similar to EMIF, so even if you have several partitions you will only need two links per machine for each CF you need to connect to. If you have an MP-machine which you plan to partition for any reason, then this means two links per CF on each side of the machine. In the coupling facility, one Intersystem Channel Adapter (fc #0014) is required for every two coupling links (#0007 or #0008). The Intersystem Channel Adapter is not hot pluggable, but the coupling links are. If you do not have a redundant 9674 to switch the coupling load to, you may want to consider installing additional Intersystem Channel Adapters to allow for additional coupling links to be installed without an outage in the future. For details on hot plugging, refer to the 9672/9674 System Overview , GA22-7148.
While designing the coupling facility environment, you should consider which structures must be relocated to an alternate coupling facility. Some subsystems can continue to operate without their coupling facility structure, although there may be a loss of performance. For example, the JES2 checkpoint can be relocated to DASD and the RACF structure can simply be deallocated while coupling facility maintenance is being performed. For the remaining structures, you must ensure that enough capacity (storage, CPU cycles, link connections structure IDs, etc.) exists on an alternate coupling facility to allow structures to be rebuilt there. When you set up your coupling facility configuration you should provide definitions that enable the structures to be moved or rebuilt; structures being moved to the alternate coupling facility must have the alternate coupling facility name in the PREFLIST statement. The following is an example on how to define a structure that can be rebuilt:
With a UPS With an optional battery backup feature With a UPS plus a battery backup feature
For more details on this see 1.15.2, 9672/9674 Protection against Power Disturbances on page 27. The volatility or nonvolatility of the coupling facility is reflected by the volatility attribute, and can be monitored by the system and subsystems to decide on recovery actions in the case of power failure. There are some subsystems that are very sensitive to the status of this coupling facility attribute, like the system logger, and they can behave in different ways depending on the volatility status. To set the volatility attribute you should use the coupling facility control code command:
Mode Powersave
This is the default setup and automatically determines the volatility status of the coupling facility based on the presence of the battery backup feature. If
the battery backup is installed and working, the CFCC sets its status to nonvolatile. The battery backup feature will preserve coupling facility storage contents across a certain time interval (default is 10 seconds).
Mode Non-Volatile
This command should be used to inform the CFCC to set non-volatile status for its storage because a UPS is installed.
Mode Volatile
This command informs the CFCC to put its storage in volatile status irrespective of whether there is a battery or not.
There are considerations in coupling facility planning depending on the sensitivity of subsystem users to coupling facility volatile/nonvolatile status:
JES2 JES2 can use a coupling facility structure for primary checkpoint data set, and its alternate checkpoint data set can either be in a coupling facility or on DASD. Depending on the volatility of the coupling facility, JES2 will or will not allow you to have both primary and secondary checkpoint data sets on the coupling facility.
Logger The system logger can be sensitive to the volatile/nonvolatile status of the coupling facility where the LOGSTREAM structures are allocated. Particularly, depending on the coupling facility status, the system logger is able to protect its data against a double failure (MVS failure together with the coupling facility). When you define a LOGSTREAM you can specify the following parameters:
STG_DUPLEX(NO/YES)
Specifies whether the coupling facility logstream data should be duplexed on DASD staging data sets. You can use this specification together with the DUPLEXMODE parameter to be configuration independent.
DUPLEXMODE(COND/UNCOND)
Specifies the conditions under which the coupling facility log data will be duplexed in DASD staging data sets. COND means that duplexing will be done only if the logstream contains a single point of failure and is therefore vulnerable to permanent log data loss: - Logstream is allocated to a volatile coupling facility residing on the same machine as the MVS system. - Duplexing will not be done if the coupling facility for the logstream is nonvolatile and resides on a different machine than the MVS system.
DB2 DB2 requests of MVS that structures be allocated in a nonvolatile coupling facility; however, it does not prevent allocation in a volatile coupling facility. DB2 does issue a warning message if allocation occurs into a volatile coupling facility. A change in volatility after allocation does not have an effect on your existing structures. The advantages of a nonvolatile coupling facility are that if you lose power to a coupling facility that is configured to be nonvolatile, the coupling facility
Chapter 1. H a r d w a r e Configuration
enters power save mode, saving the data contained in the structures. When power is returned, there is no need to do a group restart, and there is no need to recover the data from the group buffer pools. For DB2 systems requiring high availability, nonvolatile coupling facilities are recommended.
SMSVSAM Lock The coupling facility IGWLOCK00 lock structure is recommended to be allocated in a nonvolatile coupling facility. This lock structure is used to enforce the protocol restrictions for VSAM RLS data sets and maintain the record level locks. The support requires a single CF lock structure.
IRLM Lock The lock structures for IMS or DB2 locks are recommended to be allocated in a nonvolatile coupling facility. Recovery after a power failure is faster if the locks are still available.
IMS Cache Directory The cache directory structure for VSAM or OSAM databases can be allocated in a nonvolatile or volatile coupling facility.
VTAM The VTAM Generic Resources structure ISTGENERIC can be allocated in either a nonvolatile or a volatile coupling facility. VTAM has no special processing for handling a coupling facility volatility change.
1.4.1 Duplicating
When the Expanded Availability Feature is installed, two 9037 devices linked to one another, provide a synchronized, redundant configuration. This ensures that the failure of one 9037, or a fiber optic cable, will not cause loss of time synchronization. It is recommended that each 9037 have its own AC power source, so that if one source fails, both devices are not affected. Note that these two timers must be within 2.2 meters of one another. The sysplex timer attaches to the processor via the processor s Sysplex Timer Attachment Feature. Dual ports on the attachment feature permit redundant connections, so that there is no single point of failure.
10
1.4.2 Distance
The processors are connected to the timer by a multi-mode fiber, and can be up to three kms from the timer, depending on the fiber. Distances between the sysplex timer and CEC s beyond 3,000 meters are supported by RPQ 8K1919. RPQ 8K1919 allows the use of single mode fiber optic (laser) links between the processor and the 9037. To support single mode fiber on the 9037, a special LED/laser converter has been designed called the 9036 Model 003. The 9036-003 is designed for use only with a 9037, and is available only as RPQ 8K1919. Two 9036-003 extenders (two RPQs) are required between the 9037 and each sysplex timer attachment port on the processor. The single-mode link between the two 9036-003 extenders can be up to 20 kms.
1.4.4 Protection
To prevent accidental system disruption, installations should use the password protection provided by the 9037. In addition, authorized users should make it a practice to always leave the console set to Authorization Level 1 instead of Level 2. Authorization Level 2 is required to be set prior to any disruptive functions. Be aware that when a 9037 console user enters the Set the Time menu and performs the function, the 9037s will perform a power-on-reset. This is extremely disruptive to processors in a multisystem Sysplex, all the MVS Systems will enter a X 0A2 wait state.
Chapter 1. H a r d w a r e Configuration
11
The general guidelines for configuring I/O devices for high availability include:
Always try to configure with a minimum of two ESCON directors. Use of 3990 Dual Copy is recommended for critical DASD data sets, such as the MVS Master Catalog, which are placed behind 3990 subsystems. Where possible, for critical DASD data sets, use RAMAC, which has many availability features, such as RAID-5 operation, predictive failure analysis and redundant power supplies. Try to spread the channel paths to a device using nonadjacent channel numbers to nonadjacent ESCON director ports to different controllers or control units. Duplicate single path devices, such as screens to be used as consoles.
In the subsequent sections, the specific configuration requirements for the following critical device types are discussed:
12
1.6 CTCs
CTCs provide an inter-system communication vehicle for functions such as XCF and VTAM. While it is possible for inter-system communications to take place through mechanisms other than CTC devices, such as a coupling facility for XCF signalling paths, or a 3745 for VTAM, CTCs should be considered at least for backup purposes in a parallel sysplex environment.
The next few sections discuss considerations for configuring CTC devices.
Chapter 1. H a r d w a r e Configuration
13
Ensure the different CHPIDs supporting an application s primary and alternate CTC devices are selected from different channel groups on the host processor. If the SCTC CHPIDs are configured through ESCON directors, ensure that the CHPIDs are attached to different ESCON directors.
Note that XCF has built-in flexibility in its ability to use CTCs. XCF will immediately start using any online unallocated CTC device as a signaling path when prompted through the SETXCF START,PI/PO operator command. It is not necessary to pre-define device numbers for XCF signalling path use.
For high availability you should plan redundant elements. The best solution is having more than one CTC in each direction and two XCF structures allocated in two different coupling facilities. Defining an XCF structure is easier than handling a CTC configuration. An XCF structure offers a better recovery since it can be rebuilt in case of failure. CTC connections are faster than a coupling facility in message switching.
14
In planning XCF signalling through a coupling facility structure, be careful to avoid a last structure condition. In this case, XCF will take longer to complete a rebuild process because all signalling required to the process itself are through the couple data set.
RAMAC The RAMAC Array DASD and the RAMAC Array Subsystem are high availability, fault-tolerant storage subsystems which use a number of techniques to ensure full availability of data even when a hardware failure occurs. These include dynamic sparing, multi-level error correction (RAID-5 protection as well as drive and CKD error correction) and Dual Copy.
3990 PPRC For more information refer to 10.2.1, 3990 Remote Copy on page 215.
3990 XRC
Chapter 1. H a r d w a r e Configuration
15
For more information refer to 10.2.1, 3990 Remote Copy on page 215.
IMS RSR For more information refer to 10.2.2, IMS Remote Site Recovery on page 216.
An important point to remember is that while we can guard against physical loss of data by one of the mirroring techniques described above, this does not protect against logical corruption of the data by, for example, a bad program. This is a problem we have always had, and the solution remains the same. We must take backup copies of the database at regular intervals, and log all changes to it. We need procedures to be able to go back to any one of these backup copies, and then apply subsequent updates from the logs. Refer to Chapter 2, System Software Configuration on page 29 for specific information on data set placement guidelines for critical system data sets, and Chapter 3, Subsystem Software Configuration on page 95 for recommendations for critical subsystem data set placement.
16
Recommendations for configuring paths to DASD attached to 3990s and RAMAC subsystems are provided, along with a discussion of availability features such as the 3990 Model 3 and Model 6 Dual Copy and RAMAC RAID-5. While a number of different DASD control units are discussed below, the main considerations for a high availability parallel sysplex configuration are the folowing:
Availability features, such as Dual Copy Connectivity, that is, the number of ESCON logical paths supported by the control unit
Configure at least two paths to DASD. Configure multiple paths with the least number of common elements. Configure each path through a different: RAMAC or 3990 Storage Cluster (with power separation) Storage Path ESCON director
Chapter 1. H a r d w a r e Configuration
17
Do not define in the IOCP DASD paths which do not physically exist. nonexistent paths to DASD devices should not be defined in the IOCP. During CHPID recovery, MVS stops I/O operations to all devices potentially affected by the CHPID problem. This stoppage includes devices with paths defined over the CHPID, even if they do not physically exist on that CHPID. To avoid I/O response time delays that occur while a channel path is being recovered, define only those paths that physically exist to DASD devices.
Ensure duplicate devices are correctly coded in LPAR IOCDS. Avoid using external 3044 links on DASD paths. Some 3044 links are used to connect devices that are physically located in a different site. Avoid configuring such external 3044 links on CHPIDs that are also used for DASD paths. 3044 links that extend outside the building may be easily damaged.
Order 3990 and RAMAC paths in the HCD/IOCP definition. Do not configure non-DASD devices on same CHPID as a 3990 or RAMAC subsystem. Do not configure non-DASD devices on parallel CHPIDs attached to a 3990 or RAMAC. During recovery for the 3990 Reset Event Notification, a Reset Channel Path (RCHP) instruction is used. If, for example, you had a 37XX TP controller on the same channnel as the 3990, the RCHP instruction would cause the loss of TP sessions.
18
You should in any event spread I/O paths to the same control unit over several ESCON directors to minimize the effect of a failure of any one of them, and, in the case of models 001 or 002 to allow for the possibility to make changes in the directors.
Provides a single point of control for managing ESCON director switching changes Prevents accidental misconfiguration of paths Provides the operator with otherwise unavailable diagnostic information about a potentially complex I/O configuration in the parallel sysplex environment
Chapter 1. H a r d w a r e Configuration
19
+----------------+---------+---------+---------+------------------------------+ | LOGICAL PATH | | FULL | | HOST PATH GROUP ID | | | SYSTEM | ESCON | SP |------------+------+----------| |---------+------| ADAPTER | LINK | FENCES | CPU | CPU | CPU TIME | | NUMBER | TYPE | ID | ADDRESS | 0 1 2 3 | SERIAL # | TYPE | STAMP | +---------+------+---------+---------+---------+------------+------+----------+ | 1 | E | 00 | E502 | | 0000021330 | 9672 | ABCF560E | +---------+------+---------+---------+---------+------------+------+----------+ | 2 | E | 00 | E703 | | 0000030250 | 9672 | ABCE0103 | +---------+------+---------+---------+---------+------------+------+----------+ | 3 | E | 01 | CB03 | | 0000030250 | 9672 | ABCE0103 | +---------+------+---------+---------+---------+------------+------+----------+ | 4 | E | 01 | F104 | | 0000041330 | 9672 | ABCF534A | +---------+------+---------+---------+---------+------------+------+----------+ | 5 | E | 00 | E702 | | 0000020250 | 9672 | ABCF5635 | +---------+------+---------+---------+---------+------------+------+----------+ | 6 | E | 01 | F102 | | 0000021330 | 9672 | ABCF560E | +---------+------+---------+---------+---------+------------+------+----------+ | 7 | E | 01 | CB02 | | 0000020250 | 9672 | ABCF5635 | +---------+------+---------+---------+---------+------------+------+----------+ | 8 | E | 00 | E302 | | 0000020256 | 9672 | ABCF5650 | +---------+------+---------+---------+---------+------------+------+----------+ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / +---------+------+---------+---------+---------+------------+------+----------+ | 111 | E | 14 | E901 | | 0000011330 | 9672 | ABCF53BE | +---------+------+---------+---------+---------+------------+------+----------+ | 112-128 | N/E | 14-17 | | | | | | +---------+------+---------+---------+---------+------------+------+----------+ LOGICAL PATH : E = ESCON N/E = NOT ESTABLISHED
1.11 Fiber
When planning fiber connections between machine rooms, and particularly between separate buildings, remember that fiber cables are thin and can easily be broken. So if possible draw two cables by different routes with enough fiber in each for the total needs.
1.11.1 9729
When going over long distances and through common carrier fiber this could be expensive, so consider if a pair of 9729-001s could be an economic alternative. The 9729-001 Optical Wavelength Division Multiplexor (sometimes called Muxmaster) enables multiple bit streams, each possibly using a different communications protocol, bit rate, and frame format, to be multiplexed onto a single optical fiber for transmission between geographically separate locations. The 9729-001 can multiplex 10 full duplex bit streams, each at up to 622 Mb/s over a single optical fiber, up to 50 km distance. The 9729-001 uses wavelength division multiplexing (WDM) to transmit several independent bit streams over this single fiber link. The distance between the two locations can be up to 50 km (at a 200 Mb/s bit rate per channel) and goes down proportionally as the bit rate is increased. Thus the 9729 enables economical transmission of many simultaneous bit streams bidirectionally over a single fiber.
20
1.12 Consoles
Software and hardware consoles need to be configured in a parallel sysplex with regard to the possibility of a failure. Much of the considerations here are no different from any other environment.
Chapter 1. H a r d w a r e Configuration
21
console configuration, it is necessary that you understand the new roles played by master and alternate consoles. In the MVS world there are the folowing different types of consoles, as shown in Figure 7 on page 23:
MCS consoles Extended MCS consoles Integrated (System) Consoles Subsystem consoles
Only MCS and EMCS Consoles are affected by changes in a parallel sysplex configuration and require some planning consideration.
22
23
has no way of knowing that the subsystem consoles had been defined previously and will add them again; in this way, it is quite easy to reach the maximum of 99 consoles. A sysplex-wide IPL can be required once the limit is exceeded.
24
1.13 Tape
Configuring tape devices for high availability is important when critical applications have a dependency on those devices. However, in a parallel sysplex, while it is possible that tapes may exist in the configuration, there should be no dependence on those devices from an availability point of view. That is, a high availability CICS subsystem should not be relying on tapes for logging, for example. If tapes are part of the parallel sysplex configuration, say for the purposes of batch work, or backup, then their potential impact on the critical subsystems should be considered in terms of their recovery characteristics during failures.
1.13.1 3490
There are several models of the 3490 control unit that provide ESCON channel attachment, and hence the connectivity required for a parallel sysplex environment. Ensure that each MVS image has two paths configured to each 3490 device, and that each path has as few common physical components as possible.
Chapter 1. H a r d w a r e Configuration
25
1.14 Communications
The data center s communication with its users must also be ensured. The same N+1 considerations apply to communications equipment, lines, fiber trunks and so on, even out to the telecom provider. The network configuration in a parallel sysplex environment can take many different forms. We can think of the network in the following terms:
Physical configuration components 37x5 vs CTC vs 3172 Logical configuration subarea vs APPN Users of VTAM and their requirements
In this chapter, the discussion is concerned with the availability aspects of the physical configuration. The logical network and its users will be discussed in subsequent chapters.
1.14.2 3745s
As discussed in 1.9.4, DASD Path Recommendations on page 17, do not configure TP devices on the same channels as 3990 or RAMAC. Also, do not configure 3745 with 3490 or CTC.
1.14.3 CF Structure
VTAM uses a coupling facility structure to maintain information about generic resources. The structure name (ISTGENERIC) is a VTAM-defined hardcoded name which must be used.
1.15 Environmental
An essential part of keeping the data center running is the availability of power, cooling and other basic functions.
26
Chapter 1. H a r d w a r e Configuration
27
The machine room is fully protected, for a time duration that exceeds the capability of BBU or 9910. There is of course no point in having them installed. The machine room is only partially protected, or the coupling facility is at a nonprotected remote location, and the BBUs/9910 are a cost effective alternative to providing protection. The machine room protection is limited in time, and the power save state may provide the additional protection to rapidly recover from an extended power failure.
28
2.1 Introduction
Over time, installations have moved from large single images to multiple stand-alone systems where the workload is partitioned. This ensures the entire installation is not affected by a single system outage, but one of the workloads probably does not run. This has required system programmers to manage several systems, SYSRES volumes, master catalogs and parmlib members, all of which will be different. parallel sysplex improves on this situation by allowing the system programmer to manage several copies of a single system image. Sharing SYSRES, master catalog and parmlib members is possible as each system can be a clone of the others. The fact that each individual system has equal access to data enables one system to be lost and the workload balanced over the remaining systems. The ability to accommodate planned and unplanned outages and maintain availability is greatly improved in a parallel sysplex
29
The analysis of this is presented in MVS/ESA Software Management Cookbook , GG24-3481. Recommendations for the design of a shared SYSRES are discussed in 2.3.1, Shared SYSRES Design. Having implemented a shared SYSRES it of course becomes a single critical resource within the installation. As such it is highly recommended that shared SYSRESs are backed up using dual copy to ensure a live backup at all times.
SMP/E target library data sets. These will be explicitly cataloged in the master catalog and would consist of: SMPCSI, SMP/E target consolidated software inventory SMPLTS, SMP/E load module temporary store SMPMTS, SMP/E macro temporary store SMPSCDS, SMP/E save control data set SMPSTS, SMP/E source temporary store SMPLOG, SMP/E log data set SMPLOGA, SMP/E second log data set
System software data sets which would be cataloged using the indirect catalog function. See 2.3.2, Indirect Catalog Function for an explanation of the indirect catalog function.
Some system data sets cannot be shared between images in a parallel sysplex and therefore cannot be included on the shared SYSRES. These will need to be allocated specifically and placed on volumes other than the SYSRES. These data sets are:
LOGREC data sets STGINDEX data sets PAGE data sets SMF data sets
However by utilizing the substitution variables available in MVS, these data sets need only be defined once. Taking LOGREC as an example, by using the symbolic &SYSNAME as part of the data set name for the LOGREC parameter enables the IEASYSxx member to be shared across the sysplex and reduces the number of required IEASYSxx specifications.
30
Libraries that are cataloged using the indirect catalog function must reside on the system residence volume, or a data-set-not-found condition arises. The indirect catalog function works for any library, whatever high-level qualifier or name it may have. In the shared SYSRES environment this function is exploited by the following:
The indirect catalog function is used to reference the system libraries located on the SYSRES. The SMP/E target environment data sets are cataloged in the master catalog with specific volume names.
Such a system design would enable you to use different levels of target libraries independent of the IPL device you choose, and the ability to utilize the same master catalog.
Figure 10 shows an example of how the facility works when referencing SYS1.LINKLIB cataloged using the indirect catalog function VOLUME=******. SYS1.LINKLIB is located on the IPL volume. The active operational level of SYS1.LINKLIB is volume SYSRESA. Note that for the operational libraries, SYS1.PAGE01 and SYS1.HASPCKPT, the catalog data set entry has a specific volume pointer.
31
Figure 11 shows the indirect catalog function in the operational environment for SYS1.LINKLIB, where the active operational level is volume SYSRESB, the IPL volume. Note that the catalog pointers to system libraries SYS1.PAGE01 and SYS1.HASPCKPT are unaffected by the switch in IPL volumes. Therefore, it is possible to continue to use existing system libraries after a system upgrade. The indirect catalog function is a very common approach to enable alternating system residence volumes using the same master catalog.
32
Despite the fact that a shared master catalog is effectively a single point of failure, the increased complexity and management overhead of multiple master catalogs probably outweighs the risk of a shared catalog failure. For this reason the recommendation would still be a shared master catalog. It should be noted that the number of I/Os to the master catalog increases significantly when it is shared across a parallel sysplex. For performance reasons therefore, we recommended that you use DASD caching for the shared master catalog volume.
Increases system availability by allowing you to change the I/O configuration while MVS is running, thus eliminating the POR and IPL for selecting a new or changed I/O configuration definition. Allows you to make I/O configuration changes when your installation needs them rather than wait for a scheduled outage to make the changes. Minimizes the need to logically define hardware devices that do not physically exist in a configuration.
Hardware Configuration Definition (HCD) is the only way to provide a configuration file that is dynamic reconfiguration capable. The output of HCD is a file called an I/O definition file (IODF). Both hardware and software configurations are contained in the IODF. Not all devices support dynamic reconfiguration. Each device type is represented to the software by a unit information module (UIM), which is included in the product that contains the device support code. The UIM specifies whether or not the device type supports dynamic I/O configuration. If the device type does not support dynamic I/O configuration, the device definition can be added to the hardware I/O configuration definition while MVS is running, but the device cannot be added to the software I/O configuration definition. Thus, the device is not available for use until the next IPL of the configuration containing the device. If the device type supports dynamic I/O configuration, it is up to your installation to decide whether to define the device as dynamic in the software definition.
33
The specification for dynamic is through HCD where each device has to be defined as DYNAMIC Yes or DYNAMIC No. You must use HCD processing to create an IOCDS from the IODF and then perform a power-on reset (POR), which places the information about the hardware configuration in the hardware system area (HSA). The same IODF must be used at IPL time to define the software configuration. The IODF is pointed to by the LOADxx member; we recommend you use ** as the IODF identification in LOADxx, as this will use the IODF that matches to IOCDS active in the hardware. During the IPL process, the system reads the IODF and constructs UCBs, EDT and all device and I/O configuration related blocks. To be able to perform a software and hardware dynamic change, the hardware and software definitions must match. When the same IODF is used to define the hardware and software definitions, they will automatically match. So, to avoid losing the dynamic reconfiguration capability, it is strongly recommended that you keep software and hardware configuration IODF files in sync with one another. With the HCD ACTIVATE function or through the MVS ACTIVATE operator command, you can make changes to the current configuration without having to IPL the software or POR the hardware. Note: Dynamic changes are allowed from a hardware perspective only when they happen within the current LPAR setup. To add a new logical partition a Power-on reset is still required. How to Refer to Appendix F, Dynamic I/O Reconfiguration Procedures on page 267 for a complete discussion on how to make your processor I/O dynamic capable and on how to size the HSA storage.
2.5.1 Exceptions
There are a few exceptions that limit the capability of dynamic reconfiguration. Make sure that your installation is not affected by one of these exceptions:
34
2.5.1.2 Consoles
Graphic devices can be dynamically reconfigured if not allocated. Be careful in reconfiguring graphic devices mapping MVS consoles because you can create a gap between the I/O configuration and the group of devices dedicated as MVS console.
Sysplex couple data sets (also known as XCF couple data sets) Coupling Facility Resource Manager (CFRM) couple data sets Sysplex Failure Management (SFM) couple data sets Workload Manager (WLM) couple data sets Automatic Restart Manager (ARM) couple data sets System Logger (LOGR) couple data sets
Not all of those must be shared by every system in the sysplex. If they are not shared by some systems, then those systems will not be able to participate in whatever function that CDS is used for. The sysplex CDS must be shared by all systems in the parallel sysplex. When planning for the couple data sets, the following considerations should be taken into account. These considerations are applicable to not only the sysplex (or XCF) couple data sets but to the couple data sets for CFRM, SFM, WLM, ARM and LOGR policy data, as well.
An alternate couple data set. An alternate couple data set should be defined. To avoid a single point of failure in the sysplex, IBM recommends that for all couple data sets, you create an alternate couple data set on a different device, control unit, and channel from the primary.
35
A spare couple data set. When the alternate couple data set replaces the primary, the original primary data set is deallocated, and there is no longer an alternate couple data set. Because it is recommended to have an alternate couple data set always available to be switched, consider formatting three data sets before IPL. For example: SYS1.XCF.CDS01 Specified as primary couple data set SYS1.XCF.CDS02 Specified as alternate couple data set SYS1.XCF.CDS03 Spare
Then, if the alternate (CDS02) becomes the primary, you can issue the SETXCF COUPLE,ACOUPLE command to make the spare data set (CDS03) the alternate. Details of the command are found in MVS/ESA SP V5 System Commands . A couple data set can be switched by the operator through use of the SETXCF command, and by the system because of error conditions. The SETXCF command can be used to switch from the primary couple data set to the alternate couple data set. When the alternate couple data set becomes the primary, MVS uses the new primary couple data set for all systems and stops using the old primary couple data set.
The sysplex couple data set format utility determines the size of the data set based on the parameters coded on the DEFINEDS statement. To simplify adding systems to the sysplex ensure the MAXSYSTEM parameter specifies a number large enough to allow for growth in system images. This will enable the introduction of new system images without the need to create a new sysplex couple data set and switching to it using the SETXCF command. A multiple extent couple data set is not supported. For the sysplex couple data set, the format utility determines the size of the data set based on the number of groups, members, and systems specified, and allocates space on the specified volume for the data set. There must be enough contiguous space available on the volume for the couple data set. For the couple data sets that support administrative data, for example CFRM and SFM, the format utility determines the size of the data sets based on the number of parameters within the policy type that is specified.
A couple data set cannot span volumes. XCF does not support multi-volume data sets.
A couple data set is used by only one sysplex. The name of the sysplex for which a data set is intended must be specified when the couple data set is formatted. The data set can be used only by systems running in the sysplex whose name matches that in the couple data set. Each sysplex must have a unique name, and each system in the sysplex must have a unique name. Each couple data set for a sysplex, therefore, must be formatted using the sysplex name for which the couple data set is intended.
The couple data set must not exist prior to formatting. The format utility cannot use an existing data set. This prevents the accidental reformatting of an active couple data set. You must delete an existing couple data set before reformatting it.
36
Couple data set placement. Couple data sets should be placed on volumes that do not already have high I/O activity. It is essential that XCF be able to get to the volume whenever it has to. For the same reason, you should not place the couple data set on a volume that is any of the following: Is subject to reserves Has page data sets Has an SVC dump data set allocated
If SFM is active for status update missing conditions, and such a condition occurs because of the I/O being disrupted by any of the above, then systems will be partitioned from the sysplex. If the volume that the couple data set resides on is one for which DFDSS does a full volume backup, you will have to take this into consideration and possibly plan to switch the primary to the alternate during the backup to avoid a status update missing condition due to the reserve against the volume by DFDSS. See 2.12.4, RESERVE Activity on page 53 for discussion of a possible solution to this issue. When selecting a volume for an alternate couple data set, use the same considerations as described for the primary. When XCF writes to the couple data set, it first writes to the primary, waits for a successful completion, and then writes to the alternate. Not until the write to the alternate is successful is the operation complete.
Performance and availability considerations. The placement of couple data sets can improve performance, as well as availability. For maximum performance and availability, each couple data set would be on its own volume. However, this is an expensive approach. The following example provides an approach that is workable. Do not place the primary sysplex couple data set on the same volume as the primary CFRM couple data set. This is because they are I/O intensive. Table 1 shows our recommendation for couple data set placement that ensures the system can continue in a DASD failure situation. The placement of the other primary and alternate data sets is less critical and could be as shown or spread across the four volumes, dependent on installation preference.
Place couple data sets on volumes that are attached to cached control units with the DASD fast write (DFW) feature. This recommendation
Chapter 2. System Software Configuration
37
applies to all couple data sets in any size sysplex. Those couple data sets most affected by this are the sysplex couple data set and the CFRM couple data set. The recommendation becomes more critical the more systems you have in the sysplex. Place couple data sets on volumes that are not subject to reserve/release contention or significant I/O contention from sources not related to couple data sets. This is true even if the I/O contention is sporadic.
MIH considerations for couple data set. The interval for missing interrupts is specified on the DASD parameter of the MIH statement in the IECIOSxx parmlib member. The default time is 15 seconds. If there is little or no I/O contention on the DASD where the couple data sets reside, consider specifying a lower interval (such as seven seconds) to be used by MIH in scanning for missing interrupts. A lower value alerts MVS to a problem with a couple data set earlier.
Security considerations for couple data sets. Consider RACF-protecting the couple data sets with the appropriate level of security. If you are using RACF, you want to ensure that XCF has authorization to access RACF-protected sysplex resources. The XCF STC must have an associated RACF user ID defined in the RACF started task procedure table. The started procedure name is XCFAS.
38
Table 2. JES2 Checkpoint Placement Recommendations. The checkpoint definitions used here are the same as are used in the JES2 initialization deck. For more information, please refer to JES2 Version 5 Initialization and Tuning Reference .
Checkpoint Definition CKPT1 CKPT2 NEWCKPT1 Checkpoint Placement coupling facility DASD coupling facility
Note: NEWCKPT1 should not be in the same coupling facility as CKPT1 for availability reasons. NEWCKPT2 DASD
Note: It is recommended that if you are running with the JES2 primary checkpoint in a coupling facility, even if that coupling facility is nonvolatile, you should run with a duplex checkpoint on DASD as specified in the CKPT2 keyword or the checkpoint definition. This may require a modification to the checkpoint definition in the JES2 initialization parameters. More information on setting up a MAS in a parallel sysplex environment is found in JES2 Multi-Access Spool in a Sysplex Environment and MVS/ESA SP-JES2 Version 5 Implementation Guide . JES2 does not rebuild structures in the manner of other coupling facility users. Failure of a coupling facility with a JES2 checkpoint structure will invoke the JES2 reconfiguration dialog. At this time, you should have already planned the recovery route. If your recovery plans are to move the primary checkpoint to another coupling facility, then you should have predefined the structure in the active CFRM policy. For a performance comparison between JES2 checkpoints on DASD and a coupling facility, refer to S/390 MVS Parallel Sysplex Performance .
Unable to logon to, or logoff from TSO. Not being able to submit jobs.
39
Naming conventions for applications: For detailed information and recommendations on application subsystem naming conventions, please refer to System/390 MVS Sysplex Application Migration .
MVS system name SMF system identifier (SID) JES2 member name
40
3. Keep MVS system names short (for example, three to four characters). Short system names are easy for operators to use and reduce the chance of operator error. Consider the following examples of system names for a sysplex:
4. Develop consistent and usable naming conventions for the following system data sets that systems in the sysplex cannot share:
LOGREC data sets STGINDEX data sets PAGE/SWAP data sets SMF data sets
Allow the names to be defined in one place, namely IEASYSxx. MVS/ESA SP 5.1 allows these non-shareable data sets to be defined with substitution variables so that sysplex can substitute the system name for each MVS image. As a result, you only need to define these data sets once in IEASYSxx of SYS1.PARMLIB. The following is an example of how the MVS system name SY01 is substituted when a variable is used for the SYS1.LOGREC data set:
Provides a single place to change installation definitions for all systems in a multisystem environment. For example, you can specify a single SYS1.PARMLIB data set that all systems share. Reduces the number of installation definitions by allowing systems to share definitions that require unique values. For example, you can specify a single data set definition in which different systems can specify unique data set names. Allows one to ensure that systems specify unique values for commands or jobs that can flow through several systems. For example, you can use single commands to start multiple instances of started tasks with unique names. Helps maintain meaningful and consistent naming conventions for system resources.
41
When system symbols are specified in a definition that is shared by two or more systems, each system substitutes its own unique defined values for those system symbols. There are two types of system symbols:
Static system symbols have substitution texts that remain fixed for the life of an IPL. Dynamic system symbols have substitution texts that can change during an IPL.
MVS/ESA SP 5.1 introduced support for system symbols in a limited number of parmlib members and system commands. MVS/ESA SP 5.2 enhances that support by allowing system symbols in the following:
Dynamic allocations JES2 initialization statements and commands JES3 commands JCL for started tasks and TSO/E logon procedures Most MVS parmlib members Most MVS system commands
If your installation wants to substitute text for system symbols in other interfaces, such as application or vendor programs, it can call a service to perform symbolic substitution. MVS/ESA SP 5.1 introduced support for the &SYSNAME and &SYSPLEX static system symbols, which represent the system name and the sysplex name, respectively. MVS/ESA SP 5.2 enhances that support by adding the following:
&SYSCLONE A one or two character abbreviation for the system name Up to 100 system symbols that your installation defines
You can also define the &SYSPLEX system symbol earlier in system initialization than in MVS/ESA SP 5.1. The early processing of &SYSPLEX allows you to use its defined substitution text in other parmlib members. See MVS/ESA SP V5 Initialization and Tuning Reference for information about how to set up support for system symbols. Then, for information about how to use system symbols, see the following books:
Table 3 (Page 1 of 2). References Containing Information on the Use of System Symbols
Use in: Application programs Dynamic allocations JCL for started tasks JES2 commands JES2 initialization statements JES3 commands Reference Using the system symbol substitution service in MVS/ESA SP V5 Assembler Services Guide Providing input to the DYNALLOC macro in MVS/ESA SP V5 Auth Assembler Services Guide Using system symbols in JCL in MVS/ESA SP V5 JCL Reference Using system symbols in JES2 commands in MVS/ESA SP V5 JES2 Commands Using system symbols in JES2 initialization statements in MVS/ESA SP V5 JES2 Initialization and Tuning Reference Using system symbols in JES3 commands in MVS/ESA SP V5 JES3 Commands
42
Table 3 (Page 2 of 2). References Containing Information on the Use of System Symbols
Use in: Parmlib m e m b e r s SYS1.VTAMLST data set System commands TSO/E REXX and CLIST variables TSO/E logon procedures Reference Using system symbols in parmlib members in MVS/ESA SP V5 Initialization and Tuning Reference Using MVS system symbols in VTAM definitions in ACF/VTAM V3R4 VTAMLST Enhancements: Cloning VTAM Applications Managing messages and commands in MVS/ESA SP V5 System Commands Accessing system symbols through REXX and CLIST variables in TSO/E V2 User s Guide and TSO/E V2 CLISTs Setting up logon processing in TSO/E V2 Customization
Examples of how symbolics can be used within the various parmlib members is shown in Appendix A, Sample Parallel Sysplex MVS Image Members on page 221.
ALTGRP allows defining a group of consoles from which the system can select an alternate for a console during a console switch. Extended MCS
43
consoles can be included in the ALTGRP console group and be used as alternates for MCS consoles or other extended MCS consoles. ALTGRP is specified on the CONSOLE statement for MCS consoles, or in the RACF OPERPARM segment for extended MCS consoles. Figure 12 shows an example of using the ALTGRP keyword.
NOCCGRP allows you to define a group of consoles from which the system can select a master console when a no consoles condition occurs. NOCCGRP is specified on the INIT statement. SYNCHDEST allows you to define a group of consoles that the system can use to display synchronous messages. Synchronous messages, previously known as DCCF messages, are WTO or WTOR messages that are typically issued during initialization or recovery situations, or by programs that want messages to bypass normal message queuing. In a sysplex, a console can display a synchronous message only if it is physically attached to the system that issues the message. See 2.10.3.3, Synchronous WTO(R) Messages on page 45 for an fuller explanation. HCPYGRP allows you to define a group of console devices from which the system can select a backup device for the hardcopy log. HCPYGRP is specified on the HARDCOPY statement.
For a detailed description on console parameters, please refer to MVS/ESA Planning: Operations ,GC28-1441.
44
45
message. When you plan your sysplex recovery, you should attach the MCS console that is to display synchronous messages to its own control unit without any other attached console. If it shares a control unit, there is a higher probability of failure on the console; the message will then be attempted on the next suitable console in the SYNCHDEST group, or on the system console. For a detailed description on console definition, please refer to MVS/ESA Initialization and Tuning Guide ,GC28-1451.
46
It is recommended that DASD log data sets be managed by System Managed Storage (SMS). You can manage log stream data sets by either:
Modifying automatic class selection (ACS) routines. Defining the SMS data class, storage class and management class explicitly in the log stream definition using the IXCMIAPU utility or the IXGINVNT service.
MVS V5.2 imposes a limit of 168 log data sets per logstream. Based on the amount of log data, sizing the log data set is an important task in order to keep available all the log data required for a certain application. The index for the 168 log data sets is kept in the LOGR couple data set and the change management activities must be done manually. If the 168 limit is reached, system logger will stop. Procedures must be in place to ensure that appropriate action is taken before this limit is reached. Notes: 1. For information on how to use the IXCMIAPU utility, see MVS/ESA SP V5 Setting Up a Sysplex , GC28-1449. 2. For information on how to use the IXGINVNT service, see MVS/ESA V5 Authorized Assembler Services Reference, Volume 2, ENF-ITT , GC28-1476.
Maintain a copy of the coupling facility resident log records in the MVS system logger storage buffers. Maintain a copy of the coupling facility resident log records in a staging data set on logstream basis. There is a staging data set per logstream per system in the sysplex.
Depending on the duplexing option, the local buffers or the staging data set contains data written by the system logger but not yet written to a logstream DASD data set. Another concept needs to be considered when configuring the system logger environment: the failure dependent or independent attribute of each coupling facility connection. Depending on the location of the coupling facility as well as its volatility or non-volatility status, a connection to a logstream can be identified to be failure dependent or failure independent. This may affect the system logger configuration and behavior. The rules for determining the attribute of a logstream connection are as follows:
47
If the system logger and the coupling facility to which it is connected to on behalf of a given logstream are both executing on the same CPC, then the connection is failure dependent regardless of the volatility status of the coupling facility. Figure 13 on page 48 is an example of failure dependent connection between a system logger running on an MVS and a coupling facility running on an LPAR on the same CEC.
If MVS and coupling facility are separated then: If the coupling facility is non-volatile, then the connection is failure independent. An example is shown in Figure 14 on page 49 where different connections to the same logstream may each have different failure independence/dependence characteristics.
48
The attribute of failure dependent/independent can vary depending on potential failure or operator commands issued to the coupling facility. The system logger is sensitive and can switch back and forth between the two states. For critical applications using the system logger, the recommendation would be to put system logger structures in a failure independent coupling facility and to duplex logstream data to DASD.
49
the backed up log data in case of system or coupling facility failure. If peer systems do not have connectivity to staging data sets, system logger may not be able to recover all data in case of failure. Staging data sets will be used only during recovery procedure initiated automatically by the system logger through a structure rebuild process in a new coupling facility structure. Staging data sets are performance sensitive and the DASD fast write option is strongly recommended. There are important considerations for staging data set sizing; log data offload activity will be initiated as soon as either the coupling facility structure or the staging data set becomes full. The high threshold parameter applies to both the coupling facility structure and the staging data set. To minimize the offload activity, ensure that the staging data set is as big as the coupling facility structure. For more detailed information on setting up and exploiting system logger, please refer to MVS/ESA SP V5 Sysplex Migration Guide , SG24-4581.
2.12.1 SMSplex
An SMSplex is a system (an MVS image) or collection of systems that share a common SMS configuration. The systems in an SMSplex share a common ACDS and COMMDS pair. DFSMS/MVS 1.1.0 supports a maximum of eight systems in a SMSplex. DFSMS/MVS 1.2.0 introduces the concept of SMS system group names that allows the specification of a system group as a member of a SMSplex. This enables more than eight systems to be defined in one SMSplex. A system group consists of all systems that are part of the same parallel sysplex and are running SMS with the same configuration, minus any systems in the parallel sysplex that are specifically defined in the SMS configuration. The following figure shows examples by way of explanation.
50
SYSPLEX view SMSplex view SYSPL01 S1 S2 S3 S4 SYSPL01 S5 S6 COUPLExx: SCDS/ACDS Base Config: COUPLE SYSPLEX(SYSPL01) System group name=SYSPL01
Figure 15. Basic Relationship between Sysplex Name and System Group
The SMS system group name must be the same as the parallel sysplex name defined in the COUPLExx member in PARMLIB, and the individual system names must match system names in the IEASYSxx member in PARMLIB. When a system group name is defined in the SMS configuration, all systems in the named parallel sysplex are represented by the same name in the SMSplex, as shown in Figure 15.
SYSPLEX view SMSplex view SYSPL01 S1 S2 S1 S3 S4 SYSPL01 S5 S6 COUPLExx: SCDS/ACDS Base Config: COUPLE SYSPLEX(SYSPL01) System group name=SYSPL01 System name= S1
Figure 16. SMSplex Consisting of System Group and Individual System Name
The SMSplex does not have to mirror a parallel sysplex; you can choose to configure individual systems and Parallel Sysplexes into an SMSplex configuration. Figure 16 shows an SMSplex where system S1 has been
Chapter 2. System Software Configuration
51
separately defined as an individual system name. SMS considers S1 and SYSPL01 as two members of the SMSplex. Systems S2 through S6 are represented by SYSPL01 and must be addressed simultaneously with regard to SMS functions. It is recommended however, that the SMSplex matches the parallel sysplex for better manageability of your data. Note: JES3 does not support the SMS system group names. When the DFSMS/MVS 1.2.0 configuration is defined using parallel sysplex names, JES3 does not provide data set integrity and scheduling services for the SMS managed data sets. If the SMS configuration is defined using a combination of system name and system group names, JES3 SMS data set services are available on each system whose name matches the system names defined in the SMS configuration. JES3 SMS data set services are available on seven CPCs if there is more than eight MVS systems in the SMSplex.
A storage group may be processed by any system if the system name in the storage group is blank. A storage group may be processed by a subset of systems if the system name in the storage group is a system group name. A storage group may be processed by a specific system if the system name in the storage group is a system name
In addition to the sharing of the ACDS and COMMDS across the parallel sysplex the following DFSMShsm data sets need to be shared:
The DFSMShsm migration control data set, MCDS The DFSMShsm backup control data set, BCDS The DFSMShsm offline control data set, OCDS The DFSMShsm journal
52
complex, you need to define the ACDS and COMMDS on volumes attached through a 3990 Model 6 storage controller (the 3990 Model 3 does not have enough paths to make it possible to share attached volumes with more than 16 systems). Similar considerations need to be applied to the shared DFSMShsm shared data sets, MCDS, BCDS, OCDS and journal. Prior to DFSMS/MVS 1.2, logical connectivity for all system-managed volumes and storage groups was controlled at the individual system level. Allocations, deletions, and accesses could only be performed on systems that had the logical (SMS and MVS) and physical (hardware) connectivity. This also applied to DFSMShsm operations. Job failures would occur otherwise. In addition, the required catalogs needed to be accessible. With DFSMS/MVS 1.2, when you define a volume or storage group to have connectivity to a system group, the volume or storage groups must be accessible to all systems that are part of the system group. Otherwise, job failures will occur. When a common set of classes, groups, ACS routines, and a base configuration are applied across an MVS/ESA multisystem environment, the environment is a simple one. However, if SMS is not active on one of the systems, that system is not able to do the following:
Create data sets on system-managed volumes Delete system-managed data sets Extend system-managed data sets to new volumes Use JCL keywords supported by SMS
The COMMDS does not record DASD space usage changes for the system that has not activated SMS. For more information regarding defining system group names and implementing DFSMS across a parallel sysplex please refer to MVS/ESA SML: Implementing System-Managed Storage, SC26-3123.
*/
Installations should check that converting reserve activity to global ENQ does not impose performance problems, prior to implementing the solution.
53
Review MVS/ESA SP V5 Planning: Global Resource Serialization , GC28-1450, before implementing reserve conversion for other possible implications in your environment.
2.13.1 Planning
Planning for autoswitchable devices is discussed in MVS/ESA Hardware Configuration Definition: Planning , GC28-1445. The issue discussed in the manual for autoswitchable devices is how many to define as autoswitchable and how many to dedicate to particular systems. The device selection process is slightly longer for autoswitch devices than for dedicated devices due to the sysplex wide scope of the allocation. If the workload that requires tape drives is predictable on certain systems, the allocation of some devices as dedicated and others as shared may provide benefits. If the usage is likely to be spread or unpredictable, however, management of the devices may be simplified by defining all devices as autoswitchable.
54
Refer to 2.18.2, JES3 Sysplex Considerations on page 89 for information on shared tape support in a JES3 environment.
Add exit routines to an exit that has been defined to the dynamic exits facility Modify or delete exit routines for an exit Change the attributes of an exit at or after IPL Undefine an implicitly defined exit
The following operator commands allow you to control the use of dynamic exits and exit routines:
SET PROG= specifies the particular PROGxx parmlib member the system to use. SETPROG EXIT adds exit routines to an exit, changes the state of an exit routine, deletes an exit routine from an exit, undefines an implicitly defined exit, and changes the attributes of an exit. DISPLAY PROG,EXIT displays exits that have been defined or have had exit routines associated with them.
The CSVDYNEX macro allows you to define exits, associate exit routines with those exits, and control the use of exits and exit routines within program. Further Reading For more information regarding the utilization of Dynamic Exits refer to MVS/ESA SP V5 Installation Exits , SC28-1459.
55
Define and add a subsystem Activate a subsystem Deactivate a subsystem Swap subsystem functions Store and retrieve subsystem-specific information Define subsystem options, which includes deciding the following: If a subsystem can respond to dynamic SSI commands Under which subsystem a subsystem should be started
All of the features of the dynamic SSI support last only for the life of the IPL. If you IPL after using any of the set of authorized system services, you must issue the service again. Dynamic SSI provides the following benefits:
Supports continuous operations by allowing you to add a new subsystem or to upgrade an existing subsystem without an IPL. Reduces service costs associated with modifying SSI control blocks removing the need for subsystems to modify SSI control blocks and allowing a set of system services to make the necessary changes.
The services that the dynamic SSI support provides are available only to the SSI in one of the following ways:
Processing the keyword format of the IEFSSNxx parmlib member during IPL Issuing the IEFSSI macro Issuing the SETSSI system command
IEFSSVT macro, which: Creates an SSVT (REQUEST=CREATE) Enables additional function codes (REQUEST=ENABLE) Disables supported function codes (REQUEST=DISABLE) Replaces the function routine associated with a supported function code (REQUEST=CHANGE)
IEFSSI macro, which: Defines and adds a subsystem (REQUEST=ADD) Activates a subsystem (REQUEST=ACTIVATE)
56
Deactivates a subsystem (REQUEST=DEACTIVATE) Exchanges subsystem functions (REQUEST=SWAP) Defines subsystem options (REQUEST=OPTIONS) Gets (retrieves) subsystem information (REQUEST=GET) Puts (stores) subsystem information (REQUEST=PUT) Queries subsystem information (REQUEST=QUERY)
SETSSI command, which: Defines and adds a subsystem (SETSSI ADD) Activates a subsystem (SETSSI ACTIVATE) Deactivates a subsystem (SETSSI DEACTIVATE)
The dynamic SSI support also introduces the IEFJFRQ installation exit, which provides a way for vendor products and installation applications examine and modify subsystem function requests. For more information regarding the macros associated with Dynamic SSI refer to MVS/ESA SP V5 Authorized Assembler Services Reference, Volume 2 , GC28-1476. For information regarding the use of the SETSSI command refer to MVS/ESA SP V5 System Commands , GC28-1442.
57
The Sysplex Failure Management (SFM) function, which requires all systems in the Sysplex to have connectivity to the SFM couple data set, and is in operation when an SFM policy is active. The Automatic Restart Manager, which also requires all systems in the Sysplex to have connectivity to an ARM couple data set, and is in operation when an ARM policy is active.
These two functions are executed in the XCF address space. The purpose of the following chapters is to give advice and recommendations on the setting and utilization of these functions. It is expected that the reader has also at hand Setting Up a Sysplex and PR/SM Planning Guide .
Connectivity failure. Either XCF connectivity between XCF members of the sysplex or connectivity failure between structure exploiters and the structures themselves. Systems failures. In that sense system failure means the inability of a MVS image to update its status in the sysplex couple data set for a time interval greater than the INTERVAL value coded in the COUPLExx member used by the first system which IPLed in the sysplex. The failing system is then in missing status update condition. This condition may result from a true system failure but may also be a temporary situation resulting from other events such as the following: An SVC dump is being obtained on the system and is taking longer than the INTERVAL value. A spin loop is occurring. The system is in a restartable wait state. The system is going through reconfiguration. Some system has a RESERVE on the volume with the couple data set. The operator stopped the system. The system is communicating with the operator by means of a branch entry synchronous WTOR macro.
To take into account the above possible situations and not to go into failure management because of a temporarily held status update, parameters specifying detection intervals will have to be tuned. These are the following:
INTERVAL and OPNOTIFY in the SYS1.PARMLIB(COUPLExx). RESETTIME, DEACTTIME and ISOLATETIME in the SFM policy. Indirectly: SPINTIME, by its default value or the value specified in SYS1.PARMLIB(EXSPATxx), since it dictates the time that can be spent by a processor in a spin loop.
The time detection intervals are discussed in details in 2.16, Planning the Time Detection Intervals on page 73. Some factors which will influence the use of SFM and the contents of the policy are the following:
Are there logical partitions running MVS in the sysplex, and do we want specific actions when a logical partition fails, such as: Partition reset
58
Do we want to automate the initiation of structure rebuild when connectivity is lost to the structure, with options such as: Rebuild as soon as one exploiter has lost connectivity to the structure. Initiate rebuild only when important connectors have lost connectivity, or initiate rebuild when a certain number of connectors have lost connectivity.
Do we want to automate the partitioning of the sysplex up to the point where the system being partitioned is automatically isolated from the rest of the sysplex, that is, its hardware is prevented from starting new I/O operations and its reserved devices are released.
Although automation is always desirable, and is most probably mandatory in the Sysplex context, there may be cases where automated failure management has to be shut down, for example:
When investigating a problem and failure management is just not wanted to occur. When educating operators and manual take over is part of the education.
It is believed that in normal production environment, most of the installations will use the MVS version 5 automated Sysplex Failure Management.
59
The coupling facility provides for a function called isolate or fencing which consists in sending a signal, over the CFC link, to a designated target system. The target system will upon reception of the isolate signal: 1. Drain all ongoing I/O operations (this includes operations being performed over the CFC links as well). 2. Freeze its channel subsystem so that no new I/O operation can be initiated. 3. Perform an I/O system reset over its channel interfaces so that reserved devices will be released 4. Finally, go into non restartable wait state X 0A2 . The isolate function is intended to fence from the rest of the sysplex a system that is either in a missing status update condition or the target of a VARY XCF,sysname,OFFLINE operator command. However:
Isolation can be executed only through a coupling facility link, therefore systems which are not sharing connectivity to the same coupling facility cannot request or be subject of the isolate function. Isolation is initiated only if there is a SFM policy active, and requires in some cases that the proper keywords are set up in the policy, as explained in 2.15.3, SFM Parameters on page 63. The isolation is performed at the target system strictly by hardware. The isolate signal sent over the link by the coupling facility is directly interpreted by the target channel subsystem hardware, and subsequent actions at the target system are initiated without software involvement. Isolation is not performed by the target system hardware if the target system is already in a system reset state.
Isolation is actually performed at the whole CPC level if the target system is in basic mode, or at the logical partition level only (that is the logical partition in which the target MVS is executing) if the target system is running in PR/SM mode. The only way to exit from an isolated state is to perform a system reset of the CPC or of the logical partition. Ipling the target system will therefore result in exiting from the isolated state.
Note that if the isolate function cannot be performed (because of lack of connectivity with the coupling facility for instance), the alternate way to proceed is to go at the target system hardware console and to manually invoke the hardware system reset function, in order to stop all activities at the failing system and to release its reserved devices.
An isolation request to the target system if: Both the requesting system and the target system have connectivity to the same coupling facility. And there is currently a SFM policy active (whatever keywords are set in the policy).
60
Message IXC102A indicating that one must go to the target system to manually initiate a system reset, if: Above conditions are not met. Or above conditions are met but the requesting MVS has been informed by the coupling facility that the attempt to isolate has been unsuccessful. This may occur because of: - A severe hardware malfunction at the target system. - Or, too high a volume of I/O operations to be drained to complete in due time the isolation request. - Or, the target system is already reset.
INTERVAL(xx) in COUPLExx
INTERVAL(xx) is coded in the SYS1.PARMLIB COUPLExx member, and pertains to all systems in the sysplex. ISOLATETIME(xx) is coded in the active SFM policy, and pertains to a specific system.
An example of SFM policy is shown in Figure 19 on page 62. Further details about SFM keywords can be found in Setting up a Sysplex , GC28-1449.
61
SYSTEM : specifies the definition of a system within the scope of the named SFM policy (POLICY1)
NAME : target MVS system name * designate all MVS systems in the configuration, except for systems of which name is explicitly indicated with other NAME parameters
In this example any MVS in the configuration will be automatically isolated immediately upon detection of its missing status update condition. Except for MVSA which is to be isolated 10 seconds after the detection.
Examples of partitioning sequences are given in Appendix D, Examples of Sysplex Partitioning on page 259. As for the manual invocation of the isolate function through the VARY XCF,sysname,OFF command, the automatic invocation may end up in message IXC102A being issued, indicating that the isolate may have failed. See the recommendations in 2.15.2.3, Recommendations.
2.15.2.3 Recommendations
The recommendation is to use the automatic isolate capability provided by the SFM policy. This is providing a good level of built-in automation that can be very helpful to the sysplex operator and is potentially less delay and error prone when dealing with multiple MVS images. The intent of the ISOLATETIME value is to provide finer control, at the individual system level, on when to actually start isolating once the system is in missing status condition. It is recommended to tune the INTERVAL parameter in COUPLExx so that all systems could have ISOLATETIME(0), that is isolate immediately upon missing status update condition. This makes parameter management and tuning easier. However, there may be some systems with very specific characteristics which would make the INTERVAL parameter too short for them. In these cases, ISOLATETIME can be used to personalize the time interval.
62
If the requesting MVS gets back to the operator with message IXC102, implying that isolate may have failed, it is recommended that you examine SYS1.LOGREC hardware and software records written during isolation to help to determine why the isolation did not complete automatically.
It is recommended that you not automate responses to IXC102A if the sysplex is running with an active SFM policy with ISOLATE. An operator intervention is required to prevent exposure to sysplex integrity problems.
SYSTEM
Sysplex
NAME(sysname|*)
a system.
WEIGHT(value)
a system.
DEACTTIME(value)
a system
RESETTIME(value)
a system
ISOLATETIME(value)
a system
PROMPT
a system
63
ACTSYS(sysname)
a system
TARGETSYS(sysname|ALL)
a system
ESTORE(NO|YES)
a system
This paragraph describes the SFM actions that can be designed to operate independently of the type of environment.
Planning for Automatically Partitioning the Sysplex: Automatic partitioning of the sysplex is intended to vary off from the sysplex, without requiring operator intervention, an MVS image which is either in missing status update condition, or which has lost connectivity to other MVS image(s). To automatically initiate partitioning of the sysplex, the following keywords and parameters must have been set up in the policy:
64
CONNFAIL(YES|NO) This keyword must be set to CONNFAIL(YES), which is also the default value, to allow SFM to automatically initiate actions when XCF connectivity fails. Having CONNFAIL(NO) will result in the operator being prompted without initiating automatic action. WEIGHT(value) This parameter is to give SFM some guidance to automatically partition the sysplex. When an XCF connectivity failure is detected between two systems in the sysplex, SFM must choose which one to exclude from the sysplex (assuming that both are known as still working). By giving a WEIGHT value to each one of the MVS images in the sysplex, SFM chooses the final sysplex configuration which yields the highest sum of WEIGHTs, after removing one system. This can be seen as a way to preserve the most important MVS images in the sysplex to be partitioned out, or conversely to choose partitioning the less important images off the sysplex to get around the XCF connectivity problem. As an example assume there is A sysplex with 3 participating MVS systems: MVS A, MVS B and MVS C. MVS A has WEIGHT(10), MVS B has WEIGHT(10) and MVS C has WEIGHT(30). Assuming that there is an XCF connectivity failure between MVS B and MVS C, the sysplex operations can be carried on with the images still sharing XCF connectivity. The alternative is then to continue with MVS A and MVS B (total WEIGHT=20) or MVS A and MVS C (total WEIGHT=30). The latter configuration will be kept; that is, MVS B will be varied off the sysplex. Weights can be attributed, as an example, on the basis of any of the following:
ITRs of the systems in the sysplex Configuration dependences, such as unique feature or I/O connected to only one system in the sysplex.
Weight can have a value from 1 to 9999. Specifying no weight is the same as specifying WEIGHT(1). That is if there are no WEIGHTs in the policy, every system is given the same importance when it comes to partition.
Planning for Automatically Rebuilding Structures: This pertains to rebuilding a structure because of a loss of connectivity between the exploiter and the structure. That is a problem affecting the coupling technology in either one of the sysplex MVS or the in the coupling facility itself.
Rebuilding a structure upon a loss of connectivity is the structure exploiter s decision. Some of them decide to rebuild the structure as soon as one single exploiter instance has lost connectivity to the structure, some of them will listen to the XES recommendation. This recommendation is passed to the structure s exploiter and indicates that the subsystem should either disconnect from the structure or that the structure is being rebuilt. More details on structure rebuild can be found in Chapter 5, Coupling Facility Changes on page 117. XES makes the recommendation on the basis of what has been set up in the active SFM policy:
65
If CONNFAIL(NO), the MVS recommendation is always to disconnect. If CONNFAIL(YES), the recommendation will depend on the WEIGHTs given to the MVS images and on the REBUILDPERCENT value given to the affected structure in the active CFRM Policy.
As an example suppose that a structure as been given a REBUILDPERCENT of 50% in the current CFRM policy. Assume that an exploiter of the structure is running in MVSA and an exploiter of the structure is running in MVSB. Also assume that MVSA is given a WEIGHT of 30 in the active SFM policy and MVSB is given a WEIGHT of 90 in the active SFM policy. As these are the only 2 MVS in the system, the total sysplex weight is 30 + 90 = 1 2 0 . The REBUILDPERCENT indicates that MVS is to start rebuilding the structure if the total WEIGHT of the systems with loss of connectivity to the structures is greater than 50% of the total WEIGHT of the sysplex, that is, greater than 60. If MVSA looses connectivity to the structure, XES recommends that the exploiters disconnect from the structure (30/120 = 25%). If MVSB looses connectivity to the structure, XES starts rebuilding the structure (90/120 = 75%). If the specific exploiter code has been designed not to rebuild in that case, it will stop the XES initiated rebuild. Planning additional structure space for rebuild Rebuilding a structure implies to have temporarily duplicated structures in terms of coupling facility space occupancy. Proper consideration must be given as to what additional space has to be planned for the coupling facility.
Default values for WEIGHT and REBUILDPERCENT WEIGHT default is (1), and REBUILDPERCENT default is (100). Therefore if CONNFAIL is not specified (the default is CONNFAIL(YES)) and neither WEIGHT or REBUILDPERCENT are specified for any MVS image or structure, then MVS will initiate a structure rebuild only if all currently connected exploiters have lost connectivity to the structure.
Compatibility with XCF PR/SM Policy: MVS version 5 still supports the XCFPOLxx specifications but they are mutually exclusive with the usage of a SFM policy, that is:
If a SFM policy is active, XCFPOLxx specifications are discarded. If the sysplex includes MVS version 4 image(s), then the only possible failure management policy to be activated sysplex wide is the XCFPOLxx member, which only pertains to the failure management of a logical partition.
The consequence is that previous XCFPOLxx, if any, will have to be rewritten as SFM policies to take full advantage of the automated recovery in MVS version 5.
66
Timing Relationships of SFM Actions: The actions that can be taken automatically under SFM control against logical partitions are adjusted to occur a certain amount of time after a system is declared to be in a missing status update condition. A summary of the timing relationships for SFM actions is shown in Figure 20 on page 67.
OPNOTIFY(ss) see Note
Issues IXC402D
see Note
in SFM policy
A MVS image sharing the physical CPC with MVS A, and with proper authority, can either: system reset MVS A LPAR or deactivate MVS A LPAR Any MVS sharing connectivity to the same Coupling Facility than MVS A, can isolate MVS A
INTERVAL(ss) OPNOTIFY(ss) are coded in the SYS1.PARMLIB COUPLExx member. RESETTIME(ss) DEACTTIME(ss) ISOLATETIME(ss)
Note: if SFM is active and ISOLATETIME is specified for a system, then OPNOTIFY is nullified for this system.
67
RESETTIME is intended to provide adjustable and personalized timing on top of the INTERVAL value in the COUPLExx. The value to be given as nostatus_interval can be one of the following :
0 This indicates that the failing partition should be reset as soon as INTERVAL expires. Any other value from 1 to 86400 seconds &madsh. This can be chosen because of any known peculiarity in the related system that can justify adding time to INTERVAL. For example, INTERVAL may have been set up for other members of the sysplex running in basic mode, and this member, running in a logical partition may therefore need additional time to update the status.
0 This indicates that the failing partition should be deactivated as soon as INTERVAL expires. Any other value from 1 to 86400 seconds This can be chosen because of any known peculiarity in the related system that can justify adding time to INTERVAL. For example, INTERVAL may have been set up for other members of the sysplex running in basic mode, and this member, running in a logical partition may therefore need additional time to update the status.
68
ISOLATE is intended to provide adjustable and personalized timing on top of the INTERVAL value in the COUPLExx. The value to be given as nostatus_interval can be one of the following:
0 this indicates that the failing partition should be isolated as soon as INTERVAL expires. Any other value from 1 to 86400 seconds This can be chosen because of any known peculiarity in the related system that can justify adding time to INTERVAL. For example, INTERVAL may have been set up for other members of the sysplex running in basic mode, and this member, running in a logical partition may therefore need additional time to update the status.
Planning to Automatically Acquire Processor Storage from a Logical Partition: PR/SM allows a MVS image running in a logical partition to dynamically acquire processor storage (central and/or expanded storage) from a logical partition defined on the same physical CPC.
Proper usage of this facility assumes that: 1. The giving logical partition (TARGETSYS, in the policy) is either the normal production logical partition or a logical partition to be sacrificed, which will be deactivated and its storage acquired by a backup partition (ACTSYS, in the policy). 2. The processor storage for the receiving logical partition must have been defined with a reserved part which overlaps the giving logical partition processor storage. 3. The receiving logical partition has proper authority in PR/SM to acquire resources from another logical partition. This is the Cross Partition Authority, set in LPDEF frame on 9021 systems and in the image profile on 9672 systems. 4. The backup partition is to take over the workload of the failing logical partition. However, it is up to the software running in the sysplex to manage transferring the workload from the failing partition to the backup one. All SFM and PR/SM will do is to re-allocate the physical CPC resources. The storage being acquired is brought on line to the receiving system by an automatically issued CF STOR|ESTOR(E=1),ONLINE. Proper consideration must be given to the receiving system RSU, if the acquired central storage will be dynamically released later on.
69
Installation and Activation: There can be as many as 50 SFM policies set up in the SFM couple data set, of which only one can be active.
An SFM policy is installed in the SFM couple data set with IXCMIAPU Administrative Data Utility program. Remember that if you wish to know what the contents of the policy is, you either have to look into the IXCMIAPU JCL you have been using to install the policy, or you can run the Administrative Data Utility with DATA TYPE(SFM) and REPORT(YES) keywords. This will result in only listing the couple data set characteristics and the policies currently installed in the SFM couple data set. Activating a SFM policy is an operator initiated operation using the SETXCF START,POLICY,TYPE=SFM,POLNAME=polname command. Once the command is issued the new active policy is immediately in operation, unless recovery actions are already in progress under control of the previous policy. Proper messages are then issued to let the operator know about the delay in activating the new policy and the reason for it.
SETXCF START,POL,TYPE=SFM,POLNAME=SFM1 IXC602I SFM POLICY SFM1 INDICATES FOR SYSTEM SG1 A UPDATE MISSING ACTION OF PROMPT AND AN INTERVAL OF THE ACTION IS THE SYSTEM DEFAULT. IXC609I SFM POLICY SFM1 INDICATES FOR SYSTEM SG1 A SPECIFIED BY SPECIFIC POLICY ENTRY IXC601I SFM POLICY SFM1 HAS BEEN STARTED BY SYSTEM STATUS 15 SECONDS. SYSTEM WEIGHT OF 75 SG1
SFM activation An active SFM policy remains active across IPLs. However when an MVS image IPLs, the SFM function is not available to this image until after NIP completion.
Controlling Which Policy Is Currently Active: The operator can request to display what is the current SFM policy name:
D XCF,POL,TYPE=SFM IXC364I 19.01.34 DISPLAY XCF 407 TYPE: SFM POLNAME: SFMPOL01 STARTED: 10/11/95 12:49:50 LAST UPDATED: 06/02/95 09:47:22 SYSPLEX FAILURE MANAGEMENT IS ACTIVE
70
Maximum planned number of policies to install in the couple data set. Maximum planned number of systems to be characterized by the SYSTEM keyword in a policy. Maximum planned number of RECONFIG actions to be described in a policy.
Should the size of the SFM couple data set turn out to be wrong, the following procedure can be used to dynamically put online a new couple data set with the appropriate size. Note that this procedure works only for increasing the size of the couple data set . To Decrease the Size of a Couple Data Set Decreasing the size of a couple data set cannot be done non-disruptively; an alternate couple data set smaller than the primary couple data set cannot be brought online concurrently. You must prepare the new couple data set and IPL the sysplex using this new couple data set.
1. Run IXCL1DSU against a spare couple data set with the new couple data set specifications. 2. When the spare couple data set is formatted, use the command SETXCF COUPLE,ACOUPLE=(spare_dsname,spare_volume),TYPE=SFM to make the spare couple data set a new alternate SFM couple data set. Note: As soon as the spare couple data set has been switched into alternate, the new alternate couple data set will be loaded with the primary couple data set policy s contents. 3. Then switch the new alternate to new primary couple data set using the SETXCF COUPLE,TYPE=SFM,PSWITCH command. 4. The previous primary couple data set is no longer in use, and can be enlarged by the same process. Keeping COUPLExx in Synch It is recommended that the COUPLExx member be updated after swapping the couple data sets, so that an operator intervention to retrieve the last used couple data sets is not required at the next IPL.
SETXCF START,POL,TYPE=SFM,POLNAME=new_polname.
This allows you to explicitly track all changes made to the SFM policy with the policy name (a policy name can be up to eight characters long).
71
Should the SFM couple data set already contain the maximum allowed number of policies, one of them can be deleted to make room by using the JCL in Figure 21 on page 72.
//DELSFM JOB (999,POK), L06R , CLASS=A,REGION=4096K, // MSGCLASS=T,TIME=10,MSGLEVEL=(1,1),NOTIFY=&SYSUID //****************************************************************** //* JCL TO DELETE A SFM POLICY //* //****************************************************************** //STEP1 EXEC PGM=IXCMIAPU //SYSPRINT DD SYSOUT=* //SYSIN DD * DATA TYPE(SFM) REPORT(YES) DELETE POLICY NAME(target_pol)
Figure 21. Sample JCL to Delete a SFM Policy
However if for some reason the new version of the policy must keep the same name, updates can be made dynamically to the active SFM policy, by doing the following: 1. Running the Administrative Data Utility IXCMIAPU against the currently active policy (with REPLACE(YES)). It does not disrupt the system operations since the active policy is in fact a duplicate of the policy residing on the couple data set. IXCMIAPU is to update the administrative copy of the policy in the couple data set while the active copy is left unchanged. 2. Activate the same policy by typing the following command:
SETXCF START,POL,TYPE=SFM,POLNAME=same_name
This will refresh the active copy with the just modified administrative copy.
SETXCF STOP,POLICY,TYPE=SFM
Once the command is issued the policy is immediately stopped unless recovery actions are already in progress under control of the previous policy. Proper messages are then issued to let the operator know about the delay in stopping the policy and the reason for it.
All functions pertaining to logical partition reconfiguration will be accomplished when the monitored system (FAILSYS(sysname) in the policy) fails, only if the system designated to initiate the recovery action (ACTSYS(sysname)) is still up and running.
72
The isolate function can be manually invoked only if there is a SFM policy active (see 2.15.2, The SFM Isolate Function on page 59). It is automatically invoked by using the ISOLATETIME(value) keyword in the policy, provided there is at least an active system in the sysplex sharing coupling facility connectivity with the system to be isolated. Using the WEIGHT(value) parameter in the SFM policy along with the REBUILDPERCENT parameter in the CFRM policy only provides a recommendation to rebuild or not a structure in case of loss of connectivity. MVS delivers the recommendation to the structure s exploiters which, in turn, decide if they do follow the recommendation or not. This is discussed for each one of the IBM structure s exploiters at 9.2, Coupling Facility Failure Recovery on page 180.
The SPINTIME value. The default or the value set in EXSPATxx member, if any, and activated by SET EXS xx. The INTERVAL value. The default or the value explicitly set in COUPLExx. The TOLINT value. The default value or the value explicitly set in GRSCNFxx.
Equally important is to understand the relationship between the time intervals these parameters represent. This is shown in Figure 22 on page 74.
73
INTERVAL(ss) in COUPLExx
TERM ACR
TOLINT(ss) IN GRSCNFxx MVS A sends RSA to MVS B RSA in xit RSA in MVS B xit MVS C
74
partition (LP) or under VM, some customers may want to detect a dormant MVS image earlier than the default INTERVAL timeout value in order to expedite the dormant system s removal from the sysplex. The default INTERVAL values are:
After APAR OW11965: 25 seconds when MVS is running on native hardware or in a dedicated logical partition. 85 seconds when MVS is running in a shared logical partition or under VM
INTERVAL_default = SPINTIME_default * 2 + 5
See 2.16.1.3, SPINTIME in EXSPATxx on page 77 for a discussion of SPINTIME. INTERVAL can be adjusted by specifying an INTERVAL(value) in COUPLExx, where value equals 3 to 86400 seconds. The recommendations are summarized here:
Decreasing INTERVAL below 25 seconds is not recommended. If you run with default SPINTIME and SPINRCVY settings, allow XCF to select a default setting. In this case it is strongly recommended that the fix for OW11965 be installed. As a rule, try to keep INTERVAL set to:
5 + (2 * SPINTIME)
For example, if SPINTIME = 20 and SPINRCVY = TERM,ACR, set INTERVAL to 45 seconds. This results from MVS taking two spin cycles before escalating to an action specified in SPINRCVY, which could be SPIN again, ABEND or TERM (that is ABEND without retry). The extra five seconds is to allow the recovering MVS enough time to catch up and update its status in the sysplex couple data set. Because of the way XCF works internally, the five second catch up time is considered to be more than adequate for MVS to catch up and update the sysplex couple data sets. However, if there is I/O contention on the sysplex couple data sets, additional time may be needed to perform the status update. The rationale for the MVS recommendation to set INTERVAL to five seconds beyond the time it would take to reach the ABEND (or TERMinate) action during an excessive spin condition is this:
Most spin loops, if resolvable, will be resolved by the ABEND or TERM action. Hence, in most cases, there will be no need for a third occurrence of SPINTIME. Setting the XCF Failure Detection Interval to 5 + (2 * SPINTIME) is a compromise between giving MVS enough time to recover from an excessive spin condition and removing a failed MVS from the sysplex as quickly as possible.
75
Allowing MVS sufficient time to recover from conditions where the MVS image appears dormant to other systems in the sysplex. And the expeditious removal of an MVS system from the sysplex in cases where the MVS image appears dormant to other systems in the sysplex.
If you choose an INTERVAL that is too high (say three minutes), and the MVS image fails while holding a critical resource, the surviving systems may have to wait for the resource to be freed before continuing. It might appear as if the entire sysplex hangs for three minutes or more until the failed MVS is partitioned out of the sysplex. If you choose an INTERVAL that is too low (say 15 seconds), and the MVS image is dormant but alive, a missing status update condition can occur causing the system to be erroneously removed from the sysplex. Erroneous partitioning of a healthy system is more probable when you have set INTERVAL too low and have activated a sysplex Failure Management (SFM) policy using a low ISOLATETIME. If the interval you select represents an unacceptable amount of time to wait for an MVS image to respond, consider increasing the amount of CPU resource given to the MVS image. This will reduce the amount of time needed to resolve a spin loop which thereby reduces the recommended failure detection interval timeout value.
= = = =
In this example, once the status update missing condition occurs the operator would be notified without delay via message IXC402D. Automating a response to IXC402D is NOT recommended . Doing so may result in exposing the sysplex integrity.
76
SET EXS xx
SET EXS xx does not have a sysplex scope It must be therefore issued from every system where the change has to take place.
For MVS running on native hardware or in a dedicated LP use the default SPINTIME of ten seconds. For MVS running in a shared LP or under VM the default SPINTIME is 40 seconds and the default XCF INTERVAL is 85 seconds. If the default values are acceptable, skip to SPINRCVY below. If you want to decrease INTERVAL below 85 seconds, SPINTIME must also be adjusted below the default of 40 seconds. The minimum recommended value for SPINTIME can be calculated based on the amount of CPU resource given to the MVS image.
The higher the amount of real CPU resource available to the MVS image, the lower the amount of time needed to resolve spin loops. Hence, SPINTIME may be set to a lower value. For example, if you have a logical partition with engines that are receiving 95% of the REAL CPU resource you can set SPINTIME to, say, 12 seconds instead of using the default of 40 seconds.
Likewise, the lower the amount of real CPU resource available to the MVS image, the higher the amount of time needed to resolve spin loops. Hence, SPINTIME must be set to a higher value. For example, if you have a logical partition with engines that are receiving 10% of the REAL CPU resource you should set SPINTIME to 40 seconds (the default for shared LPs). Specifying a SPINTIME that is too low may cause premature excessive spin conditions. When an excessive spin condition occurs, MVS will select a first action of SPIN. Along with the first action of SPIN, MVS will also write a ABEND071-10 logrec entry and issue message IEE178I to the hardcopy log. ABEND071-10 is non-disruptive. Message IEE178I can be automated to inform the system programmer when an excessive spin condition occurs. If you see repetitive ABEND071-10 logrec entries and/or IEE178I messages AND the spin loops recover without escalating through the excessive spin recovery actions, you are probably experiencing premature excessive spin conditions. To remedy this condition, increase SPINTIME. To Compute Minimum SPINTIME
See the BLWSPINR member of SYS1.SAMPLIB for additional information on how to calculate the minimum SPINTIME for MVS images running in shared LPs. BLWSPINR shipped in UW18884 (MVS 5.1) and UW18885 (MVS 5.2).
77
More details are given on spin recovery parameters and excessive spin recovery time condition in Appendix E, Spin Loop Recovery on page 263. SPINRCVY OPER Is Not Recommended A spin recovery action of OPER can be set in EXSPATxx, which means that MVS is to prompt the operator in order to take any further action. In a sysplex, specifying a SPINRCVY action of OPER is not recommended because an operator may not respond quickly enough to prevent remaining systems in the sysplex from partitioning the ailing system out of the sysplex.
Care is needed to determine what an acceptable timeout value is for TOLINT because of the number of variables involved, such as the following: The excessive spin time and recovery actions for each system in the sysplex / ring Number of systems in the sysplex / ring Speed of systems in the sysplex / ring Inter-system signalling configuration, and activity Paging of GRS common area storage for each system in the sysplex / ring RESMIL time for each system in the sysplex / ring
Typically, the RSA should proceed quickly around the ring. However, the RSA may be delayed significantly in cases where the following occurs:
An MVS image is recovering from a spin loop An MVS image is taking an SVC dump There are delays in inter-system communications There are real storage shortages There are auxiliary storage page-in delays
IBM recommends that TOLINT be set to the default value of 180 seconds. This value may be raised or lowered depending on the installation. Keeping TOLINT set to a relatively high timeout value (180 seconds) will prevent premature GRS ring disruptions in cases where there are unexpected but recoverable delays involving systems that participate in the GRS ring. If a system fails and is partitioned out of the sysplex, GRS is notified of this condition and will commence a ring rebuild operation without waiting for the TOLINT timeout value to expire.
At the time of the writing of this document the toleration timeout interval that GRS uses to limit RSA travel time is the lesser of TOLINT (specified in GRSCNFxx) or INTERVAL (specified in COUPLExx). APAR OW11016 was taken to honor TOLINT as specified in GRSCNFxx. Until a fix for OW11016 is applied, you need to inflate the INTERVAL setting to achieve the desired TOLINT timeout value. In MVS 5.1.0 the default storage isolation for GRS was removed. Without storage isolation, excessive paging can occur in the GRS address space resulting in GRS
78
ring disruptions. APAR OW12444 was taken to restore storage isolation to GRS s working set in WLM compatibility mode. A consideration for MVS images that share CPU resources is when MVS is running in a shared logical partition or under VM you should consider the effect that MVS image will have on the rest of the sysplex. If the image is CPU or storage constrained it can have a degrading effect on functions that perform inter-system communications (such as GRS, XCF, Console Communications). Systems that reside in a sysplex must have sufficient resources to ensure that the sysplex performance is not adversely affected. An example of a constrained MVS image is one whose logical CPUs only receives 5% of REAL CPU resources.
IECIOSxx This member contains the Missing Interrupt Handler (MIH) settings as well as the Hot I/O recovery options. Ensure that the Hot I/O recovery options do not include OPER. This specifies to MVS to request the desired recovery action from the operator through a synchronous WTO(R) when a hot I/O condition occurs.
EXSPATxx This member contains the spin loop recovery options. Ensure that OPER is not included. This specifies to MVS to request the desired recovery action from the operator through a synchronous WTO(R) when a spin loop occurs.
79
ARM also integrates with existing function within both automation (AOC/MVS) and production control (OPC/ESA) products. However care needs to be taken when planning and implementing ARM to ensure that multiple products (OPC/ESA and AOC/ESA, for example) are not trying to restart the same elements. If they are the results may not be what is required.
ARM provides only job and STC recovery. Transaction or database recovery is the responsibility of the restarted applications. Initial starting of applications (first or subsequent IPLs) is not provided by ARM. Automation or production control products provide this function. Interface points are provided through exits, event notifications (ENFs) and macros. The system or sysplex must have sufficient spare capacity to guarantee a successful restart. To be eligible for ARM processing, elements (Job/STC) must be registered with ARM. This is achieved through the IXCARM macro. A registered element that terminates unexpectedly is restarted on the same system. Registered elements that are on a system that fails are restarted on another system. Related elements are restarted on the same system. The intended exploiters of the ARM function are the jobs and STCs of certain strategic transaction and resource managers, such as the following: CICS/ESA CP/SM DB2 IMS/TM IMS/DBCTL ACF/VTAM
These products, at the correct level, already have the capability to exploit ARM. When they detect that ARM as been enabled they register an element with ARM to request a restart if a failure occurs.
80
POLICY the maximum number of user defined ARM policies that can be in the couple data set at any given time. MAXELEM the maximum number of elements per policy. TOTELEM the maximum number of elements that are anticipated to be registered with ARM across the sysplex at any given time.
Increasing the parameters can be done dynamically using the SETXCF COUPLE command. Decreasing the parameters needs a sysplex wide IPL. Care needs to taken therefore when allocating the couple data set. A good starting point for defining the ARM couple data set would be:
Refer to MVS/ESA SP V5 Setting up a Sysplex , GC28-1449, for more information on defining the ARM couple data set.
Parameters relating to groups of ARM elements. Elements that have interdependences on each other are referred to as restart groups . Restart groups are pertinent only to ARM s restarting of elements after a system has left the sysplex. Elements from a departed system that are in the same restart group are restarted on the same system. ARM also allows control of the order in which elements in a restart group are restarted. For example, restarting a failed DB2 region before restarting the associated CICS AORs and TORs.
Parameters relating to individual ARM elements. The policy parameters for individual elements indicate how to restart that element in particular situations and/or whether to restart it. When an elements policy entry has such parameters that conflict with each other, it is the entry that indicates whether to restart the element that takes precedent.
Refer to MVS/ESA SP V5 Setting up a Sysplex , GC28-1449, for more information on defining the ARM policy.
81
Change the program that is executed to request ARM services by invoking the IXCARM macro. Avoid changing source by utilizing an ARM driver program. This works by calling the ARM driver program instead of the application program and providing the original program call as a PARM for the driver program.
An example of how to change a program using the IXCARM macro is contained in MVS/ESA SP V5 Sysplex Migration Guide , SG24-4581. The same publication contains a full sample ARM driver program along with detailed examples of its use.
Implement ARM on the MVS images that the CICS workload is to run on. Ensure that CICS startup JCL used to restart CICS regions is suitable for ARM. Each CICS restart can use the previous startup JCL and system initialization parameters, or can use a new job and parameters.
Specify appropriate CICS START options. Specify appropriate MVS workload policies.
82
Implementing ARM for CICS: Implementing ARM for CICS generally involves the following steps:
Ensure that the MVS images available for automatic restarts have access to the databases, logs, and program libraries required for the workload. Identify those CICS regions for which you want to use ARM. Define restart processes for the candidate CICS regions Define ARM policies for the candidate CICS regions Ensure that the system initialization parameter XRF=NO is specified for CICS startup. You cannot specify XRF=YES if you want to use ARM. If the XRF system initialization parameter is changed to XRF=YES for a CICS region being restarted by ARM, CICS issues message DFHKE0407 to the console then terminates.
CICS START Options: It is recommended that START=AUTO is specified. This causes a warm start after a normal shutdown, and an emergency restart after failure. (START=AUTO also resolves to a cold start when you start a region for the first time with newly initialized catalogs.)
It is also recommended to always use the same JCL, even if it specifies START=COLD, to ensure that CICS restarts correctly when restarted by ARM after a failure. With ARM support, if the start-up system initialization parameter specifies START=COLD and the ARM policy specifies that ARM is to use the same JCL for a restart following a CICS failure, then CICS overrides the start parameter when restarted by ARM and enforces START=AUTO. This is reported by message DFHPA1934, and ensures recoverable data is correctly handled by the resultant emergency restart. If the ARM policy specifies different JCL for an automatic restart, and that JCL specifies START=COLD, CICS obeys this parameter with a risk of loss of data integrity. Therefore, if there is a need to specify different JCL to ARM, START=AUTO should be specified to ensure data integrity.
CICSPlex SM/ESA: CICSPlex SM/ESA extends the ability to manage CICS systems to include the logical set of systems to be treated as a single entity, the CICSPlex. CICSPlex SM/ESA has been extended to include the ability to terminate a damaged or hung CICS system that is being managed by ARM. Once terminated ARM will request a restart of the CICS system.
CICSPlex SM ARM support was provided via APAR PN65642 for V1.1.1 only. CICSPlex SM/ESA ARM support will allow both on demand and automatic ARM restart to be requested. This support is implemented as follows:
On demand restart. The CICSRGN set of views has been extended to support an ARM primary and line command. When requested this will result in the cancellation of the CICS region with a request to ARM restart. Automatic restart. The ACTION definition has been extended to allow specification of an ARM restart option. If specified this will direct RTA to request cancellation of all CICS regions within the scope of the outstanding event.
83
ARM support is only be available if the following criteria are met for a CICS region:
The CICS region must be connected to a CMAS as a Local MAS. The operating system release must be MVS/ESA 5.2. ARM must be active in the MVS image. The CICS release must be CICS/ESA 4.1 or greater. The CICS region must have registered with ARM during initialization. The current ARM Policy must allow the region to be restarted.
If all the above criteria are true the CICS region will be terminated by internally issuing the following MVS command:
CANCEL name,ARMRESTART
The specification of ARMRESTART tells ARM to become involved. The CANCEL command will be issued in the CMAS to which the CICS region is connected as a local MAS. There is no interface for attempting ARM restart for a CICS region connected as a remote MAS.
The IMS environments supported are: TM-DB, DCCTL, DBCTL and XRF. DL/1, DDB batch and the IMS utilities are not supported. The IMS control region is the only region restarted by ARM. The DL/1 SAS and DBRC regions are started internally by the IMS control region. IMS dependent regions are not automatically restarted as these are normally restarted by some form of automation after the IMS control region has restarted. The element name that IMS registers with ARM is the IMSID. The IMSID should be unique across the sysplex because ARM attempts to move IMS to a surviving system if the system IMS is executing on fails. If the IMSID is not unique, ARM may move the IMS failing system to one that already has an IMS with the same IMSID. The element type IMS specifies when registering with ARM is SYSIMS. The default is for IMS to register with ARM and allow ARM to restart IMS in case of failure. The following is a new startup parameter which has been added to allow the user to stop IMS from registering with ARM:
ARMRST= Y allow ARM to restart IMS (default) N do not allow ARM to restart IMS
In an XRF environment, when the backup IMS (alternate) has started the tracking phase, the active IMS system is deregistered from ARM and is not automatically restarted. This is necessary to insure that the active system does not automatically restart after the backup takes over. If the old active
84
IMS was allowed to restart, the IMS (OLDS) and message queue integrity could be destroyed.
If IMS abends before it completes restart (XRF tracking phase is considered to be restart complete for an XRF backup), it deregisters from ARM and is not automatically restarted. If IMS is cancelled, IMS is not automatically restarted by ARM unless the ARMRESTART option was specified on the CANCEL or FORCE command. IMS maintains a user abend table and deregisters from ARM any time one of the abends in this abend table is experienced. The abend codes currently in this table are:
MODIFY /CHE ABDUMP /SWITCH QUEUES FULL QUEUE I/O ERROR CICS TAKEOVER
All of these abends are either a result of operator intervention or require some external changes before IMS can be restarted.
If any call to ARM fails, IMS issues a warning message and continues to execute. The message is:
DFS0403WW IMS xxxxxxxxx CALL TO MVS ARM FAILED RETURN CODE= nn, REASON CODE=nnnn values for xxxxxxxxx are: REGISTER register with ARM READY tell ARM that IMS is ready to accept work ASSOCIATE tell ARM that this is an XRF alternate UNKNOWN unknown request sent to DFSARM00
IMS enables and ENF listening exit for ENF signal 38. This is the signal value that ARM uses to indicate it has a failure of some kind, at which point IMS deregisters. The ENF signal is issued again when the ARM failure condition has been corrected and IMS reregisters. If IMS is being restarted by ARM, it ignores the AUTO=NO value in the IMS start parameters. ARM indicates whether this is an XRF alternate or not, so that IMS does does need the restart command to know how to start.
Using an Automatic Restart Policy: As with other subsystems DB2 will be restarted in the event of failure, as specified in either the default ARM policy or the installation written one. In any policy, the job or STC is referred to as an element . In a data sharing group, element is the concatenated DB2 group name and member name (such as DSNDB0GDB1G). Wild cards (such as DSNDB0G*) can be specified if a single policy statement for all members in the group is to be used.
85
To specify that DB2 is not to be restarted after a failure, RESTART_ATTEMPTS(0) should be included in the policy for that DB2 element.
Command Prefixes: DB2 Version 4 includes support for one to eight character command prefixes. A command prefix replaces the existing subsystem recognition character (SRC) for recognizing commands and are used in message displays as well.
To use multiple-character command prefixes with DB2 V4 the IEFSSNxx subsystem definition statements in SYS1.PARMLIB must be updated. The subsystem definition statement is changed to allow the command prefix to be specified. The format of the SYS1.PARMLIB IEFSSNxx subsystem definition statement to define the command prefix is as follows:
ssname,DSN3INI, DSN3EP,prefix,scope,group-attach
where:
ssname the one to four character DB2 subsystem name. prefix the one to eight character command prefix. scope the one character scope for the command prefix. DB2 Version 4 registers its command prefix with MVS. When this is done, the scope of the command prefix is controlled by the value chosen: M is for system scope (one MVS system) and register the prefix at IPL. X is for sysplex scope and register the prefix at IPL. S is for started and register the prefix with sysplex scope at DB2 startup instead of MVS IPL time.
It is recommended that S be chosen. This allows for a single IEFSSNxx parmlib member to be used by all MVS systems in the sysplex. It also simplifies the task of moving a DB2 from one system to another; DB2 can be stopped on one system and started on another. There is no need to re-IPL the system. For more information about the command prefix facility of MVS, see MVS/ESA SP V5 Planning: Operations , GC28-1441.
Group-attach This is the group attachment name. This is specified on installation panel DSNTIPK.
Here is an example definition of a subsystem with a name of DB1G and a started scope command prefix of -DB1G.
DB1G,DSN3INI, DSN3EP,-DB1G,S,DB0G
The existing one-character subsystem recognition character can continue to be used as the command prefix. This means the existing IEFSSNxx definitions can be used whilst migrating to DB2 Version 4. These one-character command prefixes are given a started scope. To change the command prefix parameters, the IEFSSNxx entry must be changed and the host system IPLed. Unless circumstances dictate otherwise, to minimize the requirement for IPLs, system or sysplex-wide, code the scope as S.
86
2.18 JES3
The following chapter will discuss all JES3 items required to achieve high availability in a JES3 complex.
2.18.1 Planning
To achieve the goal of continuous availability in a parallel sysplex environment, an installation must configure the hardware and software such that no planned or unplanned outage will disrupt all systems in the sysplex at the same time. Planned outages include installing new software releases or hardware upgrades, or changing the configuration. JES3, even at the current Version 5.2.1, requires
87
a concurrent restart of all systems in the JES3 complex for the following changes:
Any addition of MAINs to the initialization deck. Any addition of RJPs to the initialization deck. Any change in the JES3 managed device configuration. Any change in the JES3 user exits. Any upgrade of JES software maintenance or release.
Main Definitions Adding a MAINPROC statement to the JES3 initialization deck requires a JES3 complex-wide warm start. If the JES3 complex maps one-to-one with the sysplex, this means a sysplex-wide disruption, because today the IPL and warm start of the systems in the JES3 complex must be concurrent. Prepare for future additions of mains by coding additional MAINPROC statements. The cost is a small amount of additional storage required for control blocks to support the extra processor definitions, and potentially some operational confusion when displays contain names of non-existent mains, as shown below in Figure 23.
*I S IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 IAT5619 ALLOCATION QUEUE = 00000 BREAKDOWN QUEUE SYSTEM SELECT QUEUE = 00000 ERROR QUEUE SYSTEM VERIFY QUEUE = 00000 FETCH QUEUE UNAVAILABLE QUEUE = 00000 RESTART QUEUE WAIT VOLUME QUEUE = 00000 VERIFY QUEUE ALLOCATION TYPE = AUTO CURRENT SETUP DEPTH ALL PROCESSORS = 00000 MAIN NAME STATUS SDEPTH DASD SC50 ONLINE IPLD 020,000 00208,00000 SC49 ONLINE IPLD 020,000 00208,00000 SC43 ONLINE IPLD 020,000 00208,00000 SCNEW1 ONLINE NOTIPLD 020,000 00208,00000 SCNEW2 ONLINE NOTIPLD 020,000 00208,00000 = = = = = 00000 00000 00000 00000 00000
Note that it is possible to change the names of mains in the initialization deck deck without the requirement for a warm start.
RJP Workstation Definitions The same considerations as those discussed above for adding JES3 mains applies to RJP workstation definitions. Any addition of RJPWS definitions to the JES3 initialization deck requires a JES3 complex-wide IPL and warm start. Therefore, it is necessary to plan ahead and define additional RJPs in advance.
JES3-Managed Devices
88
In the MVS/ESA SP Version 5.2 parallel sysplex environment, there may be good reasons, as discussed below in 2.18.2, JES3 Sysplex Considerations on page 89, for removing tape and DASD devices from JES3 control. Once these devices are no longer JES3-managed, they can be dynamically added and deleted from the configuration without the requirement for a JES3 complex-wide warm start. Note that as of JES3 Version 5.2.1, JES3 no longer supports JES3 operator consoles. That is, the JES3 initialization deck CONSOLE statement is ignored.
MVS/ESA 5.2 Shared Tape MVS/ESA SP Version 5.2 introduces allocation support for sharing tapes between multiple systems in the sysplex. Up until now, JES3 has always managed the sharing of tapes between systems in the JES3 complex. In order to manage tape sharing, the tape devices were defined to JES3 in the JES3 initialization deck as JES3-managed devices; that is, the tapes were identified to JES3 on the DEVICE statement of the initialization deck. In a JES3 parallel sysplex environment, it is necessary to choose between having JES3 manage tapes or the new MVS/ESA SP Version 5.2 shared tape support. A tape device cannot be both auto-switchable and JES3-managed at the same time, as shown in Figure 24 on page 90. Note that once a device has been varied online to a JES3 system, that device remains under JES3 control for the life of the IPL, and in the example shown above, cannot be used as an auto-switchable device for the remainder of the IPL. The advantages of JES3-managed tape devices include: JES3 Soft Allocation An advantage of JES3-managed devices is that JES3 performs setup, or soft allocation, of the devices required for a job before it begins execution. The disadvantages of JES3-managed tape devices include: No Dynamic I/O Reconfiguration support. Loss of I/O symmetry across all systems in the sysplex. JES3 complex-wide warm start required to change the configuration.
Consoles In JES3 Version 5.2.1, JES3 consoles no longer exist, and their definitions should be removed from the JES3 initialization stream.
89
-D U,TAPE,,000,64 S008= MSTRJCL IEE457I 11.05.28 UNIT STATUS 086 S008= MSTRJCL UNIT TYPE STATUS VOLSER . . . S008= MSTRJCL 0B38 349S OFFLINE-AS S008= MSTRJCL 0B39 349S OFFLINE-AS S008= MSTRJCL 0B3A 349S OFFLINE-AS
*V B3A ONLINE S008= JES3 IEE302I 0B3A ONLINE S008= JES3 IEF259I UNIT 0B3A IS NO LONGER DEFINED AS AUTOSWITCH IAT5510 0B3A VARIED ONLINE ON GLOBAL D U,,,B3A,1 S008= MSTRJCL S008= MSTRJCL S008= MSTRJCL
IEE457I 11.22.08 UNIT STATUS 102 UNIT TYPE STATUS VOLSER 0B3A 349S O -M
VOLSTATE /REMOV
*V B3A OFF SC50 IAT8180 0B3A VARIED OFFLINE TO JES3 ON SC50 S008= JES3 IEF281I 0B3A NOW OFFLINE D U,,,B3A,1 S008= MSTRJCL S008= MSTRJCL S008= MSTRJCL V B3A,AS,ON S008= MSTRJCL S008= MSTRJCL
IEE457I 11.56.08 UNIT STATUS 253 UNIT TYPE STATUS VOLSER 0B3A 349S F-NRD
VOLSTATE /REMOV
90
XCF Group Name The fencing of each JES3 complex within the sysplex is defined by the XCF group name. The XCF group name may be explicitly coded on the OPTIONS statement in the JES3 initialization deck, or, the recommended way is to let the group name default to the node name corresponding to the NJE homenode, that is, where HOME=YES is coded. A portion of a sample JES3 initialization deck is shown below in Figure 25. The XCF group name for this JES3 complex is WTSCPLX9.
*-------------------------------* NJE NODE DEFINITIONS *-------------------------------NJECONS,CLASS=S12,SIZE=128 NJERMT,NAME=WTSCPLX9,HOME=YES NJERMT,NAME=WTSCPLX1,TYPE=SNA NJERMT,NAME=WTSCMXA,TYPE=SNA NJERMT,NAME=C5JES3,PATH=WTSCMXA NJERMT,NAME=C2JES2,PATH=WTSCMXA NJERMT,NAME=C5JES2,PATH=WTSCMXA NJERMT,NAME=C2JES3,PATH=WTSCMXA NJERMT,NAME=WTSCPOK,PATH=WTSCMXA NJERMT,NAME=WTSCPLX1,PATH=WTSCMXA *-------------------------------Figure 25. NJE Node Definitions Portion of JES3 Init Stream
Command Prefix
91
The JES3 command prefix requires careful consideration in this environment. If there is a single JES3 complex within the sysplex (JES3PLEX=SYSPLEX), then * will be the default for PLEXSYN. If there is more than one JES3 complex within the sysplex, then it is necessary to change all JES3 initialization decks to specify a PLEXSYN value other than *, and it is necessary to specify a value of * for the SYN parameter. Both the PLEXSYN and SYN parameters are specified on the CONSTD statement in the JES3 initialization deck.
JES3 Proc The parallel sysplex philosophy and cloning support eliminates the need to keep individual copies of critical data sets and definition libraries for each system in the sysplex. With this is mind, it is recommended to maintain a single SYS1.PROCLIB for all systems in the sysplex, and take advantage of the cloning support to tailor the JES3 proc for the different globals within the sysplex. figref refid=j3proc. provides an example of how the JES3 proc may be coded to accommodate its use by multiple JES3 globals within the sysplex.
//JES3 PROC JES=JES3,ID=01 //JES3 EXEC PGM=IATINTK,DPRTY=(15,15),TIME=1440,REGION=0M //STEPLIB DD DSN=SYS1.JES3LIB,DISP=SHR //* ----------------------------------------------------* //* //* JES3 PROCEDURE: JES3 //* //* RELEASE: ALL RELEASES //* CONFIG: //* STEPLIB: SYS1.JES3LIB //* CHKPT: SYS1.JES3CKPT //* SYS1.JES3CKP2 //* DISK RDR: SYS1.JES3DR //* JCT: SYS1.JES3JCT //* SPOOL: SYS1.JESPACE //* JES3OUT: ON-LINE PRINTER //* DUMPS: ON-LINE PRINTER //* PROCLIB: NONE DEFINED (DYNALLOC REQUIRED) //* JSMSSTAB: SYS1.JES3MSS //* INISH: SYS1.JES3IN //* //* ----------------------------------------------------* //CHKPNT DD DSN=SYS1.&JES.CKPT,DISP=SHR //CHKPNT2 DD DSN=SYS1.&JES.CKP2,DISP=SHR //JES3DRDS DD DSN=SYS1.&JES.DR,DISP=SHR //JES3JCT DD DSN=SYS1.&JES.JCT,DISP=SHR //SPOOL1 DD DSN=SYS1.&JES.SPL1,DISP=SHR //JES3OUT DD DSN=SYS1.&JES.OUT,DISP=SHR //JES3SNAP DD DUMMY //SYSMDUMP DD DSN=SYS1.&JES.DUMP,DISP=SHR //JESABEND DD DUMMY //JES3IN DD DSN=SYS1.PARMLIB(JES3IN&ID),DISP=SHR //*
Figure 26. Sample JES3 Proc for Use by Multiple Globals
92
93
94
CICS/ESA V4 CICSPlex SM V1 IMS/ESA V5 DB2 V4 DFSMS 1.3 (VSAM RLS) TSO/E 2.4 NetView V3.1 AOC/MVS OPC/ESA VTAM 4.2
These products generally fall into one of the three following categories:
Just as it is with the hardware, redundancy of software subsystem components is one of the keys to providing continuous availability. In a parallel sysplex, you must have the capability to detect and route work around unavailable resources. This is the job of a transaction manager like CICS. The database managers, such as DB2, have the capability of providing concurrent access to the data. The system management products are needed to help simplify the complexity of the coupled environment. All of these products must be configured in such a way that they do not represent a single point of failure. Also, any instance of a product must be able to be added or removed from the system without impact to end user availability.
CICS/ESA Version 4 allows MRO operation between systems using XCF. With CICS/ESA Version 3, a dynamic transaction routing exit was added allowing routing decisions to be made based on programmable criteria. CICSPlex SM provides sophisticated routing algorithms to perform workload balancing and failure avoidance.
95
VTAM generic resource capability along with CICS 4.1 or higher allows the terminal user to log on to any of the terminal owning regions (TORs). This enables VTAM to perform dynamic balancing of the sessions across the available terminal-owning regions, thus removing a potential single point of failure. The terminal-owning regions can in turn perform dynamic workload balancing using the CICS dynamic transaction routing facility, which leads to improved availability for individual transactions. Dynamic transaction routing is controlled through the CICSPlex SM product.
96
Customize CICSPlex SM to recognize the affinity and route to the appropriate AOR. Modify the application to remove the affinity and remove the single point of failure from the AOR for those transactions.
Support for VSAM files and data tables CSD management Dynamic addition of MRO connections Autoinstall for programs, mapsets, and partitionsets Autoinstall for LU6.2 parallel sessions
97
STGPROT=NO indicates that storage protection is not active and storage is acquired in key 8, as for previous releases. STGPROT=YES indicates that storage protection is active and that storage will be acquired in storage protect key 9.
3.2 CICSPlex SM V1
IBM CICSPlex System Manager (CICSPlex SM) is a system-management tool that provides the following functions:
A real-time single-system image A single point of control Automated workload management Automated exception reporting for CICS resources Collection of statistical data for CICS resources
A single-system image means that the CICSPlex SM operator can manage multiple CICS systems, distributed across the parallel sysplex as if they were one system. A single command is sufficient to make changes throughout the CICSPlex. CICSPlex SM can balance the enterprise workload dynamically across the available AORs, thereby enabling you to manage a variable workload without operator intervention. CICSPlex SM routes transactions away from busy regions and from those that are failing or likely to fail, giving improved throughput and availability to the end user. Furthermore, planned service is made much easier. A CICS region can be taken down for maintenance without having to worry about that region s work, because CICSPlex SM can simply route the work dynamically to another region.
98
RTA resource monitoring evaluates the status of any CICS resource. External notifications are issued when the resource moves outside of the declared status range. For example, RTA resource monitoring can warn you that dynamic storage area (DSA) free space is falling, that a file is disabled, that a journal is closed, that the number of users of a transaction is growing, and so on. Once the exception is detected, RTA can issue an SNA generic alert to NetView, thereby allowing NetView to take corrective action. External messages, which are directed to the console by default, can be intercepted for automation by other products. For example, external messages may be intercepted and processed by AOC CICS automation.
Figure 28. CICSPlex SM. The CAS in each system communicates with the CMASs to collect information and present a single view of the CICSplex.
99
Figure 29. Sample IMS 5.1 Configuration. Each MVS image contains an IMS subsystem, IRLM, shared Recon data and a shared database.
100
When migrating to an IMS data sharing environment the following items need to be considered:
Set up your IMS RESLIBs so you can clone your IMS subsystems across parallel sysplex. Ensure that the IMSID is unique for each IMS subsystem in the sysplex in order that the IMS subsystem can be moved to any MVS image, if necessary. Ensure that all terminal names, LU names, and ETO user IDs in the network are unique. Divide the network to balance your workload and to minimize network outage if you lose one of your IMS subsystems. Convert batch jobs to BMP programs, to minimize the number of connections to the coupling facility.
3.3.3 IMSIDs
Having unique IMSIDs lets you move your IMS subsystems to another MVS image when necessary. The IMSID also has to be different from any non-IMS subsystem identifier defined to the MVS under which IMS is running. The IMSID specified in the IMSCTRL macro (part of stage 1) can be overridden at execution by specifying a keyword in the DFSPBxxx member or a parameter on the EXEC statement.
101
Have a unique IRLMID Specify the same data sharing group name in the group parameter Specify the same lock structure using the LOCKTABL parameter
Although you can specify these definitions on the IRLM startup procedure, the recommended method is to define them using the CFNAMES control statement.
102
Note The SVC utility does not remove the need to add the IMS SVC to the MVS nucleus. Every MVS IPL will regress the customer back to an old IMS SVC and requires using the utility to reinstall a new IMS SVC.
MSDBs DEBDs with SDEPs DEBDs using Virtual storage option (VSO)
A possible solution for these databases is to place them in only one system and have the transactions routed using Multiple System Coupling (MSC) to that IMS. Another solution would be to convert these databases.
103
Figure 30. Sample DB2 Data Sharing Configuration. subsystem,IRLM, shared database.
Once a DB2 data sharing group has been established, you can stop and start individual members in the data sharing group while the other members continue to process. You also can configure a new DB2 subsystem into the group without affecting the existing members.
104
105
Run all systems performing RLS as a sysplex. Define and activate sharing control data sets (SHCDS). Define CF cache and lock structures to MVS, using the coupling facility resource manager (CFRM) policy, and to the SMS base configuration. Associate CF cache set names with storage class definitions, and write routines to associate data sets with storage class definitions that map to cache structures. Change the attributes for a data set to specify whether the data set is recoverable or nonrecoverable. Specify LOG(NONE) if the data set is nonrecoverable. Specify LOG(UNDO) or LOG(ALL) if the data set is recoverable.
Figure 31 on page 107 shows the major components involved in VSAM RLS.
106
Figure 31. Sample VSAM RLS Data Sharing Configuration. Each SMSVSAM Address space has access to the coupling facility which contains the lock and cache structures.
The name of the CF lock structure in use The system status for each system or failed system instance The time that the system failed A list of subsystems and their status A list of open data sets using the CF
The VSAM sharing control data sets are logically-partitioned, linear data sets. They can be defined with secondary extents, but all the extents for each data set must be on the same volume. You should define at least three sharing control data sets, for use as follows:
VSAM requires two active data sets for use in duplexing mode. VSAM requires the third data set as a spare in case of failure of one of the active data sets.
107
Place the SHCDSs on volumes with global connectivity. VSAM RLS processing is only available on those systems which currently have access to an active SHCDS. Ensure that the space allocation for active and spare SHCDSs is the same. See the DFSMS/MVS Version 1 Release 3 DFSMSdfp Storage Administration Reference for more information about sharing control data sets and for sample JCL for defining them.
No recovery required; the sphere is not recoverable Backout required; the sphere is recoverable Backout and forward recovery is required; the sphere is recoverable
It is strongly recommended that LOG(ALL) be specified when defining the VSAM data sets so that CICS and VSAM can provide a high availability environment with forward recovery and back out capabilities. With LOG(ALL), you must specify in the ICF catalog the name of the MVS log structure that CICS is to use as the forward recovery log. This log stream name must match the name of a log stream defined to the MVS system logger.
108
1. CFRM policy update You define the coupling facility cache structures through the XES coupling definition process. 2. SMS configuration and ACS routine updates From an SMS point of view the following actions are required:
Update the Base Configuration The base configuration section now includes the list of cache set names. This new list has an entry for each unique cache set name specified in the storage class. Each cache set name has an associated list of up to eight coupling facility cache structure names.
Set up the Storage Class construct The CACHE SET parameter has been added to the storage class construct. The cache set name is used to map the storage class to a cache set that you specify in the base configuration.
Storage Class ACS Routine You will be required to update the current storage class routine to match the new configuration.
Addition Information For further details, please refer to DFSMS/MVS 1.3 Implementation Guide , GG24-4391.
Share structures between MVS images. This provides immediate logstream recovery for the logs used by the failing image. Otherwise recovery will be delayed until the next time a system connects to the failed logstream. Use a standard naming convention for the log structures, that equates the structure to the type of logstream. For example: LOG_DFHLOG_001 CICS system logs LOG_DFHSHUNT_001 CICS Secondary system logs
109
3.7.1 NetView
In multisystem environments today, the recommendation is to have one of the systems act as a focal point system to provide a single point of control where the operator is notified of exception conditions only. This is still the case in a parallel sysplex. Another recommendation is to automate as close to the source as is possible. This means, having both NetView and AOC/MVS installed on all systems in the parallel sysplex to enable automation to take place on the system where the condition or message occurs . For continuous availability a backup focal point NetView should also be planned for in your configuration. This will allow the current focal point system to be taken out of the parallel sysplex for planned outages. In addition, unplanned outages on the focal point will not render the parallel sysplex inoperable. It is recommended that the NetView focal point and the focal point backup exist on the two VTAM network nodes in the parallel sysplex. Refer to 3.8, VTAM on page 112 for information about the VTAM configuration.
3.7.2 AOC/MVS
One of the principal tools for system automation is IBM SystemView Automated Operations Control/MVS (AOC/MVS). AOC/MVS extends NetView to provide automated operations facilities that improve system control. With Release 4 of AOC/MVS, support is provided for the Automatic Restart Management (ARM) function. Failing applications can be automatically restarted. With improved status coordination between focal point and target systems, the status monitored resources can be accurately reflected on the
110
AOC/MVS graphical interface. Operators can take prompt recovery actions for outages.
3.7.3 OPC/ESA
OPC/ESA provides automation for planning, controlling, and managing batch workload over multiple MVS systems today. So there is little change when these systems form a parallel sysplex.
111
3.8 VTAM
VTAM can be configured in a multitude of ways in and outside of the sysplex. It is not our intention to describe all of the possible VTAM configurations, instead only the items that apply to availability will be discussed.
3.8.1 Configuration
The generic resources function is provided by a sysplex using Advanced Peer-to-Peer Networking (APPN). You need VTAM 4.2 for generic resources support, and at least one VTAM in the sysplex must be an APPN network node, with the other VTAMs being APPN end nodes. Each VTAM must be connected to the coupling facility and be part of the same sysplex. A high availability environment requires more than one APPN network node in the sysplex.
112
The kind of software setup you need in order to accomplish this is also discussed.
113
114
Changes can be introduced in an orderly manner with all personnel being aware of and ready to respond to the change. The introduction of the change can be tied to the capacity planning and performance management disciplines to ensure sufficient resources exist to support the change. The change management process forces more detailed planning. This reduces problems and allows either more time to implement additional planned changes or consolidating changes to reduce the number of planned outages rather than spending time reacting to problems.
The change management process must ensure that all changes are tracked adequately. With the philosophy in a parallel sysplex being that change is introduced in one place and then propagated through the sysplex over a period of time, you must be able to determine, at any point in time, what changes have been implemented on what elements within the parallel sysplex.
115
The only tasks left for operators are those that cannot be automated. Operators are alerted only to exception conditions requiring them to take some action. Operators are aware of the status of the sysplex.
This table contains a list of strongly
2.17.4.2, ARM and IMS on page 84 1.12.6.2, Hardware Requirements on page 24 9.13.1, Sysplex (XCF) Couple Data Set Failure on page 206
4.1.5 Summary
The bottom line is, if the increased availability potential of a parallel sysplex is to be realized, then the installation s system management disciplines need to be as follows:
In place Of a very high standard Adhered to rigorously Reviewed regularly and updated accordingly
116
From the active CFRM policy Structure name Structure initial size Structure maximum size
From the exploiter s internal code Structure type (cache, list or locks). Requested volatility state. Structure disposition. Permission to rebuild. Permission to alter size. Apportioning specifications such as: - Directory to data element ratio for cache structure. - List entry to list element ratio for list structure. - Lock entries and users numbers for lock structure. The level of CFCC required (CFLEVEL0 or CFLEVEL1 for the time being).
Further details can be found in Programming: Sysplex Services Guide , GC28-1495. The allocation of the structure is performed as per the current CFRM policy preference and exclusion lists for the structure. The selection process points to the first available coupling facility in the preference list which satisfies the following:
It has connectivity to the system on which the request is made. It has a CFLEVEL equal to or greater than the requested CFLEVEL.
117
It meets the volatility requirement. It meets the failure independence requirement. It does not contain structures in the exclusion list. It has the requested space available.
If there is no coupling facility meeting above criteria, XES goes again through the allocation selection by relaxing some criteria:
It ignores the exclusion list. It ignores the failure independence requirement. It ignores the volatility requirement. It returns a structure in the coupling facility with the most storage that meets or exceeds the CFLEVEL requirement.
118
Notes: 1. A structure with active or failed-persistent connections cannot be deallocated. The connections must be put in the undefined state first. See 5.2.2, Connection State and Disposition on page 119 2. A structure with a related dump still in the coupling facility dump space cannot be deallocated. The dump has to be deleted first. See 5.3, Structure Dependence on Dumps on page 120 3. Because of the risk of data loss, care must be taken when using the SETXCF FORCE command. All consequences must be well understood before issuing the command. IBM Exploiters Using Structures with a Disposition of DELETE
XCF signalling structures RACF database structures Automatic Tape Switching structure System Logger logstream structures DB2 GBP structure IMS/DB OSAM and VSAM caches VTAM generic resource name structure
Undefined means that the connection is not established. Active , means that the connection is being currently used Failed-persistent means that, the connection has abnormally terminated but is logically remembered although itis not physically active.
At connection time, another parameter in the IXLCONN macro indicates the disposition of the connection. A connection can have a disposition of KEEP or DELETE. A connection with a disposition of KEEP is placed in what is called a failed-persistent state if it terminates abnormally, that is, without a proper completion of the exploiter task. When in the failed persistent state, a connection will become active again as soon as the connectivity to the structure is recovered. The failed-persistent state can be thought of as a place holder for the connection to be recovered. Note that in some special cases a connection with a disposition of KEEP may be left in undefined state even after an abnormal termination. A connection with a disposition of DELETE is placed in an undefined state if it terminates abnormally. When the connectivity to the structure is recovered, the exploiter has to reestablish a new connection.
119
To Check for Structure and Connection State and Disposition Refer to Appendix B, Structures, How to ... on page 241
The access to the structure by exploiters is delayed until the capture of the dump data is complete. Although the dump space defined in the coupling facility is intended to capture dump data without holding the structure s exploiters for too long a time, a dump serialization time limit is specified as a parameter of the IXLCONN macro. When reaching this time limit, the dump function is terminated and the structure is released. This limit can be furthermore enforced or overridden by DUMP command parameters. A structure dump residing in the coupling facility dump space prevents structure deallocation until it is transferred onto the dump data set or until it is deleted by using the SETXCF FORCE command. The SETXCF FORCE command allows thefollowing: Deletion of a structure dump Release of the dump serialization for a structure
120
2. Another way to move a structure is to use the rebuild function against the structure, either by an operator initiated rebuild (using the SETXCF START,REBUILD command) or by a dynamic rebuild request issued by the structure exploiter during its recovery process (using the IXCREBLD macro). The structure rebuild function can be explicitly required to rebuild the structure in another coupling facility. For further details, refer to 5.4.1, The Structure Rebuild Process. Important Notice Not all structures can be rebuilt. A structure can be originally allocated with rebuild disallowed, in such a case all requests to rebuild the structure will be denied, and structure movement will have to be performed by deallocation and reallocation.
An operator initiated request to rebuild A connected exploiter request because of: Loss of connectivity to the structure Structure failure Specific exploiter reason (as per specific exploiter s conventions).
The rebuild process can also be stopped (and therefore does not complete successfully) because of one of the following reasons:
An operator initiated request to stop the process A connected exploiter request because of: Loss of connectivity to the original structure while rebuilding Original structure failure while rebuilding Specific exploiter reason
When the rebuild process completes successfully, the original instance of the structure is deallocated and the processing resumes using the new instance of
121
the structure. When the rebuild process does not complete successfully, the new instance of the structure is deallocated, and the original instance remains as it was when entering the rebuild.
To rebuild a structure in the first matching coupling facility in the structure active preference list. This will probably end up in rebuilding in the same coupling facility, unless the preference list has been changed since structure allocation.
SETXCF START,REBUILD,STRNM=strname,LOCATION=NORMAL
To rebuild a structure as per the active preference list but excluding the current coupling facility:
SETXCF START,REBUILD,STRNM=strname,LOCATION=OTHER
Rebuild can also be invoked for the whole contents of a coupling facility:
SETXCF STOP,REBUILD,STRNAME=strname
or
SETXCF STOP,REBUILD,CFNAME=cfname
Structure Rebuild Affects Performance The utilization of the structure is suspended for the complete duration of the rebuild process. This may temporarily affect system throughput if dealing with a heavily used structure.
The structure size, either by the rebuilder decision or because of a modification to the active CFRM policy. Attributes set up by the requestor code, such as:
122
Request for nonvolatile coupling facility Directory to element ratio for a cache structure Entry to element ratio for a list structure Lock entries for a lock structure
Details on structure allocation and rebuild can be found in Sysplex Services References , GC28-1496. Information on how IBM exploiters support rebuild can be found in Table 7.
Table 7. Support of REBUILD by I B M Exploiters
Exploiting function XCF Signalling System Logger JES2 CKPT Shared Tape Structure name IXC..... user defined name user defined name IEFAUTOS Rebuild Supported Yes Yes No Yes Yes, but
RACF
IRRXCF00
IMS OSAM Cache IMS VSAM Cache IRLM lock table IMS IRLM lock table DB2
user defined name user defined name user defined name groupname_LOCK1 VTAM 4.2 -ISTGENERIC
VTAM 4.3 can be user defined groupname_SCA groupname_GBP user defined name
SETXCF START,ALTER,STRNAME=strname
The structure cannot be expanded beyond the SIZE parameter value specified in the active CFRM policy. To check for the maximum size allowed use the following:
D XCF,STRUCTURE,STRNAME=strname
A structure can also be dynamically altered by a program using the IXLALTER service. The IXLALTER service allows modification to the structure size and the entry to element ratio attribute, and is intended to provide dynamic structure reapportionment capability.
Chapter 5. Coupling Facility Changes
123
XES accepts the ALTER request if all of the following are true:
SCP is MVS 5.2 or higher. The structure to be altered is in a coupling facility with CFLEVEL=1 or higher. All currently active or failed-persistent connectors to the structure, allowed structure alter when they connected. The structure is not already in the rebuild process. Structure ALTER for a Persistent Structure
A structure with a disposition of KEEP and no active or failed-persistent connectors can be altered.
A structure alteration can be stopped either by a connecting program or by the operator with the following command:
SETXCF STOP,ALTER,STRNAME=strname
Structure rebuild and structure alter can be thought of as complementary functions. The structure rebuild function allows the changing of many of the structure attributes but requires planning in that coupling facility space must be available for later rebuild use. Structure alter does not require additional space to be reserved beyond the maximum SIZE specified in the active CFRM policy and does not disrupt the processing of connectors to the structure while it is being altered. Information on how IBM exploiters support alter can be found in Table 8.
Table 8. Support of ALTER by I B M Exploiters
Exploiting function XCF Signalling System Logger JES2 CKPT Shared Tape RACF IMS OSAM Cache IMS VSAM Cache IRLM lock table IMS IRLM lock table DB2 VTAM generic resource Structure name IXC..... user defined name user defined name IEFAUTOS IRRXCF00 user defined name user defined name user defined name groupname_LOCK1 VTAM 4.2 -ISTGENERIC VTAM 4.3 can be user defined DB2 SCA DB2 GBP SMSVSAM groupname_SCA groupname_GBP user defined name Alter Supported Yes Yes Yes No No No No Yes Yes No No Yes Yes Yes
124
If no active policy is currently available, the activation takes effect immediately. If a policy is already active, the transition to the new policy parameters may not occur immediately: When adding a new coupling facility, the preference list for each structure definition in the active policy has to be updated with the new coupling facility logical name. This updating takes some time and it may prevent the operator to immediately use the new coupling facility logical name in commands. When changing the dump space size in the new policy, MVS attempts to change the dump space size immediately in the coupling facility, and if not successful, continues to attempt the change. Use the DISPLAY CF command to determine the dump space size in the policy and the dump space actually defined in the coupling facility. When deleting a coupling facility or structure or when modifying a structure in the new policy the following occurs: - The change takes effect immediately if the coupling facility resources are not allocated for the particular structure. - The change remains pending if coupling facility resources are allocated for the structure. The D XCF command provides specific information about a structure and any pending policy changes. The structure resources need to be deallocated by either operator command such as SETXCF FORCE, or if structure rebuild is allowed SETXCF START,REBUILD can be used to rebuild a new instance of the structure as per the new parameters in the CFRM policy.
The addition of a coupling facility or structure takes effect immediately assuming that the CFRM couple data set has free space available to record these new resources. If not, other resources have to be freed or the CFRM couple data set must be reformatted to accommodate for the new additional
125
resources. Refer to 5.7, Reformatting the CFRM Couple Data Set on page 126 for the procedure to reformat a couple data set non disruptively. Examples of CFRM Policy Transitioning See Appendix C, Examples of CFRM Policy Transitioning on page 249
Maximum planned number of policies to install in the couple data set Maximum number of structures, and maximum number of connectors for any given structures. Maximum number of coupling facilities in the installation.
Should the size of the CFRM couple data set be proven to have been planned incorrectly, the following procedure can be used to dynamically put online a new couple data set with the appropriate size. Note that this procedure works only when increasing the size of the couple data set. To Decrease the Size of the Couple Data Set Decreasing the size of a couple data set cannot be done nondisruptively: an alternate couple data set smaller than the primary couple data set cannot be brought online concurrently. You must prepare the new couple data set and the new COUPLExx member, then IPL the sysplex using this new couple data set.
1. Run IXCL1DSU against a spare couple data set with the new couple data set specifications. 2. When the spare couple data set is formatted, use the command SETXCF COUPLE,ACOUPLE=(spare_dsname,spare_volume),TYPE=CFRM to make the spare couple data set a new alternate CFRM couple data set. Note: As soon as the spare couple data set has been switched into alternate, the new alternate couple data set will be loaded with the primary couple data set policies contents. 3. Then switch the new alternate into the new primary couple data set:
SETXCF COUPLE,TYPE=CFRM,PSWITCH
4. The previous primary couple data set is no longer in use, and can be enlarged by the same process before becoming in turn a new alternate couple data set Keep COUPLExx in sync It is recommended that the COUPLExx member be updated after swapping the couple data sets, so that an operator intervention to retrieve the last used couple data sets is not required at the next IPL.
126
As a new permanent device in a production or test configuration, and is expected in most cases to be an additional 9674 coupling facility. As a temporary alternate coupling facility while the primary coupling facility is being serviced. In this case, it is conceivable that the alternate coupling facility be a logical partition in one of the sysplex CPCs. Note that the latter will require CFR CHPIDs being dedicated to the coupling facility logical partition. This configuration is not a recommended production configuration since structure recovery can be seriously impacted if the CPC where the coupling facility and some MVS images cohabit were to fail.
Define the coupling facility via HCD. HCD must be used to define the logical partition for the coupling facility and to define connectivity between the coupling facility senders and receivers. Keep track of the partition number you specify for the logical partition to be able to match with the partition number in the CFRM policy. Specify the SIDE parameter for a CPC only in a physically partitioned configuration. The SIDE parameter is needed in order to define the coupling facility to one or the other physical side of the CPC. Use caution when splitting or merging physical sides of a processor that contains a coupling facility. The action might change the SIDE information that identifies the coupling facility.
Define the coupling facility logical partition If the coupling facility is in a ES/9000 LPAR, IOCDS must be reloaded from HCD to get the new LPAR information and to display the new partition on LPDEF at the next POR time. Fill in the LPDEF and LPCTL definition for the coupling facility LPAR. If the coupling facility is in a 9672 LPAR, configure the HMC environment with a Reset and Image profile. Then, download the HCD IOCDS to the coupling facility service element through a MVS running on a CPC on the same SE LAN.
The coupling facility CPC must go through a power on reset so that the new coupling facility logical partition is known. This is achieved by running CONFIG POR on a 9021/9121 CPC, and by activating the CPC via the proper reset profile on a 9672/9674 CPC. Activate the coupling facility partition.
Use the following command to get the information required to set up a new CFRM policy.
D CF,CFNAME=xxxx
Define a new CFRM Policy with the administrative data utility IXCMIAPU: Define a new policy with the new coupling facility information
127
Associate the structures to the coupling facility Define the amount of dump space in the coupling facility for dumping the coupling facility structure data
Activate the new policy in order to start using the coupling facility with the following command from any active system in the sysplex:
Verify that each MVS image that requires connectivity is connected to the coupling facility. To obtain information about the system connectivity for the coupling facility, issue the following command and specify the name of the coupling facility:
D XCF,CF,CFNAME=name
From now on, any exploiter can connect to the structure defined into the coupling facility.
SCOPE=GLOBAL LOCKTABL=name-of-the-structure
2. Specify the lock structure on the CFNAMES,CFIRLM= control statement in one of the following procedures:
The VSPEC member (DFSVSMxx) in the IMS procedure The DFSVSAMP DD statement in the DLIBATCH or DBBATCH procedures
3. Specify the IRLM parameters during system definition to connect IMS to IRLM:
IRLM=YES in the IMSCTRL macro IRLM=Y in the IMS, DBBBATCH or DLIBATCH procedure Note: This specification overrides the specification in the IMSCTRL macro.
4. Ensure that the correct CFRM policy has been started; then start the IRLMs which are to use this structure and the DBMS they are connected to.
128
sharelvl 3
3. For the command to start IMS, specify the following application access:
access = up
4. Specify the OSAM and VSAM buffer invalidate structures on the CFNAMES control statement in one of the following procedures:
The VSPEC member (DFSVSMxx) in the IMS procedure The DFSVSAMP DD statement in the DLIBATCH or DBBATCH procedures
5. Ensure that the VSAM share option is (3 3). 6. Ensure the correct CFRM policy has been started, then start the DBMS which are to use this structure.
IXCxxxxx
The remaining characters (xxxxx) can be alphanumeric or national characters (&, #, or @). 3. For each MVS in the sysplex, specify the names of the structures in the STRNAME keyword for the appropriate PATHIN and PATHOUT statements in COUPLExx of SYS1.PARMLIB. 4. Ensure that correct CFRM policy is active, then the new PATHINs and PATHOUTs can be dynamically started issuing from each MVS image:
SETXCF START,PATHIN!PATHOUT,STRNAME=str_name
Or, if required for other reasons, the sysplex can be re-IPLed.
ISTGENERIC
If you are using VTAM 4.3, the structure name can be user-defined, and must be in that case specified to VTAM by the start option STRGR (refer to VTAM 4.3 Network Implementation Guide ). 2. Customize the generic resources exit routine if needed.
129
3. If you are using RACF or another security management product, authorize all the CICS TORs to register the generic resources name. To authorize CICS TORs to access a VTAM generic resources, you must: Define a VTAMAPPL profile with the generic resources name as the VTAMAPPL name. Authorize each CICS TOR with READ access to the VTAMAPPL profile. 4. After activating the new CFRM policy, re-initialize VTAM on each system. 5. Specify the generic resource name as a system initialization parameter in the system initialization table (SIT) or as an override for each CICS TOR that is a member of the generic resources set. To activate, you must restart CICS on the system.
Sysplex-wide RACF commands operate whether you use RACF data sharing mode or not. Do not define more than one RACF data sharing group to the sysplex.
IRRXCF00_ayyy
Here a = P for primary, or B for backup and yyy = the RACF database sequence number. You require a structure for the RACF primary database and the RACF backup database. 4. Use the data set range table (ICHRRNG) to determine on which data sets the RACF profiles are to reside. 5. For the first RACF database that initializes RACF data sharing, set the sysplex communication bit and the default mode bit in ICHRDSNT to indicate data sharing. 6. IPL all systems in the sysplex to activate RACF data sharing mode, or use the RACF command RVARY DATASHARE on each system that is enabled for RACF sysplex data sharing, and ensure that the CFRM policy is active. 7. To control RACF data sharing dynamically after IPL, use the following RACF command:
RVARY DATASHARE|NODATASHARE
130
A set of DB2 target libraries A single DB2 catalog and directory All DB2 databases Log data sets and BSDS data sets that are to be shared A user integrated catalog facility Catalogs for shared databases All coupling facilities
3. Define the following structures for use with the DB2 data sharing:
Cache structure for the total number of DB2 group buffer pools. (The group buffer pool consists of data pages and directory entries.) Lock structure for the total number of DB2 group members List structure for the DB2 SCA.
For the LOGREC log stream, the name is SYSPLEX.LOGREC.ALLRECS For the operations log stream, the name is SYSPLEX.OPERLOG
2. Use the IXCL1DSU utility to format the LOGR couple data set. 3. Specify the name of the LOGR couple data set in COUPLExx. 4. Use the IXCMIAPU utility to define the log streams and structures to the coupling facility. 5. Plan to use SMS-managed DASD for staging log stream data.
Determine the size and number of the coupling facility cache structures depending on the following requirements: Number of available CF facilities Amount of space available in each CF facility Amount of data that will be accessed through each CF facility
131
Continuous availability requirements for CF reconfiguration Performance requirements for various applications
Determine the size of the coupling facility lock structure IGWLOCK00. Update the SMS configuration and the ACS routine to reflect the SMSVSAM environment and to map the storage class to a cache set specified in the base configuration. Define the Sharing control data sets (SHCDS) to maintain data integrity in a shared environment. Consider converting the RESERVEs for sharing control data sets. Define the cache and the lock structure using a CFRM Policy. Update the IGDSMSxx member. Update the CICS procedure for using VSAM RLS.
For 9672 or 9674 processor family: The physical upgrade can be concurrent if there are free slots in a sufficient number, in the already plugged CFC adapter cards (fc 0014), to concurrently
132
receive link cards (fc 0007 or 0008). If this is not the case, physically adding adapter cards cannot be done concurrently.
9021 processor family: The physical upgrade can be concurrent to the coupling facility operations if there are sufficient CFC slots available in both CPC sides altogether, that do not require plugging additional adapter card(s). The adapter cards cannot be plugged concurrently.
Proper consideration must be given to the increase in HSA size that could be incurred because of additional hardware such as CFRs.
133
D XCF,STR,STRNAME=ALL,STATUS=(FPCONN) D XCF,STR,STRNAME=ALL,STATUS=(NOCONN)
If a failed-persistent connector exists or if a structure with no connector exists you should determine whether this is a normal state for the connector/structure to be in and you should know what to do to resolve these conditions prior to going on with coupling facility shutdown procedure.
134
D CF,CFNAME=cfname
IXL150I 08.23.52 DISPLAY CF 160 COUPLING FACILITY 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 00 CONTROL UNIT ID: FFFE NAMED CF01 COUPLING FACILITY SPACE UTILIZATION ALLOCATED SPACE DUMP SPACE UTILIZATION STRUCTURES: 18944 K STRUCTURE DUMP TABLES: 0 K DUMP SPACE: 2048 K TABLE COUNT: 0 FREE SPACE: 201216 K FREE DUMP SPACE: 2048 K TOTAL SPACE: 222208 K TOTAL DUMP SPACE: 2048 K MAX REQUESTED DUMP SPACE: 0 K VOLATILE: NO STORAGE INCREMENT SIZE: 256 K CFLEVEL: 1 COUPLING FACILITY SPACE CONFIGURATION IN USE FREE CONTROL SPACE: 20992 K 201216 K NON-CONTROL SPACE: 0 K 0 K SENDER PATH 70 72 PHYSICAL ONLINE ONLINE LOGICAL ONLINE ONLINE SUBCHANNEL 1696 1697 1698 1699
TOTAL 222208 K 0 K
In our example CHPID 70 and 72 are the sender ISC links used by this MVS image to connect the coupling facility named as CF01. Once the new policy has taken effect, the D CF,CFNAME=cfname command cannot be used to display information about the target coupling facility because the coupling facility will no longer be in the active CFRM policy. However, the D CF command can still be used to see the physical connections to the target coupling facility. Since D CF displays information on all the coupling facilities, it is necessary to know the NODE, PARTITION, and CPCID to identify the target coupling facility. 3. Activate the new policy with the following command:
SETXCF START,POLICY,TYPE=CFRM,POLNAME=newpolicyname
4. Get the names of all the allocated structures in the target CF by issuing the following MVS command:
D XCF,CF,CFNAME=cfname
If no structures are allocated, you will receive message IXC362I with one of the following statements:
135
NO COUPLING FACILITIES MATCH THE SPECIFIED CRITERIA or NO STRUCTURES ARE IN USE BY THIS SYSPLEX IN THIS COUPLING FACILITY
5. Depending on the subsystems connected to the structure, you will be required to follow different procedures. Not all subsystems can support structure rebuilding. For instance, DB2, RACF and JES2 will require particular actions. For these subsystems, follow the recommended procedure as described in 5.11.1, Coupling Facility Exploiter Considerations on page 138. 6. Move the remaining structures out of the target CF by attempting a rebuild of the structures. You can initiate a rebuild by either moving all the structures or by initiating the rebuild process for each structure individually. So you can use one of the following commands:
SETXCF START,REBUILD,STRNAME=strname,LOC=OTHER
If a structure does not support rebuild, an IXC message will inform you. You can expect that structure rebuild may take several minutes to complete. 7. Once that the previous command has been completed, check that no more structures are still allocated in the target coupling facility with the following command:
D XCF,STR,STRNAME=strname
There are some situations that prevent the deallocation of structures as part of the structure rebuild process:
The function that owns the structure does not support rebuild. In this case, it may be necessary to bring down the application to deallocate the structure.
136
The structure has no connectors . A structure cannot be rebuilt without an active connector. If the function associated with the structure supports structure rebuild, initialize the function to obtain a connector to the structure and attempt a rebuild by issuing the following:
SETXCF START,REBUILD,STRNAME=strname,LOCATION=OTHER
The structure has failed-persistent connector(s). Generally, a structure with failed-persistent connector(s) should have recovery actions invoked for the connectors prior to the deallocation of the structure. Whether or not the recovery of failed-persistent connectors is mandatory depends on the program that owns the connection/structure. The existence of failed-persistent connectors may prevent a program from rebuilding or deallocating the structure. This may require re-initialization of the function associated with the failed-persistent connector to recover the connection. When all of the failed-persistent connections have been recovered, the rebuild (or other method of deallocation) should be retried.
The following commands can be used to clean-up resources related to structures in the target coupling facility. You have to use them carefully because forcing the deletion of a structure may cause a loss of data. So, you should not force deletion of a structure or of a failed-persistent connector unless you understand its use in the sysplex and the impact of the force operation.
For structures with no connectors to the structure, force the structure by issuing the following command:
SETXCF FORCE,STR,STRNAME=strname
If there are only failed-persistent connectors, force the failed-persistent connectors and then force the structure.
On each system, configure all CHPIDs to the target coupling facility offline with the following command:
CONFIG CHP(xx,yy),OFFLINE
MVS will refuse to vary offline the last path to a coupling facility that contains one or more structures in use by an active XES connection. Ensure that all the structures and connectors have been removed from the target coupling facility. If for some reason the CHPID would have to be varied offline anyway, this can be achieved by executing the following:
CONFIG CHP(xx),OFFLINE,FORCE
Configuring ISC links offline is optional. It can be considered a clean way to quiesce the coupling facility. If you do not take the CHIPIDs offline, error messages and recording of link failure can occur. The messages and logouts can be ignored.
Issue the following command on all systems connected to the target coupling facility:
D CF
Verify that all sender paths were taken offline.
137
Power off the target coupling facility. When the maintenance procedure is complete, bring the coupling facility back into service by restoring the original coupling facility policy, and repeating the above actions in reverse. Add instead of delete the target coupling facility, and move structures into it instead of out of it.
JES2 Checkpoint: JES2 does not allow structure rebuild. To move the JES2 checkpoint either to another CF, or to DASD, you have to invoke the reconfiguration dialog. To do this, you must have a NEWCKPTx defined in the JES2 CKPTDEF statement, either in the JES2 parms or defined dynamically via the $TCKPTDEF command. Then, switch to this alternate checkpoint via the $TCKPTDEF,RECONFIG=YES command and follow the reconfig dialog prompts. When done, issue a display command to verify that the old structure isn t being used by JES2: $DCKPTDEF. The old checkpoint structure will remain allocated. Force if off via the following command:
SETXCF FORCE,STR,STRNAME=strname
Remember to update your installation s JES2 init parms to point to this new checkpoint so you ll find it on your next JES2 warmstart. Remember that data can be transferred to and from a coupling facility much faster than to and from DASD, so you should plan to return the checkpoint data set to a coupling facility as soon as possible. When the checkpoint data set is located on DASD, JES2 uses a hardware reserve to ensure data integrity among all members. This affects I/O performance on the checkpoint pack.
System Logger: The current exploiters for the system logger are:
System logger fully supports the rebuild function and it is the recommended procedure to move its structures to another coupling facility. In case of failure during a rebuild, you will get some IXG messages, for instance IXG106I, IXG107I, etc., and some actions can be required to recover the operations. OPERLOG will automatically switch hardcopy logging to SYSLOG if the syslog data set is initialized by JES. If unable to switch to SYSLOG an attempt will be made to send hardcopy to a printer console. If this fails, the hardcopy will be lost. LOGREC will buffer logrec entries to a point, then it discards new entries until the logger is operational.
Shared Tape: During an off-peak maintenance window, IEFAUTOS is a structure that you can potentially live without while coupling facility maintenance is being performed.
138
VTAM: VTAM Generic Resource structure can be moved using the REBUILD function. When the VTAM generic resource structure becomes unavailable, no new sessions to generic resources will be allowed. This includes sessions to both the generic and real name. Existing sessions should not be affected. During the period required for the rebuild, sessions are not rejected. They are queued and processed after the rebuild completes.
If VTAM is down, a failed-persistent connector will remain associated with the structure. Forcing the failed-persistent connector will result in the deallocation of the structure and the loss of persistent affinities. Failed-persistent connectors to the structure must not be forced if LU 6.1 or LU 6.2 sync level 2 is being used by the applications or subsystems.
IRLM: Deallocating the IRLM lock structure should be done by REBUILDing it to the alternate CF.
IRLM always disconnects from the lock structure abnormally (IXLDISC with REASON=FAILURE). This leaves the IRLM lock structure connections in a failed-persistent state. It is safe to force failed persistent connectors of a lock structure when there are no retained locks on any of the DBMSs identified to any IRLM in the data sharing group that is using that lock structure. Alternate procedure for deallocating the IRLM lock structure is as follows: 1. Use IRLM status command to see if any of the DBMSs that were previously in the data sharing group have retained locks:
F irlm_name,STATUS,ALLD
The IRLM has to be connected to the data sharing group in order to get information on all DBMSs that are in the group. IRLM will connect to the group as soon as a DBMS (IMS or DB2) identifies to it. If a DBMS has retained locks, restart that DBMS so it can recover and cleanup the retained locks. 2. Provide NORMAL shutdown of all DBMSs identified to IRLM. IRLM will disconnect from the data sharing group. If the subsystem does not shut down normally retained locks may exist and the subsystem must be restarted until a normal shut-down occurs. 3. Stop IRLM. 4. Force IRLM failed-persistent connectors. 5. Force the IRLM lock structure.
RACF Database Cache: RACF uses the coupling facility as a large sysplex-wide store-through cache for the RACF database to reduce contention and I/O to DASD.
RACF uses a new serialization protocol to replace reserve/release when the coupling facility is in use. The new protocol protects the data, but without the disadvantages of reserve/release. So, in case of coupling facility shutdown RACF could also operate in non-datasharing mode in a parallel sysplex. However, there could be a
Chapter 5. Coupling Facility Changes
139
performance impact to the installation if the I/O activity rate against the RACF database is high. To deallocate the structure, take RACF out of data sharing mode via the RVARY NODATASHARE command. Once RACF is out of data sharing mode the structure will be deallocated. You can stay like this until the coupling facility maintenance is completed. Then when the CF is back online, re-enable RACF data sharing mode via RVARY DATASHARE.
XCF Signalling: If the target coupling facility has an XCF signalling structure, there must be a full set of redundant signalling paths even if the XCF signalling structure is going to be rebuilt in the alternate coupling facility. If for example, you do not have alternate signalling structure or CTCs for the sysplex, you will lose system connectivity during the rebuild. The loss of connectivity will lengthen the time it takes to rebuild the signalling structure and may result in XCF timeouts, especially if multiple structures are being rebuilt at the same time.
If redundant signalling paths cannot be made available, XCF s failure detection interval, COUPLExx(INTERVAL), and GRS s toleration interval, GRSCNFxx(TOLINT), should be increased to prevent timeouts. This is particularly important if you have an active SFM policy with ISOLATETIME versus PROMPT specified. Furthermore, the XCF signalling structure should be rebuilt separately from other structures. When sufficient redundant XCF signalling capacity exists to allow for temporarily shutting down signalling through the target coupling facility, you can decide not to rebuild the target coupling facility s XCF signalling structure in the alternate coupling facility. In this case, you can simply stop XCF signalling by issuing the following commands on each system connected to the target coupling facility:
If maintenance is required on the coupling facility, different methods are required depending on the structure type we should move out of the coupling facility. To deallocate the SCA structure, the REBUILD function is supported and it is the recommended way to move the SCA into an alternate coupling facility. The easiest way to deallocate a GBP structure is stopping all the DB2s in the data sharing group. Once the DB2s will be stopped, the GBP will automatically
140
be deallocated. There is no REBUILD support for this type of structure. Further details on GBP deallocation can be found at 9.4.5, To Manually Deallocate and Reallocate a Group Buffer Pool on page 190. During the REBUILD function, consider the following items: You should plan enough spare capacity on the alternate coupling facility to absorb the work from the target coupling facility. During the planning session consider how to spread the coupling facility structures across the coupling facilities in case of recovery to obtain optimal performance. For instance, mixing the Lock structure and a highly accessed GBP in the same coupling facility can significantly degrade performance. Rebuilding a structure with very high access frequencies can be very disruptive to the workloads. For instance, while the rebuild of the IRLM lock structure is in progress, all IRLM requests are queued. So, you may receive more timeouts, and in the extreme case, IRLM wait queues may build up to an unmanageable level, and IRLM may lead to a halt.
If maintenance is required on the coupling facility, SMSVSAM supports the REBUILD function against both structures types and it is the recommended way to move them into an alternate coupling facility. During the REBUILD function, consider the following items: You should plan absorb the work consider how to facilities in case enough spare capacity on the alternate coupling facility to from the target coupling facility. During the planning session spread the coupling facility structures across the coupling of recovery to obtain optimal performance.
Rebuilding a structure with very high access frequencies can be very disruptive to the workloads.
XCF signalling Ensure you have sufficient CTCs so that sysplex communication can continue without the coupling facility signalling structures
141
RACF
Take RACF out of data sharing mode before deactivating the coupling facility.
Once you have the JES2 checkpoint out of the single coupling facility, and have ensured you have enough CTCs to keep the sysplex alive without the coupling facility, you should initiate an orderly shutdown of any subsystems still using that coupling facility (for instance IMS, IRLM, DB2, CICS). Force off any structures still allocated on the coupling facility. If you are using ICMF (Integrated Coupling Migration Facility), you should use the same procedure to rebuild structures into an alternate coupling facility. However, do not configure the coupling facility CHPIDs offline in this case. When all structures in the target coupling facility have been deallocated, the MVS images that exist on the same hardware image as the ICMF should be shut down. The hardware image can then be powered-off.
The sysplex is granted ownership of the coupling facility Structure exploiters are made aware of the availability of the coupling facility By the EVENT exit for exploiters that currently have active connections to structures. By ENF service (event code 35) for exploiters not currently having an active connection to any structure.
This makes exploiters only aware of a change in resource availability. As far as IBM exploiters are concerned, there are no spontaneous structure movements initiated because of this event. The new coupling facility will be used eventually for a new allocation of a structure or for a structure rebuild, if any.
142
6.1 Processors
Here is a description on how to add, remove and maintain a processor in a nondisruptive mode.
143
144
145
For parallel-attached devices you run the risk of power surges if you connect channel cables to devices when they are running, so you should power down any two devices you are connecting together. 3. Define the new device to the processors. You can run HCD on any system in the sysplex to modify the IODF to include the new I/O device, and then use the sysplex-wide activate function (MVS 5.2 only) to activate it in all the systems in the sysplex.
146
IMS subsystems include IMS online (DC and DBCTL) systems, IMS DLI/DBB batch jobs, CICS-DL/I systems, the IMS database utilities (image copy, change accum, recovery, unload, reload, prefix update, etc.), and the logging utilities (log recovery and archive).
Chapter 6. H a r d w a r e Changes
147
the old subsystems. In addition, if the IMS subsystems that were running prior to the time change continued execution past midnight (old time), then they could reset their internal clocks back an hour and cause negative time breaks in the log. Almost all of the record types in the RECON data sets have time stamps. Some of these time stamps are provided by the IMS modules that invoke DBRC and some of these time stamps are obtained by the DBRC code (via the TIME macro). DBRC assumes that time never goes backwards and is coded with that basic assumption. Therefore, the user has no option other than to terminate all IMS subsystems prior to resetting the MVS clock and not run any new IMS subsystems for an hour.
148
Allocate specific LOGREC, PAGE, STGINDEX and SMF data sets for this MVS image. See Appendix A, Sample Parallel Sysplex MVS Image Members on page 221 for JCL example. Check SYS0.IPLPARM(LOADxx) for the IEASYMxx in use. Check SYS1.PROCLIB(JES2) for the names of JES2 clone members. Modify SYS1.PARMLIB as follows: IEASYMxx add SYSDEF for the new system. COUPLE00 add pathin/pathout for XCF signalling paths. J2G add name of the new system to the JES2 global member. J2Lxx add the specific JES2 member for the new system.
Modify ISMF if needed. Using SDSF check active SCDS. Check that the groupname is the same as the sysplexname. Check ACS-routines for any system-specific code.
Check the HSM startup procedure. Activate the new SMS configuration and verify that SMS is OK. Start new XCF pathins and pathouts on all systems. /RO *ALL,SETXCF START,PI,DEV=(4230,4238) /RO *ALL,SETXCF START,PO,DEV=(5230,5238)
Create new VTAMLST members for the new system. ATCSTRxx ATCCONxx APNJExx APCICxx CDRMxx, also modify all the other CDRM members to include the new system MPCxx TRLxx, also modify the TRL members for network nodes to include the new system
149
Vary the VTAM CTCs online in all other network node machines. IPL the new system. Review the ARM policy.
In order for the addition of a JES3 local to be nondisruptive to the existing JES3 complex, the initialization deck must already have a definition for a new main included. It is not possible to add a new main definition (MAINPROC statement) without a JES3 complex-wide warm-start. However, it is possible to change the name of a previously defined main without a complex-wide warm-start.
Check maintenance levels. It is only possible to run multiple JES3 complexes within a parallel sysplex when the appropriate JES3 and JESXCF maintenance is installed. The required PTFs are UW19140 and UW19148.
Define a new XCF group name. The JES3 group name, defined to XCF through the JES3 initialization deck, is the distinguishing attribute that separates one JES3 complex from another with the sysplex. For more information on specifying the XCF group name for XCF, refer to on page 91.
Define a new command prefix. JES3 makes use of the MVS Command Prefix Facility (CPF). For more information on how to specify the JES3 command prefix, refer to 2.18.4.2, JES3PLEX < SYSPLEX on page 91.
Allocate unshared JES3 data sets. The new JES3 complex requires some unique data sets: JES3 Checkpoint Provide two for increased availability. JES3 JCT JES3 Spool JES3 Initialization Stream
It is possible to share the following data sets with another JES3 complex within the sysplex:
150
Check the JES3 proc and PARMLIB COMMNDxx member As described in 2.18.4.2, JES3PLEX < SYSPLEX on page 91, it is possible to share the JES3 proc between different JES3 complexes within the same sysplex. This requires a change to the START JES3 command issued out of the PARMLIB command (COMMNDxx) member. For example, the START command for the new system being added might look like:
S JES3,JES=JES9,ID=09,SUB=MSTR
Figure 32. START Command When Adding a New JES3 Global
VOLSER = SYSRESA SMP/E target zone is TGTRESA Target SMP/E data set high-level qualifier is SMP.RESA.**
The process to create the new SYSRES, called SYSRESB, would be as follows: 1. Initialize the new volume. 2. Copy the all the data sets from SYSRESA to SYSRESB, excluding the VTOC, VVDS and SMP/E data sets, and do not catalog. 3. Copy the SMP/E target data sets to SYSRESB, rename to SMP.RESB.** and catalog. 4. SMP/E ZONEDIT the target zone on SYSRESB to:
Change the SMP/E target zone name from TGTRESA to TGTRESB Change the VOLUME entry in the DDDEFs from SYSRESA to SYSRESB Change the DATASET entry in the DDDEFs for the SMP/E data sets from SMP.RESA.** to SMP.RESB.**
151
//DSFINIT JOB (999,POK), INITIALIZE VOLUME , // MSGCLASS=X,NOTIFY=&SYSUID //INIT1 EXEC PGM=ICKDSF //SYSPRINT DD SYSOUT=* //SYSIN DD * INIT UNITADDRESS(FD0) DEVTYP(3390) VOLID(SYSRESB) VTOC(1113,0,90) INDEX(1110,0,45) PURGE NOVERIFY /*
Figure 33. Volume Initialization. Initialize volume and name it SYSRESB
//DSSCOPY JOB (999,POK), COPY RES PACK , MSGCLASS=X,NOTIFY=&SYSUID //STEP1 EXEC PGM=ADRDSSU,REGION=0M //SYSPRINT DD SYSOUT=* //RESIN DD UNIT=3390,VOL=SER=SYSRESA,DISP=SHR //RESOUT DD UNIT=3390,VOL=SER=SYSRESB,DISP=SHR //SYSIN DD * COPY INDD(RESIN) OUTDD(RESOUT) DS(EXCLUDE(SYS1.VTOCIX.* SYS1.VVDS.* SMP.** )) TOLERATE(ENQF) SHARE ALLEXCP ALLDATA(*) CANCELERROR COPY INDD(RESIN) OUTDD(RESOUT) DS(INCLUDE(SMP.RESA.**)) RENAMEU(SMP.RESA.**,SMP.RESB.**) CATALOG TOLERATE(ENQF) SHARE ALLEXCP ALLDATA(*) CANCELERROR /*
Figure 34. Copy SYSRESA. Job copies data sets excluding VTOC, VVDS and SMP/E, and then copies, renames and catalogs the SMP/E data sets.
152
//SMPZEDIT JOB (999,POK), ZONE EDIT , MSGCLASS=X,NOTIFY=&SYSUID, // TIME=1440,TYPRUN=HOLD //SMP EXEC PGM=GIMSMP,TIME=1440,REGION=0M //SMPCSI DD DISP=SHR,DSN=SMP.GLOBAL.CSI //SYSPRINT DD SYSOUT=* //SMPRPT DD SYSOUT=* //SMPOUT DD SYSOUT=* //SMPCNTL DD * SET BDY(GLOBAL) . UCLIN. DEL GLOBALZONE ZONEINDEX((TGTRESB)) . ENDUCL. ZONERENAME(TGTRESA) TO(TGTRESB) OPTIONS(OPTMVST) RELATED(DLIB001) NEWDATASET(SMP.RESB.CSI) . SET BDY (TGTM02C) . ZONEEDIT DDDEF. CHANGE VOLUME(SYSRESA,SYSRESB) . ENDZONEEDIT . UCLIN. REP DDDEF(SMPLTS) DATASET(SMP.RESB.LTS) . REP DDDEF(SMPMTS) DATASET(SMP.RESB.MTS) . REP DDDEF(SMPSCDS) DATASET(SMP.RESB.SCDS) . REP DDDEF(SMPSTS) DATASET(SMP.RESB.STS) . ENDUCL. /*
Figure 35. SMP/E ZONEEDIT. Job renames target zone to TGTRESB, changes all DDDEF volumes to SYSRESB and changes the SMP/E target data set DDDEFs to SMP.RESB.**.
//IPLTEXT JOB (999,POK), IPL TEXT , MSGCLASS=T,NOTIFY=&SYSUID, // TYPRUN=HOLD //IPLTEXT PROC VOL=,UNIT=3390 //DSF EXEC PGM=ICKDSF,REGION=1M //SYSPRINT DD SYSOUT=* //IPLVOL DD DISP=SHR,VOL=SER=&VOL,UNIT=&UNIT //IPLTEXT DD DSN=SYS1.SAMPLIB(IPLRECS), // DISP=SHR,UNIT=&UNIT,VOL=SER=&VOL // DD DSN=SYS1.SAMPLIB(IEAIPL00), // DISP=SHR,UNIT=&UNIT,VOL=SER=&VOL // PEND //STEP1 EXEC IPLTEXT,VOL=SYSRESB,UNIT=3390 //SYSIN DD * REFORMAT DDNAME(IPLVOL) IPLDD(IPLTEXT) NOVERIFY BOOTSTRAP /*
Figure 36. Add IPL Text. Job creates and places IPL text on SYSRESB.
153
To introduce system software changes into the sysplex, such as maintenance, new products or a product upgrade, the following process is:
Apply the change to SYSRESA. This will have no effect on the sysplex as all images are IPLed from SYSRESB. Clone a new SYSRES volume, SYSRESC from SYSRESA using the procedure described in 7.2, Adding a New SYSRES on page 151. IPL one image in the sysplex from SYSRESC.
154
Figure 38. Introducing a New Software Level into the parallel sysplex
The result of this is that for a period of time, the images within the sysplex are at N and N+1 levels. Should the N+1 level, in this example SYSRESC, cause a problem then the N level is still available to fall back to. By employing the ripple IPL technique any potential problem is limited to one image initially, thereby reducing the impact on the whole sysplex. It can be seen therefore that the minimum number of SYSRESs required for this technique is three, one to act as the medium for change, and two to be the N and N+1 levels. An installation may need more than three SYSRESs. For example in an eight image sysplex there may be four images sharing one SYSRES and four sharing another. Either way there will need to be at least two other SYSRESs available to facilitate the processes of introducing change with minimum disruption as described. This basic philosophy of rippling a change through the parallel sysplex can be employed to propagate subsystem changes as well as for system software changes through the sysplex. The manner in which the changes are implemented for a specific subsystem may differ from that of system software and are discussed further in 7.6, Changing Subsystems on page 160.
155
7.4.1 CICS
Actions required to add a new CICS subsystem may vary depending on the type of CICS region we are introducing on the MVS image. Therefore, we will point out what activities will be specific for a TOR (Terminal Owing Region) and for an AOR (Application Owing Region). Most of the definitions should already be in place because we are only adding a cloned CICS region. However, before activating a new CICS region, you should execute or verify the following activities:
Verify that the new CICS data sets are protected either using RACF or an equivalent external security manager. Verify the MVS definitions depending on the function being used by the new CICS regions. Verify the existing subsystem definition in IEFSSNxx. Verify the existing entry for CICS in SCHEDxx. Verify the existing SMSVSAM server definitions (only for an AOR region). If you are going to use MVS workload management with CICS, you should set up appropriate MVS definitions, and ensure that CICS performance parameters match the current definitions. If you want to use the MVS Automatic Restart Manager (ARM) facility to handle the new CICS, you should verify the following: - Check that ARM is active on the MVS image. - Ensure that the MVS images available for ARM have access to the databases, logs, and program libraries required for the workload. - Ensure that the CICS startup JCL used to restart CICS regions is suitable for MVS ARM. - Ensure that the system initialization parameter XRF=NO is specified for CICS startup. - Specify appropriate CICS START options. - Define ARM policies for the new CICS region.
Set up all definitions required to enable the logging function. If you are going to add either a TOR belonging to an existing and active application or a cloned AOR, you will not be required to make all logging definitions. This new CICS region will join the existing environment and will use predefined structures. Therefore, you should only plan the following activities: Verify the existing coupling facility structure for log data to check whether it will be enough storage to support the increasing activity rate. If necessary, you can expand the coupling facility structure size, if previously planned, or change it through a new CFRM policy. If log data duplexing is required, plan for the staging data set allocation Update the LOGR data set: define the new CICS unique log streams that are implemented as two MVS system log streams (primary and secondary). Activate the new LOGR definition. Verify the archiving procedures.
156
Set up all security definitions for the logging function. The CICS region user ID must be authorized to write to (and create if necessary) the log streams that are used for its system log and general logs. If the setup of your installation allows for several CICS regions to share the same CICS region user ID, you can make profiles more generic by specifying an * for the APPLID qualifier. If this were done, then most of the definitions should already exist.
If you intend using VTAM with CICS you must define to VTAM each CICS region that is to use VTAM. You must also ensure that any VTAM terminal definitions are properly specified for connection to CICS (only for a TOR region). To define your CICS regions to VTAM, you must: Define VTAM application program major nodes (APPL) Issue a VARY ACT command to activate the APPL definition
Allocate the data sets unique to the new CICS region. Verify or customize the DL/1 interface (only for AOR region). Verify or customize the DB2 support (only for AOR region). Verify the MRO and ISC support. Define the new CICS region to the CP/SM environment.
Define the IMS system parameters that are unique to this IMS instance (for example, the IMSID). Create the data sets required to support the new IMS. Update the IEFSSNxx member of SYS1.PARMLIB, to define the new subsystem to MVS. The definition can be activated via the SETSSI command. Define the ARM policy for the new IMS. Verify that the coupling facility structure sizes are large enough to accommodate the addition of the subsystem. Create the IRLM procedure. Use ETO to make changes to the terminal definitions.
IMS/ESA V5 Administration Guide: System IMS/ESA V5 Installation Voulme 1 IMS/ESA V5 Installation Voulme 2
157
7.4.3 DB2
In this section we discuss how to add a new member in a DB2 data sharing group. DB2 data sharing is the only way to satisfy the requirements of applications that need very high levels of availability. With data sharing, you can run applications on many DB2 subsystems and access the same shared data. If one system must come down, either for planned maintenance or because of a failure, the work can be rerouted to another DB2 subsystem with no perceived outage to end users. In the same way, you are able to add a new subsystem to support increased workload demand. DB2 subsystems that share data must belong to a DB2 data sharing group. A data sharing group is a collection of one or more DB2 subsystems accessing shared DB2 data. Each DB2 subsystem belonging to a particular data sharing group is a member of that group. All members of the group use the same shared DB2 catalog and directory. Changes that occur during scheduled maintenance can be done on one DB2 at a time. If a DB2 or MVS must come down for the change to take place, and the outage is unacceptable to users, you can move those users onto another DB2. Most changes can be made on one DB2 at a time, as shown in Table 9, with no application disruption.
Table 9. DB2 Changes
Type of change DB2 Code Attachment Code System parameters Action required Bring down and restart each DB2 member independently. Apply the change and restart the transaction manager or application. For those that cannot be changed dynamically, update using DB2 s update process. Stop and restart the DB2 to activate the updated parameter.
Adding a new member to the installation should be considered as a new installation. You cannot take an independently existing DB2 subsystem and merge it into the group. The new members begin using the DB2 catalog of the originating member. The following list shows the actions required to add a new DB2 data sharing member: 1. Update IEFSSNxx with the subsystem definition and activate the changes through the following MVS command:
T SSN=xx
2. On panel DSNTIPA1, specify:
158
when you install a new member name. We suggest that you choose a new name for prefix .NEW.SDSNSAMP data set on the installation panel DSNTIPT. 5. Define the system data sets BSDS and active log data sets. 6. Initialize System data sets. 7. Define DB2 initialization statements. 8. Optionally:
Record DB2 to SMF. Establish Security. Connect IMS to DB2. Connect CICS to DB2.
9. IPL the MVS image if you are using a multi-character command prefix. 10. Start the DB2 subsystem. 11. Define the temporary work files.
7.4.4 TSO
Adding a TSO application requires the following actions:
Verify the startup procedure in a PROCLIB library. Check the contents of IKJTSOxx member in Parmlib. Define the APPL to VTAM. There is no workload balancing for the TSO sessions. A future release will support session balancing through the usage of VTAM generic resources combined with WLM.
7.5.1 CICS
There is no difference between starting a CICS region in a traditional world or in a parallel sysplex. You can either start your CICS region as a started task or as a batch job. Even if you are using CICS V5 in data sharing mode, no further user actions are required to open the connection to the SMSVSAM server. The CICS interface with SMSVSAM is through a control ACB and CICS registers with this ACB to open the connection. CICS registers automatically during the initialization process.
159
7.5.2 DB2
There is a new process available in DB2 V4 called group restart, which is needed only in the rare event that critical resources in a coupling facility are lost and cannot be rebuilt. When this happens, all members of the group terminate abnormally. Group restart is required to rebuild this lost information from individual member logs. However, unlike data recovery, this information can be applied in any order. Because there is no need to merge log records, many of the restart phases for individual members can be done in parallel. An automated procedure can be used to start all members of the group. If a particular DB2 is not started, then one of the started DB2s performs a group restart on behalf of that stopped DB2.
7.5.3 IMS
A number of operator enhancements were made to IMS 5.1 to assist in the management of databases. Commands with the GLOBAL parameter globally affect data. If, for example, you enter the /START command with the GLOBAL parameter on one subsystem and specify several database names, then the IRLM transmits the command to other sharing subsystems. It deletes the names of any databases that are invalid for the local system before it transmits the command, and all sharing subsystems process the command. In these online data sharing systems, you observe the following messages:
DFS3334I GLOBAL START COMMAND seqno INITIATED BY SUBSYSTEM ssid FOR THE FOLLOWING DATABASES DFS3328I GLOBAL START COMMAND seqno COMPLETE
The variable seqno is a reply sequence number uniquely identifying the command and associating this message with the completion message that follows. The variable ssid is the name of the system originating the command. The message includes the database names that were in your command. If you omit the GLOBAL parameter (or specify LOCAL), the command applies only to the local online system and does not affect access by any other IMS subsystem.
160
The method for implementing changes to the subsystems will differ from that of system software in that there is no residence volume for the subsystems. The simplest way to make changes to the subsystem software is by the use of a STEPLIB statement in the initialization JCL. For example, for the subsystems on the system where you choose to do your changes, you might have a STEPLIB concatenation such as:
Prior to any change the first data set in the concatenation is empty. The contents of SUBSYS.PROD.RESLIB are copied to SUBSYS.TEST.RESLIB and the changes applied to this library. The subsystem is closed down and restarted. This subsystem will access modules from the newly updated SUBSYS.TEST.RESLIB. Provided there are no problems encountered running from this library over a suitable period of time, the TEST library can be renamed to PROD and the remaining subsystems started from this new level. The original PROD level would become a backup library. Should problems occur after the initial change, then fallback to the PROD level for that subsystem is straight forward.
7.7.1 CICS
Different considerations apply depending if the target region to be closed is a TOR or an AOR. Figure 39 on page 162 illustrates an high availability configuration with multiple front end CICSs distributing the incoming workload on multiple AORs. This kind of configuration is able to balance the sessions using the VTAM generic resource feature. The generic resource name is normally shared by a number of CICS TORs. A VTAM application such as CICS can be known by a generic resource name in addition to its own VTAM application program name (APPLID). Both of the names are defined in the VTAM APPL definition statement for the CICS TOR, and VTAM keeps a list of the APPLIDs that are members of the same generic resource name set. For this reason, redistributing the new sessions is done automatically by VTAM/GR towards the online TORs suitable to the generic resource name. Terminals connected to the outgoing TOR have to re-logon to the generic resource and re-signon to CICS.
161
In workload balancing scenarios, you would typically keep the AORs as similar as possible. The ideal AOR for a parallel sysplex is one that is capable of running any transaction. If all your AORs are identical, then the dynamic routing program has great flexibility in making routing decisions, and workload balancing is most effective. This will allow you, as shown in Figure 40 on page 163, to shut down an AOR region without any impact to operations. CICSPlex SM will automatically redirect the new transactions to the remaining AORs.
162
7.7.2 IMS
In order to move workload from one IMS instance to another will require a short outage to each terminal connected to the IMS that is being stopped. NetView automation can be used to bring down the terminal sessions and re-establish these sessions on another IMS. Once all sessions have been moved, IMS can then be stopped. Another approach would be to perform a shutdown of IMS and restart IMS in another MVS image. This method requires spare capacity in the MVS image to accommodate the addition of the moved IMS. This process takes longer than moving sessions and is hence more disruptive. Which procedure a customer uses, will depend upon how critical the disruption of service is for their business. Regardless of how the work is moved, VTAM routes must exist from each terminal to the new IMS and must not all traverse through a single VTAM node. Additional information can be found in the IMS/ESA V5 Operations Guide and the IMS/ESA V5 Sample Operating Procedures .
7.7.3 DB2
DB2 workload is entering the system in a traditional way. In the following sections we will explore what actions are required to move workload either coming from transaction managers or from batch/TSO.
163
7.7.4 TSO
There is no automated way to move the TSO sessions from a quiescing TSO application to another TSO. The only way to redrive the logon is to logoff the user and re-logon to another system. While you are quiescing the TSO application, you can prevent new users from logging on via the command F TSO,USERMAX=0. Currently, TSO does not support the VTAM/GR facility. TSO user must specifically access a new TSO application to redrive their logon.
7.7.5 Batch
You can stop new jobs from being runon a system by stopping all initiators. You will then have to wait until all running jobs have completed. Alternatively you can cancel the jobs and resubmit them on another system. This assumes that they are restartable, and it may involve a lot of work to back out updates made by the job before it abended, so it is an alternative you must choose with care. A better way to transfer batch work in a planned way is to let your job scheduling system handle it for you.
164
7.7.6 DFSMS
Before removing an SMS element from the sysplex, you should verify that no specific activities or affinities belong to this system. Here is a list of potential activities that need to be redirected to another system:
Storage Group affinity You should verify that there are no Storage Groups where new allocations are allowed only from the outgoing system. In this case you have to review the Storage Group attributes, as described in the following example, and activate a new SMS configuration to be able to do new allocations on these Storage Groups.
Full access enabled by SMS Not connected New allocation disabled by SMS Job access disabled by SMS New job access disabled by SMS Job access disabled by SMS
DFSMShsm activities In a multiple-processor environment, you define one DFSMShsm processor as the primary DFSMShsm processor. The primary processor automatically performs the primary processor functions of backup and dump. Primary processor functions are functions not related to one data set or volume. The following functions are performed only by the primary processor: Backing up control data sets Backing up data sets Deleting expired dump copies automatically Deleting excess dump VTOC copy data sets
The primary DFSMShsm processor is qualified in the DFSMShsm startup procedure through the HOST parameter. Therefore, if the MVS image to be removed is the DFSMShsm primary processor, you should move these functions to another MVS in the sysplex. After closing the primary DFSMShsm, close the alternate DFSMShsm and restart it with the primary attribute.
165
7.8.1 CICS
Stopping a CICS region requires the following action:
Check the definition and the set up with CP/SM to remove the CICS references. If you are closing a TOR, terminals connected to this region have to re-logon to the generic resource and re-signon to CICS. If you are closing an AOR, no further actions are required. Shut down the CICS region with the normal stop procedure.
7.8.2 IMS
A common sequence for shutting down the entire online system is as follows: 1. Stop data communications 2. Stop dependent regions 3. Stop the control region 4. Stop the IRLM The following describes these operations. The command used to shut down the control region also forces termination of communication and the dependent regions if they have not already been terminated in an orderly way.
166
After the shutdown process has begun, you can use the /DISPLAY SHUTDOWN STATUS command to see how many and which communication lines and terminals still contain active messages. You can use the /IDLE command to stop I/O operations on the specified lines to speed up the shutdown process. If IMS fails to shut down after you have followed these shutdown procedures, and if logging resources are available, you must force it to terminate.
/CHECKPOINT FREEZE|DUMPQ|PURGE causes immediate session termination for all logical units as follows: FREEZE DUMPQ PURGE Immediately after current input/output message After blocks are checkpointed After all queues are empty
/CHECKPOINT FREEZE|DUMPQ|PURGE QUIESCE allows all network nodes to complete normal processing before initiating the shutdown processing.
7.8.3 DB2
As shown in Figure 41 on page 168 there is no problem in accessing data when one subsystem comes down. Users can still access their DB2 data from another subsystem. Transaction managers are informed that DB2 is down and can switch new user work to another DB2 subsystem in the group.
167
There might be a situation in which you want to remove members from the group permanently or temporarily. For example, assume your group does the job it needs to do 11 months of the year. However, you get a surge of additional work every December that requires you to expand your capacity. It is possible to quiesce some members of the group for those 11 months. Those members are dormant until you restart them. The same principle is used to remove a member of the group permanently. You quiesce the member to be removed, and keep the log data sets until they are no longer needed for recovery (other members might need updates that are recorded on that member s log). In summary, to quiesce a member of the group, you must: 1. Stop the DB2 you are going to quiesce. Our example assumes you want to quiesce member DB3G.
DISPLAY GROUP DISPLAY UTILITY (*) MEMBER( member-name ) DISPLAY DATABASE(*) RESTRICT
If there is no unresolved work, no further action is required. However, if you want to create an archive log, go to step 4 on page 169.
168
3. If there is unresolved work, or if you want to do optional logging to create a disaster recovery archive log, start the quiesced member with ACCESS(MAINT).
169
170
8.1 VSAM
VSAM files usually belong to a CICS transaction manager. They can be accessed locally by a single CICS region or shared between multiple CICS AORs either traditionally through an FOR region or with a future release of CICS the files can be directly accessed with record level sharing (RLS). With a no single point of failure configuration CICS is able to provide 24x7 service. In this environment availability of the database is the key item to be concerned with to obtain continuous operations. Some dataset operations can be done without stopping the online activity, but some still cannot be executed concurrently with online activities. This next section discusses the kind of application outages that are still necessary.
8.1.1 Batch
Currently, there is no capability to share VSAM databases between online and batch workloads. Before starting the batch processing, all required database must be deallocated from the online transaction manager. With a future release of CICS this restriction will be lifted. In this RLS environment there will be the capability to share VSAM databases between online and batch processing, with the restriction that batch can access the database only for read operations.
171
8.1.2 Backup
There are different techniques for backing up VSAM files. In this section we will summarize which method and/or product you can use to avoid deallocating the VSAM database from the online workload. In fact, starting from MVS/DFP V3.2 DFHSM V2.5 and DFDSS V2.5, CICS is able to provide database backup while they are still open for online updates. CICS Backup While Open (BWO) is an online backup facility that allows data sets to be backed up even while they are being updated. BWO utilizes DFSMSdss, through DFHSM, as the data mover and creates a fuzzy copy. When restoring a fuzzy backup, you must also include any logs of changes made since the backup process started . If a file is eligible for BWO, CICS sets the BWO attribute in the catalog entry and writes information in the catalog entry at regular intervals. This information includes the time from which the forward recovery utility must start applying records and is defined as the recovery point. However, there are some things to consider when BWO technique is used. DFSMSdss reads data sets sequentially, so if a control interval (CI) or control area (CA) split occurs, it cannot assure data integrity and the backup is flagged as invalid. Therefore, if you want to use BWO with a file in which many records are inserted, you should schedule it during a period of low activity. To avoid this problem, DFSMS Concurrent Copy can be used with the BWO function. Once the DFSMS Concurrent Copy begins, any CI or CA splits that occur will not invalidate the copy. BWO and Concurrent Copy provide a point-in-time backup of CICS VSAM files with full data integrity. With DFSMS version 1.2, the only operational consideration for Concurrent Copy with BWO is the possibility that a CI or CA split is already in progress when the DFHSM backup of the VSAM file begins. In this case, DFDSS will fail the backup and will not retry. You must either schedule a manual backup or wait until the next backup cycle. CICSVR provides forward recovery of the CICS-VSAM file using the backup copy and all CICS journal records logged after the backup was taken. For further reading, please refer to Implementing Concurrent Copy , GG24-3990, Concurrent Copy Overview ,GG24-3936, CICS/ESA Release Guide , GC33-0655 and CICS VSAM Recovery Guide , SH19-6709.
8.1.3 Reorg
In general, most VSAM file reorganizations will require the file be removed from the online system.
8.2 IMS/DB
Applications running in a DL/1 environment are well positioned to offer a 24x7 continuous operation. The major issue is the database reorganization process.
172
8.2.1 Batch
IMS supports concurrent access from online and batch programs through the use of batch message processing programs (BMPs). Therefore, if your installation requires 24x7 service, you must use BMPs for all batch jobs. BMPs have characteristics of programs in both online and batch environments in that they run online but are started with job control language (JCL), like programs in a batch environment. Input for BMPs can be from an MVS file or from the IMS message queue. BMPs do not necessarily process messages, although they can; BMPs can access the database concurrently with MPPs, even if your installation does not use data sharing. However, with data sharing, true batch programs, as well as BMPs or MPPs, can access the same database concurrently. Although BMPs are generally used to perform batch-type processing online, they can send or receive messages. There are two kinds of BMPs:
A transaction-oriented BMP accesses message queues for its input and output. It can also process input from MVS files and it can create MVS files for its output. A batch-oriented BMP does not access the message queue for input; it is simply a batch program that runs online. It can send its output to any MVS output device.
8.2.2 Backup
The techniques used to back up a DL/1 database depends on the type of database. IMS databases are divided into Full Function Databases (FFDBs) and Fast Path Databases (FPDBs). FPDBs support higher transaction rates and offer some enhancements in data management. On the other hand, FPDBs require more virtual storage than FFDBs. FPDBs can be further divided into Data Entry databases (DEDBs) and Main Storage Databases (MSDBs). Full Function Database: The FFDB is the standard IMS database. The access methods are HSAM, HISAM, HDAM, and HIDAM. The image copy utilities supported depend on the access methods used. Data Entry Database: DEDBs are similar to FFDBs; the main differences are:
DEDBs support partitioning of the database into multiple areas , each of which exist in a separate area data sets . DEDBs support maintaining multiple copies of any area data set. If the area data set is defined as having multiple copies, the recovery procedure must rebuild the first copy and then a separate process must re-establish the multiple copies. With DEDBs, the log contains only the after image of the data. The DASD copy of a DEDB is not written until the transaction reaches a commit point, usually when it finishes.
Main Storage Database: The MSDB is located in main storage. This means that an MSDB can be accessed faster than a DEDB. However, it also means that the MSDBs are more limited in size. MSDBs also have very significant functional limitations. For example, you cannot add or delete a root segment to an MSDB without shutting down IMS. MSDBs are not supported by IMS V5. With this release of IMS a new VSO option for DEDBs causes IMS to place the entire contents of a
Chapter 8. Database Availability
173
DEDB into storage. This gives the performance advantage of main storage occupancy without the functional limitations of MSDBs. A backup of an IMS database is called an image copy. An image copy can be produced using either an IMS utility or a user utility, and may be performed either online or offline (batch). In this section we want to put emphasis on the online techniques available for the IMS database. Concurrent Image Copy Option: This is a database image copy utility with the concurrent image copy (CIC) parameter in the EXEC statement. This allows an image copy to be taken while the database continues to be updated. IMS concurrent image copy supports DEDBs and has been enhanced to support FFDBs in IMS/ESA Version 4.1. The resulting image copy is not a point-in-time backup; however, it can be used with the appropriate log to recover the database. This is sometimes called a fuzzy image copy. VSAM KSDSs are not supported by the concurrent image copy option. Online Database Image Copy Utility (DFSUICP0): This utility is executed as an online utility. It runs as a batch message processing (BMP) program. You can use it only for HISAM, HIDAM, and HDAM databases. All logs active while the image copy is being created are required as input to the recovery. DBRC plays an important role here by maintaining in the RECON the recovery information that has been obtained from log archive activity and image copy executions. DBRC uses system timestamps to determine the various logs required for use in a potential data base recovery.
8.2.3 Reorg
The requirement to reorganize a database in most cases requires the database to be removed from the online service, causing an outage to the users of the system. However, DEDBs can be reorganized online as long as the space allocation does not have to be changed. There are also some vendor products that extend the standard IBM utilities to provide online database reorganization for other database organizations.
8.3 DB2
Up to now there is still no 24x7 full availability for DB2 database. However DB2, through the database partitioning and hardware features, is able to offer limited outages of the database during backup and/or reorg processing.
8.3.1 Batch
There is no particular restriction in batch activities against DB2 database. DB2 can support either online, batch, CAF and TSO queries. The only issue could come from a performance point of view.
174
8.3.2 Backup
In DB2, the term image copy refers only to data copies that are taken with the DB2 image copy utility. Image copies are taken at a table space level. Until DB2 Version 3, no other data copies, such as DFSMSdss or IDCAMS copies, could be used by DB2 for recovery. DB2 keeps track of the image copies by registering them in the DB2 catalog. It automatically selects the correct image copy for any recovery needed. An image copy that is not registered in the DB2 catalog cannot be used for recovery. DB2 utilities provide some functions to back up database while still in use. For example, image copies can be taken concurrently while other applications update the data. The DB2 image copy utility can be invoked with either:
SHRLEVEL CHANGE Concurrent update is allowed. SHRLEVEL REFERENCE Read only, no concurrent update is permitted.
Starting with DB2 V3 you can also use the DFSMS CONCURRENT COPY feature to speed up DB2 database backup. However, DB2 is not aware of the copy when it is done in this manner, so you have to manage the copies and recovery from them yourself in the event that a recovery is required. There is a new option in the recovery process called log only, that is designed to work as a follow on to a recovery from a copy done with the DFSMSdss concurrent copy.
8.3.3 Reorg
Up to now there is no full capability for online database reorganization. However, there is granularity in DB2 database reorganization. It is not required that the entire database should be deallocated to be reorganized, you have to shutdown only the partition of the database that needs to be reorganized.
175
176
177
178
179
Connectivity failure. This is a solid failure affecting the communication between the host MVS and the coupling facility, such as defective CFS or CFR CHPIDs or defective CFC links, and there are no more communication links available to the coupling facility. Note: a coupling facility going out of operation also causes a connectivity failure.
Structure failure. This is a functional problem reported by the coupling facility against the structure(s) it is keeping in its processor storage. Getting a structure failure indication implies that the coupling facility is operative enough to report an internal problem which may have affected the structure contents. By its nature, a structure failure has a more pervasive effect on the sysplex than a connectivity failure, except if the connectivity failure is due to a coupling facility going not operational.
The coupling facility becomes volatile. A coupling facility switches from the nonvolatile state to the volatile state upon: An operator intervention at the CFCC who entered the command: MODE VOLATILE Refer to 1.3.5, Coupling Facility Volatility/Nonvolatility on page 8. Or the coupling facility power control system detects a potential malfunction in the battery backup unit or the local UPS (refer to 1.15.2, 9672/9674 Protection against Power Disturbances on page 27).
Switching from nonvolatile to volatile state does not affect the coupling facility operation provided that the primary power is still here, but it may matter to connected exploiters which requested the structure to be allocated in a nonvolatile coupling facility. In most of the cases the recovery for a coupling facility failure will be to move the affected structure(s) in another location, which is not in the current failure domain. Location here could be either to move inside the same coupling facility or to move to another coupling facility. Note also that the recovery can be either attempted immediately or possibly deferred. If deferred, the sysplex continues with some member(s) possibly affected by the failure. These failures are reported to the coupling facility exploiters by XES via the exploiter s EVENT exit, along with some additional information intended to help the connected exploiter in making a recovery decision. If possible, recovery is automatically initiated. Note that the philosophy here is only to provide the structure s exploiter with information and facilities to help it drive recovery. However, it is up to the exploiter code to decide whether or not recovery should be performed and to what extent. MVS provides services to the coupling facility exploiters to help them automating the recovery from coupling facility failure, which are based on the Sysplex Failure Management (SFM) service. The recommendation is to use SFM
180
whenever applicable. If SFM is not applicable or should the automated recovery fail, then operator intervention can be considered. SFM is explained in further detail in 2.15, Automating Sysplex Failure Management on page 57. The following section indicates the ways of recovering from a coupling facility/coupling technology failure. Information on how to move a structure can be found in 5.4, To Move a Structure on page 120. Information on how IBM subsystems specifically recover from a coupling facility failure can be found in these paragraphs:
DB2 at 9.4, DB2 V4 Recovery from a Coupling Facility Failure on page 189 XCF at 9.5, XCF Recovery from a Coupling Facility Failure on page 192 RACF at 9.6, RACF Recovery from a Coupling Facility Failure on page 194 VTAM at 9.7, VTAM Recovery from a Coupling Facility Failure on page 196 IMS/DB at 9.8, IMS/DB Recovery from a Coupling Facility Failure on page 197 JES2 at 9.9, JES2 Recovery from a Coupling Facility Failure on page 199 System logger at 9.10, System Logger Recovery from a Coupling Facility Failure on page 203 Tape Switching at 9.11, Automatic Tape Switching Recovery from a Coupling Facility Failure on page 204 VSAM RLS at 9.12, VSAM RLS Recovery from a Coupling Facility Failure on page 205
We also describe what a system operator can do to assess the problem and what is the recommended course of actions.
181
182
RACF V T A M 4.2 Structure Disp: DELETE Connection Disp: DELETE Connection Disp: KEEP Connection Disp: DELETE Structure Disp: KEEP Structure Disp: KEEP Processing as for VTAM 4.2. V T A M 4.3 JES2 Checkpoint VTAM Generic Resources Logger If a VTAM Node fails, other VTAMs provide necessary cleanup. G e n e r i c r e s o u r c e s can be restarted on another VTAM node. Other instances of Logger coordinate migration of logstream data that had not been written to DASD by failed system. Structure Disp: DELETE Connection Disp: KEEP Local persistent data may exist for LU61 and LU62 sessions with SYNCPT data. Structure Disp: KEEP Connection Disp: KEEP Processing as for VTAM 4.2. System Logger initiates structure rebuild. See 9.10.1 on page 203. VTAM initiates rebuild of structure ISTGENERIC when any VTAM member in the sysplex loses connectivity. See 9.7.1 on page 196. System which loses connectivity to RACF structure switches to read-only mode; rest of sysplex continues in data-sharing mode. Checkpoint is m o v e d , either to another structure or DASD, according to specification of OPVERIFY in JES2 initialization parms. Structure has disposition of KEEP and remains allocated even if checkpoint is forwarded to DASD. See 9.9.1 on page 199. If all RACF instances lose connectivity, structure is automatically deallocated and reallocated as per CFRM policy preference list. See 9.6.1.2 on page 194. Processing as per N o Active SFM Policy. Processing as per N o Active SFM Policy. VTAM initiates rebuild of structure ISTGENERIC as per the active SFM policy WEIGHTs and CFRM policy REBUILDPERCENT. VTAM 4.2 ignores REBUILDPERCENT and initiates structure rebuild as soon as one VTAM member loses connectivity. See 9.7.1 on page 196. N o action. See 9.6.2 on page 195. N o action. See 9.7.3 on page 196. N o action. See 9.7.3 on page 196. Processing as specified on CKPTDEF stmt: issue msg, enter Chkpt Reconfig Dialog, or ignore. See 9.9.3 on page 203. Logger initiates structure rebuild. See 9.10.2 on page 203. RACF initiates structure r e b u i l d . If not possible, switches to non data sharing mode. See 9.6.1.3 on page 195. JES2 does not support r e b u i l d . Operator must use Checkpoint Reconfiguration Dialog to switch checkpoint to DASD. Processing as for CF connectivity failure. See 9.9.2 on page 202. If structure ISTGENERIC fails, each VTAM attempts to initiate structure rebuild. New ISTGENERIC is replenished from the local data of each VTAM node in the generic resource configuration. See 9.7.2 on page 196. RACF supports operator initiated structure rebuild but with restrictions. See 9.6.3 on page 195. Operator must enter JES2 Checkpoint Reconfiguration Dialog. See 9.9.4 on page 203. VTAM supports operator initiated structure rebuild. See 9.7.4 on page 196. VTAM supports operator initiated structure rebuild. See 9.7.4 on page 196. Logger supports operator initiated structure rebuild. See 9.10.4 on page 204. If structure fails, each VTAM attempts to initiate structure rebuild. New structure is replenished from the local data of each VTAM node in the generic resource configuration. See 9.7.2 on page 196. Logger rebuilds the logstream structures into another CF. See 9.10.3 on page 203. VTAM initiates rebuild of structure as per the active SFM policy WEIGHTs and CFRM policy REBUILDPERCENT. VTAM 4.3 initiates structure rebuild when REBUILDPERCENT is reached. See 9.7.1 on page 196. Logger initiates structure rebuild as per the active SFM policy WEIGHTs and CFRM policy REBUILDPERCENT. See 9.10.1 on page 203.
Table 10 (Page 1 of 2). Subsystem Recovery Summary Part 1. The table summarizes recovery actions for the subsystems for different failure types.
Subsystem
Element
Loss of CF Connectivity
XCF initiates structure rebuild when any sysplex member loses connectivity to the XCF signalling structure.
If any member cannot recover connectivity to new and only structure it is partitioned out of the sysplex.
Loss of CF Connectivity
CF Volatility Change
N o action.
CF Structure Failure
Table 10 (Page 2 of 2). Subsystem Recovery Summary Part 1. The table summarizes recovery actions for the subsystems for different failure types.
RACF V T A M 4.2 Put all RACF data-sharing instances into non data sharing mode: See 9.7.5 on page 197. See 9.7.5 on page 197. RVARY NODATASHARE See 9.6.4 on page 196. Not supported. Stop all connected instances of VTAM. Stop all connected instances of VTAM. V T A M 4.3 All logstream exploiters must disconnect from System Logger: See 9.10.5 on page 204. JES2 Checkpoint VTAM Generic Resources Logger
Subsystem
Element
183
184
DB2 GBP Possibility of failed persistent data. Connection Disp: KEEP Structure Disp: KEEP Connection Disp: DELETE Cache Connection Disp: DELETE Data sharing group member fails. See 9.4.1 on page 189. See 9.12.1 on page 205. See 9.4.1 on page 189. Data sharing group member fails. Allocation initiates IEFAUTOS structure rebuild. See 9.11.1 on page 204. Connection Disp: KEEP Cache Structure Disp: KEEP Structure Disp: KEEP Lock Connecttion Disp: DELETE Possibility of failed persistent data. Structure Disp: DELETE Lock Structure Disp: KEEP SCA Lock Structure Disp: DELETE Connection Disp: DELETE IMS SMSVSAM (VSAM RLS) Tape Sharing (IEFAUTOS) IMS does not initiate s t r u c t u r e r e b u i l d . The data sharing member which lost connectivity enters non data sharing mode. See 9.8.1 on page 197. SMSVSAM attempts to rebuild both cache and lock structures. DB2 initiates structure rebuild in alternate CF, if possible. See 9.4.1 on page 189. See 9.8.1 on page 197. See 9.4.1 on page 189. DB2 initiates structure rebuild in alternate CF, if possible. IMS initiates structure rebuild according to the active SFM policy WEIGHTs and CFRM policy REBUILDPERCENT. S M S V S A M initiates structure rebuild according to the active SFM policy WEIGHTs and CFRM policy REBUILDPERCENT. See 9.12.1 on page 205. N o action. See 9.12.3 on page 206. Allocation initiates IEFAUTOS structure rebuild according to the active SFM policy WEIGHTs and CFRM policy REBUILDPERCENT. See 9.11.1 on page 204. N o action. See 9.11.3 on page 204. D B2 issues warning message but does not initiate structure rebuild. DB2 issues warning message but does not initiate structure rebuild. Neither IMS nor IRLM take any action if the CF becomes volatile. See 9.8.3 on page 198. DB2 initiates structure rebuild in alternate CF. See 9.4.2 on page 190. See 9.4.2 on page 190. DB2 initiates structure rebuild in alternate CF. IMS initiates structure rebuild in alternate CF. SMSVSAM attempts to rebuild the structure (either cache or lock). See 9.12.2 on page 205. All connectors to IEFAUTOS attempt to start rebuild. If system cannot continue with rebuild, it disconnects. See 9.11.2 on page 204. Operator initiated rebuild supported. See 9.11.4 on page 204. See 9.8.4 on page 198. SETXCF START,REBUILD See 9.12.4 on page 206. DB2 supports manual rebuild of SCA structure via the command: SETXCF START,REBUILD without disruption to data sharing group members. See 9.4.4 on page 190. SETXCF START,REBUILD without disruption to data sharing group members. See 9.4.4 on page 190. IGWLOCK00 lock structure no support Not supported. See 9.11.6 on page 205. Stop all DB2 instances currently connected to structure. Lock structure has disp KEEP, so use command: SETXCF FORCE See 9.4.7 on page 199. O S A M / V S A M cache structure stop all connected DBMS instances. DB2 supports manual rebuild of lock structure via the command: IMS supports the manual rebuild of the lock, OSAM and VSAM structures. SMSVSAM supports rebuild of both the lock and cache structures via the command: Stop all DB2 instances currently connected to structure. SCA structure has disp KEEP, so use command: SETXCF FORCE See 9.4.7 on page 199. SMSVSAM support manual deallocation against the SMSVSAM cache structure through the VARY SMS command. IRLM lock structure identify all IRLMs connected to the structure and stop all DBMS instances connected to those IRLMs. Note: lock structure has disposition of KEEP. See 9.8.5 on page 199. See 9.12.5 on page 206.
Table 11. Subsystem Recovery Summary Part 2. The table summarises recovery actions for the subsystems for different failure types.
Subsystem
Element
Loss of CF Connectivity
Transactions using GBP when connectivity is lost add their pages to the Logical Page List (LPL), making them unavailable for other transactions. N e w transactions attempting to access affected GBP receive SQL rc=904 indicating pages not available.
Loss of CF Connectivity
CF Volatility Change
CF Structure Failure
Not supported.
Stop usage of GBP by stopping DB2 instances, stopping databases, or adjusting buffer pools.
When a lock structure fails: DXR143I IRLK REBUILDING LOCK STRUCTURE BECAUSE IT HAS FAILED OR AN IRLM LOST CONNECTION TO IT DXR146I IRLK REBUILD OF LOCK STRUCTURE COMPLETED SUCCESSFULLY When a XCF signalling structure fails: IXC467I STARTED REBUILD FOR PATH STRUCTURE IXCPLEX_PATH1 RSN: STRUCTURE FAILURE DIAG073: 08880001 092A0000 2000E800 000000000 IXC457I REBUILT STRUCTURE IXCPLEX_PATH1 ALLOCATED WITH 1000 LISTS WHICH SUPPORTS FULL SIGNALLING CONNECTIVITY AMONG 32 SYSTEMS AND UP TO 14428 SIGNALS IXC465I REBUILD REQUEST FOR STRUCTURE IXCPLEX_PATH1 WAS SUCCESSFUL WHY REBUILT: STRUCTURE FAILURE When a RACF structure fails: IRRX007I RACF DATASHARING GROUP IS INITIATING A REBUILD FOR STRUCTURE IRRXCF00_P001. ICH15019I INITIATING PROPAGATION OF RVARY COMMAND TO MEMBERS OF RACF DATA SHARING GROUP I IN RESPONSE TO A REBUILD REQUEST. ICH15020I RVARY COMMAND INITIATED IN RESPONSE TO THE REBUILD REQUEST HAS FINISHED PROCESSING When a VTAM structure fails: IST1381I REBUILD STARTED FOR STRUCTURE ISTGENERIC . . . IST1383I REBUILD COMPLETE FOR STRUCTURE ISTGENERIC
There are, as of now, two known variations on the way a structure failure is indicated:
VSAM or OSAM structure : There is currently no message stating that damage has been detected in a structure. Neither is there a message when the structure is rebuilt, even though the rebuild is done automatically. The only indication that might appear would be U3033 abends in the transactions caused by calls to a database failing while the structure is being rebuilt.
185
JES2 checkpoint: A checkpoint structure which fails is treated by JES2 as an I/O error, and the checkpoint reconfiguration can be automatically initiated. This is further explained at 9.9, JES2 Recovery from a Coupling Facility Failure on page 199.
In this example, the system is connected to the CF2 via CHPID 12 and 14.
IXL158I PATH 12 IS NOW NOT-OPERATIONAL TO CUID: FFF8 066 COUPLING FACILITY 009672.IBM.51.000000060043 PARTITION: 3 CPCID: 00 IXL158I PATH 14 IS NOW NOT-OPERATIONAL TO CUID: FFF8 067 COUPLING FACILITY 009672.IBM.51.000000060043 PARTITION: 3 CPCID: 00
IXC518I SYSTEM SF1 NOT USING 081 COUPLING FACILITY 009672.IBM.51.000000060043 PARTITION: 3 CPCID: 00 NAMED CF2 REASON: CONNECTIVITY LOST. REASON FLAG: 13300001.
D XCF,CF,CFNAME=CF2 IXC362I 16.38.36 DISPLAY XCF 101 CFNAME: CF2 COUPLING FACILITY : 009672.IBM.51.000000060043 PARTITION: 3 CPCID: 00 POLICY DUMP SPACE SIZE: 2000 K ACTUAL DUMP SPACE SIZE: 2048 K STORAGE INCREMENT SIZE: 256 K NO SYSTEMS ARE CONNECTED TO THIS COUPLING FACILITY
Note that the same messages will show up if the coupling facility becomes not operational.
186
IXG104I STRUCTURE REBUILD INTO STRUCTURE SYSTEM_OPERLOG 608 HAS BEEN STARTED FOR REASON: COUPLING FACILITY VOLATILITY STATE CHANGE IXG209I RECOVERY FOR LOGSTREAM SYSPLEX.OPERLOG 316 IN STRUCTURE SYSTEM_OPERLOG COMPLETED SUCCESSFULLY. IXG110I STRUCTURE REBUILD FOR STRUCTURE SYSTEM_OPERLOG IS COMPLETE. LOGSTREAM DATA DEFINED TO THIS STRUCTURE MAY BE LOST FOR CERTAIN LOGSTREAMS D CF IXL150I 23.01.31 DISPLAY CF 927 COUPLING FACILITY 009674.IBM.51.000000060041 PARTITION: 3 CPCID: 00 CONTROL UNIT ID: FFF6 NAMED CF1 COUPLING FACILITY SPACE UTILIZATION ALLOCATED SPACE DUMP SPACE UTILIZATION STRUCTURES: 11264 K STRUCTURE DUMP TABLES: DUMP SPACE: 2048 K TABLE COUNT: FREE SPACE: 17920 K FREE DUMP SPACE: TOTAL SPACE: 31232 K TOTAL DUMP SPACE: MAX REQUESTED DUMP SPACE: VOLATILE: YES STORAGE INCREMENT SIZE:256K CFLEVEL: 1 COUPLING FACILITY SPACE CONFIGURATION IN USE CONTROL SPACE: 13312 K NON-CONTROL SPACE: 0 K SENDER PATH 13 PHYSICAL ONLINE LOGICAL ONLINE SUBCHANNEL 0361 0362 317
0 0 2048 2048 0
K K K K
TOTAL 31232 K 0 K
187
occurs. Deallocating a structure is discussed at Appendix B, Structures, How to ... on page 241. 3. Reallocate the structure. This is expected in most of the cases to translate into restarting the structure s exploiters. Whether this process is applicable and how it can be applied to the IBM structure exploiters, is indicated for each one of them in the paragraphs dedicated to their individual recovery.
All the current connectors to the structure are allowing the structure to be rebuilt. There is at least one connection to the structure still active. The alternate coupling facility is on the preference list in the active CFRM policy for the structure to be rebuilt. The coupling facility candidate to have the structure rebuilt into must have enough free space to accommodate for the new instance of the structure. All exploiters have connectivity to the new instance of the structure.
188
Transactions which were using the GBP at the time of loss of connectivity add their pages to the Logical Page List (LPL), making these pages unavailable for other transactions. New transactions which try to access the affected GBP receive a SQL return code of -904, indicating that the pages are not available.
Repair the connectivity failure. As soon as connectivity to the GBP is restored, the affected data sharing group members will automatically reconnect to the GBP. You could also stop the affected DB2 members and restart them from a host MVS which still has connectivity to the GBP structure.
If the connectivity failure cannot be repaired, or if the GBP contents have been damaged. Delete all the connections left to the GBP (these are failed-persistent connections and SETXCF FORCE will have to be used). This deallocates the GBP and one of the DB2 members will then perform a damage assessment and will mark the affected DB2 objects as GBP recovery pending (GRECP) with proper messages to indicate which databases have been affected. The next step is then to re-start the affected database with the START DB command. This will reallocate the GBP, and the objects will be recovered. This can be done from any DB2 member.
The alternate coupling facility is on the active preference list for the involved structure, and has enough free processor storage to accommodate for a new instance of the structure. All participating DB2 members have connectivity to the alternate coupling facility.
Chapter 9. Parallel Sysplex Recovery
189
The active REBUILDPERCENT threshold for the involved structure has been reached.
Loss of Connectivity to a GBP Structure: Having an active SFM policy does not affect the way a loss of connectivity to a GBP is handled. The recovery has to be performed as in 9.4.1.1, No Active SFM Policy, or Active Policy with CONNFAIL(NO) on page 189, that is, either of the following:
Recover connectivity to the GBP. Proceed with GBP deallocation and restart the affected database.
The simplest one is to stop all DB2 members in the data sharing group. This method is mandatory if the GBP to deallocate is GBP0 (GBP0 contains catalog and directory). As the GBP structure has a disposition of DELETE, it will be automatically deallocated. Reallocation will be performed upon restart of the DB2 instances.
190
If it is not possible to stop all members and the GBP to be deallocated is not GBP0, then one of the three following methods can be used.
Delete virtual pool buffer by altering its size to 0. This will initiate disconnection from the related GBP. Stop all databases, therefore remove page sets dependence on the GBP which results in eventually disconnecting from the GBP. Stop only page sets which use the associated buffer pool. This is the most granular way to minimize the GBP deallocation impact.
To reallocate the structure, restart the stopped elements or alter the size of the virtual buffer pool, depending on the method selected for deallocation.
D XCF,STR,STRNAME=lock_structure_name
This will result in displaying at the console a list of all connectors. That is, a list of the IRLMs currently using the structures or with connections in the failed-persistent state, if any.
D XCF,STR,STRNAME=IRLMLOCK1 STRNAME: IRLMLOCK1 STATUS: ALLOCATED POLICY SIZE : 32000 K POLICY INITSIZE: N/A REBUILD PERCENT: 1 PREFERENCE LIST: CF1 CF2 EXCLUSION LIST IS EMPTY ACTIVE STRUCTURE ---------------ALLOCATION TIME: 11/01/95 08:22:20 CFNAME : CF1 COUPLING FACILITY: 009674.IBM.02.000000040020 PARTITION: 1 CPCID: 00 ACTUAL SIZE : 32000 K STORAGE INCREMENT SIZE: 256 K VERSION : ABE834ED BC6B2002 DISPOSITION : KEEP ACCESS TIME : 0 MAX CONNECTIONS: 23 # CONNECTIONS : 4 CONNECTION NAME ---------------IRLMGRP1$IRLA001 IRLMGRP1$IRLB002 IRLMGRP1$IRLC003 IRLMGRP1$IRLD004 ID -07 02 01 05 VERSION -------0007006D 0002006D 0001007F 0005006E SYSNAME -------Z0 J80 J90 JA0 JOBNAME -------IRLMA IRLMB IRLMC IRLMD ASID ---0081 0038 003B 008D STATE ---------------ACTIVE ACTIVE ACTIVE ACTIVE
2. For each one of the IRLMs identified above, identify the DB2 instances using the following command:
F irlm_name,STATUS
Chapter 9. Parallel Sysplex Recovery
191
F IRLMD,STATUS DXR101I IRLD STATUS SCOPE=GLOBAL SUBSYSTEMS IDENTIFIED NAME STATUS UNITS HELD IMSD UP 5 271
WAITING 0
RET_LKS 0
3. Stop all the DB2 instances identified by the previous command Note that lock structures have a disposition of KEEP, therefore the SETXCF FORCE command will have to be used to complete the deallocation. SETXCF FORCE must also be used if any connection remains in failed-persistent state. Reallocation is performed when the DB2 instances are restarted. Reallocation can be directed to another coupling facility by changing the active preference list before restarting the DB2s.
D XCF,STR,STRNAME=sca_structure_name
This will result in displaying at the console a list of all connectors. That is, DB2 members currently using the structures or with connections in failed persistent state, if any. 2. To stop all DB2 instances identified by the above command. Note that SCA structures have a disposition of KEEP&semi, therefore the SETXCF FORCE command will have to be used to complete the deallocation. SETXCF FORCE must also be used if any connection remains in failed-persistent state. Reallocation is performed when the DB2 members are restarted. Reallocation can be directed to another coupling facility by changing the active preference list before restarting the DB2s.
192
Note that the decision to partition is made by XCF itself, without consulting the XCF exploiters. An XCF signalling exploiter cannot in any way prevent XCF from partitioning a member out of the sysplex when XCF loses signalling connectivity. We recommend that both CTC paths and coupling facility structures be available for XCF signalling.
Have backup XCF signalling structures in another coupling facility (recommended). Have backup CTC links for XCF signalling (recommended). Modify the INTERVAL parameter in COUPLExx to account for the longer time (not recommended).
V XCF,sysname,OFFLINE
In this case the operator will be prompted with the following message:
IXC371D CONFIRM REQUEST TO VARY SYSTEM sysname OFFLINE. REPLY SYSNAME=sysname TO REMOVE sysname OR C TO CANCEL.
Chapter 9. Parallel Sysplex Recovery
193
The operator must go to the system partitioned off the sysplex and perform system reset if no SFM policy is active. If a SFM policy is active, the partitioned system will be automatically isolated and an I/O interface reset performed, as long as the system to be isolated shares coupling facility connectivity with any other operating MVS image in the sysplex. For further details on the isolate function refer to 2.15.2, The SFM Isolate Function on page 59.
194
RACF structure rebuild Because of the specific way structure rebuild is implemented in RACF, rebuilding into another coupling facility requires that you modify the active preference list first. See 9.6.3, Manual Invocation of Structure Rebuild on page 195.
SETXCF START,REBUILD,STRNM=IRRXCF00_P001 IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE 527 IRRXCF00_P001 WAS ACCEPTED. IXC521I REBUILD FOR STRUCTURE IRRXCF00_P001 HAS BEEN STOPPED ICH15019I INITIATING PROPAGATION OF RVARY COMMAND 299 TO MEMBERS OF RACF DATA SHARING GROUP IRRXCF00 IN RESPONSE TO A REBUILD REQUEST. IXC509I CFRM ACTIVE POLICY RECONCILIATION EXIT HAS STARTED. 300 TRACE THREAD: 0000018F. .......................... IXC509I CFRM ACTIVE POLICY RECONCILIATION EXIT HAS COMPLETED. 301
The rebuild always occurs as if LOC=NORMAL. For any one rebuild request to RACF, all the RACF structures currently allocated are rebuilt.
The implication is that the rebuild always scans the CFRM preference list, and chances are great that the same coupling facility will be selected again to receive the structure. In order to have the RACF structures be rebuilt into a different coupling facility, one of the following conditions must be met:
The rebuild cannot be performed on the original coupling facility because of a permanent failure and there is another coupling facility available in the preference list.
195
The CFRM active policy is changed with a preference list designating another operational coupling facility as the best candidate for the allocation of the structure.
RVARY NODATASHARE
This command has a sysplex scope and initiates disconnection of the data sharing group members from the structures, and hence the structures are deallocated. The RACF structures are reallocated when you issue the following:
RVARY DATASHARE
196
The affected IRLMs remain active in failure status. The batch jobs using the affected IRLMs abend. The IMS/TMs or DBCTLs using the affected IRLMs are quiesced. Dynamic backout is invoked for in-flight transactions
If the connectivity is restored, IRLM reconnects automatically to the lock structure, IMS and DBCTL reconnect automatically to IRLM and operations resume. if the connectivity cannot be restored, either of the following must be done:
Manually rebuild the lock structure into another coupling facility that all IRLMs have connectivity to. Restart the IMS and DBCTL instances on a system where IRLM has still connectivity to the lock structure. Note that an IMS/DB instance can be restarted and use a different IRLM to process any new lock or previously retained locks. The restart of the new IMS instance can be implemented as an automatic operation using ARM (refer to 2.17, ARM: MVS Automatic Restart Manager on page 79).
If batch jobs abended while the connectivity to the coupling facility was lost, it is possible that you will require running the Batch Backout Utility to recover the databases used by the batch jobs.
197
IMS stops all databases with SHARELVL=2 or 3 (that is, stops data sharing). If IMS/TM is used, affected transactions are put in the suspend queue.
If the connectivity is restored, IMS reconnects automatically to the cache structure and starts the affected data bases. The transactions are released from the suspend queue. If the connectivity cannot be restored do either of the following:
Manually rebuild the cache structure into another coupling facility to which all IRLMs in the data sharing group have connectivity. Restart the IMS instances on a system which has still connectivity to the cache structure.
The local buffers are invalidated IMS requests a dynamic rebuild of the structure. Data sharing operations automatically resume on the successful completion of the rebuild.
If the dynamic rebuild is not successful, a manual rebuild can be attempted after modifying the active preference list to designate a new best candidate coupling facility to rebuild into.
198
D XCF,STR,STRNAME=lock_structure_name
This will result in displaying at the console a list of all connectors, that is IRLMs currently using the structures or with connections in failed-persistent state, if any. 2. For each one of the IRLMs identified above, identify the DBMS instances connected to this IRLM, using:
F irlm_name,STATUS
3. Stop all DBMS instances connected this IRLM (IMS/DB, DBCTL or DL/I batch); this results in IRLM disconnecting from the lock structure. Note that the lock structures have a disposition of KEEP, therefore the SETXCF FORCE command will have to be used to complete the deallocation. The SETXCF FORCE command must also be used for those connections which remain in failed-persistent state for a lock structure. Reallocation is performed when the IMS and/or DBCTL instances are restarted.
Use the D XCF,STR,STRNAME=cache_strname command to identify all the DBMS instances connected to the structure. Stop all the identified DBMS instances.
The structure will be automatically reallocated when the DBMS instances are started again.
199
CKPTDEF CKPT1=(STRNAME=JES2CKPT_1,INUSE=YES, VOLATILE=NO), CKPT2=(DSNAME=SYS1.JES2.CKPT1, VOLSER=TOTSM1,INUSE=YES,VOLATILE=NO), NEWCKPT1=(STRNAME=JES2CKPT_2), NEWCKPT2=(DSNAME=SYS1.JES2.CKPT2, VOLSER=TOTPD0),MODE=DUPLEX,DUPLEX=ON, LOGSIZE=1,APPLCOPY=NONE, VERSIONS=(STATUS=ACTIVE,NUMBER=50, WARN=80,MAXFAIL=0,NUMFAIL=0, VERSFREE=50,MAXUSED=2),RECONFIG=NO, VOLATILE=(ONECKPT=DIALOG, ALLCKPT=DIALOG),OPVERIFY=NO
Figure 42. Sample Checkpoint Definition
On a connectivity failure to JES2CKPT_1, the checkpoint must be forwarded to JES2CKPT_2 by JES2. Note that the JES2 checkpoint structures are allocated as soon as they they assigned as a checkpoint, but they remain allocated (disposition=KEEP) even if the checkpoints are forwarded to DASD.
200
Connectivity is lost to the checkpoint structure: IXC518I SYSTEM SC47 NOT USING 483 COUPLING FACILITY 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 01 NAMED CF02 REASON: CONNECTIVITY LOST. REASON FLAG: 13300002.
Because of OPVERIFY=NO, the Checkpoint Reconfiguration is automatically initiated. $HASP285 JES2 CHECKPOINT RECONFIGURATION STARTING $HASP290 MEMBER SC47 -- JES2 CKPT1 IXLLIST LOCK REQUEST FAILURE 490 *** CHECKPOINT DATA SET NOT DAMAGED BY THIS MEMBER *** RETURN CODE = 0000000C REASON CODE = 0C080C06 RECORD = UNKNOWN *$HASP275 MEMBER SC47 -- JES2 CKPT1 DATA SET - I/O ERROR - REASON CODE 491 CF2 $HASP233 REASON FOR JES2 CHECKPOINT RECONFIGURATION IS CKPT1 I/O 492 ERROR(S) ON 1 MEMBER(S) $HASP285 JES2 CHECKPOINT RECONFIGURATION STARTED - DRIVEN BY 493 MEMBER SC50 $HASP280 JES2 CKPT1 DATA SET (STRNAME JES2CKPT_2) IS NOW IN USE JES2 informs the operator that there is no more NEWCKPT1 defined, since the Checkpoint has just been forwarded to the previously defined NEWCKPT1. $HASP256 FUTURE AUTOMATIC FORWARDING OF CKPT1 IS SUSPENDED UNTIL 378 NEWCKPT1 IS RESPECIFIED. ISSUE $T CKPTDEF,NEWCKPT1=(...) TO RESPECIFY $DCKPTDEF $HASP829 CKPTDEF 547 $HASP829 CKPTDEF CKPT1=(STRNAME=JES2CKPT_2,INUSE=YES, $HASP829 VOLATILE=NO), $HASP829 CKPT2=(DSNAME=SYS1.JES2.CKPT1, $HASP829 VOLSER=TOTSM1,INUSE=YES,VOLATILE=NO), $HASP829 NEWCKPT1=(DSNAME=,VOLSER=), $HASP829 NEWCKPT2=(DSNAME=SYS1.JES2.CKPT2, $HASP829 VOLSER=TOTPD0),MODE=DUPLEX,DUPLEX=ON, $HASP829 LOGSIZE=1,APPLCOPY=NONE, $HASP829 VERSIONS=(STATUS=ACTIVE,NUMBER=50, $HASP829 WARN=80,MAXFAIL=0,NUMFAIL=0, $HASP829 VERSFREE=50,MAXUSED=0),RECONFIG=NO, $HASP829 VOLATILE=(ONECKPT=DIALOG, $HASP829 ALLCKPT=DIALOG),OPVERIFY=NO
If the OPVERIFY parameter had been coded OPVERIFY=YES , then the operator would be prompted to make the decision:
201
Connectivity is lost to the checkpoint structure: IXC518I SYSTEM SC47 NOT USING 602 COUPLING FACILITY 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 00 NAMED CF01 REASON: CONNECTIVITY LOST. REASON FLAG: 13300001. JES2 initiates Checkpoint reconfiguration, but prompts the operator to continue. $HASP285 JES2 CHECKPOINT RECONFIGURATION STARTING *$HASP275 MEMBER SC47 -- JES2 CKPT1 DATA SET - I/O ERROR - REASON CODE 609 CF2 $HASP290 MEMBER SC47 -- JES2 CKPT1 IXLLIST LOCK REQUEST FAILURE 610 *** CHECKPOINT DATA SET NOT DAMAGED BY THIS MEMBER *** RETURN CODE = 0000000C REASON CODE = 0C080C06 RECORD = UNKNOWN $HASP285 JES2 CHECKPOINT RECONFIGURATION STARTING $HASP233 REASON FOR JES2 CHECKPOINT RECONFIGURATION IS CKPT1 I/O 611 ERROR(S) ON 1 MEMBER(S) $HASP285 JES2 CHECKPOINT RECONFIGURATION STARTED - DRIVEN BY 612 MEMBER SC42 MEMBER SC42 *$HASP273 JES2 CKPT1 DATA SET WILL BE ASSIGNED TO NEWCKPT1 STRNAME 270 JES2CKPT_1 VALID RESPONSES ARE: CONT - PROCEED WITH ASSIGNMENT TERM - TERMINATE MEMBERS WITH I/O ERROR ON CKPT1 DELETE - DISCONTINUE USING CKPT1 CKPTDEF (NO OPERANDS) - DISPLAY MODIFIABLE SPECIFICATIONS CKPTDEF (WITH OPERANDS) - ALTER MODIFIABLE SPECIFICATIONS *161 $HASP272 ENTER RESPONSE R 161,CONT IEE600I REPLY TO 161 IS;CONT $HASP280 JES2 CKPT1 DATA SET (STRNAME JES2CKPT_1) IS NOW IN USE *$HASP256 FUTURE AUTOMATIC FORWARDING OF CKPT1 IS SUSPENDED UNTIL 275 NEWCKPT1 IS RESPECIFIED. ISSUE $T CKPTDEF,NEWCKPT1=(...) TO RESPECIFY $HASP255 JES2 CHECKPOINT RECONFIGURATION COMPLETE
202
JES2 issues a message to the operator to suspend or confirm the use of the structure as a checkpoint data set ( VOLATILE=(ONECKPT=WTOR) ). JES2 automatically enters the Checkpoint Reconfiguration Dialog ( VOLATILE=(ONECKPT=DIALOG) ). JES2 just ignores the volatility state of the structure ( VOLATILE=(ONECKPT=IGNORE) ).
OPERLOG LOGREC
All recovery decisions resulting from a coupling facility failure affecting a logstream structure will be made by the system logger itself or the system operator. The system logger exploiters will either suspend their processing or switch to an alternate logging solution, if any, while the logstream recovery is in process.
203
VARY OPERLOG,HARDCPY,OFF
This issued at each occurrence of OPERLOG.
SETLOGRC DATASET
This is issued at each occurrence of LOGREC. This works only if a LOGREC data set name was specified in IEASYSxx. at the last IPL. The recommendation is to have a LOGREC data set defined in IEASYSxx.
For other exploiters of the system logger, if any, refer to the specific operating procedures for that product.
204
The tape devices which are not allocated are taken OFFLINE. The operator can vary them back online, but they will be dedicated to the single system they are online to. The tape devices which are allocated are kept online, but become dedicated to the systems that had them allocated.
If another instance of IEFAUTOS can eventually be created, with the proper connectivity, reconnection to the structure is automatically performed and the tape devices will again be sharable.
205
206
Note: If XCF loses access to all sysplex couple data sets, the system enters a nonrestartable wait state. If the parallel sysplex is running with only one sysplex couple data set, and that volume fails, then all systems in the sysplex enter nonrestartable wait states. Consideration should be given to automating the activation of the spare couple data set, triggered by the issuing of the message indicating there is no alternate couple data set:
IXC267I PROCESSING WITHOUT AN ALTERNATE COUPLE DATA SET FOR typename ISSUE SETXCF COMMAND TO ACTIVATE A NEW ALTERNATE
9.13.2 Coupling Facility Resource Manager (CFRM) Couple Data Set Failure
When a system loses access to the CFRM couple data set, the system enters a nonrestartable wait state. Loss of access to the CFRM policy implies that XES cannot ensure that connectors on this system are in a consistent state.
Any system in the sysplex at a pre-SP5.1.0 level will disable SFM for the entire sysplex. Any system in the sysplex without connectivity to SFM couple data set will disable SFM for the entire sysplex. Not having a started SFM policy will disable SFM for the entire sysplex.
Loss of access to SFM policy by any system in the sysplex results in the inactivation of SFM policy. The sysplex reverts to pre-SP5.1.0 mechanisms for failure processing.
IWM012E POLICY ACTIVATION FAILED, WLM COUPLE DATA SET NOT AVAILABLE
207
IXC253I PRIMARY COUPLE DATA SET xxxxxx FOR ARM IS BEING REMOVED BECAUSE OF AN I/O ERROR DETECTED BY SYSTEM sysname IXC263I REMOVAL OF THE PRIMARY COUPLE DATA SET xxxxxx FOR ARM IS COMPLETE *IXC267I PROCESSING WITHOUT AN ALTERNATE COUPLE DATA SET FOR ARM ISSUE SETXCF COMMAND TO ACTIVATE A NEW ALTERNATE.
If access to the only ARM couple data set is lost, Automatic Restart Manager services are not available until a primary couple data set is made active. Automatic Restart Manager does not cause the system to enter a wait state when both of its couple data sets are lost. In this case, the elements registered with Automatic Restart Manager will be deregistered. The following messages may be issued:
IXC808I ELEMENTS FROM TERMINATED SYSTEM sysname WERE NOT PROCESSED BY THIS SYSTEM. ARM COUPLE DATA SET IS NOT AVAILABLE TO THIS SYSTEM
In this case, the system that issued ARM couple data set. Therefore, it manager elements, if any, from the in the sysplex can restart elements
this message does not have access to the cannot initiate restarts of automatic restart failed system. The other remaining systems from the failed system.
IXC809I ELEMENTS REGISTERED ON SYSTEM sysname WERE DEREGISTERED DUE TO LOSS OF ACCESS TO THE ARM COUPLE DATA SET
The system issuing this message has lost access to the ARM couple data set. All elements running on this system will be deregistered by other systems in the sysplex that have access to the ARM couple data set. The deregistered programs will continue to run. Once an ARM couple data set has been available, the elements can not be reregistered without ending and restarting their jobs or started tasks.
208
The table summarizes which couple data sets are essential for
Effect of Loss Switch to alternate XCF automatically switches to the available alternate, and issues message:
IXC267I PROCESSING WITHOUT AN ALTERNATE COUPLE DATA SET FOR typename. ISSUE SETXCF COMMAND TO ACTIVATE A NEW ALTERNATE.
Sysplex (XCF) Primary
WAIT0A2-10 XCF enters a nonrestartable wait state when it loses access to all couple data sets XCF issues message:
IXC267I PROCESSING WITHOUT AN ALTERNATE COUPLE DATA SET FOR typename. ISSUE SETXCF COMMAND TO ACTIVATE A NEW ALTERNATE.
CFRM Primary
No active alternate available Sysplex Failure Management is disabled for the entire sysplex. Sysplex reverts to pre-SP510 mechanisms for failure processing. WLM switches into independent state, and uses only local data. No data is transmitted to other members of the sysplex. Processing continues but ARM services are no longer available until a new primary Couple Data Set is made active. Logger terminates.
SFM
WLM
ARM
209
machine TOD instead of the ETR. When the sysplex timer is repaired, this system will then synchronize itself with the ETR. The other systems will then be allowed to IPL into the sysplex.
The IMS or IRLM fails within a system in the sysplex. The CEC or MVS image fails. A coupling facility fails.
The way in which support is provided in IMS V5 to handle coupling facility failures is discussed in 9.8.1.1, No Active SFM Policy, or an Active Policy with CONNFAIL(NO) on page 197
210
211
Therefore, if a failed CICS is restarted within a predefined time interval, it can use the retained sessions immediately and there is no need for networks flows to rebind them. The CICS sessions are held by VTAM in the recovery pending state and may be recovered during the emergency restart of CICS. There are some instances where it is not possible to reestablish a pre-existing session, such as:
Performing a COLD start after the CICS failure Toleration interval has expired VTAM, MVS or CPC failure
The application failure is recognized by either the end-of-task (EOT) or end-of-memory (EOM) resource manager that was established by the MVS system logger on behalf of the connector. The system logger insures that all logstream data written to a logstream to which a connection still exits is flushed from the coupling facility and written to DASD. After all data is flushed to DASD, the system logger automatically disconnects the application from any logstreams to which the application is still connected.
212
Multisystem sysplex when other logstream connections exist: Other instances of the MVS system logger in the sysplex are notified of the failure. The surviving instances of the MVS system logger coordinate among themselves to migrate logstream data that was not yet written to DASD by the failed system. Multisystem sysplex when no other logstream connections exist: Data still resident in the coupling facility continues to exist in the coupling facility. When another instance of the application connects to the logstream, it has access to the coupling facility data.
213
MVS/ESA operator modify command. The activation is done by the standby controller itself on a signal from XCF; ARM is not involved. If you do not have a standby controller, you will have to start one in order to continue running batch production.
214
215
Peer-to-Peer Remote Copy provides a mechanism for synchronous copying of data to the remote site, which means that no data is lost between the time of the last backup at the application system and the time of the recovery at the remote site. The impact on performance must be evaluated, since an application write to the primary subsystem is not considered complete until the data has also been transferred to the remote subsystem. Figure 43 on page 217 shows a sample Peer-to-Peer Remote Copy configuration. The Peer-to-Peer Remote Copy implementation requires ESCON links between the primary site 3990 and the remote (recovery) site 3990.
Extended Remote Copy (XRC) Extended Remote Copy provides a mechanism for asynchronous copying of data to the remote site and only data that is in transit between the failed application system and the recovery site is lost. Note that in general, the delay in transmitting the data from the primary subsystem to the recovery subsystem is measured in seconds. Figure 44 on page 218 shows a sample Extended Remote Copy configuration. The Extended Remote Copy implementation involves the transfer of data between the primary subsystem and the recovery subsystem under the control of a DFSMS/MVS host system, which can exist at the primary site, at the recovery site, or anywhere in between.
The 3990 Remote Copy solutions are data-independent; that is, beyond the performance considerations, there is no restriction on the data that can be mirrored at a remote site using these solutions.
Database Level Tracking With this level of support, the database is shadowed at the remote site, thus eliminating the need to recover the databases in the event of a primary site outage.
216
Recovery Level Tracking With this level of support, the databases are not shadowed. The logs are transmitted electronically to the remote site, and the databases must be recovered as part of the disaster recovery process.
RSR supports the recovery of IMS full function databases, Fast Path DEDBs, the IMS message queues and the telecommunications network. For more information on the IMS Remote Site Recovery feature, refer to the following sources:
MKTTOOLS packages
IMS5RSR IMS/ESA Remote Site Recovery Overview IMSRSR IMS Remote Site Recovery
217
Note that before the above rerouting of transactions can be effective, you must have addressed the issues of getting the databases available on the recovery site through the techniques discussed previously in 10.2.2, IMS Remote Site Recovery on page 216 and 10.2.1, 3990 Remote Copy on page 215 or some other technique. To provide for high availability for the TOR, you would want to either start the TOR on the alternate CPU or perhaps issue a VTAM command to install an alternate USS table which would direct 3270 logons to the alternate TOR. CICSPlex SM can trigger automation software by issuing an alert or console message when the TOR fails. Note that it is still necessary to consider the recovery at the remote site of the application data. Data sharing can only occur within a sysplex, but features such as IMS RSR described above, can significantly reduce the recovery time required for databases.
218
Obviously you must also have a coupling facility at the recovery site. DB2 does not provide a utility for remote site database shadowing. Transportation of required recovery information, such as logs, image copies and so on, to the remote site is manual. For more information on DB2 disaster recovery, refer to DB2 Data Sharing: Planning and Administration , SC26-3269.
219
220
221
A.2.1 LOADAA
IODF NUCLEUS SYSCAT NUCLST IEASYM SYSPLEX 37 SYS6 L06RMVS1 01 1 TOTCAT113CCATALOG.TOTICFM.VTOTCAT AA (AA,L) WTSCPLX1 X
LOADxx m e m b e r for all systems in sysplex WTSCPLX1
IEASYM It defines one or more suffixes of the IEASYMxx members of PARMLIB that are to be used. The IEASYMxx member is used to define the static system symbols. IEASYMxx is the only place where installations can define static system symbols. These will be required to keep management effort of cloning new systems to a minimum. L indicates that a list of member names is displayed in message IEA900I during IPL.
SYSPLEX This statement defines the name of the sysplex in which the system participates. It is also the substitution text for the &SYSPLEX system symbol. Any non blank character, (in this case X), should be coded after the name to tell the system to issue message IXC2171 and prompt the operator to respecify the suffix of COUPLExx if the &SYSCLONE system symbol is not unique for every system in the sysplex.
222
A.3.1 IEASYMAA
Figure 49 contains the IEASYMAA member for the sample sysplex. This is the member pointed to by the LOADxx member.
SYSDEF
SYSDEF
SYSDEF
SYSDEF
SYSDEF
SYMDEF(&CLNLST= AA ) /* CLONING NAME */ SYSCLONE(&SYSNAME(3:2)) SYMDEF(&IEASYSP= AA ) SYMDEF(&APPCLST1=&SYSCLONE. ) SYMDEF(&CMDLIST1=&SYSCLONE.,00 ) SYMDEF(&LNKLIST2= 0 0 ) SYMDEF(&LPALIST2= 0 0 ) SYMDEF(&LPALSTJ1= AA ) SYMDEF(&MLPALST1= AA ) SYMDEF(&RSULST01= 0 0 ) SYMDEF(&SSNLST01= AA ) HWNAME(ITSO942A) LPARNAME(T5) SYSPARM(AA) SYSNAME(SC47) SYMDEF(&MLPALST1= AA,AB ) HWNAME(P101) LPARNAME(A1) SYSPARM(AA) SYSNAME(SC52) SYMDEF(&APPCLST1= AA ) SYMDEF(&CMDLIST1= AA,00 ) HWNAME(P101) LPARNAME(A2) SYSPARM(AA) SYSNAME(SC53) SYMDEF(&APPCLST1= AA ) SYMDEF(&CMDLIST1= AA,00 ) HWNAME(P201) LPARNAME(A1) SYSPARM(AA) SYSNAME(SC42) SYMDEF(&APPCLST1= AA ) SYMDEF(&CMDLIST1= AA,00 )
The first SYSDEF statement is global . The value parameters will apply to all systems in the sysplex. The remaining SYSDEF statements are local . The value parameters here apply only to the system that HWNAME or LPARNAME identifies. In the previous example, system SC47 will use command list members COMMND47 and COMMND00. The remaining systems use COMMNDAA and COMMND00. SYSPARM specifies that member IEASYSAA is to be concatenated with the default IEASYS00.
223
ALLOC=00, ALLOCATION DEFAULTS CLOCK=00, TOD CLOCK INITIALIZATION CMB=(UNITR,COMM,GRAPH,CHRDR), ADDITIONAL CMB ENTRIES CON=(00), CONSOLE DEFINITIONS COUPLE=00, COUPLE DEFINITIONS CSA=(2048,20480), MVS/XA CSA RANGE DIAG=01, CSA/SQA TRACING DUMP=NO, DYNAMIC ALLOCATION ACTIVE (COMMND00) FIX=00, FIX MODULES SPECIFIED /*J3*/ GRS=TRYJOIN, LETS GET THIS BABY GOING GRSCNF=00, GRS CONFIG DEFINITIONS GRSRNL=01, GRS RNLS DEFINITIONS ICS=00, SELECT IEAICS00 INSTALL CNTL SPECS FOR SRM LNK=00, SPECIFY LNKLST00 LNKAUTH=APFTAB, LINKLIST APF AUTHORIZATION VIA APFTAB LPA=00, SELECT LPALST00 CONCATENATED LPA LIBRARY LOGCLS=Y, WILL NOT BE PRINTED BY DEFAULT LOGLMT=999999, MUST BE 6 DIGITS, MAX WTL MESSAGES QUEUED LOGREC=LOGSTREAM, LOGREC GOES TO LOGR LOGSTREAM MAXCAD=25, CICSPLEX CMAS NUMBER OF COMMON DSPACES MAXUSER=250, (SYS TASKS + INITS + TSOUSERS) MLPA=02, SELECT IEALPA02 MODULES LOADED INTO PLPA MSTRJCL=01, MSTJCL WITHOUT UADS & WITH IEFJOBS NSYSLX=55, CICSPLEX CAS/ESSS LINKAGE INDEXES OPI=YES, ALLOW WOL OVERRIDE TO IEASYS00 OPT=00, SPECIFY IEAOPT00 (SRM TUNING PARMETERS) PAGE=(PAGE.&SYSNAME..PLPA, PLPA PAGE DATA SET PAGE.&SYSNAME..COMMON, COMMON PAGE DATA SET PAGE.&SYSNAME..LOCAL1,L), LOCAL PAGE DATA SET PAGTOTL=(8,3), ALLOW ADDITION 5 PAGE D/S AND 3 SWAP D/S PAK=00, IEAPAK00 PLEXCFG=(MULTISYSTEM,OPI=NO), MULTI-SYSTEM SYSPLEX ONLY PROG=(00), DYNAMIC APF REAL=512, ALLOWS 2 64K JOBS OR 1 128K JOB TO RUN V=R RSVSTRT=25, RESERVED ASVT ENTRIES DEFAULT RSVNONR=25, RESERVED ASVT ENTRIES DEFAULT SCH=00, SCHEDULER LIST SCHED00 SMF=00, SELECT SMFPRM00, SMF PARMETERS SMS=00, SMS PARAMETER SQA=(3,18), MVS/XA SQA APPROX 1MB SSN=00, SUBSYSTEM INITIALIZATION NAMES SVC=00, SVC TABLE IEASVC00 VAL=00, SELECT VATLST00 DEFAULT VIODSN=SYS1.&SYSNAME..STGINDEX, DATASET NAME FOR STGINDEX-DS VRREGN=512 DEFAULT REAL-STORAGE REGION SIZE DEFAULT
Figure 50. IEASYS00
Figure 51 on page 225 contains the IEASYSAA member for the sample sysplex.
224
The LNK statement in IEASYSAA resolves to a concatenation of LNKLST00, as specified in the global SYSDEF statement in the IEASYMAA memeber in Figure 49 on page 223.
SYMDEF(&LNKLIST1= 0 0 )
The other statements resolve in the same manner.
225
A.3.3 COUPLE00
Figure 52 contains the COUPLE00 member for the sample sysplex.
COUPLE SYSPLEX(WTSCPLX1) PCOUPLE(SYS1.XCF.CDS10) ACOUPLE(SYS1.XCF.CDS20) INTERVAL(85) OPNOTIFY(85) CLEANUP(30) MAXMSG(500) RETRY(10) CLASSLEN(1024) /* DEFINITIONS FOR CFRM POLICY */ DATA TYPE(CFRM) PCOUPLE(SYS1.XCF.CFRM1X) ACOUPLE(SYS1.XCF.CFRM2X) /* DATASETS FOR SFM POLICY */ DATA TYPE(SFM) PCOUPLE(SYS1.XCF.SFM10) ACOUPLE(SYS1.XCF.SFM20) /* DATASETS FOR WLM POLICY */ DATA TYPE(WLM) PCOUPLE(SYS1.XCF.WLM10) ACOUPLE(SYS1.XCF.WLM20) /* DATASETS FOR LOGR POLICY */ DATA TYPE(LOGR) PCOUPLE(SYS1.XCF.LOGR10) ACOUPLE(SYS1.XCF.LOGR20) /* DATASETS FOR ARM POLICY */ DATA TYPE(ARM) PCOUPLE(SYS1.XCF.ARM10) ACOUPLE(SYS1.XCF.ARM1X) /* LOCAL XCF MESSAGE TRAFFIC */ LOCALMSG MAXMSG(512) CLASS(DEFAULT) /* PATH DEFINITIONS FOR DEFAULT SIGNALLING */ PATHIN DEVICE(4010,4020,4030,4040,4050) PATHIN DEVICE(4018,4028,4038,4048,4058) PATHOUT DEVICE(5010,5020,5030,5050) PATHOUT DEVICE(5018,5028,5038,5058) PATHOUT STRNAME(IXC_DEFAULT_1,IXC_DEFAULT_2) PATHIN STRNAME(IXC_DEFAULT_1,IXC_DEFAULT_2)
Figure 52. COUPLE00
In the previous example the naming convention for the CTC device numbering is based on XYYZ, where:
X is 4 for inbound CTCs (PATHIN) or 5 for outbound CTCs (PATHOUT) YY corresponds to the MVS system image number. Z indicates to which of two ESCON directors the CTC is associated.
226
M=J2G,M1=J2L&SYSCLONE, N=SYS1,L=LINKLIB,U=,PN=SYS1,PL=PARMLIB, PROC00= SYS1.PROCLIB , PROC01= ESA.SYS1.PROCLIB , OPCSTC= OPCESA.V1R3M0.COMMON.STC PGM=HASJES20,TIME=1440,DPRTY=(15,15) DDNAME=IEFRDER UNIT=&U,DSN=&PN..&PL(&M),DISP=SHR UNIT=&U,DSN=&PN..&PL(&M1),DISP=SHR DSN=&OPCSTC,DISP=SHR DSN=&PROC00,DISP=SHR DSN=&PROC01,DISP=SHR DSN=&PROC01,DISP=SHR DSN=&PROC00,DISP=SHR DSN=&N..&L,DISP=SHR
This member uses both system symbolic substitution and JCL symbolic substitution to point to the appropriate members for JES2 initialization:
J2G for global initialization parameters J2L.&SYSCLONE for specific system parameters, where &SYSCLONE resolves to the last two characters of the SYSNAME parameter specified in IEASYMAA, (see Figure 49 on page 223).
227
A.3.5 J2G
Figure 54 contains the JES2 initialization member for the sample sysplex.
LOGON(1) APPLID=WTSC&SYSNAME. LOGON(2) APPLID=&SYSNAME.RJE SPOOLDEF DSNAME=SYS1.HASPACE, VOLUME=TOTSP, BUFSIZE=3992, FENCE=NO, SPOOLNUM=32, TGBPERVL=5, TGSIZE=30, TGSPACE=(MAX=48864,WARN=80), TRKCELL=6 MASDEF AUTOEMEM=ON, CKPTLOCK=ACTION, HOLD=0, SHARED=CHECK, XCFGRPNM=XCFJES2A RESTART=YES, LOCKOUT=1000, DORMANCY=(0,100), SYNCTOL=120,
/* SYMBOLIC */ /* SYMBOLIC */
/* CONTENTION */
CKPTDEF CKPT1=(STRNAME=JES2CKPT_1,INUSE=YES), CKPT2=(DSN=SYS1.JES2.CKPT1,VOL=TOTSM1,INUSE=YES), NEWCKPT1=(STRNAME=JES2CKPT_2), NEWCKPT2=(DSN=SYS1.JES2.CKPT2,VOL=TOTPD0), MODE=DUPLEX, DUPLEX=ON, APPLCOPY=NONE, OPVERIFY=NO, VOLATILE=(ONECKPT=DIALOG,ALLCKPT=DIALOG) MEMBER(1) MEMBER(2) MEMBER(3) MEMBER(4) NJEDEF NAME=SC47 NAME=SC52 NAME=SC53 NAME=SC42
DELAY=300, HDRBUF=(LIMIT=100,WARN=50), JRNUM=1, JTNUM=1, SRNUM=7, STNUM=7, LINENUM=40, MAILMSG=YES, MAXHOP=0, NODENUM=999, OWNNODE=1, PATH=1, RESTMAX=0, RESTNODE=100, RESTTOL=0, TIMETOL=0 PATHMGR=YES, PATHMGR=YES, PATHMGR=NO, PATHMGR=YES, SUBNET=LOCAL SUBNET=WTSCNET SUBNET=WTSCPOK SUBNET=WTSCTEST
N1 N4 N5 N6
228
DESTID(MVS3827) DEST=WTSCMXA.U11 DESTID(MVS3900) DEST=WTSCMXA.LOCAL JOBDEF ACCTFLD=REQUIRED, JOBNUM=2000, PRTYHIGH=10, PRTYJECL=YES, RANGE=(1000-30000) COPIES=128, AUTOCMD=50, CONCHAR=$, SCOPE=SYSTEM, RDRCHAR=$ DSN=SYS1.JES2.OFFLOAD1 DSN=SYS1.JES2.OFFLOAD2 DSN=SYS1.JES2.OFFLOAD3 DSN=SYS1.JES2.OFFLOAD4 WS=(/) DISP=KEEP,WS=(/) WS=(/) DISP=KEEP,WS=(/) WS=(/) DISP=KEEP,WS=(/) WS=(/) DISP=KEEP,WS=(/) WS=(/) DISP=KEEP,WS=(/) WS=(/) DISP=KEEP,WS=(/) WS=(/) DISP=KEEP,WS=(/) WS=(/) DISP=KEEP,WS=(/) NIFCB=STD3, NIUCS=GF15, JCLERR=YES, JNUMWARN=60, PRTYLOW=5, PRTYJOB=YES, JOENUM=10000, BUFNUM=200, DISPLEN=56, JOBWARN=60, PRTYRATE=144, JOEWARN=70 BUFWARN=60,
OUTDEF CONDEF
OFFLOAD1 OFFLOAD2 OFFLOAD3 OFFLOAD4 OFF1.JR OFF1.JT OFF1.SR OFF1.ST OFF2.JR OFF2.JT OFF2.SR OFF2.ST OFF3.JR OFF3.JT OFF3.SR OFF3.ST OFF4.JR OFF4.JT OFF4.SR OFF4.ST
229
TPDEF
BUFDEF
SMFDEF
TSUCLASS AUTH=ALL, BLP=YES, LOG=NO, REGION=2M, MSGCLASS=S, SWA=ABOVE, CONDPURG=NO STCCLASS AUTH=ALL, BLP=NO, LOG=NO, REGION=2M, MSGCLASS=S, SWA=ABOVE, TIME=(1440,0) JOBCLASS(A-Z) AUTH=ALL, BLP=YES, COMMAND=DISPLAY, JOURNAL=NO, RESTART=NO, MSGLEVEL=(1,1), REGION=2M, SWA=ABOVE, TIME=(450,00) JOBCLASS(0-9) AUTH=ALL, BLP=YES, COMMAND=DISPLAY, JOURNAL=YES, RESTART=YES, MSGLEVEL=(1,1), REGION=2M, SWA=ABOVE, TIME=(450,00) JOBPRTY1 JOBPRTY2 JOBPRTY3 JOBPRTY4 JOBPRTY5 JOBPRTY6 JOBPRTY7 JOBPRTY8 JOBPRTY9 INTRDR PRIORITY=9,TIME=1 PRIORITY=8,TIME=2 PRIORITY=7,TIME=4 PRIORITY=6,TIME=8 PRIORITY=5,TIME=16 PRIORITY=4,TIME=32 PRIORITY=3,TIME=64 PRIORITY=2,TIME=128 PRIORITY=1,TIME=256
OUTPUT=YES,
LOG=YES,
LOG=YES,
230
PRINTER1 FSS=FSS382C, MODE=FSS, PRMODE=(PAGE,LINE), CLASS=IU, UCS=0, CKPTPAGE=100, MARK=YES, START=NO, ROUTECDE=U10 PRINTER3 FSS=FSS382B, MODE=FSS, PRMODE=(PAGE,LINE), CLASS=IU, UCS=0, CKPTPAGE=100, MARK=YES, START=NO, ROUTECDE=U12 /* PRINTER9 START=NO,UNIT=831,UCS=P11,FCB=STD2,CLASS=U */ PUNCH1 PUNCH2 PUNCH3 PUNCH4 START=NO START=NO START=NO START=NO START=NO START=NO START=NO START=NO
ESTLNCT NUM=25,INT=10000,OPT=0 ESTIME NUM=10,INT=10,OPT=NO OUTCLASS(A) OUTCLASS(B) OUTCLASS(C) OUTCLASS(D) OUTCLASS(E-J) OUTCLASS(K) OUTCLASS(L) OUTCLASS(M-R) OUTCLASS(S-T) OUTCLASS(U-W) OUTCLASS(X) OUTCLASS(Y) OUTCLASS(Z) OUTCLASS(0-9) OUTPRTY1 OUTPRTY2 OUTPRTY3 OUTPRTY4 OUTPRTY5 OUTPRTY6 OUTPRTY7 OUTPRTY8 OUTPRTY9 OUTPUT=PUNCH OUTDISP=(HOLD,HOLD) OUTPUT=PUNCH OUTDISP=(HOLD,HOLD) OUTDISP=(HOLD,HOLD) OUTDISP=(HOLD,HOLD) OUTPUT=DUMMY,OUTDISP=(PURGE,PURGE),TRKCELL=NO RECORD=600, RECORD=1200, RECORD=3000, RECORD=12000, RECORD=15000, RECORD=20000, RECORD=25000, RECORD=30000, RECORD=40000, PAGE=10 PAGE=20 PAGE=50 PAGE=200 PAGE=250 PAGE=300 PAGE=350 PAGE=400 PAGE=500
231
A.3.6 J2L42
Figure 55 contains the JES2 initialization member that is unique for a member of the sysplex.
INITDEF PARTNUM=20 INIT(1) INIT(2) INIT(3) INIT(4) INIT(5) INIT(6) INIT(7) INIT(8) INIT(9) INIT(10) INIT(11) INIT(12) INIT(13) INIT(14) INIT(15) INIT(16) INIT(17) INIT(18) INIT(19) INIT(20) NAME=A, NAME=A, NAME=A, NAME=A, NAME=A, NAME=B, NAME=B, NAME=B, NAME=C, NAME=C, NAME=C, NAME=D, NAME=D, NAME=D, NAME=E, NAME=E, NAME=E, NAME=Z, NAME=Z, NAME=Z, CLASS=ABCDE, CLASS=ABCDE, CLASS=ABCDE, CLASS=ABCDE, CLASS=ABCDE, CLASS=01234, CLASS=01234, CLASS=56789, CLASS=A, CLASS=A, CLASS=A, CLASS=ABCDEFG, CLASS=ABCDEFG, CLASS=ABCDEFG, CLASS=0123456789, CLASS=0123456789, CLASS=0123456789, CLASS=S, CLASS=S, CLASS=S, START=YES START=YES START=YES START=YES START=YES START=NO START=NO START=NO START=NO START=NO START=NO START=NO START=NO START=NO START=NO START=NO START=NO START=YES START=YES START=YES
ATCSTRxx system specific configuration list. ATCCONxx system specific start list APCICxx system specific CICS definitions APNJExx system specific JES2/VTAM interface CDRMxx system specific CDRM members MPCxx system specific TRL local major node definitions
232
TRLxx system specific TRL definitions APAPPCAA global APPC application member APHCMAA global HCM application member APISPFAA global ISPF application member APTCPAA global TCP/IP application member APTSOAA global TSO application member ECHOAA global ECHO member
The global members utilize symbolic substitution. Therefore when cloning new MVS images into the parallel sysplex no changes need to be made to these members. One example of such a member, APAPPCAA, is included in the following. All system specific members have been included so as to relate to 7.1, Adding a New MVS Image on page 149.
A.4.1 ATCSTR42
******************************************************************** * START LIST FOR VTAM IN IMG03/SC42 * ******************************************************************** CONFIG=42, SSCPID=42, NOPROMPT, SSCPNAME=SC42M, NETID=USIBMSC, HOSTSA=42, NODETYPE=EN, CONNTYPE=APPN, APPNCOS=#INTER, DEFAULT APPN COS CPCP=YES, SUPP=NOSUP, IOPURGE=180, HOSTPU=SC42MPU, PPOLOG=YES, DYNLU=YES, CRPLBUF=(208,,15,,1,16), IOBUF=(182,440,19,,8,48), LPBUF=(9,,0,,6,1)
Figure 56. ATCSTR42
X X X X X X X X X X X X X X X X X
233
A.4.2 ATCCON42
********************************************************************** * CONFIG LIST FOR IMG03/SC42 * ********************************************************************** PATH42, PATH DECK TRL23, TRL DEFINITIONS MPC23, MPC LOCAL MAJOR NODE CDRM42, CDRMS ECHOAA, ECHO APPL APTSOAA, TSO APPLICATION APISPFAA, ISPF APPC LU APNJE42, JES2/VTAM INTERFACE APCIC42, CICS AND CPSM ACPPLICATIONS COSAPPN, DEFAULT APPN COS TABLE APPNTGP, TRANSMISSION GROUP PROFILE FOR APPN APAPPCAA, APPC LU APTCPAA TCP/IP TELNET TERMINALS CULN8A0
Figure 57. ATCCON42
X X X X X X X X X X X X X
234
A.4.3 APCIC42
*************************************************** * * * CICS DEFINITIONS * * * *************************************************** VBUILD TYPE=APPL CICSPAC1 APPL AUTH=(ACQ,VPACE,PASS),VPACING=0,EAS=5000,PARSESS=YES, SONSCIP=YES CICSPAC2 APPL AUTH=(ACQ,VPACE,PASS),VPACING=0,EAS=5000,PARSESS=YES, SONSCIP=YES CICSPAC3 APPL AUTH=(ACQ,VPACE,PASS),VPACING=0,EAS=5000,PARSESS=YES, SONSCIP=YES CICSPAC4 APPL AUTH=(ACQ,VPACE,PASS),VPACING=0,EAS=5000,PARSESS=YES, SONSCIP=YES CICSPFC1 APPL AUTH=(ACQ,VPACE,PASS),VPACING=0,EAS=5000,PARSESS=YES, SONSCIP=YES CICSPFC2 APPL AUTH=(ACQ,VPACE,PASS),VPACING=0,EAS=5000,PARSESS=YES, SONSCIP=YES CICSPFC3 APPL AUTH=(ACQ,VPACE,PASS),VPACING=0,EAS=5000,PARSESS=YES, SONSCIP=YES CICSPFC4 APPL AUTH=(ACQ,VPACE,PASS),VPACING=0,EAS=5000,PARSESS=YES, SONSCIP=YES * CICSPCC1 APPL AUTH=(ACQ,VPACE,PASS,SPO),EAS=10,PARSESS=YES,APPC=NO, ACBNAME=CICSPCC1,VPACING=5, SONSCIP=YES * CAS42 APPL AUTH=(ACQ), ACBNAME=CAS42, PARSESS=YES, MODETAB=EYUSMPMT
Figure 58. APCIC42
X X X X X X X X
X X
X X X
A.4.4 APNJE42
*************************************************** * * * JES2/VTAM INTERFACE * * * *************************************************** VBUILD TYPE=APPL WTSCSC42 APPL AUTH=(ACQ),EAS=5,ACBNAME=WTSCSC42,VPACING=7, MODETAB=NJETAB,DLOGMOD=PKNJE77
Figure 59. APNJE42
235
A.4.5 CDRM42
*********************************************************************** * CDRMS FOR IMG03 SC42 * *********************************************************************** CDRMSC VBUILD TYPE=CDRM NETWORK NETID=USIBMSC * SC42M CDRM SUBAREA=42, SC42 * CDRDYN=YES, * CDRSC=OPT, * ISTATUS=ACTIVE SC47M CDRM SUBAREA=47, SC47 * CDRDYN=YES, * CDRSC=OPT, * ISTATUS=INACTIVE SC52M CDRM SUBAREA=52, SC52 * CDRDYN=YES, * CDRSC=OPT, * ISTATUS=INACTIVE SC53M CDRM SUBAREA=53, SC53 * CDRDYN=YES, * CDRSC=OPT, * ISTATUS=INACTIVE
Figure 60. CDRM42
A.4.6 MPC03
*********************************************************************** * TRL LOCAL MAJOR NODE FOR IMG03 * *********************************************************************** TRL03L VBUILD TYPE=LOCAL TRL0305P PU TRLE=MPC0305,ISTATUS=ACTIVE,VPACING=0, X SSCPFM=USSSCS,CONNTYPE=APPN,CPCP=YES
Figure 61. MPC03
Note: The network node, in this example image 05, will require PU statements for all other images in the sysplex.
A.4.7 TRL03
*********************************************************************** * VTAM TRL DEFINITIONS FOR IMG03 * *********************************************************************** TRL03 VBUILD TYPE=TRL MPC0305 TRLE LNCTL=MPC,MAXBFRU=5, * READ=(4053,405B),WRITE=(5053,505B)
Figure 62. TRL03
236
Note: The network node, in this example image 05, will require TRLE statements for all other images in the sysplex.
A.4.8 APAPPCAA
APAPPC&SYSCLONE VBUILD TYPE=APPL * SC&SYSCLONE.APPC APPL ACBNAME=SC&SYSCLONE.APPC, APPC=YES, AUTOSES=10, DDRAINL=NALLOW, DMINWNL=2, DMINWNR=2, DRESPL=NALLOW, DSESLIM=10, EAS=509, MODETAB=MTAPPC, SECACPT=CONV, SRBEXIT=YES, VERIFY=NONE, VPACING=2 SC&SYSCLONE.SRV APPL ACBNAME=SC&SYSCLONE.SRV, APPC=YES, AUTOSES=10, DDRAINL=NALLOW, DMINWNL=2, DMINWNR=2, DRESPL=NALLOW, DSESLIM=10, EAS=509, MODETAB=MTAPPC, SECACPT=ALREADYV, SRBEXIT=YES, VERIFY=NONE, VPACING=2 SC&SYSCLONE.SGT APPL ACBNAME=SC&SYSCLONE.SGT, APPC=YES, AUTOSES=10, DDRAINL=NALLOW, DMINWNL=2, DMINWNR=2, DRESPL=NALLOW, DSESLIM=10, EAS=509, MODETAB=MTAPPC, SECACPT=ALREADYV, SRBEXIT=YES, VERIFY=NONE, VPACING=20
Figure 63. APAPPCAA
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
237
238
//******************************************************************** //* DEFINE STGINDEX DATASET * //******************************************************************** //STGDEF EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * /* DEFINE STGINDEX */ DEFINE CLUSTER (NAME(SYS1.SC42.STGINDEX) CYLINDERS(5) VOLUME(MVS004) KEYS(12 8) BUFFERSPACE(20480) RECORDSIZE(2041 2041) REUSE) DATA (CONTROLINTERVALSIZE(2048)) INDEX (CONTROLINTERVALSIZE(4096)) /* //******************************************************************** //* DEFINE SMF DATASETS * //******************************************************************** //SMFDEF EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * DEFINE CLUSTER (NAME(SYS1.SC42.MAN1) CYLINDERS(40) VOLUME(MVS004) RECORDSIZE(26614 32767) NONINDEXED SPEED BUFFERSPACE(737280) SPANNED SHAREOPTIONS (2) REUSE CONTROLINTERVALSIZE(26624)) DEFINE CLUSTER (NAME(SYS1.SC42.MAN2) CYLINDERS(40) VOLUME(MVS004) RECORDSIZE(26614 32767) NONINDEXED SPEED BUFFERSPACE(737280) SPANNED SHAREOPTIONS (2) REUSE CONTROLINTERVALSIZE(26624))
Figure 64 (Part 2 of 3). Allocating System Specific Data Sets
239
DEFINE CLUSTER (NAME(SYS1.SC42.MAN3) CYLINDERS(40) VOLUME(MVS004) RECORDSIZE(26614 32767) NONINDEXED SPEED BUFFERSPACE(737280) SPANNED SHAREOPTIONS (2) REUSE CONTROLINTERVALSIZE(26624)) /* //******************************************************************* //* CLEAR THE SPECIFIED SMF DATASETS * //******************************************************************* //SMFCLR1 EXEC PGM=IFASMFDP //SYSPRINT DD SYSOUT=* //DUMPIN DD DSN=SYS1.SC42.MAN1,DISP=SHR //DUMPOUT DD DUMMY //SYSIN DD * INDD(DUMPIN,OPTIONS(CLEAR)) /* //SMFCLR2 EXEC PGM=IFASMFDP //SYSPRINT DD SYSOUT=* //DUMPIN DD DSN=SYS1.SC42.MAN2,DISP=SHR //DUMPOUT DD DUMMY //SYSIN DD * INDD(DUMPIN,OPTIONS(CLEAR)) /* //SMFCLR3 EXEC PGM=IFASMFDP //SYSPRINT DD SYSOUT=* //DUMPIN DD DSN=SYS1.SC42.MAN3,DISP=SHR //DUMPOUT DD DUMMY //SYSIN DD * INDD(DUMPIN,OPTIONS(CLEAR)) /*
Figure 64 (Part 3 of 3). Allocating System Specific Data Sets
240
D CF,CFNAME=CF1 IXL150I 23.01.31 DISPLAY CF 927 COUPLING FACILITY 009674.IBM.51.000000060041 PARTITION: 3 CPCID: 00 CONTROL UNIT ID: FFF6 NAMED CF1 COUPLING FACILITY SPACE UTILIZATION ALLOCATED SPACE DUMP SPACE UTILIZATION 1 STRUCTURES: 11264 K 3 STRUCTURE DUMP TABLES: 0 K 2 DUMP SPACE: 2048 K 4 TABLE COUNT: 0 FREE SPACE: 17920 K FREE DUMP SPACE: 2048 K TOTAL SPACE: 31232 K TOTAL DUMP SPACE: 2048 K 5 MAX REQUESTED DUMP SPACE: 0 K 6 VOLATILE: YES STORAGE INCREMENT SIZE: 256 K 7 CFLEVEL: 1 COUPLING FACILITY SPACE CONFIGURATION IN USE 8 CONTROL SPACE: 13312 K 9 NON-CONTROL SPACE: 0 K SENDER PATH 10 13 PHYSICAL ONLINE LOGICAL ONLINE SUBCHANNEL 0361 0362 FREE 17920 K 0 K STATUS VALID STATUS OPERATIONAL/IN USE OPERATIONAL/IN USE TOTAL 31232 K 0 K
Note:
1 Allocated space for STRUCTURES. This is the coupling facility processor storage currently used by the allocated structures. It is a multiple of STORAGE INCREMENT SIZE. 2 DUMP SPACE is the space reserved to capture structure dump data in
the coupling facility, before offloading them onto the dump data set. DUMP SPACE is the value given to DUMPSPACE in the CFRM policy active at structure allocation time, rounded to the next multiple of STORAGE INCREMENT SIZE.
3 STRUCTURE DUMP TABLES is the space currently allocated to captured structure dump data waiting to be offloaded onto dump data set.
Copyright IBM Corp. 1995
241
4 TABLE COUNT is the number of captured dumps still in the coupling facility dump space. 5 MAX REQUESTED DUMP SPACE. This is the maximum amount of dump space which has been requested to be assigned to a dump table. 6 VOLATILE. This is the volatility status of the coupling facility. 7 CFLEVEL is the CFCC level currently running in this coupling facility 8 CONTROL SPACE should be understood as Central Storage space in the coupling facility processor storage. 9 NON-CONTROL SPACE should be understood as Expanded Storage space in the coupling facility processor storage. 10 The following is information related to the status of the CFS CHPID(s) defined to this MVS image and connected to this coupling facility:
SENDER PATH is the CHPID number. PHYSICAL can be either: ONLINE OFFLINE This means that there is no physical CHPID assigned to this MVS image. This can result from a definition error or from the CHPID being offline due to a CF CHP command.
LOGICAL can be either: ONLINE OFFLINE This means that there is no path to the coupling facility associated to this CHPID. This can result from a malfunction or from the path being offline due to a V PATH command.
STATUS can be either: VALID MISCABLED This means either that the CFS CHPID is not connected to the coupling facility as it was defined during the configuration phase through HCD, or the HCD configuration phase did not complete properly. NOT OPERATIONAL FACILITY PAUSED This means that the last path validation attempted received a Facility Paused status, therefore the path is not operational. PATH NOT AVAILABLE This means that the last path validation attempted received a Path Not Available status, therefore the path is not operational.
242
D XCF,STR,STRNM=SYSTEM_LOGREC IXC360I 15.18.30 DISPLAY XCF 361 STRNAME: SYSTEM_LOGREC STATUS: ALLOCATED POLICY SIZE : 32256 K POLICY INITSIZE: 16128 K REBUILD PERCENT: N/A PREFERENCE LIST: CF01 CF02 EXCLUSION LIST IS EMPTY ACTIVE STRUCTURE ---------------ALLOCATION TIME: 10/16/95 12:52:08 CFNAME : CF02 COUPLING FACILITY: 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 01 ACTUAL SIZE : 17152 K STORAGE INCREMENT SIZE: 256 K VERSION : ABD445FB B991D202 DISPOSITION : DELETE ACCESS TIME : 0 MAX CONNECTIONS: 32 # CONNECTIONS : 9 CONNECTION NAME ---------------IXGLOGR_SC42 IXGLOGR_SC43 IXGLOGR_SC47 IXGLOGR_SC49 IXGLOGR_SC50 IXGLOGR_SC52 IXGLOGR_SC53 IXGLOGR_SC54 IXGLOGR_SC55 ID -02 03 01 08 05 04 09 07 06 VERSION -------00020D96 000309FB 00010077 0008005C 000502F9 0004041A 00090006 000700A3 00060230 SYSNAME -------SC42 SC43 SC47 SC49 SC50 SC52 SC53 SC54 SC55 JOBNAME -------IXGLOGR IXGLOGR IXGLOGR IXGLOGR IXGLOGR IXGLOGR IXGLOGR IXGLOGR IXGLOGR ASID ---00 0011 0011 0011 0011 0011 0011 0011 0011 STATE ---------------ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
1 2
3 4 5 6 7 8
Note:
1 POLICY SIZE is the maximum size which can be reached by the structure. This is the value given to the SIZE parameter in the CFRM policy active at structure allocation time, rounded to the next multiple of STORAGE INCREMENT SIZE. 2 POLICY INITSIZE is the size actually given to the structure at allocation time. This is the value of the INITSIZE parameter in the CFRM policy active at allocation time, rounded to the next multiple of STORAGE INCREMENT SIZE. Giving a value to INITSIZE implies that the user intends to use the ALTER function if necessary. 3 ACTUAL SIZE is the current size of the structure and is a multiple of the STORAGE INCREMENT SIZE. 243
4 VERSION is a pseudo random number generated by XES at structure allocation time. It uniquely identifies this instance of the structure. Rebuilding a structure, or deallocating/reallocating the structure will change the version number for the new instance of the structure. 5 DISPOSITION is DELETE or KEEP. This is set up by the first exploiter to connect to the structure (therefore at structure allocation) and this is not adjustable by policy parameter. 6 ACCESS TIME is the length of time (in tenths of second) the connectors can tolerate not having access to the structure because of a structure SVC dump being in progress. This value is set up by the first exploiter to connect to the structure and this is not adjustable by a policy parameter. In the case shown here, ACCESS TIME: 0 means that the structure cannot be included in a SVC dump. 7 MAX CONNECTIONS is the maximum total number of connections to this structure that can be active or failed-persistent at any point in time. This value is defined when formatting the CFRM data set with the IXCL1DSU. It pertains to all structures created in the sysplex while this couple data set is in use. 8 # CONNECTIONS is the current number of connectors to the structure. 9 The following is information relative to the current connections to the structure:
CONNECTION NAME is a value given by the connector internal code. CONNECTION ID is a value given by XES when granting the connection. CONNECTION VERSION is a pseudo random number which uniquely identifies the instance of the connection. SYSNAME, JOBNAME and ASID allow you to locate the connector. The STATE of the connection can be: FAILED-PERSISTENT DISCONNECTING The connection is in the process of disconnecting. FAILING, that is in the process of abnormally ending. ACTIVE. ACTIVE & The connection is in the active state but the connector has lost physical connectivity to the structure. ACTIVE OLD The structure is being rebuilt and the connector is connected to the old structure. ACTIVE &OLD The structure is being rebuilt and the connector is connected to the old structure, but it has lost physical connectivity to the structure.
244
ACTIVE NEW,OLD The structure is being rebuilt and the connector is connected to the old and new structure.
ACTIVE NEW,&OLD The structure is being rebuilt and the connector is connected to the old and new structure, and it has lost physical connectivity to the old structure.
ACTIVE &NEW,OLD The structure is being rebuilt and the connector is connected to the old and new structure, and it has lost physical connectivity to the new structure.
ACTIVE &NEW,&OLD The structure is being rebuilt and the connector is connected to the old and new structure, and it has lost physical connectivity to the old and new structure.
SETXCF FORCE,STRUCTURE,STRNAME=strname
245
D CF,STRUCTURE,STRNAME=strname
If any of the connections have a state of Failed-Persistent then force them out by using the following command:
SETXCF FORCE,CONNECTION,STRNM=strname,CONNAME=conname
SETXCF START,REBUILD,STRNM=IEFAUTOS,LOC=OTHER IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE 879 IEFAUTOS WAS ACCEPTED. IEF265I AUTOMATIC TAPE SWITCHING: REBUILD IN PROGRESS BECAUSE THE OPERATOR REQUESTED IEFAUTOS REBUILD. IEF265I AUTOMATIC TAPE SWITCHING: REBUILD IN PROGRESS BECAUSE THE OPERATOR REQUESTED IEFAUTOS REBUILD. IEF265I AUTOMATIC TAPE SWITCHING: REBUILD IN PROGRESS BECAUSE THE OPERATOR REQUESTED IEFAUTOS REBUILD. IEF265I AUTOMATIC TAPE SWITCHING: REBUILD IN PROGRESS BECAUSE THE OPERATOR REQUESTED IEFAUTOS REBUILD. IEF265I AUTOMATIC TAPE SWITCHING: REBUILD IN PROGRESS BECAUSE THE OPERATOR REQUESTED IEFAUTOS REBUILD. IEF265I AUTOMATIC TAPE SWITCHING: REBUILD IN PROGRESS BECAUSE THE OPERATOR REQUESTED IEFAUTOS REBUILD. IEF265I AUTOMATIC TAPE SWITCHING: REBUILD IN PROGRESS BECAUSE THE OPERATOR REQUESTED IEFAUTOS REBUILD. IEF268I AUTOMATIC TAPE SWITCHING IS AVAILABLE. 881 IEFAUTOS WAS SUCCESSFULLY REBUILT.
246
SETXCF START,REBUILD,STRNM=SYSTEM_LOGREC,LOC=OTHER IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE 057 SYSTEM_LOGREC WAS ACCEPTED. D XCF,STR,STRNM=SYSTEM_LOGREC IXC360I 15.17.51 DISPLAY XCF 059 STRNAME: SYSTEM_LOGREC STATUS: REASON SPECIFIED WITH REBUILD START: OPERATOR INITIATED REBUILD PHASE: COMPLETE POLICY SIZE : 32256 K POLICY INITSIZE: 16128 K REBUILD PERCENT: N/A PREFERENCE LIST: CF01 CF02 EXCLUSION LIST IS EMPTY REBUILD NEW STRUCTURE --------------------ALLOCATION TIME: 10/25/95 15:17:50 CFNAME : CF01 COUPLING FACILITY: 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 00 ACTUAL SIZE : 16128 K STORAGE INCREMENT SIZE: 256 K VERSION : ABDFB755 B5945204 DISPOSITION : DELETE ACCESS TIME : 0 MAX CONNECTIONS: 32 # CONNECTIONS : 8 REBUILD OLD STRUCTURE --------------------ALLOCATION TIME: 10/16/95 12:52:08 CFNAME : CF02 COUPLING FACILITY: 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 01 ACTUAL SIZE : 17152 K STORAGE INCREMENT SIZE: 256 K VERSION : ABD445FB B991D202 ACCESS TIME : 0 MAX CONNECTIONS: 32 # CONNECTIONS : 8 * ASTERISK DENOTES CONNECTOR CONNECTION NAME ID VERSION ---------------- -- -------*IXGLOGR_SC42 02 00020D97 *IXGLOGR_SC43 03 000309FC *IXGLOGR_SC47 01 0001007C *IXGLOGR_SC49 08 0008005D *IXGLOGR_SC50 05 000502FA *IXGLOGR_SC52 07 000700A5 *IXGLOGR_SC54 04 0004041C *IXGLOGR_SC55 06 00060231 WITH OUTSTANDING REBUILD RESPONSE SYSNAME JOBNAME ASID STATE -------- -------- ---- ---------------SC42 IXGLOGR 0011 ACTIVE NEW,OLD SC43 IXGLOGR 0011 ACTIVE NEW,OLD SC47 IXGLOGR 0011 ACTIVE NEW,OLD SC49 IXGLOGR 0011 ACTIVE NEW,OLD SC50 IXGLOGR 0011 ACTIVE NEW,OLD SC52 IXGLOGR 0011 ACTIVE NEW,OLD SC54 IXGLOGR 0011 ACTIVE NEW,OLD SC55 IXGLOGR 0011 ACTIVE NEW,OLD
247
248
249
CF NAME(CF01) DUMPSPACE(2048) PARTITION(1) CPCID(00) TYPE(009672) MFG(IBM) PLANT(02) SEQUENCE(000000040104) CF NAME(CF02) DUMPSPACE(2048) PARTITION(1) CPCID(01) TYPE(009672) MFG(IBM) PLANT(02) SEQUENCE(000000040104) STRUCTURE NAME(IEFAUTOS) SIZE(640) REBUILDPERCENT(20) PREFLIST(CF01, CF02) STRUCTURE NAME(IRRXCF00_B001) SIZE(332) PREFLIST(CF02, CF01) EXCLLIST(IRRXCF00_P001) STRUCTURE NAME(IRRXCF00_P001) SIZE(1644) PREFLIST(CF01, CF02) EXCLLIST(IRRXCF00_B001) STRUCTURE NAME(ISTGENERIC) SIZE(328) PREFLIST(CF02, CF01) STRUCTURE NAME(IXC_DEFAULT_1) SIZE(16128) PREFLIST(CF02, CF01) EXCLLIST(IXC_DEFAULT_2) STRUCTURE NAME(IXC_DEFAULT_2) SIZE(16128) PREFLIST(CF01, CF02) EXCLLIST(IXC_DEFAULT_1)
STRUCTURE NAME(JES2CKPT_1) SIZE(4096) INITSIZE(2048) PREFLIST(CF02, CF01) EXCLLIST(JES2CKPT_2) STRUCTURE NAME(SYSTEM_LOGREC) SIZE(32256) INITSIZE(16128) PREFLIST(CF01, CF02) STRUCTURE NAME(SYSTEM_OPERLOG) SIZE(1024) PREFLIST(CF02, CF01) STRUCTURE NAME(TEST) SIZE(2048) INITSIZE(1024) REBUILDPERCENT(20) PREFLIST(CF01, CF02)
Figure 69. CFRM Policy Sample
250
D XCF,STR IXC359I 10.14.49 DISPLAY XCF 020 STRNAME ALLOCATION TIME IEFAUTOS 10/12/95 13:25:26 IRRXCF00_B001 10/16/95 16:55:19 IRRXCF00_P001 10/16/95 16:55:18 ISTGENERIC 10/12/95 14:35:55 IXC_DEFAULT_1 10/17/95 09:42:11 IXC_DEFAULT_2 --JES2CKPT_1 10/12/95 14:46:43 SYSTEM_LOGREC 10/16/95 12:52:08 SYSTEM_OPERLOG 10/12/95 14:36:31 TEST ---
STATUS ALLOCATED ALLOCATED ALLOCATED ALLOCATED ALLOCATED NOT ALLOCATED ALLOCATED ALLOCATED ALLOCATED NOT ALLOCATED
For the sake of the example, we are to create and activate a new CFRM policy with two major changes: 1. The SIZE of the IEFAUTOS structure will be modified. 2. The IXC_DEFAULT_1 signalling structure will not be defined in the new CFRM policy. Note that these two structures are currently allocated. We check what is the current size of the IEFAUTOS structure:
D XCF,STR,STRNM=IEFAUTOS IXC360I 10.15.43 DISPLAY XCF 022 STRNAME: IEFAUTOS STATUS: ALLOCATED POLICY SIZE : 640 K POLICY INITSIZE: N/A REBUILD PERCENT: 20 PREFERENCE LIST: CF01 CF02 EXCLUSION LIST IS EMPTY ACTIVE STRUCTURE ---------------ALLOCATION TIME: 10/12/95 13:25:26 CFNAME : CF02 COUPLING FACILITY: 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 01 ACTUAL SIZE : 768 K STORAGE INCREMENT SIZE: 256 K VERSION : ABCF45F7 0891D281 DISPOSITION : DELETE ACCESS TIME : NOLIMIT MAX CONNECTIONS: 32 # CONNECTIONS : 9 CONNECTION NAME ---------------IEFAUTOSSC42 IEFAUTOSSC43 IEFAUTOSSC47 IEFAUTOSSC49 IEFAUTOSSC50 IEFAUTOSSC52 IEFAUTOSSC53 IEFAUTOSSC54 IEFAUTOSSC55 ID -06 09 01 08 05 02 04 07 03 VERSION -------00060035 00090005 0001017B 00080006 0005003B 00020045 00040033 00070028 0003003F SYSNAME -------SC42 SC43 SC47 SC49 SC50 SC52 SC53 SC54 SC55 JOBNAME -------ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ASID ---000F 000F 000F 000F 000F 000F 000F 000F 000F STATE ---------------ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
251
The new CFRM policy is installed into the CFRM couple data set by the JCL shown in Figure 70 on page 252.
//ADDCFRM JOB (999,POK), L06R , CLASS=A,REGION=4096K, // MSGCLASS=T,TIME=10,MSGLEVEL=(1,1),NOTIFY=&SYSUID //****************************************************************** //* JCL TO INSTALL A NEW CFRM POLICY //* //****************************************************************** //STEP1 EXEC PGM=IXCMIAPU //SYSPRINT DD SYSOUT=* //SYSIN DD * DATA TYPE(CFRM) REPORT(YES) DEFINE POLICY NAME(TESTPK ) CF NAME(CF01) DUMPSPACE(2048) PARTITION(1) CPCID(00) TYPE(009672) MFG(IBM) PLANT(02) SEQUENCE(000000040104) CF NAME(CF02) DUMPSPACE(2048) PARTITION(1) CPCID(01) TYPE(009672) MFG(IBM) PLANT(02) SEQUENCE(000000040104) STRUCTURE NAME(IEFAUTOS) SIZE(1000) REBUILDPERCENT(20) PREFLIST(CF01, CF02) STRUCTURE NAME(IRRXCF00_B001) SIZE(332) PREFLIST(CF02, CF01) EXCLLIST(IRRXCF00_P001) STRUCTURE NAME(IRRXCF00_P001) SIZE(1644) PREFLIST(CF01, CF02) EXCLLIST(IRRXCF00_B001) STRUCTURE NAME(ISTGENERIC) SIZE(328) PREFLIST(CF02, CF01) STRUCTURE NAME(IXC_DEFAULT_2) SIZE(16128) PREFLIST(CF01, CF02) STRUCTURE NAME(JES2CKPT_1) SIZE(4096) INITSIZE(2048) PREFLIST(CF02, CF01) EXCLLIST(JES2CKPT_2) STRUCTURE NAME(SYSTEM_LOGREC) SIZE(32256) INITSIZE(16128) PREFLIST(CF01, CF02) STRUCTURE NAME(SYSTEM_OPERLOG) SIZE(1024) PREFLIST(CF02, CF01) STRUCTURE NAME(TEST) SIZE(2048) INITSIZE(1024) REBUILDPERCENT(20) PREFLIST(CF01, CF02)
Figure 70. JCL to Install a New CFRM Policy
252
Once the policy has been installed, it is started as a new active policy:
SETXCF START,POL,TYPE=CFRM,POLNM=TESTPK IXC511I START ADMINISTRATIVE POLICY TESTPK FOR CFRM ACCEPTED IXC512I POLICY CHANGE IN PROGRESS FOR CFRM 025 TO MAKE TESTPK POLICY ACTIVE. 2 POLICY CHANGE(S) PENDING.
D XCF,STR IXC359I 10.17.10 DISPLAY XCF 027 STRNAME ALLOCATION TIME STATUS IEFAUTOS 10/12/95 13:25:26 ALLOCATED POLICY CHANGE PENDING IRRXCF00_B001 10/16/95 16:55:19 ALLOCATED IRRXCF00_P001 10/16/95 16:55:18 ALLOCATED ISTGENERIC 10/12/95 14:35:55 ALLOCATED IXC_DEFAULT_1 10/17/95 09:42:11 ALLOCATED POLICY CHANGE PENDING IXC_DEFAULT_2 --NOT ALLOCATED SYSTEM_LOGREC 10/16/95 12:52:08 ALLOCATED SYSTEM_OPERLOG 10/12/95 14:36:31 ALLOCATED TEST --NOT ALLOCATED
In this example, we already knew what to expect from this CFRM change. In a real life situation, these pending changes may result from mistakes in writing the new policy, and may therefore require that you compare the previous and current active policies to locate the differences. We now initiate a rebuild of the IEFAUTOS structure. This will cause the creation of a new instance of the structure as per the new parameters in the active policy:
SETXCF START,REBUILD,STRNM=IEFAUTOS,LOC=NORMAL IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE 033 IEFAUTOS WAS ACCEPTED. IEF265I AUTOMATIC TAPE SWITCHING: REBUILD IN PROGRESS BECAUSE 492 THE OPERATOR REQUESTED IEFAUTOS REBUILD. IEF268I AUTOMATIC TAPE SWITCHING IS AVAILABLE. 035 IEFAUTOS WAS SUCCESSFULLY REBUILT. IXC512I POLICY CHANGE IN PROGRESS FOR CFRM 793 TO MAKE TESTPK POLICY ACTIVE. 1 POLICY CHANGE(S) PENDING.
253
D XCF,STR IXC359I 10.19.48 DISPLAY XCF 037 STRNAME ALLOCATION TIME IEFAUTOS 10/17/95 10:19:14 IRRXCF00_B001 10/16/95 16:55:19 IRRXCF00_P001 10/16/95 16:55:18 ISTGENERIC 10/12/95 14:35:55 IXC_DEFAULT_1 10/17/95 09:42:11 IXC_DEFAULT_2 JES2CKPT_1 SYSTEM_LOGREC SYSTEM_OPERLOG TEST
STATUS ALLOCATED ALLOCATED ALLOCATED ALLOCATED ALLOCATED POLICY CHANGE PENDING --NOT ALLOCATED 10/12/95 14:46:43 ALLOCATED 10/16/95 12:52:08 ALLOCATED 10/12/95 14:36:31 ALLOCATED --NOT ALLOCATED
D XCF,STR,STRNM=IEFAUTOS IXC360I 10.20.02 DISPLAY XCF 039 STRNAME: IEFAUTOS STATUS: ALLOCATED POLICY SIZE : 1000 K POLICY INITSIZE: N/A REBUILD PERCENT: 20 PREFERENCE LIST: CF01 CF02 EXCLUSION LIST IS EMPTY ACTIVE STRUCTURE ---------------ALLOCATION TIME: 10/17/95 10:19:14 CFNAME : CF01 COUPLING FACILITY: 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 00 PAGE 8 ACTUAL SIZE : 1024 K STORAGE INCREMENT SIZE: 256 K VERSION : ABD565AC 32D98806 DISPOSITION : DELETE ACCESS TIME : NOLIMIT MAX CONNECTIONS: 32 # CONNECTIONS : 9 CONNECTION NAME ---------------IEFAUTOSSC42 IEFAUTOSSC43 IEFAUTOSSC47 IEFAUTOSSC49 IEFAUTOSSC50 IEFAUTOSSC52 IEFAUTOSSC53 IEFAUTOSSC54 IEFAUTOSSC55 ID -06 09 01 08 05 02 04 07 03 VERSION -------00060035 00090005 0001017B 00080006 0005003B 00020045 00040033 00070028 0003003F SYSNAME -------SC42 SC43 SC47 SC49 SC50 SC52 SC53 SC54 SC55 JOBNAME -------ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ALLOCAS ASID ---000F 000F 000F 000F 000F 000F 000F 000F 000F STATE ---------------ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE
To remove the last pending change, we have to deallocate the IXC_DEFAULT_1 structure. This is achieved by stopping all PATHINs and PATHOUTs to this structure:
254
SETXCF STOP,PI,STRNM=IXC_DEFAULT_1 IXC467I STOPPING PATHIN STRUCTURE IXC_DEFAULT_1 041 RSN: OPERATOR REQUEST IXC307I SETXCF STOP PATHIN REQUEST FOR STRUCTURE IXC_DEFAULT_1 042 COMPLETED SUCCESSFULLY SETXCF STOP,PO,STRNM=IXC_DEFAULT_1 IXC467I STOPPING PATHOUT STRUCTURE IXC_DEFAULT_1 044 RSN: OPERATOR REQUEST IXC307I SETXCF STOP PATHOUT REQUEST FOR STRUCTURE IXC_DEFAULT_1 045 COMPLETED SUCCESSFULLY IXC307I STOP PATH REQUEST FOR STRUCTURE IXC_DEFAULT_1 046 COMPLETED SUCCESSFULLY: NOT DEFINED AS PATHOUT OR PATHIN IXC513I COMPLETED POLICY CHANGE FOR CFRM. 047 TESTPK POLICY IS ACTIVE.
DISPLAY XCF,CF IXC361I 10.21.18 DISPLAY XCF 049 CFNAME COUPLING FACILITY CF01 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 00 CF02 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 01 D XCF,STR IXC359I 10.21.25 DISPLAY XCF 051 STRNAME ALLOCATION TIME IEFAUTOS 10/17/95 10:19:14 IRRXCF00_B001 10/16/95 16:55:19 IRRXCF00_P001 10/16/95 16:55:18 ISTGENERIC 10/12/95 14:35:55 JES2CKPT_1 10/12/95 14:46:43 SYSTEM_LOGREC 10/16/95 12:52:08 SYSTEM_OPERLOG 10/12/95 14:36:31 TEST ---
STATUS ALLOCATED ALLOCATED ALLOCATED ALLOCATED ALLOCATED ALLOCATED ALLOCATED NOT ALLOCATED
255
DEFINE POLICY NAME(CFRM07 ) CF NAME(CF01) DUMPSPACE(2048) PARTITION(1) CPCID(00) TYPE(009672) MFG(IBM) PLANT(02) SEQUENCE(000000040104) CF NAME(CF02) DUMPSPACE(2048) PARTITION(1) CPCID(01) TYPE(009672) MFG(IBM) PLANT(02) SEQUENCE(000000040104) STRUCTURE NAME(IEFAUTOS) SIZE(640) REBUILDPERCENT(20) PREFLIST(CF01, CF02) STRUCTURE NAME(IRRXCF00_B001) SIZE(332) PREFLIST(CF02, CF01) EXCLLIST(IRRXCF00_P001) STRUCTURE NAME(IRRXCF00_P001) SIZE(1644) PREFLIST(CF01, CF02) EXCLLIST(IRRXCF00_B001) STRUCTURE NAME(JES2CKPT_1) SIZE(4096) INITSIZE(2048) PREFLIST(CF02, CF01) EXCLLIST(JES2CKPT_2)
Figure 71. Original CFRM Policy
DEFINE POLICY NAME(TSTPK ) CF NAME(CF01) DUMPSPACE(2048) PARTITION(1) CPCID(00) TYPE(009672) MFG(IBM) PLANT(02) SEQUENCE(000000040104) STRUCTURE NAME(IEFAUTOS) SIZE(640) REBUILDPERCENT(20) PREFLIST(CF01) STRUCTURE NAME(IRRXCF00_B001) SIZE(332) PREFLIST(CF01) EXCLLIST(IRRXCF00_P001) STRUCTURE NAME(IRRXCF00_P001) SIZE(1644) PREFLIST(CF01) EXCLLIST(IRRXCF00_B001) STRUCTURE NAME(JES2CKPT_1) SIZE(4096) INITSIZE(2048) PREFLIST(CF01) EXCLLIST(JES2CKPT_2)
Figure 72. New CFRM Policy
When starting the new policy, four pending changes are indicated against the structures and one pending change against the coupling facility itself.
256
SETXCF START,POL,TYPE=CFRM,POLNAME=TSTPK IXC511I START ADMINISTRATIVE POLICY TSTPK FOR CFRM ACCEPTED IXC512I POLICY CHANGE IN PROGRESS FOR CFRM 118 TO MAKE TSTPK POLICY ACTIVE. 8 POLICY CHANGE(S) PENDING. D XCF,STR IXC359I 17.01.41 DISPLAY XCF 120 STRNAME ALLOCATION TIME STATUS IEFAUTOS 10/25/95 15:11:53 ALLOCATED POLICY CHANGE IRRXCF00_B001 10/17/95 10:44:14 ALLOCATED POLICY CHANGE IRRXCF00_P001 10/17/95 10:44:12 ALLOCATED POLICY CHANGE JES2CKPT_1 10/12/95 14:46:43 ALLOCATED POLICY CHANGE D XCF,CF IXC361I 17.01.57 DISPLAY XCF 122 PAGE 2 CFNAME COUPLING FACILITY CF01 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 00 CF02 009672.IBM.02.000000040104 PARTITION: 1 CPCID: 01 POLICY CHANGE PENDING
The pending changes are due to the following: 1. Changes to the structures already allocated in CF01 (the o n - g o i n g coupling facility) in that their preference lists have been modified so as not to mention CF02. This pertains to structures IEFAUTOS and IRRXCF00_P001. 2. Changes to the structures already allocated in CF02 (the out-going coupling facility), in that their coupling facility is not defined in the new active policy. This pertains to structures IRRXCF00_B001 and JES2CKPT_1. Note however that all accesses to the structures in CF02 keep proceeding normally. 3. Change to the coupling facility CF02 in that it is no longer useable by the Sysplex MVS images as a target of allocation or rebuild. To resolve the pending changes:
The original policy can be re-installed and re-started. Or, if the new policy is to be kept as is: 1. The changes pending against CF01 structures are resolved using the following command:
SETXCF START,REBUILD,CFNAME=CF01,LOC=NORMAL
This causes the new structure attributes (in this case, the new preference list) to be taken into account. 2. The changes pending against the structures in CF02 are resolved with the following command:
SETXCF START,REBUILD,CFNAME=CF02,LOC=NORMAL
This will initiate the rebuild of all the structures currently in CF02 as per the active preference lists. That is, all the rebuilds will be done into CF01.
Appendix C. Examples of CFRM Policy Transitioning
257
This works for IRRXCF00_B001, but it does not work for JES2CKPT_1 since JES2 does not support the structure rebuild. The JES2 checkpoint will therefore have to be moved onto DASD or into CF01 using the JES2 checkpoint reconfiguration dialog. In either case, JES2CKPT_1 must be eventually deleted from CF02 using the following command:
SETXCF FORCE,STRUCTURE,STRNAME=JES2CKPT_1
3. When this is done, all the pending changes have been resolved, and a D XCF,CF will show CF01 as the only usable coupling facility for the sysplex members.
258
V XCF,sysname,OFFLINE
Further operation intervention is required if there is no SFM policy active, as shown in Figure 73. The partitioning is totally handled without operator intervention (the system being varied off the sysplex is automatically isolated using the SFM isolate function) if both of the following are true:
There is a SFM policy active. An operational system in the Sysplex shares coupling facility connectivity with the system to be isolated (the coupling facility is the intermediary in forwarding the isolation signal to the target system).
V XCF,SC42,OFF *007 IXC371D CONFIRM REQUEST TO VARY SYSTEM SC42 OFFLINE. REPLY SYSNAME=SC42 TO REMOVE SC42 OR C TO CANCEL. R 7,SYSNAME=SC42 IEE600I REPLY TO 007 IS;SYSNAME=SC42 IXC101I SYSPLEX PARTITIONING IN PROGRESS FOR SC42 ........................ *008 IXC102A XCF IS WAITING FOR SYSTEM SC42 DEACTIVATION. REPLY DOWN WHEN MVS ON SC42 IS DOWN. R 8,DOWN IEE600I REPLY TO 008 IS;DOWN ........................ ISG178E GLOBAL RESOURCE SERIALIZATION HAS BEEN DISRUPTED. GLOBAL RESOURCE REQUESTORS WILL BE SUSPENDED. IEA257I CONSOLE PARTITION CLEANUP IN PROGRESS FOR SYSTEM SC42. ISG011I SYSTEM SC42 - BEING PURGED FROM GRS COMPLEX ISG013I SYSTEM SC42 - PURGED FROM GRS COMPLEX ISG173I SYSTEM SC43 RESTARTING GLOBAL RESOURCE SERIALIZATION. IXC105I SYSPLEX PARTITIONING HAS COMPLETED FOR SC42 321 - PRIMARY REASON: OPERATOR VARY REQUEST - REASON FLAGS: 000004
Figure 73. VARY OFF a System without SFM Policy Active
259
V XCF,SC42,OFFLINE *006 IXC371D CONFIRM REQUEST TO VARY SYSTEM SC42 OFFLINE. REPLY SYSNAME=SC42 TO REMOVE SC42 OR C TO CANCEL. R 6,SYSNAME=SC42 IEE600I REPLY TO 006 IS;SYSNAME=SC42 .................................................... IXC101I SYSPLEX PARTITIONING IN PROGRESS FOR SC42 .................................................... IXC105I SYSPLEX PARTITIONING HAS COMPLETED FOR SC42 255 - PRIMARY REASON: OPERATOR VARY REQUEST - REASON FLAGS: 000004 IEA258I CONSOLE PARTITION CLEANUP COMPLETE FOR SYSTEM SC42.
Figure 74. VARY OFF a System with an SFM Policy Active
*242 IXC402D SC42 LAST OPERATIVE AT 18:40:14. REPLY DOWN IF MVS IS DOWN OR INTERVAL=SSSSS TO SET A REPROMPT TIME. R 242,DOWN IEE600I REPLY TO 242 IS;DOWN IXC101I SYSPLEX PARTITIONING IN PROGRESS FOR SC42 ISG011I SYSTEM SC42 - BEING PURGED FROM GRS COMPLEX ISG013I SYSTEM SC42 - PURGED FROM GRS COMPLEX IEA257I CONSOLE PARTITION CLEANUP IN PROGRESS FOR SYSTEM SC42. IEA258I CONSOLE PARTITION CLEANUP COMPLETE FOR SYSTEM SC42. IXC105I SYSPLEX PARTITIONING HAS COMPLETED FOR SC42 109 - PRIMARY REASON: SYSTEM STATUS UPDATE MISSING - REASON FLAGS: 000008
Figure 75. System in Missing Status Update Condition and No Active SFM Policy
260
IXC101I SYSPLEX PARTITIONING IN PROGRESS FOR SC42 IEA257I CONSOLE PARTITION CLEANUP IN PROGRESS FOR SYSTEM SC42. IEA258I CONSOLE PARTITION CLEANUP COMPLETE FOR SYSTEM SC42. IXC105I SYSPLEX PARTITIONING HAS COMPLETED FOR SC42 879 - PRIMARY REASON: SYSTEM REMOVED BY SYSPLEX FAILURE MANAGEMENT BECAUSE ITS STATUS UPDATE WAS MISSING - REASON FLAGS: 000100
Figure 76. System in Missing Status Update with an Active SFM Policy and CONNFAIL(YES)
261
262
TERM, ACR
Specifying TERM as the first disruptive action will ABEND the failing workunit. The workunit s recovery will be given control however, the recovery routine will not be allowed to retry. The terminating abend thereby eliminates the possibility of the recovery routine retrying back into misbehaving code. In a sysplex, specifying a SPINRCVY action of OPER is not recommended because an operator may not respond quickly enough to prevent remaining systems in the sysplex from partitioning the ailing system out of the sysplex. An example of a spin loop occurring on an MVS system and being resolved is shown in Figure 77 on page 264. SYSA is in a sysplex with systems SYSB and SYSC. SYSA is running in a shared logical partition with SPINTIME=20 and SPINRCVY=TERM,ACR. At time minus 10, CPU 1 gets into a never-ending disabled loop. At time 0, CPU 0 tries to signal CPU 1 but gets no response. CPU 0 enters a spin loop waiting for CPU 1 to enable.
263
Time in |------------- MVS SYSTEM SYSA ---------------| seconds: CPU 0 CPU 1 -10 -7 | Enter never-ending | disabled loop Update couple data | set (CDS) with status | | Looping Update CDS with status | | Update CDS with status | | SIGP to CPU 1 -------------> No response. Enter | spin loop waiting for | Looping response from CPU 1. | | Spinning | | | Spinning | Looping | | SPINTIME timeout - An | excessive spin | condition is declared. | Write IEE178I to | syslog. SIGP RESTART ------> Write ABEND071-10 to | LOGREC. Redispatch Recovery action is to | interrupted program. Continue SPIN. | | Spinning | V | Looping | | Spinning | | | Spinning | | Looping | An excessive spin | condition is | declared again, | ACTION is to TERM | CPU 1. SIGP RESTART -------> ABEND071-30, | disabled loop is | terminated. SIGP response received <----- Respond to (time 0) from CPU 1 | SIGP | Update CDS with status |
-4 -1 0
20
40
41
42
In this figure, after waiting 20 seconds for CPU 1 to enable, CPU 0 declared an excessive spin condition. MVS s response to the excessive spin condition at
264
time 20 was to collect diagnostic data on CPU 1 and to have the spinning routine on CPU 0 repeat the SPIN. After waiting an additional 20 seconds (time 40) for CPU 1 to enable, CPU 0 declared another excessive spin condition. At this point MVS selected the next excessive spin recovery action specified by SPINRCVY. The TERM action successfully ended the disabled loop on CPU 1 and resolved the spin loop on CPU 0. In this example, there was a 43 second lapse of time (-1 thru +42) between updates of SYSA s status in the sysplex couple data set. During this time, SYSA would appear to be dormant to systems SYSB and SYSC. If SYSA s failure detection interval was 40 seconds, SYSB or SYSC may have initiated a partitioning action against SYSA to remove it from the sysplex before SYSA had a chance to recover from the spin loop. It is therefore very important to choose the XCF failure detection interval carefully. Note: It is possible for a spin loop to tie up multiple CPUs in an MP environment. If SYSA had 10 engines, a spin loop on one CP could tie up all ten CPs and make the MVS image appear dormant to other systems in the sysplex.
265
266
View Device Parameter / Feature Definition Command ===> __________________________________________ Scroll ===> Configuration ID . : MVSW1 Device number . . : 018B Device type . . . : 3390 Generic / VM device type . . . . : 3390 ENTER to continue. Parameter/ Feature OFFLINE DYNAMIC ALTCTRL SHARED SHAREDUP Value Req. Description
No Device considered online or offline at IPL Yes <------- Device supports dynamic configuration No Separate physical control unit path Yes Device shared with other systems No Shared when system physically partitioned
2. Now, depending on the type of processor, you have to follow different steps to make the hardware dynamic reconfigurable: ES/9000 a. HCD is used to build the IOCDS from the IODF. As shown in Figure 78 on page 268, option 2.2 is used to create the IOCDS.
267
Select one of the following tasks. ---> 2 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Build production I/O definition file Build IOCDS Build IOCP input data set Create JES3 initialization stream data View active configuration Activate configuration dynamically Activate configuration sysplex-wide Activate switch configuration Save switch configuration Build HCPRIO input data set Build and manage S/390 microprocessor IOCDSs and IPL attributes
b. Hardware enablement of dynamic reconfiguration management is selected on the hardware console CONFIG frame, H=I/O DEFINITION selection.
H= I/O Definition 1. Percent Expansion Total : ______ Shared: ______ 2. Allow Modification
c. Perform a power-on reset, and IPL from the same IODF. Note that the IODF is pointed to by the LOADxx member. 9672 a. As shown in Figure 80 on page 269, option 2.11 is used to build the IOCDS on a 9672.
268
Select one of the following tasks. 11 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Build production I/O definition file Build IOCDS Build IOCP input data set Create JES3 initialization stream data View active configuration Activate configuration dynamically Activate configuration sysplex-wide Activate switch configuration Save switch configuration Build HCPRIO input data set Build and manage S/390 microprocessor IOCDSs and IPL attributes
------>
You can also use HCD to create a stand alone input IOCP deck on a diskette and load it via the HMC Workstation. For 9672 model 1 the HCD token will not be preserved and the resulting IOCDS file will not be dynamic capable. You should first POR and IPL without dynamic reconfigurable capability and then reload the IOCDS file through HCD to regain the dynamic capability. 9672 model 2 and 3 will be dynamic I/O capable from the first power-on reset. b. Enable the dynamic I/O configuration option in the RESET profile to be used in the next POR as shown in Figure 81 on page 270.
269
c. Perform a power-on reset, and IPL using the same IODF. d. Update the RESET profile to indicate that the system should use the last active IOCDS. This ensures that the system will come back after a power off with the same IOCDS as was active when the system went down. Because you may have done some dynamic I/O activates from MVS, this may not be the same IOCDS as was used at the last POR. 3. IPL the machine using the same IODF file 4. You can verify that both hardware and software are in synch via the D IOS,CONFIG command and checking the results:
IOS506I 13.44.29 I/O CONFIG DATA 011 ACTIVE IODF DATA SET = SYS5.IODF23 <--SW Token CONFIGURATION ID = MVSW1 EDT ID = 11 TOKEN: PROCESSOR DATE TIME DESCRIPTION SOURCE: ITSO942A 95-09-21 11:54:18 SYS5 IODF23 <--HW Token
270
6. If you are running in LPAR, perform a software and hardware activation in the driving partition. For the software and hardware activation specify YES for Allow hardware delete on the Activate New Hardware and Software Configuration panel. In all the other partitions perform a software-only activation. If you are running in Basic mode, HW and SW activation should be done once from the running MVS. 7. Configure the channel paths on-line. 8. Vary on the new devices. 9. If you have a specific IODF named in your LOADxx member, update it to point at the new IODF. We recommend you use ** as the IODF numebr in LOADxx, as this indicates that the system should use the IODF that matches the one currently active in the hardware.
IOS500I ACTIVATE RESULTS TEST DETECTED CONDITIONS WHICH WOULD RESULT IN ACTIVATE FAILURE REASON=0156,NOT ENOUGH SPACE TO ACCOMMODATE HARDWARE CHANGES DESCTEXT=NET # [SUB|CU|LCU] TO BE ADDED = xxxxxxxx, # [SUB|CU|LCU] AVAIL = yyy There is not enough storage in the hardware system area (HSA) to store the changes to the hardware I/O configuration. More subchannels (SUB), control units (CU) or logical control units (LCU) must be available in the HSA to store the changes to the hardware I/O configuration. In the message text: xxxxxxxx The number of subchannels, CUs, or LCUs that the system is adding because of the configuration change. yyy The number of subchannels, CUs, or LCUs that are currently available in the HSA.
Anyway, the system rejects the ACTIVATE request and most of the time you will be requested to power-on-reset the machine with a larger expansion factor for the HSA. Starting from MVS/ESA SP V5.1 there is as an extension of the operator command D IOS,CONFIG to request HSA information relative to Dynamic I/O. In this way you can control the HSA growing and prevent unplanned outages because the machine is no more dynamic capable due to a lack of HSA storage. The command D IOS,CONFIG(HSA) or D IOS,CONFIG(ALL) will provide the following as part of the IOS506I response:
271
xxx PHYSICAL CONTROL UNITS xxx SUBCHANNELS FOR SHARED CHANNEL PATHS xxx SUBCHANNELS FOR UNSHARED CHANNEL PATHS xxx LOGICAL CONTROL UNITS FOR SHARED CHANNEL PATHS xxx LOGICAL CONTROL UNITS FOR UNSHARED CHANNEL PATHS where xxx indicates the available number.
272
The example applies to a 9021-711 based processor. Along with the CONFIG frame Percent Expansion values, information from the HCD Device Detail Report, or the end of the IOCP I/O IODEVICE Report, as well as the total number of logical partitions defined in the IOCDS, is required to determine the resultant total number of subchannels. The following process can be used to determine the number of nonshared and shared subchannels to be added, as well as the total number of subchannels to be accommodated in the HSA, based on user specified percent expansion TOTAL and SHARED values. In this example, the user plans to increment the number of subchannels three-fold, and of that number, one-half will be shared subchannel. To accomplish this, a percent expansion TOTAL value of 300, and a percent expansion SHARED value of 50, would be specified.
-----> Information from HCD Device Detail Report ---------------------------------------------------------------TOTALS FOR CHPIDS, SUBCHANNELS, AND CONTROL UNITS NonAdditional HSA Shared shared generated Total Total ------ ------- ---------- ----- ----CHPIDs 28 96 n/a 124 n/a Physical Control Units 8 271 n/a 279 n/a Subchannels 320 1938 1332 3590 4870 Logical Control Units 5 172 70 247 267 --------------------------------------------------------------------> CONFIG Frame H values H= I/O Definition 1. Percent Expansion Total: 300 Shared: 50 ----------------------------> Total number of logical partitions is 5. ---------------------------------------------------------------Determine expansion value IOCDS Total Subchannels x expansion factor 3590 x 3.00 = 10770 -----> This number plus the IOCDS total number (3590 in the above calculation) cannot exceed the Max Subchannels in IOCDS supported by the processor. ---------------------------------------------------------------Determine shared portion Expansion value x shared factor -----> 10770 x .5 = 5385 ---------------------------------------------------------------Determine nonshared portion Expansion value minus shared portion -----> 10770 - 5385 = 5385 ---------------------------------------------------------------Determine new HSA total
a. HSA total ---------------------------------------> 4870 from IOCDS I/O Device Report or HCD Detail Report
Appendix F. Dynamic I/O Reconfiguration Procedures
273
b. Nonshared addition ------------------------------> 5385 Nonshared portion from above c. Shared addition ---------------------------------> 26925 Shared portion from above x total # of partitions -----d. New HSA Total -----------------------------------> 37180
274
Glossary
The following terms and abbreviations are defined as they are used in this book. For terms that do not appear in this glossary see the IBM Vocabulary for Data Processing, Telecommunications, and Office Systems, GC20-1699 , or the glossaries of related publications. The following cross-references are used in this glossary: Contrast with. This refers to a term that has an opposed or substantively different meaning. Deprecated term for. This indicates that the term should not be used. It refers to a preferred term, which is defined in the glossary. See. This refers the reader to multiple-word terms in which this term appears. See also. This refers the reader to terms that have a related, but not synonymous, meaning. Synonym for. This indicates that the term has the same meaning as a preferred term, which is defined in the glossary. Synonymous with. This is a backward reference from a defined term to all other terms that have the same meaning.
A
abend . Abnormal end of task: termination of a task prior to its completion because of an error condition that cannot be resolved by recovery facilities while the task is executing. AC . ACF . ACS . Alternating current. See advanced communications function. See automatic class selection .
device or workstation connected to a network. (5) The identifier of a location, source, or destination. address space . In ESA/390, the range of virtual storage addresses that provide each user with a unique address space and which maintains the distinction between programs and data within that space. advanced communications functions (ACF) . A group of IBM program products (ACF/VTAM, ACF/NCP, and more) that uses the concepts of SNA. AIX . Advanced Interactive Executive
activate . To load the contents of an SCDS into SMS address space storage and into an ACDS , or to load the contents of an existing ACDS into SMS address space storage. This establishes a new storage management policy for the SMS complex. active configuration . In an Enterprise Systems Connection Director (ESCD), the configuration determined by the status of the currently active set of connectivity attributes. Contrast with saved configuration . active control data set (ACDS) . A VSAM linear data set that contains a copy of the most recently activated configuration ( SCDS ) and subsequent updates. All systems in an SMS complex use the ACDS to manage storage. adapter . (1) A general term for a device that provides some transitional function between two or more devices. (2) In an Enterprise Systems Connection link environment, hardware used to join different connector types. address . (1) A value that identifies a register, a particular part of storage, a data source, or a data sink. The value is represented by one or more characters. (2) To refer to a device or an item of data by its address. (3) The location in the storage of a computer where data is stored. (4) In data communication, the unique code assigned to each Copyright IBM Corp. 1995
alert . A unit of information, usually indicating the loss of a system resource, passed from one machine or program to a host to signal an error. alphanumeric . Consisting of both letters and numbers and often other symbols, such as punctuation marks and mathematical symbols. APA . APAR . All-points-addressable. See authorized program analysis report .
application . (1) The use to which an information processing system is put; for example, a payroll application, an airline reservation application, a network application. (2) A collection of software components used to perform specific types of work on a computer. asynchronous . Without regular time relationship. Unexpected or unpredictable with respect to the p r o g r a m s instructions, or to time. Contrast with synchronous. authorized program analysis report (APAR) . A report of a problem caused by a suspected defect in a current unaltered release of a program. automatic class selection (ACS) . A mechanism for assigning SMS classes and storage groups.
275
automatic dump . In DFHSM , the process of using DFDSS to automatically do a full volume dump of all allocated space on primary volumes to designated tape dump volumes. auxiliary storage . Data storage other than main storage; usually, a direct storage device. availability . For a storage subsystem, the degree to which a data set can be accessed when requested by a user.
central storage . Storage that is an integral part of the processor unit. Central storage includes both main storage and the hardware system area. channel (CHN) . (1) A path along which signals can be sent, for example, data channel, output channel. (A) (2) In the channel subsystem, each channel controls an I/O interface between the channel control element and the attached control units. channel-attached . Pertaining to attachment of devices directly by data channels (I/O channels) to a computer. Synonym for local . Contrast with telecommunication-attached. channel path . Is the physical medium by which a channel subsystem exchanges data with an I/O device in ESA/390 mode. A channel path can have byte or burst character, and up to eight paths can be assigned to a device from the same system. channel subsystem (CSS) . A collection of subchannels that directs the flow of information between I/O devices and main storage, relieves the processor of communication tasks, and does path management functions. CICS . CIM . class. See Customer Information Control System. See computer-integrated manufacturing. See SMS class .
B
backup . The process of copying data and storing it for use in case the original data is somehow damaged or destroyed. In DFHSM , the process of copying a data set residing on a level 0 volume, level 1 volume, or a volume not managed by DFHSM to a backup volume. See automatic backup and incremental backup . BASIC (beginners all-purpose symbolic instruction code) . An easy-to-use problem-solving language that lets you write programs in English-like statements. batch . Pertaining to a program or operation that is performed with little or no interaction between the user and the system. Contrast with interactive . block . A string of data elements recorded or transmitted as a unit. The element may be characters, words, or physical records. (T) bus . In a processor, a physical facility on which data is transferred to all destinations, but from which only addressed destinations may read in accordance with appropriate conventions. (I)
CLIST . A sequential list of commands and control statements assigned a single name; when the name is invoked the commands in the list are executed in sequential order. CLIST (command list) . A data set in which commands and possibly subcommands and data are stored for subsequent execution. cluster controller . A device that can control the input/output operations of more than one device connected to it. coax . Coaxial.
C
CA . (1) Channel address. (2) Communication adapter. (3) Common adapter. catalog . A data set that contains extensive information required to locate other data sets, to allocate and deallocate storage space, to verify the access authority of a program or operator, and to accumulate data set usage statistics. CDS . CDS . Configuration data set. See control data set .
column . A vertical arrangement of data. Contrast with r o w . command . (1) A request for system action. (2) A request from a terminal for the performance of an operation or the execution of a particular program. (3) A value sent on an I/O interface from a channel to a control unit that specifies the operation to be performed. common carrier . In data communication, any government-regulated company that provides communication services to the general public. complex . See SMS complex .
central processor . The electronic circuitry (and licensed internal code) responsible for the execution of the instructions that reside in main storage and constitute the operating system and the user applications.
276
component . (1) Hardware or software that is part of a functional unit. (2) A functional part of an operating system; for example, the scheduler or supervisor. computer-integrated manufacturing (CIM) . A strategy that encompasses the integration of information from engineering design, production, business systems, and the plant floor. config . (1) Configuration. (2) Configurator. (3) Configure. configuration . (1) The arrangement of a computer system or network as defined by the nature, number, and the chief characteristics of its functional units. More specifically, the term configuration may refer to a hardware configuration or a software configuration. (I) (A) (2) In an Enterprise Systems Connection Director (ESCD), the physical interconnection capability determined by a set of attributes. The attribute values specify the connectivity control status and identifiers associated with the ESCD and its ports. See also active configuration, configuration matrix, connectivity attributes, and saved configuration . configuration matrix . In an Enterprise Systems Connection Director (ESCD), an array of connectivity attributes, displayed in rows and columns, that can be used to alter both active and saved configurations. configure . To describe to the system the devices and optional features installed on the system. connected . In an Enterprise Systems Connection Director (ESCD), the attribute that, when set, establishes a dedicated connection. Contrast with disconnected . connection . In an Enterprise Systems Connection Director (ESCD), an association established between two ports that provides a physical communication path between them. connectivity . A term used to describe the physical interconnections of multiple devices/computers/networks employing similar or different technology and/or architecture together to accomplish effective communication between and among connected members involving data exchange and/or resource sharing. connectivity . Relationship that establishes the eligibility of a given system in an SMS complex to access a VIO storage group, a pool storage group, and the individual volumes within a pool storage group. The relationship can be NOTCON (not connected), indicating ineligibility, or any of the following, all of which imply eligibility: ENABLE , QUIALL (quiesce all), QUINEW (quiesce new), DISALL (disable all), DISNEW (disable new).
console . A logical device that is used for communication between the user and the system. See also service console . control data set (CDS) . With respect to SMS , a VSAM linear data set containing configurational, operational, or communication information. SMS introduces three types of control data sets: source control data set, active control data set, and communications data set. controller . A unit that controls input/output operations for one or more devices. control unit . A general term for any device that provides common functions for other devices or mechanisms. Synonym for controller . conversion . (1) In programming languages, the transformation between values that represent the same data item but belong to different data types. Information can be lost through conversion because accuracy of data representation varies among different data types. (2) The process of changing from one method of data processing to another or from one data processing system to another. (3) The process of changing from one form of representation to another; for example, to change from decimal representation to binary representation. CP . CS . (1) Control program. (2) Central processor. (1) Central storage. (2) Cycle steal.
CUA . Control unit address (channel, control unit, and device address). Customer Information Control System (CICS) . An IBM licensed program that enables transactions entered at remote terminals to be processed concurrently by user-written application programs. It includes facilities for building, using, and maintaining data bases. Customize . To change a data processing installation or network to meet the needs of particular users.
D
DASD . Direct access storage device. data class . A list of allocation attributes that the system uses for the creation of data sets. data stream . A continuous or concentrated flow of data bytes including control characters that will influence the processing of this string of bytes. data system . Refers to the storage and retrieval of data, its transmission to terminals, and controls to provide adequate protection and ensure proper usage.
Glossary
277
default . Pertaining to an attribute, value, or option that is assumed when none is explicitly specified. device . A mechanical, electrical, or electronic contrivance with a specific purpose. direct access storage device (DASD) . A device in which access time is effectively independent of the location of the data. disconnected . In an Enterprise Systems Connection Director (ESCD), the attribute that, when set, removes a dedicated connection. Contrast with connected . diskette . A flexible magnetic disk enclosed in a protective container. display . See display device, display image, and display screen . display device . A device that presents information on a screen. See also display screen . display image . Information, pictures, or illustrations that appear on a display screen. See also display device . display screen . The surface of a display device on which information is presented to a user. See also display image . duplex . Pertaining to communication in which data can be sent and received at the same time. Synonymous with full duplex . dynamic . Pertaining to an operation that occurs at the time it is needed rather than at a predetermined or fixed time.
esoteric name . A name used to define a group of devices having similar hardware characteristics, such as TAPE or SYSDA . See generic name . event . (1) An occurrence or happening. (2) A n occurrence of significance to a task; for example, the completion of an asynchronous operation, such as an input/output operation. expanded storage . (1) Optional integrated high-speed storage that transfers 4K-byte pages to and from central storage. (2) Additional (optional) storage that is addressable by the system control program. Expanded storage improves system response and system performance. (3) All storage above 256MB. Storage between 64MB and 256MB can be partitioned between central storage and expanded storage. extent . A continuous space on a DASD volume occupied by a data set or portion of a data set.
F
FC . Feature code. feature . A part of an IBM product that can be ordered separately by the customer. FORTRAN (formula translation). . A mathematically oriented high-level programming language, useful for applications ranging from simple problem solving to large scale numeric systems using optimization techniques. frame . (1) A housing for machine elements. (2) The hardware support structure, covers, and all electrical parts mounted therein that are packaged as one entity for shipping. (3) A formatted display. full duplex . Synonym for duplex .
E
element . A major part of a component (for example, the buffer control element) or a major part of the system (for example, the system control element). emulation . (1) The imitation of all or part of one system by another, primarily by hardware, so that the imitating system accepts the same data, executes the same programs, and achieves the same results as the imitated computer system. (2) The use of programming techniques and special machine features to allow a computing system to execute programs written for another system. (3) Imitation; for example, imitation of a computer or device. (4) Contrast with simulation . end user . A person in a data processing installation who requires the services provided by the computer system. ESA/390 . ESCON . Enterprise Systems Architecture/390. Enterprise Systems Connection.
G
GDDM . Graphical Data Display Manager. generic name . A name assigned to a class of devices (such as 3380) that is derived from the IODEVICE statement in the MVS configuration program. See esoteric name .
H
hardware system area (HSA) . A logical area of central storage that is used to store Licensed Internal Code and control information (not addressable by application programs). host (computer) . (1) In a computer network, a computer that provides end users with services such
278
as computation and data bases and that usually performs network control functions. (2) The primary or controlling computer in a multiple-computer installation. HSA . See hardware system area .
I
ID . Identifier. identifier (ID) . (1) One or more characters used to identify or name a data element and possibly to show certain properties of that data element. (2) In an Enterprise Systems Connection Director (ESCD), a user-defined symbolic name of 24 characters or less that identifies a particular ESCD. See also password identifier and port address name . incremental backup . In DFHSM , the process of copying a data set that has been opened for other than read-only access since the last backup version was created, and that has met the backup frequency criteria. initialization . Preparation of a system, device, or program for operation. initialize . To set counters, switches, addresses, or storage contents to zero or other starting values at the beginning of, or at prescribed points in, the operation of a computer routine. (A) initial program load (IPL) . The initialization procedure that causes an operating system to start operation. input/output (I/O) . (1) Pertaining to a device whose parts can perform an input process and an output process at the same time. (I) (2) Pertaining to a functional unit or channel involved in an input process, output process, or both, concurrently or not, and to the data involved in such a process. input/output configuration data set (IOCDS) . A configuration definition built by the I/O Configuration Program (IOCP) and stored on disk files associated with the processor controller. Input/output configuration program (IOCP) . The program that defines the I/O configuration data required by the processor complex to control I/O requests. intelligent printer data stream (IPDS) . A type of printer control that allows you to present text, raster images, vector graphics, bar codes, and previously stored overlays at any point on a page. interactive . Pertaining to a program or system that alternately accepts input and then responds. A n interactive system is conversational; that is, a continuous dialog exists between user and system. Contrast with batch .
interface . (1) A shared boundary between two functional units, defined by functional characteristics, common physical interconnection characteristics, signal characteristics, and other characteristics as appropriate. (2) A shared boundary. An interface can be a hardware component to link two devices or a portion of storage or registers accessed by two or more computer programs. (3) Hardware, software, or both, that links systems, programs, or devices. I/O . Input/output. See input/output configuration data set.
IOCDS .
I/O configuration . The collection of channel paths, control units, and I/O devices that attaches to the processor unit. IOCP . IPDS . IPL . See input/output configuration program. See intelligent printer data stream. See initial program load.
J
JES (job Entry Subsystem) . A system facility for spooling, job queuing, and managing I/O. job . A unit of work to be done by a system. M a y consist of more that one program. job control language (JCL) . A problem-oriented language used to express statements in a job that identify the job or describe its requirements to an operating system.
L
LAN . LIC . See local area network. Licensed Internal Code.
link problem determination aid (LPDA) . A series of test commands executed by IBM DCE to determine which of various network components may be causing an error in the network. local . Pertaining to a device accessed directly without use of a telecommunication line. Synonym for channel-attached . Contrast with remote . local area network (LAN) . A data network located on the user s premises in which serial transmission is used for direct data communication among data stations. (T) It services a facility without the use of common carrier facilities. log . To record; for example, to log error information onto the system disk.
Glossary
279
logically partitioned mode (LPAR) . A mode that allows the operator to allocate hardware resources of the processor unit among several logical partitions. logical partition . In LPAR mode, a subset of the processor unit resources that is defined to support the operation of a system control program (SCP). logical unit (LU) . In SNA, a port to the network through which an end user can communicate with another end user. loop . (1) A sequence of instructions processed repeatedly while a certain condition prevails. (2) A closed unidirectional signal path connecting input/output devices to a network. LPAR . LPDA . See logically partitioned mode. See link problem determination aid.
mode . In any cavity or transmission line, one of those electromagnetic field distributions that satisfies Maxwell s equations and the boundary conditions. The field pattern of a mode depends on wavelength, refractive index, and cavity or waveguide geometry. (A) multidrop (network) . A network configuration in which there are one or more intermediate nodes on the path between a central node and an endpoint node. multiple preferred guests . A VM/XA facility that, with the Processor Resource/Systems Manager (PR/SM), supports up to six preferred virtual machines. See also preferred virtual machine . multiplexing . In data transmission, a function that permits two or more data sources to share a common transmission medium so that each data source has its own channel. MVS . Multiple Virtual Storage, consisting of MVS/System Product Version 1 and the MVS/370 Data Facility Product operating on a System/370 processor. See also MVS/XA . MVS/SP . Multiple Virtual Storage/System Product.
M
main storage . A logical entity that represents the program addressable portion of central storage. See also central storage. All user programs are executed in main storage. management class . A list of data set migration, backup, and retention attributes that DFHSM uses to manage storage at the data set level. MAP (manufacturing automation protocol) . A communication protocol used mainly to communicate between electronic equipment associated with the manufacturing process. master catalog . A catalog that points to user catalogs. See catalog . Mb. MB. Megabit. Megabyte; 1 048 576 bytes. A physical carrier of electrical or optical
MVS/XA . Multiple Virtual Storage/Extended Architecture, consisting of MVS/System Product Version 2 and the MVS/XA Data Facility Product, operating on a System/370 processor in the System/370 extended architecture mode. MVS/XA allows virtual storage addressing to 2 gigabytes. See also MVS .
N
NetView . An IBM licensed program used to monitor a network, manage it, and diagnose its problems. network . An arrangement of programs and devices connected for sending and receiving information. node . A junction point in a network, represented by one or more physical units.
medium . energy.
megabit (Mb) . A unit of measure for throughput. 1 m e g a b i t = 1 048 576 bits. megabyte (MB) . (1) A unit of measure for storage size. One megabyte equals 1 048 576 bytes. (2) Loosely, one million bytes. migration . In DFHSM , the process of moving a cataloged data set from a primary volume to a migration level 1 volume or migration level 2 volume, from a migration level 1 volume to a migration level 2 volume, or from a volume not managed by DFHSM to a migration level 1 or migration level 2 volume.
O
office system . A set of application that provide support in areas like decision support, text services, electronic mail, data base access, and professional support. They integrate text, data, graphic and image processing. offline . Not controlled directly by, or not communicating with a computer. Contrast with online. offload . To move data or programs out of a storage.
280
online . Being controlled directly by, or directly communicating with a computer. Contrast with offline. online . Pertaining to equipment, devices, or data under the direct control of the processor. operating system (OS) . Software that controls the execution of programs. An operating system may provide services such as resource allocation, scheduling, input/output control, and data management. (I) (A) Note: Although operating systems are predominantly software, partial or complete hardware implementations are possible.
complex. Performance is largely determined by throughput, response time, and system availability. PICK . An operating system made by PICK Systems for various applications written for asynchronous machines. pool . POR . See storage pool See power-on reset.
port . (1) An access point for data entry or exit. (2) A connector on a device to which cables for other devices such as display stations and printers are attached. port address name . In an Enterprise Systems Connection Director (ESCD), a user-defined symbolic name of 24 characters or less that identifies a particular port. power-on reset . The state of the machine after a logical power-on before the control program is IPLed. preferred virtual machine . A virtual machine that runs in the V = R area. The control program gives this virtual machine preferred treatment in the areas of performance, processor assignment, and I/O interrupt handling. See also multiple preferred guests . processor controller element (PCE) . Hardware that provides support and diagnostic functions for the processor unit. The processor controller communicates with the processor unit through the logic service adapter and the logic support stations, and with the power supplies through the power thermal controller. It includes: primary support processor (PSP), initial power controller (IPC), input/output support processor (IOSP), and the control panel assembly. Processor Resource/Systems Manager (PR/SM) . A function that allows the processor unit to operate several system control programs (SCPs) simultaneously in LPAR mode. It provides for logical partitioning of the real machine and support of multiple preferred guests. See also multiple preferred guests . processor storage. . (1) The storage in a processing unit. (2) In virtual storage systems, synonymous with real storage. profile . Data that describes the significant characteristics of a user, a group of users, or one or more computer resources. PROFS . Professional office system.
option . (1) A specification in a statement, a selection from a menu, or a setting of a switch, that can be used to influence the execution of a program. (2) A hardware or software function that can be selected or enabled as part of a configuration process. (3) A piece of hardware (such as a network adapter) that can be installed in a device to modify or enhance device function. OS . See operating system.
P
page . In a virtual storage system, a fixed-length block that has a virtual address and is transferred as a unit between real storage and auxiliary storage. (I) (A) parallel channel . A data path along which a group of signals representing a character or any other entity of data can be sent simultaneously. parameter . (1) A variable that is given a constant value for a specified application and that can denote the application. (2) An item in a menu for which the user specifies a value or for which the system provides a value when the menu is interpreted. (3) Data passed between programs or procedures. Pascal . A high-level programming language that is effective for system development and technical problem solving. password identifier . In an Enterprise Systems Connection Director (ESCD), a user-defined symbolic name of 24 characters or less that identifies the password user. path . PCE. In a network, a route between any two nodes. Processor controller element.
performance . For a storage subsystem, a measurement of effective data processing speed against the amount of resource that is consumed by a
program temporary fix (PTF) . A temporary solution or by-pass of a problem diagnosed by IBM as
Glossary
281
resulting from an error in a current unaltered release of the program. protocol . (1) A set of semantic and syntactic rules that determines the behavior of functional units in achieving communication. (2) In SNA, the meanings of and the sequencing rules for requests and responses used for managing the network, transferring data, and synchronizing the states of network components. (3) A specification for the format and relative timing of information exchanged between communicating parties. PR/SM . ps . PTF . See Processor Resource/Systems Manager.
session . In SNA, a logical connection between two network addressable units (NAUs) that can be activated, tailored to provide various protocols, and deactivated as requested. SMS . See storage management subsystem
SMS class . A list of attributes that SMS applies to data sets having similar allocation (data class), performance (storage class), or availability (management class) needs. SNA . See Systems Network Architecture. Structured query language/data system.
standard . Something established by authority, custom, or general consent as a model or example. station . (1) An input or output point of a system that uses telecommunication facilities; for example, one or more systems, computers, terminals, devices, and associated programs at a particular location that can send or receive data over a telecommunication line. (2) A location in a device at which an operation is performed; for example, a read station. (3) In SNA, a link station. storage . A unit into which recorded text can be entered, in which it can be retained and processed, and from which it can be retrieved. storage class . A list of storage performance and availability service requests. storage group . VIO , a list of real DASD volumes, or a list of serial numbers of volumes that no longer reside on a system but that end users continue to reference in their JCL . storage management subsystem (SMS) . An operating environment that helps automate and centralize the management of storage. To manage storage, SMS provides the storage administrator with control over data class, storage class, management class, storage group, and ACS routine definitions. storage pool . A predefined set of DASD volumes used to store groups of logically related data according to user requirements for service or according to storage management tools and techniques. subchannel . The channel facility required for sustaining a single I/O operation. subchannel (SCH) . In ESA/370 mode, a group of contiguous words in the hardware system area that provides all of the information necessary to initiate, control, and complete an I/O operation. subsystem . A secondary or subordinate system, or programming support, usually capable of operating
R
real time . Pertains to the actual time during which a physical process transpires. remote . Pertaining to a system, program, or device that is accessed through a telecommunication line. Contrast with local. request for price quotation (RPQ) . for a product. A custom feature
RETAIN . Remote technical assistance and information network. row . A horizontal arrangement of data. Contrast with column . RPQ . See request for price quotation.
S
SAA . See Systems Application Architecture. saved configuration . In an Enterprise Systems Connection Director (ESCD), a stored set of connectivity attributes whose values determine a ESCD configuration that can be used to replace all or part of the configuration currently active. Contrast with active configuration . SCP . SEC . System control programming. System engineering change.
service console . A logical device used by service representatives to maintain the processor unit and to isolate failing field replaceable units. The service console can be assigned to any of the physical displays attached to the input/output support processor.
282
independently of or asynchronously with a controlling system. synchronous . (1) Pertaining to two or more processes that depend on the occurrences of a specific event, such as common timing signal. (2) Occurring with a regular or predictable time relationship. system . (1) The processor unit and all attached and configured I/O and communication devices. (2) In information processing, a collection of machines, programs, and methods organized to accomplish a set of specific functions. system control programming (SCP) . IBM-supplied programming that is fundamental to the operation and maintenance of the system. It serves as an interface with licensed programs. system-managed storage . An approach to storage management in which the system determines data placement and an automatic data manager handles data backup, movement, space, and security. system reset (SYSRESET) . To reinitialize the execution of a program by repeating the initial program load (IPL) operation. Systems Application Architecture (SAA) . An architecture developed by IBM that consists of a set of selected software interfaces, conventions, and protocols, and that serves as a common framework for application development, portability, and use across different IBM hardware systems. Systems Network Architecture (SNA) . The description of the logical structure, formats, protocols, and operational sequences for transmitting information units through, and controlling the configuration and operation of, networks. S/370 . System/370 mode.
token . A sequence of bits passed from one device to another on the token-ring network that signifies permission to transmit over the network. It consists of a starting delimiter, an access control field, and an end delimiter. The access control field contains a bit that indicates to a receiving device that the token is ready to accept information. If a device has data to send along the network, it appends the data to the token. When data is appended, the token then becomes a frame. Token-Ring . A network with a ring topology that passes tokens from one attaching device (node) to another, complying with the IEEE 802.5 standard. A node that is ready to send can capture a token and insert data for transmission. topology . units. The geometric configuration of connected
track . A portion of a disk that is accessible to a given read/write head position. transmission control protocol/internet protocol (TCP/IP) . A public domain networking protocol with standards maintained by US Department of Defense to allow unlike vendor systems to communicate.
U
upgrade . To add features to a system. user ID . A predefined set of one to eight characters that uniquely identifies a user to the system.
V
V = R. Virtual equals real. virtual machine (VM) . (1) A functional simulation of a computer and its associated devices. Each virtual machine is controlled by a suitable operating system. VM/370 controls concurrent execution of multiple virtual machines on a single System/370. (2) In VM, a functional simulation of either a System/370 computing system or a System/370-Extended Architecture computing system. Each virtual machine is controlled by an operating system. VM controls concurrent execution of multiple virtual machines on a single system. virtual storage (VS) . (1) The storage space that can be regarded as addressable main storage by the user of a computer system in which virtual addresses are mapped into real addresses. The size of virtual storage is limited by the addressing scheme of the computer system and by the amount of auxiliary storage available, not by the actual number of main storage locations. (2) Addressable space that is apparent to the user as the processor storage space, from which the instructions and the data are mapped into the processor storage locations.
Glossary
S/390 . System/390. Any ES/9000 system including its associated I/O devices and operating system(s).
T
table . Information presented in rows and columns. TCP/IP . See transmission control protocol/internet protocol. telecommunication-attached . Pertaining to the attachment of devices by teleprocessing lines to a host processor. Synonym for remote . Contrast with channel-attached . terminal . In data communication, a device, usually equipped with a keyboard and display device, that can send and receive information.
283
virtual telecommunication access method (VTAM) . This program provides for workstation and network control. It is the basis of a System Network Architecture (SNA) network. It supports SNA and certain non-SNA terminals. VTAM supports the concurrent execution of multiple telecommunications applications and controls communication among devices in both single-processors and multiple processors networks. VM . See virtual machine. Virtual Machine/Extended Architecture.
W
wait . The condition of a processing unit when all operations are suspended. wide area network . A network that provides communication services to a geographic area larger than that served by a local area network. workstation . (1) An I/O device that allows either transmission of data or the reception of data (or both) from a host system, as needed to perform a job; for example, a display station or printer. (2) A configuration of I/O equipment at which an operator works. (3) A terminal or microcomputer, usually one connected to a mainframe or network, at which a user can perform tasks. write . To make a permanent or transient recording of data in a storage device or on a data medium.
VM/XA .
volume . A certain portion of data, together with its data carrier, that can be mounted on the system as a unit; for example, a tape reel or a disk pack. For DASD, a volume refers to the amount of space accessible by a single actuator. VSE (Virtual Storage Extended) . An operating system that is an extension of DOS/VS. A VSE system consists of a) a licensed VSE/Advanced Functions support and b) any IBM-supplied and user-written programs required to meet the data processing needs of a user. VSE and the hardware it controls form a complete data processing system. Its current version is called VSE/ESA. VSE/ESA (Virtual Storage Extended/Enterprise Systems Architecture) . The most advanced VSE system currently available.
X
XA . Extended architecture.
284
List of Abbreviations
ABEND ACB ACF
abnormal end access method control block advanced communications function (MVS-based software) advanced communications function for virtual telecommunications access method (MVS-based software) alternate CPU recovery automated operations control/multiple virtual storage (IBM) application owning region authorized program analysis report advanced program-to-program communication advanced peer-to-peer networking (IBM program product) automatic restart manager component of MVS address space identifier (MVS) battery backup unit bootstrap dataset (DB2) basic telecommunications access method Backup While Open (IBM DFHSM enhanced backup option) continuous availability (optically read) compact disk - read only memory cross-domain resource managers configuration data set central electronics complex, synonym for CPC Master Terminal Transaction (CICS) coupling facility Coupling Facility Channel Coupling Facility Receiver
Coupling Facility Resource Manager Coupling Facility Sender channel path channel path id control interval customer information control system (IBM) customer information control system/enterprise systems architecture (IBM) CICS VSAM recovery (IBM program product, MVS or VSE) count key data checkpoint command list CICS Managing address space complementary metal oxide semiconductor central processing complex central processing unit CICS system definition channel subsystem channel to channel control unit direct access storage device data base Data Base Control Subsystem data base management system data base recovery control (IMS) distributed console access facility disabled console communications facility (MVS) Distributed Computing Environment (OSF) a JCL dynamic allocation statement for MVS data entry data base data facility data set services (IBM software product)
ACF/VTAM
ACR AOC/MVS
CICSVR
CKD CKPT CLIST CMAS CMOS CPC CPU CSD CSS CTC CU DASD DB DBCTL DBMS DBRC DCAF DCCF DCE DDDEF DEDB DFDSS
APPN
285
DFHSM DFSMS
data facility hierarchical storage manager Data Facility Storage Management Subsystem (MVS and VM) Data Facility Storage Management Subsystem/MVS data facility sort (IBM program product) DASD fast write data language 1 dynamic storage area dynamic system interchange (JES3) eligible devices table (MVS control block) ESCON multiple image facility event notification facility enqueue end of memory ESCON director (ES/9000) enterprise systems connection (architecture, IBM System/390) Extended Terminal Option (IMS DC) external time reference file owning region gigabyte (10**9 bytes or 1,000,000,000 bytes) group buffer pool (DB2) global resource serialization (MVS) hardware configuration definition (MVS/SP) hardware maangement console hardware system area input/output International Business Machines Corporation the program name for access method services (OP SYS) Institute of Electrical and Electronics Engineers interface control check information management system
IMS/DB IMS/ESA
information management system/data base information management system/enterprise systems architecture information management system/virtual storage initialize/initial/initiate I/O configuration data set I/O configuration program input/output definition file input/output supervisor inter-processor communication interactive problem control system intelligent printer data stream (IBM) initial program load IMS/VS resource lock manager inter-system communications International Technical Support Center (IBM) International Technical Support Organization job control language (MVS and VSE) job entry subsystem (MVS) local area network logical control unit licensed internal code logout recorder (error recording DB in OS/VS) logically partitioned mode link problem determination aid logical terminal logical unit link/linkage index multi-access spool (JES2) multiple console support missing interruption handler multi-processing multiple virtual storage (IBM System 370 & 390)
DFSMS/MVS DFSORT DFW DL/I DSA DSI EDT EMIF ENF ENQ EOM ESCD ESCON
IMS/VS INIT IOCDS IOCP IODF IOS IPC IPCS IPDS IPL IRLM ISC ITSC ITSO JCL JES LAN LCU LIC LOGREC LPAR LPDA LTERM LU LX MAS MCS MIH MP MVS
ETO ETR FOR GB GBP GRS HCD HMC HSA I/O IBM IDCAMS IEEE IFCC IMS
286
MVS/ESA
multiple virtual storage/enterprise systems architecture (IBM) multiple virtual storage/extended architecture (IBM) nucleus initialization program network job entry operations, planning & control (IBM program product) operations planning & control/enterprise systems architecture (IBM) open systems adapter overflow sequential access method parameter MVS initialization parameter library personal computer processor controller element power on reset Peer-to-Peer Remote Copy (IBM 3990 Model 6) processor resource/systems manager (IBM) physical terminal publications resource access control facility brand name and trademark of IBM. return code resource definition on-line (CICS) recovery control (data set) restructured extended executor language remote job processing record level sharing resource measurement facility (MVS) resource name lists ring processing system authority message (MVS control block) remote site recovery (IMS) real-time analysis (CICS)
SCA SCDS SCH SE SFM SID SIGP SIT SMF SMP/E SNA SQL SSI STC STCK SUBSYS SVC SYSCAT SYSDEF SYSLOG SYSPLEX SYSRES SYSRESET TCP/IP
Shared Communications Area (MVS/XCF Coupling Facility) save control data set subchannel service element Sysplex Failure Manager system identification signal processor system initialization table system management facility system modification program/extended (MVS) systems network architecture (IBM) structured query language subsystem interface (MVS) started task control store clock subsystem supervisor call instruction (IBM System/360) system catalog system definition (frame) system log systems complex system residence file/disk system reset Transmission Control Protocol/Internet Protocol (USA, DoD, ARPANET; TCP=layer 4, IP=layer 3, UNIX-ish/Ethernet-based system-interconnect protocol) U.S. Dept. of Defense s virtual terminal protocol, based on TCP/IP transaction manager time of day terminal owning region time sharing option time sharing option extensions text uni-processor uninterruptible power supply/system user identification
MVS/XA
OSA OSAM PARM PARMLIB PC PCE POR PPRC PR/SM PTERM PUBS RACF RAMAC RC RDO RECON REXX RJP RLS RMF RNL RSA
TELNET
RSR RTA
List of Abbreviations
287
unformatted system services (SNA) virtual input output virtual machine (IBM System 370 & 390) virtual machine/extended architecture (IBM) volume serial virtual storage access method (IBM)
VTAM
virtual telecommunications access method (IBM) (runs under MVS, VM, & DOS/VSE) VTAM definition library write to operator with reply cross-system coupling facility (MVS) cross-system extended services (MVS) extended recovery facility
288
Index Numerics
3088 configuration 13 maintenance 14 3174 MVS console 24 sysplex console attachment 21 3490 25 3990 concurrent copy 172, 175 DB2 data 105 dual copy 12 Extended Remote Copy 216 model 3 17 model 6 17, 53 Peer-to-Peer Remote Copy 215 remote copy 215 9021 711-based processors 144, 145 cross partition authority 67, 68, 69 9032 18, 146 9036-003 11 9037 10 9121 511-based processors 145 9672 clock 148 cross partition authority 68, 69 dynamic storage reconfiguration 145 HMC 21 image profile 67, 68, 69 IOCDS 143 power 27 R1 machines 144 R2 and R3 machines 144 9674 7, 27 9729 20 9910 27 AOC/MVS (continued) description 110 graphical interface 111 NetView 110 shutdown 169 area data set 173 ARM See Automatic Restart Manager (ARM) ARMRESTART 84, 85 ARMRST 84 auto-switchable tape 89 Automatic Restart Manager (ARM) AOC/MVS interaction 110 automating sysplex failure management characteristics 80 CICS definition 156 CICS implementation 83 CICS support 82 couple data set 35, 37, 81 DB2 support 85 definition 81 description 79 IMS element name 84 IMS element type 84 IMS support 84 IMSID 84 parameters 81 Policy TOTELEM keyword 81 subsystem interaction 82 SYSIMS 84 VTAM support 87 automation AOC/MVS 110 NetView 110 OPC/ESA 111 tools 110, 169 AUTOSWITCH 54 availability database considerations 171 DB2 subsystem 105 high 3 RLS database 108
58
A
Abbreviations 285 ABEND 75, 77, 85 ACDS 50, 52 ACQUIRE 69 Acronyms 285 ACTIVATE command 34 activation 70 ACTSYS Parameter 64 alternate consoles 43 ALTGRP 43 AOC/MVS ARM interaction 80 ARM restart 110
B
backup database considerations DB2 database 175 DL/1 database 173 VSAM database 172 batch database considerations DB2 database 174 DL/1 database 173 171
171
289
batch (continued) m o v i n g workload 164 OPC/ESA 165 VSAM database 171 battery backup 8, 27 BCDS 52, 53 BLWSPINR 77
C
CANCEL parameter 85 central processing complex (CPC) number that can be managed by one HMC 21 central storage 145 CFCC See Coupling Facility Control Code (CFCC) CFRM See Coupling Facility Resource Manager (CFRM) channel card 144 ESCON 143 parallel 143 CICS 83 adding a subsystem 156 affinities 97 ARM Implementation 83 ARM Support 82 backup while open (BWO) considerations 172 CICSPlex SM 217 coupling facility structure 106 CSD 97 disaster recovery 217 failure 82 file-owning region 97 logging 109 moving workload 161 Resource Definition Online (RDO) 97 restarting a TOR 211 restarting an AOR 212 RLS control data set 107 RLS database 108 shared temporary storage 97 shutdown 166 SMSVSAM 106 starting 159 storage protection 98 topology 96 transaction isolation 98 VSAM structure 108 CICSPlex SM affinities 97 configuration 99 description 83, 98 disaster recovery considerations 217 CLOCKxx 11 cloning 6 CMOS processors 6 sysplex 6
CNGRPxx 45 Command Prefix Facility (CPF) 150 command prefixes 86 COMMDS 50, 52, 53 COMMNDxx 54 concurrent maintenance 9032 model 003 18 channel 144 CP 144 LIC patches 144 CONNFAIL 64, 65, 66 CONSOLE statement 44 consoles 9672 21 alternate 43 C O N S I D = 0 22 extended MCS 22, 23 groups 43 integrated 22 JES3 89 master 22 MCS 22, 43 MSTCON 45 MVS 43 subsystem 22 system 22, 45 CONSOLxx 43 continuous availability 3 configuration 5 operations 3 couple data set alternate 35 Automatic Restart Manager (ARM) 35, 37, 80, 207 COUPLE00 member 149 Coupling Facility Resource Manager (CFRM) 35, 37, 54, 207 description 35 determining size 36 failure 206 performance and availability 37 placement 37 reformatting 126 spare 36 swapping 71 sysplex 35, 37, 206 Sysplex Failure Management (SFM) 35, 37, 207 System Logger (LOGR) 35, 37, 46, 208 Workload Manager (WLM) 35, 37, 207 COUPLEXX INTERVAL parameter 58, 62, 68, 69, 73, 74 OPNOTIFY parameter 58, 76 sample member 226 SFM considerations 58 SMS group name 51 coupling facility alternate 8 CFCC 9
290
coupling facility (continued) CFRM policy changes 125 CICS logstream 109 configuration 7 DB2 104, 105 DB2 structure 130 dump space 120 exploiters 128 IMS 102 IMS lock structure 128 JES2 structure 129 links 7, 35 logstream structure 131 maintenance 132 moving a structure 120 OSAM and VSAM structure 128 RACF structure 130 shared tape structure 131 shutdown procedure 134 SMSVSAM structure 131 structure allocation 117 configuration 7 connections 118 connectivity 8 DB2 9 definition 8 IEFAUTOS 131 IMS lock 128 ISTGENERIC 10 JES2 checkpoint 9, 129 last structure condition 15 LOGR 46 logstream 46, 131 OSAM and VSAM 128 RACF 8, 130 rebuilding 121 relocation 8 shared tape 131 SMSVSAM 131 system logger 9 VTAM 129 VTAM generic resources 10 XCF signalling paths 14 XCF structure 129 volatility 8 VSAM 108 VSAM RLS 106 VTAM structure 129 XCF structure 129 Coupling Facility Control Code (CFCC) 9 Coupling Facility Resource Manager (CFRM) couple data set 35, 37, 54 policy 54, 66, 73 PREFLIST statement 8 CSVDYNEX 55 CTC 13
D
DASD path configuration 17 DASD Fast Write (DFW) 37, 50 data set IMS 102 LOGREC 41, 149, 238 PAGE 149, 238 PAGE/SWAP 41 SMF 41, 149, 238 STGINDEX 41, 149, 238 data sharing 3 DATA TYPE 70 DB2 adding a member 158 ARM support 85 availability 105 Call Attachment Facility (CAF) 164 CICS and IMS considerations 164 coupling facility structure 9, 104 database considerations 174 database structure 130 description 103 disaster recovery 218 failure 82 moving workload 163 shutdown 167 starting 160 subsystem definition 105 subsystem parameters 105 TSO and batch considerations 164 DB2 group name 85 DCCF 24, 44, 45 DEACTIVATE 68 DEACTTIME 58, 64, 67, 68 DFSMS 50, 52 coupling facility structure 10 DFSMShsm considerations 165 moving workload 165 SMSVSAM structure 131 DFSMSdss 37, 53 DFSMShsm 52, 53, 165 DFSMShsm journal 52 DSI 91, 93 dual copy 12 DUPLEXMODE 9 DYNAMIC 34, 57 dynamic exits 55 dynamic I/O reconfiguration 33 dynamic sparing 15 dynamic storage reconfiguration 145 dynamic subsystem interface (SSI) 56
E
EDT 34 ELEMENT 85
Index
291
EMIF 7 ENF 85 ENQ 53 ESCON channels 11, 143 CTC 13 devices 145 director 12, 18, 145, 146 EMIF 7 I/O configuration 11 logical paths 12, 18 manager 19 ESCON Manager 24 ESTORE 64 expanded storage 145 exploiting dynamic functions 55 EXSPATxx 58, 73, 77 Extended Remote Copy 216
I
I/O configuration 11 connectivity 11 definition file (IODF) 35 devices 145 dynamic reconfiguration 33 ICKDSF logical path report 18 ICMF 7, 218 IEACMDxx 54 IEASYM 222 IEASYMxx 222, 223 IEASYSxx 41, 51, 224 IECIOSxx 38 IEFAUTOS 54 IEFAUTOS structure 131 IEFJFRQ 57 IEFSSI 56 IEFSSNxx 56, 86 IEFSSVT 56 IEFSSVTI 56 Image 69, 221 image profile 67, 68 IMS area data set 173 ARM support 84 cache directory 10 cloning 101, 102, 103, 157 coupling facility structure 102 database considerations 172 DEDB database 173 disaster recovery 216 failure 82 FFDB database 173 fuzzy image copy 174 IRLM definitions 102 lock structure 128 moving workload 163 OSAM and VSAM structure 128 RSR 216 shared data sets 102 shutdown 166 starting 160 subsystem identifier 101 SVC 102 terminal definition 101 topology 100 unique data sets 102 IMSID 84 indirect catalog 30 installation 70 INTERVAL description 74 interval detection 73 recommendations 62 SFM planning 58 values definitions 68
F
FAILING 67 FAILSYS 64, 72 failure detection 74 fault-tolerant system 4 fiber 20 FORCE parameter 85 fuzzy backup 172 fuzzy image copy 174
G
generic resources 10 glossary 275 GRSCNFxx 73, 78 GRSRNLxx 53, 54
H
hardware management console (HMC) changing time 148 description 21 usage during NIP 21 HCD ACTIVATE function 34 adding I/O device 146 download IOCDS 143 dynamic capable IOCDS 33 TIMEOUT parameter 14 HCPYGRP 44 HMC See hardware management console (HMC) hot I/O 79 HSA 34 HWNAME 223
292
IOCDS 34, 35, 143 IOCP stand-alone 143 TIMEOUT parameter 14 IODF 33, 54 IPL load parameters 35 message suppression 21 IPLPARM members 222 IRLM lock structure 10 ISOLATE 63, 68 ISOLATETIME 58, 62, 64, 68, 73, 76 ISOLATETIME. 67 ISTGENERIC structure 10 ITEM NAME 81 IXCARM 80, 82 IXCL1DSU 71, 81 IXCMIAPU 70, 72 IXCMIAPU utility 47 IXGCONN 49 IXGINVNT service 47
150
L
LIC patches 144 LOADxx 34, 222, 268 local UPS 27 log data sets 46 log data, duplexing 47 logger See system logger LOGR 46 LOGREC data set 41, 149, 238 logs 148 LOGSTREAM 9, 46 LPAR adding 34, 144 dynamic storage reconfiguration isolation 68 processing weights 145 resetting 67 SFM parameters 64 storage acquisition 69 LPARNAME 223 LPDEF 67, 68, 69
145
J
JCL CICS startup 82 started tasks 42 system symbols 42 JES2 checkpoint 8 performance 39 placement 38 reconfiguration 39 structure placement 38 checkpoint coupling facility structure CKPT structure 129 duplicated TSO logon 109 startup procedure 227 structure failure 39 JES3 adding a global 150 adding a local 150 adding a subsystem 150 ARM exploitation 87 CONSTD statement 92 DSI 91, 93 initialization stream CONSOLE statement 89 DEVICE statement 88, 89 MAINPROC statement 88, 150 OPTIONS statement 91 RJPWS statement 88 managed devices 34, 88 managing tape allocation 54 planning changes 87 PLEXSYN keyword 92 SMS support 52 SYN keyword 92
M
master catalog 32 MAXELEM 81 MCDS 52, 53 messages IWM012E 207 IXC253I 208 IXC263I 208 IXC267I 207, 208, 209 IXC808I 208 IXC809I 208 undeliverable (UD) 22 MIH 38 MVS ACTIVATE command 34 adding a new SYSRES 151 adding an image 149 NIP console 21 removing an image 169 ripple IPL 154
N
N a n d N + 1 29, 155 NAME 64 naming conventions 40 NetView description 110 focal point 110
Index
293
R
RACF coupling facility structure 8 database 40 database structure 130 structure failure 40 RAMAC 11, 12, 15, 17 REBUILD 65 REBUILDPERCENT 66, 73 RECONFIG 64, 71 reconfiguration 33 redundancy 4 Remote Site Recovery (IMS) 216 reorganization database considerations 171 DB2 database 175 DL/1 database 174 VSAM database 172 REPORT 70 reserve 37, 38, 53 reset failing logical partition 67 RESETTIME 58, 64, 67, 68 RESMIL 78 restart groups 81 RPQ 8K1919 11 RSA 78 RVARY command 40
O
OCDS 52, 53 OPC/ESA ARM support 80 controller 111 description 111 job routing 111 moving workload 165 shutdown 169 OPERLOG 213 OPNOTIFY 58, 76
P
PAGE data set 149, 238 PAGE/SWAP data set 41 parallel attached devices 146 channels 143 parallel sysplex sample configuration 221 PARMLIB See SYS1.PARMLIB PARTITION 64, 68 Peer-to-Peer Remote Copy 215 performance 6 PLEXSYN 92 policy 81 active SFM 70 Automatic Restart Manager (ARM) 81 Coupling Facility Resource Manager (CFRM) 66, 73 DFSMS 50 Sysplex Failure Management (SFM) 71 XCF PR/SM 66 POR 33 power battery backup 8 failure 8 save state 27 subsystem 144 supply 144 UPS 8, 26 PR/SM 10 processing weights 145 processor 69 adding 143 bipolar 6 changing 144 CMOS 6 configuration for continuous availability 6 N+1 6 r e m o v i n g 143
S
54, SCDS 50, 52 SCTC 13 SE See service element (SE) service element (SE) clock 148 SET 55 SETPROG EXIT 55 SETSSI 56, 57 SETXCF command 36, 70, 71, 81 SFM 66 See also Sysplex Failure Management (SFM) shared SYSRES 29, 30 shared tape 54, 89 single point of failure 4 single system image 40 SMF allocation sample 238 data sets 41 dynamic exit 55 system cloning 149 system identifier (SID) 40 time changes 148 SMSplex 50
294
spin loop 79 SPINRCVY 74, 77 SPINTIME 58, 73, 74, 75, 76, 77 SSIDATA 57 staging data sets 49, 50 standards 6 standby system 3 START 83 status update missing 37 STC 80 STGINDEX data set 41, 149, 238 STOR 69 refid.Sysplex Failure Management (SFM) STORAGE 69 STORE 64 subsystem adding a CICS subsystem 156 changing 160 CICS 166 CICS startup 159 DB2 167 DB2 startup 160 IMS 166 IMS startup 160 shutdown 165 starting 159 SYMDEF 223 SYN 92 SYNCHDEST 44, 45 synchronous WTO(R) 45 SYS1.PARMLIB CLOCKxx member 11, 146 CNGRPxx member 45 COMMNDxx member 22 considerations 40 CONSOLxx member 22, 43, 45 COUPLE00 member 149 DEFAULT statement 45 ETRMODE keyword 11, 146 ETRZONE keyword 11, 146 EXSPATxx member 79 GRSRNLxx 53, 54 IEACMDxx member 22 IEASYMxx member 149 IECIOSxx member 79 IEFSSNxx member 156, 158 m e m b e r s 222 OPER Keyword 79 PLEXCFG parameter 90 SCHEDxx member 156 SPINRCVY keyword 74 SPINTIME keyword 74 SYNCHDEST keyword 45 TERM keyword 75 TOLINT keyword 73, 78 XCFPOLxx 66 SYS1.PROCLIB 227
70
SYS1.SAMPLIB 77 SYSCLONE 42 SYSCONS 22 SYSDEF 223 SYSIMS 84 SYSNAME 42 SYSPARM 223 sysplex CMOS only 6 keyword in LOADxx member 222 mixed 6 name 51 symbolic 42 t i m e r 10, 146 Sysplex Failure Management (SFM) activation 69 active policies 70 ARM considerations 79 automating 57 couple data set 35, 37 isolate function 59 native environment considerations planning 58 policies 70, 71 Policy ACTSYS keyword 64 CONNFAIL keyword 64 DEACTTIME keyword 64 ESTORE keyword 64 FAILSYS keyword 64 ISOLATETIME keyword 64 NAME keyword 64 PROMPT keyword 64 RECONFIG keyword 64 RESETTIME keyword 64 STORE keyword 64 SYSTEM keyword 64 TARGETSYS keyword 64 WEIGHT keyword 64, 65 PR/SM environment considerations stopping 72 timing 67 utilization 72 SYSPLEX symbolic 42 system group 51 system logger address space failure 213 application failure 212 CICS definition 156 coupling facility sensitivity 8 description 46 logstream allocation 46 logstream structure 131 OPERLOG failure 213 sysplex failure 213 system failure 213 SYSTEM parameter 64
64
64
Index
295
154
W
WEIGHT 64, 65, 73 WLM See Workload Manager (WLM) workload balancing 4 batch 164 CICS 161 DB2 163 DFSMS 165 IMS 163 m o v i n g 161 TSO 164 Workload Manager (WLM) compatibility mode 79 couple data set 35, 37 sysplex recovery 80
T
tape 54 tape switching 54 TARGETSYS 64, 69 TERM 75 time changing 146 9672 HMC and SE 148 IMS 146 SMF 148 daylight savings 146, 147 detection interval 73 local 146 log timestamps 148 MVS clocks 146 setting in MVS 11 standard 146, 147 s u m m e r 146 TOD clock 10 winter 146 zone offset 146 time-of-day (TOD) clock 10 TOLINT 73, 78 TOTELEM 81 TSO/E adding 159 description 109 moving workload 164
X
XCF (Cross Systems Coupling Facility) address space 80 connectivity failure 65 couple data set 35, 37 failure detection interval 74 group name JES3 91, 150 PR/SM policy 66 signalling paths alternate 14 configuration 14 JES3 use 90 transport class 90 signalling structure 129 XCFPOLxx 66 XES 57 XRF 83
U
UCB 34 U I M 33 uninterruptible power supply (UPS) 8, 26 UPS See uninterruptible power supply (UPS)
V
VARY device command 54 VARY XCF 60, 62 volatile, coupling facility 39, 47, 49 VSAM database 171 VTAM APPN 112 ARM support 87 configuration 112 coupling facility structure 10 generic resources 10 ISTGENERIC structure 129 VTAMLST 232
296
RED000
Your feedback is very important to help us maintain the quality of ITSO Bulletins. Please fill out this questionnaire and return it using one of the following methods:
Mail it to the address on the back (postage paid in U.S. only) Give it to an IBM marketing representative for mailing Fax it to: Your International Access Code + 1 914 432 8246 Send a note to REDBOOK@VNET.IBM.COM
Please rate on a scale of 1 to 5 the subjects below. (1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor) Overall Satisfaction Organization of the book Accuracy of the information Relevance of the information Completeness of the information Value of illustrations ____ ____ ____ ____ ____ ____ Grammar/punctuation/spelling Ease of reading and understanding Ease of finding information Level of technical detail Print quality ____ ____ ____ ____ ____
Please answer the following questions: a) If you are an employee of IBM or its subsidiaries: Do you provide billable services for 20% or more of your time? Are you in a Services Organization? b) c) d) Are you working in the USA? Was the Bulletin published in time for your needs? Did this Bulletin meet your needs? If no, please explain: Yes____ No____ Yes____ No____ Yes____ No____ Yes____ No____ Yes____ No____
Comments/Suggestions:
Name
Address
Company or Organization
Phone No.
RED000
IBML
IBM International Technical Support Organization Mail Station P099 522 SOUTH ROAD POUGHKEEPSIE NY USA 12601-5400
SG24-4503-00
IBML
Printed in U.S.A.
SG24-4503-00