You are on page 1of 606

Ultra Enterprise 10000

Administration

ES-400

Student Guide

Sun Educational Services


SunService Division
Sun Microsystems, Inc.
MS UMIL02-105
901 San Antonio Road
Palo Alto, CA 94303
U.S.A.

Part Number 805-1385-01


Revision B, June 1998
Copyright 1998 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, California 94303 U.S.A. All rights reserved.
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution,
and decompilation. No part of this product or document may be reproduced in any form by any means without prior
written authorization of Sun and its licensors, if any.
Portions of this product may be derived from the UNIX® system, licensed from Novell, Inc., and from the Berkeley 4.3 BSD
system, licensed from the University of California. UNIX is a registered trademark in the United States and other countries
and is exclusively licensed by X/Open Company Ltd. Third-party software, including font technology in this product, is
protected by copyright and licensed from Sun’s suppliers.
RESTRICTED RIGHTS: Use, duplication, or disclosure by the U.S. Government is subject to restrictions of FAR 52.227-
14(g)(2)(6/87) and FAR 52.227-19(6/87), or DFAR 252.227-7015(b)(6/95) and DFAR 227.7202-1(a).
Sun, Sun Microsystems, the Sun logo, Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the
United States and other countries. All SPARC trademarks are used under license and are trademarks or registered
trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are
based upon an architecture developed by Sun Microsystems, Inc.
The OPEN LOOK® and Sun™ Graphical User Interfaces were developed by Sun Microsystems, Inc. for its users and
licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical
user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User
Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s
written license agreements.
X Window System is a product of the X Consortium, Inc.
THIS PUBLICATION IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, OR NON-INFRINGEMENT.

Please
Recycle
Contents
About This Course......................................................................................xv
Course Map........................................................................................ xvi
Module-by-Module Overview ...................................................... xvii
Appendices ................................................................................xix
Course Objectives............................................................................... xx
Topics Not Covered......................................................................... xxii
Introductions .................................................................................. xxiv
How to Use the Course Materials.................................................. xxv
How to Use the Icons.................................................................... xxvii
Typographical Conventions and Symbols ............................... xxviii
Ultra Enterprise 10000 Capabilities and Features................................1-1
Course Map........................................................................................ 1-1
Relevance............................................................................................ 1-2
Objectives ........................................................................................... 1-3
Ultra Enterprise 10000 ...................................................................... 1-4
Ultra Enterprise 10000 Features ...................................................... 1-6
Ultra Enterprise 10000 System Cabinet ......................................... 1-8
Installing AP ............................................................................1-13
Limitations for AP 2.0 (Solaris 2.5.1) ....................................1-13
Dynamic Reconfiguration.............................................................. 1-14
Operating System Support ............................................................ 1-17
Solaris Binary Compatibility .................................................1-18
SSP Operating System Levels................................................1-18
Operating System Enhancements................................................. 1-19
The SSP ............................................................................................. 1-21
SSP Logical Connectivity ............................................................... 1-24
The SSP User Environment............................................................ 1-25
System Accounts .....................................................................1-25
SSP Window ............................................................................1-25
Network Console Window....................................................1-26
Hostview ..................................................................................1-27

iii
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Hardware Configuration Control................................................. 1-29
Blacklist.....................................................................................1-29
Figure of Merit.........................................................................1-30
Diagnostic and Monitoring Tools................................................. 1-31
Bringup .....................................................................................1-31
Power On Self Test..................................................................1-32
OpenBoot PROM.....................................................................1-32
Status Monitoring and Display ..................................................... 1-33
SSP.............................................................................................1-33
SunVTS .....................................................................................1-33
redx...........................................................................................1-33
Resiliency Features ......................................................................... 1-34
DC Power .................................................................................1-34
System Boards .........................................................................1-35
Processors.................................................................................1-35
Memory ....................................................................................1-35
I/O Interface Subsystem ........................................................1-35
Redundant Components ................................................................ 1-36
Concurrent Serviceability .............................................................. 1-37
Error Logging .................................................................................. 1-39
Check Your Progress ...................................................................... 1-40
Think Beyond .................................................................................. 1-41
Architecture Overview..............................................................................2-1
Course Map........................................................................................ 2-1
Relevance............................................................................................ 2-2
Objectives ........................................................................................... 2-3
Enterprise 10000 Packaging............................................................. 2-4
Enterprise 10000 Component List................................................... 2-6
Data Interconnects .......................................................................... 2-10
Data Paths ................................................................................2-10
Address Paths..........................................................................2-11
Centerplane Configurability.......................................................... 2-13
The System Board ........................................................................... 2-15
Logical View ............................................................................2-16
Physical View (SBus I/O) ......................................................2-17
Mezzanine (Daughter) Board Packaging..................................... 2-18
SBus Mezzanine Packaging ........................................................... 2-20
PCI Mezzanine Packaging ............................................................. 2-21
Memory Subsystem ........................................................................ 2-22
I/O Subsystem ................................................................................ 2-24
Ultra Port Architecture...........................................................2-24
Logical View ............................................................................2-25
JTAG.................................................................................................. 2-26
Support Boards................................................................................ 2-27
Control Board .................................................................................. 2-29

iv Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Enterprise 10000 Client-Server Architecture............................... 2-31
System Failure Isolation Capabilities........................................... 2-32
Check Your Progress ...................................................................... 2-34
Think Beyond .................................................................................. 2-35
SSP Software Installation.........................................................................3-1
Course Map........................................................................................ 3-1
Relevance............................................................................................ 3-2
Objectives ........................................................................................... 3-3
Enterprise 10000 Network Planning .............................................. 3-4
Enterprise 10000 Network Configurations.................................... 3-5
Control Board Configurations......................................................... 3-6
Domain Network Configurations................................................... 3-7
SSP Privacy ................................................................................3-7
Sample hosts File ............................................................................. 3-8
Enterprise 10000 Network Planning Worksheet .......................... 3-9
The SSP Software ............................................................................ 3-10
SSP Accounts ...........................................................................3-11
The SSP Packages ............................................................................ 3-12
SSP Software Environment Variables .......................................... 3-14
Saving the SSP Configuration Files .............................................. 3-16
Installing and Configuring the SSP Solaris Software ................ 3-18
Installing the xntp Software.......................................................... 3-22
Preparing the System Files ............................................................ 3-24
Installing the SSP Software Packages........................................... 3-26
Configuring the SSP Environment ............................................... 3-28
Responding to the Questions ................................................3-29
Connecting to the Enterprise 10000 Host System ...................... 3-31
Reconfiguring the SSP .................................................................... 3-32
Changing the SSP Type .................................................................. 3-33
Switching to Spare ..................................................................3-33
Switching to Main ...................................................................3-34
/etc/inittab.........................................................................3-34
Dual Control Boards....................................................................... 3-35
Control Board Executive (cbe)..............................................3-36
The Control Board Configuration File ......................................... 3-37
Switching the Active Control Board ............................................ 3-39
Determining the Active Control Board................................3-40
Control Board Executive Image and Port Specification Files ... 3-41
Changing the Control Board Configuration ............................... 3-42
Lab..................................................................................................... 3-43
Check Your Progress ...................................................................... 3-44
Think Beyond .................................................................................. 3-45
System Operation.......................................................................................4-1
Course Map........................................................................................ 4-1
Relevance............................................................................................ 4-2

v
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Objectives ........................................................................................... 4-3
Security Considerations ................................................................... 4-4
Introduction ...............................................................................4-4
General Comments on Security ..............................................4-4
Physical Security .......................................................................4-5
System Security .........................................................................4-6
Network Security ......................................................................4-8
SSP and Control Board Software Block Diagram......................... 4-9
Instances of Client Programs and Daemons ............................... 4-10
Platform Clients.......................................................................4-11
Domain Clients........................................................................4-11
SSP Platform Client Reference ...................................................... 4-12
SSP Domain Client Reference........................................................ 4-13
SSP Daemon Summary .................................................................. 4-14
SSP Daemons ................................................................................... 4-15
Control Board Server (cbs)....................................................4-15
The cb_reset Command ......................................................4-16
The cb_prom Command.........................................................4-16
Event Detector Daemon (edd)...............................................4-17
Event Detector Daemon (edd) Event Handling..................4-19
Event Detector Daemon (edd) Control ................................4-20
File Access Daemon (fad)......................................................4-21
Network Time Protocol Daemon (xntpd) ...........................4-22
The SNMP Daemon (snmpd) .................................................4-23
SNMP Trap Sink Server (straps) ........................................4-25
machine_server.....................................................................4-26
Domain Support Executables ........................................................ 4-27
System Operation............................................................................ 4-28
The hostinfo Command .............................................................. 4-29
hostview.......................................................................................... 4-30
hostview Performance Considerations....................................... 4-32
hostview Main Window ............................................................... 4-33
Main Window Processor Symbols................................................ 4-35
Selecting Items in the Main Window ........................................... 4-37
Help Window .................................................................................. 4-38
Main Window Buttons ................................................................... 4-39
The Failure Window....................................................................... 4-40
SSP Log Files.................................................................................... 4-41
Viewing a Messages File With hostview.................................... 4-42
Administering Power ..................................................................... 4-43
The power Command ..................................................................... 4-44
Examples ..................................................................................4-44
Automatic Recovery From a Power Outage .......................4-45
Power Control From Hostview..................................................... 4-46
Monitoring Power Levels in Hostview........................................ 4-47
Monitoring Temperature in Hostview......................................... 4-49

vi Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
The fan Command ......................................................................... 4-51
Syntax .......................................................................................4-51
Usage.........................................................................................4-51
Controlling Fans From Hostview ................................................. 4-53
Monitoring Fans in Hostview ....................................................... 4-54
Lab..................................................................................................... 4-56
Check Your Progress ...................................................................... 4-58
Think Beyond .................................................................................. 4-59
Domains.......................................................................................................5-1
Course Map........................................................................................ 5-1
Relevance............................................................................................ 5-2
Objectives ........................................................................................... 5-3
Introduction ....................................................................................... 5-4
Inter-Domain Networking.......................................................5-6
Domain Configuration Requirements ........................................... 5-7
Domain Planning .............................................................................. 5-9
The eeprom.image Files ................................................................ 5-10
Creating eeprom.image Files........................................................ 5-12
hostid Information ........................................................................ 5-16
Obtaining Domain Status From the Command Line................. 5-17
domain_status.......................................................................5-17
domain_history.....................................................................5-18
SSP Domain Messages Files ..................................................5-18
Obtaining Domain Status From hostview ................................. 5-19
Switching Domains......................................................................... 5-20
domain_switch.......................................................................5-20
Specifying the Domain for an SSP Window........................5-21
Creating Domains From the Command Line ............................. 5-22
Creating Domains From hostview.............................................. 5-24
Removing Domains From the Command Line .......................... 5-26
Removing Domains From hostview........................................... 5-28
Renaming Domains From the Command Line........................... 5-29
Renaming Domains From hostview ........................................... 5-31
Creating a netcon Window for a Domain .................................. 5-32
Bringing Up a Domain From the Command Line ..................... 5-34
The bringup Command................................................................. 5-35
Configuring the Centerplane ................................................5-36
Bringing Up a Domain From hostview ...................................... 5-37
Overview of netcon ....................................................................... 5-38
Using netcontool.......................................................................... 5-40
netcon Session Types..................................................................... 5-42
netcontool Window Configuration ........................................... 5-44
netcontool Buttons....................................................................... 5-45
Blacklisting Components ............................................................... 5-46
Using the blacklist...................................................................... 5-48

vii
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Blacklisting Boards and Buses With hostview .......................... 5-49
Blacklisting Processors With hostview....................................... 5-50
Clearing the Blacklist File .............................................................. 5-51
Processor Sets .................................................................................. 5-52
Lab..................................................................................................... 5-54
Check Your Progress ...................................................................... 5-55
Think Beyond .................................................................................. 5-56
Installing Solaris in a Host Domain.......................................................6-1
Course Map........................................................................................ 6-1
Relevance............................................................................................ 6-2
Objectives ........................................................................................... 6-3
The Enterprise 10000 Solaris Environment ................................... 6-4
Configuring the SSP as a Boot Server ............................................ 6-6
Preparing the Domain ...................................................................... 6-8
Installing Solaris.............................................................................. 6-11
Booting the Domain for the First Time ........................................ 6-14
Installing Packages From the 2.6
SMCC Server Supplement CD-ROM ........................................ 6-16
Installing Packages From the 2.5.1
SMCC Hardware Updates CD-ROM ........................................ 6-18
Finishing the Installation - Solaris 2.6 .......................................... 6-20
Finishing the Installation - Solaris 2.5.1 ....................................... 6-21
Preinstalled Domain Software ...................................................... 6-22
Lab..................................................................................................... 6-23
Check Your Progress ...................................................................... 6-24
Think Beyond .................................................................................. 6-25
System Boot Process ..................................................................................7-1
Relevance............................................................................................ 7-2
Objectives ........................................................................................... 7-3
The SSP Boot Process........................................................................ 7-4
Prepare the SSP..........................................................................7-4
SSP Boot Process........................................................................7-4
Daemon Start Up.......................................................................7-5
The ssp_startup Script .................................................................. 7-6
Restartable Daemons ................................................................7-7
Domain Bringup Flow...................................................................... 7-8
The bringup Command................................................................... 7-9
Syntax .......................................................................................7-10
Execution ..................................................................................7-11
The hpost Command ..................................................................... 7-12
Syntax .......................................................................................7-12
Functions ..................................................................................7-14
hpost Control Files......................................................................... 7-15
.postrc ....................................................................................7-15

viii Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
blacklist and redlist Files ..................................................... 7-17
blacklist................................................................................7-17
redlist....................................................................................7-18
The obp_helper Command .......................................................... 7-19
Syntax .......................................................................................7-20
Function....................................................................................7-20
Restarting obp_helper ..........................................................7-20
The download_helper Command ............................................... 7-21
Function....................................................................................7-22
Other Boot-Time Software ............................................................. 7-23
netcon_server.......................................................................7-23
Solaris........................................................................................7-24
Console Communication Paths..................................................... 7-25
The OpenBoot PROM..................................................................... 7-26
OpenBoot PROM Functions ..................................................7-26
obp..................................................................................................... 7-28
eeprom.image................................................................................. 7-29
Managing the eeprom.image Files .............................................. 7-31
OBP Environment Variables Specific to the
Enterprise 10000 .............................................................................. 7-33
Boot Time Parameter ..............................................................7-33
Reset Handling ........................................................................7-34
The OBP Device Tree...................................................................... 7-36
Decoding an Interface Card Location ..................................7-36
Decoding a PCI Slot Location................................................7-37
Decoding a Processor Location .............................................7-38
Lab..................................................................................................... 7-40
Check Your Progress ...................................................................... 7-43
Think Beyond .................................................................................. 7-44
Alternate Pathing .......................................................................................8-1
Course Map........................................................................................ 8-1
Relevance............................................................................................ 8-2
Objectives ........................................................................................... 8-3
AP Concepts....................................................................................... 8-4
AP Implementation........................................................................... 8-5
AP Requirements .............................................................................. 8-9
Supported Devices .......................................................................... 8-10
Disk Devices ............................................................................8-10
Network Devices.....................................................................8-11
Installing AP .................................................................................... 8-12
Solaris 2.6..................................................................................8-12
Solaris 2.5.1...............................................................................8-13
Installing AP (Both Releases) ................................................8-13

ix
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Basic Alternate Pathing Concepts................................................. 8-14
Physical Paths ..........................................................................8-14
Meta-Disk .................................................................................8-15
Disk Pathgroup ............................................................................... 8-16
Meta-Network ................................................................................. 8-18
Network Pathgroup........................................................................ 8-19
Sample AP Configurations ............................................................ 8-20
AP With Mirroring.......................................................................... 8-21
Device Paths.............................................................................8-22
The AP State Database ................................................................... 8-23
AP Database Configuration Considerations............................... 8-24
Creating the AP Database.............................................................. 8-26
The apdb Command ...............................................................8-26
Refreshing the Databases.......................................................8-27
AP Databases on Alternate Pathed Disks.................................... 8-28
Viewing AP Database Status ......................................................... 8-29
The apconfig Command ......................................................8-29
Deleting a Copy of the AP Database ............................................ 8-31
Viewing Pathgroup Information .................................................. 8-32
Viewing Network Entries .............................................................. 8-33
Uncommitted Network Entries.............................................8-33
Committed Network Entries .................................................8-34
Planning Network Pathgroups and Meta-devices..................... 8-35
Meta-Network Interfaces .......................................................8-36
FDDI Devices ...........................................................................8-37
Creating a Network Pathgroup .................................................... 8-38
Activating the Meta-Device...................................................8-39
FDDI Setup Considerations........................................................... 8-41
Contacting the IEEE................................................................8-42
Switching a Network Pathgroup .................................................. 8-43
Deleting a Network Pathgroup..................................................... 8-45
Reversing an Uncommitted Delete.......................................8-46
Alternately Pathing the Primary Network Interface ................. 8-47
Boot Time Interface Failure ...................................................8-50
Viewing Disk Entries...................................................................... 8-51
Uncommitted Disk Entries ....................................................8-51
Committed Disk Entries.........................................................8-52
Disk Path Components................................................................... 8-53
Planning a Disk Pathgroup and Meta-disks ............................... 8-54
Meta-disk Configuration Example ............................................... 8-55
Creating a Disk Pathgroup and Meta-disks................................ 8-60
Using the Meta-devices .................................................................. 8-63
Disk Managers and AP .................................................................. 8-65
Using Volume Manager With AP.........................................8-65
Disabling DMP ........................................................................8-66
Using SDS With AP ................................................................8-66

x Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Manually Switching the Active Path............................................ 8-67
Switching Back to the Primary Path............................................. 8-69
Automatic Disk Pathgroup Switching (AP 2.1).......................... 8-70
Deleting a Disk Pathgroup ............................................................ 8-72
Reversing an Uncommitted Delete.......................................8-73
AP and the Boot Disk ..................................................................... 8-74
Placing the Boot Disk Under AP Control .................................... 8-75
Removing AP Support From the Boot Disk ................................ 8-77
Using apboot With a Mirrored Boot Disk
(Solaris 2.6 Only) ............................................................................. 8-78
Telling AP About the Mirror .................................................8-79
Removing the Mirror Information........................................8-79
The AP Recovery Boot Sequence .................................................. 8-80
Using AP in Single-User Mode ..................................................... 8-82
Lab..................................................................................................... 8-84
Check Your Progress ...................................................................... 8-87
Think Beyond .................................................................................. 8-88
Dynamic Reconfiguration ........................................................................9-1
Relevance............................................................................................ 9-2
Objectives ........................................................................................... 9-3
Dynamic Reconfiguration Capabilities.......................................... 9-4
When to Use DR ........................................................................9-5
dr-max-mem (Solaris 2.6) .................................................................. 9-7
dr-max-mem (Solaris 2.5.1) ............................................................... 9-8
Considerations.........................................................................9-10
DR Attach ......................................................................................... 9-14
Requirements...........................................................................9-14
Operation .................................................................................9-16
Attaching a Board With dr ............................................................ 9-18
Aborting the Attach Operation .............................................9-21
System Failures........................................................................9-21
Attaching a System Board With hostview ................................. 9-22
hostview Attach Buttons ......................................................9-22
I/O Device Reconfiguration After a DR Operation................... 9-26
Disk Devices ............................................................................9-27
Viewing System Information ........................................................ 9-28
Processor Configuration Information ..................................9-31
Memory Configuration Information ....................................9-32
Device Configuration Information .......................................9-35
Device Configuration Detail..................................................9-36
OBP Configuration Information ...........................................9-37
Viewing Suspend-Unsafe Devices........................................9-38
DR Detach ........................................................................................ 9-39
Requirements...........................................................................9-39
drain ........................................................................................9-41

xi
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
complete_detach ..................................................................9-42
reconfig..................................................................................9-42
Finishing the Complete Detach Operation.................................. 9-43
Configuring for DR Detach ........................................................... 9-45
Enabling DR Detach ...............................................................9-45
I/O Devices..............................................................................9-46
Detaching Network Devices.......................................................... 9-48
FDDI..........................................................................................9-48
Causes for DR Failure.............................................................9-49
Detaching Non-Network Devices................................................. 9-50
DR Detach-Safe Devices................................................................. 9-52
Declaring a Driver Detach-Safe.............................................9-53
Unloading a Loaded Detach-Unsafe Driver ............................... 9-54
Using modunload ....................................................................9-55
Correctable Errors ...................................................................9-60
Detaching a Board With dr............................................................ 9-61
Aborting the Detach Operation ............................................9-63
Detaching a Board With hostview .............................................. 9-64
Beginning the Detach .............................................................9-64
hostview Detach Buttons......................................................9-67
Pageable and Permanent Memory ............................................... 9-69
Operation: Permanent Memory on the Target Board........9-71
Operating System Quiesce............................................................. 9-73
Operating System Quiesce Failures ............................................. 9-75
Suspend-Safe and Suspend-Unsafe Devices ............................... 9-77
Tape Devices ............................................................................9-78
Adding New Suspend-Safe Drivers ............................................. 9-79
Adding New Suspend-Bypass Drivers........................................ 9-81
Quiesce Operation .......................................................................... 9-83
DR and AP Interaction ................................................................... 9-86
DR Attach .................................................................................9-86
DR Detach ................................................................................9-87
Lab..................................................................................................... 9-88
Part 1: Using hostview..........................................................9-88
Part 2: Using the Command Line .........................................9-93
Check Your Progress ...................................................................... 9-94
Think Beyond .................................................................................. 9-95
Diagnostic Information...........................................................................10-1
Course Map...................................................................................... 10-1
Relevance.......................................................................................... 10-2
Objectives ......................................................................................... 10-3
Standard Domain Message Logs .................................................. 10-4
Bus Configurations and the Figure of Merit ............................... 10-5
Sample FOM Calculation............................................................... 10-9
Redlist and Blacklist Files ............................................................ 10-11

xii Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
The autoconfig Command ........................................................ 10-13
Diagnostic Tools............................................................................ 10-15
hpost ......................................................................................10-15
SunVTS ...................................................................................10-16
prtdiag..................................................................................10-17
Correctable Memory Errors......................................................... 10-20
Enabling Reporting...............................................................10-21
System Failures.............................................................................. 10-22
Reboot Request.............................................................................. 10-23
Panic................................................................................................ 10-24
Considerations.......................................................................10-25
Watchdog, Redmode, and XIR Resets ....................................... 10-26
Saving a Panic Dump ...........................................................10-27
Heartbeat Failure (Hung Host) ................................................... 10-28
Manual Intervention With a Hung Host ...........................10-29
Arbstop ........................................................................................... 10-31
Creating a Hardware State Dump File ...................................... 10-33
redx................................................................................................. 10-35
Starting redx..........................................................................10-36
Technical Information for Escalation ......................................... 10-37
General Information Needed ..............................................10-37
Problem-Specific Information Needed ..............................10-38
Lab................................................................................................... 10-39
Check Your Progress .................................................................... 10-40
Think Beyond ................................................................................ 10-41
Configuring NTP ......................................................................................A-1
SSP and Domain Time Synchronization....................................... A-2
NTP Files ...................................................................................A-2
NTP Server Strata............................................................................. A-3
Synchronization Sources................................................................. A-4
Configuring Your Server or Client ................................................ A-5
Configuration Guidelines ............................................................... A-7
NTP Query Programs...................................................................... A-8
Inspecting Your Configuration ...................................................... A-9
OBP Device Aliases .................................................................................. B-1
OBP Device Aliases.......................................................................... B-2
Deleting Device Aliases........................................................... B-2
Creating a SCSI Disk Alias ............................................................. B-3
Creating a Storage Array Disk Alias ............................................. B-5
Creating a Network Device Alias .................................................. B-7
Glossary ......................................................................................... Glossary-1

xiii
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
xiv Ultra Enterprise 10000 Administration
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
About This Course

Course Goal
This course is designed introduce students to the Ultra™ Enterprise™
10000 system. It will explain the capabilities and configuration of the
system; show how to load the software, discuss the operation and
management of the system, the configuration, and use of its special
capabilities; and how to troubleshoot failures.

The course is intended for experienced system administrators and


hardware specialists with a thorough background in Sun™ SPARC™
system administration or Sun hardware maintenance.

xv
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Course Map

Each module begins with a course map that enables you to see what
you have accomplished and where you are going in reference to the
course goal. A complete map of this course is shown below.

xvi Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Module-by-Module Overview

● Module 1 – Ultra Enterprise 10000 Capabilities and Features

This overview introduces the Enterprise 10000. It describes the


capabilities, features, terminology, and components of the
Enterprise 10000 system. All of the material is covered in detail in
later modules.

● Module 2 – Architecture Overview

This module provides a high-level overview of the Enterprise


10000 architecture, packaging, and hardware operation.

● Module 3 – SSP Software Installation

This module teaches you how to plan for and install the System
Service Processor software, and perform basic SSP administration
and setup tasks.

About This Course xvii


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Module-by-Module Overview

● Module 4 – System Operation

This module describes how to perform system management and


operational tasks for the Enterprise 10000 and its domains, both
from the command line and from Hostview.

● Module 5 – Domains

This module explains the concept of domains, and how to create,


destroy, manage, and configure them.

● Module 6 – Installing Solaris in a Host Domain

This module shows how to install Solaris in a Enterprise 10000


domain.

● Module 7 – System Boot Process

This module describes how the SSP and the Enterprise 10000 and
its domains boot. It discusses all the Enterprise 10000-specific
commands, daemons, and configuration files used in the boot
process.

● Module 8 – Alternate Pathing

This module explains the capabilities of the Alternate Pathing (AP)


feature, and how to plan, configure, and manage alternate paths
for both disks and network interfaces.

● Module 9– Dynamic Reconfiguration

This module discusses the operation, restrictions and


configuration requirements for Dynamic Reconfiguration (DR). It
covers the procedures for both attaches and detaches of system
boards to and from domains.

● Module 10 – Diagnostic Information

This module discusses the various problems that may occur in an


Enterprise 10000 system, what diagnostic tools and information
are available, and when and how to use them.

xviii Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Module-by-Module Overview

Appendices
● Appendix A – Configuring NTP

A more detailed discussion of the configuration of NTP (Network


Time Protocol) is given in this appendix.

● Appendix B – OBP Device Aliases

This appendix discusses how to create devalias names in the


OpenBoot PROM.

About This Course xix


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Course Objectives

Upon completion of this course, you will be able to:

● Describe the differences between the Enterprise 10000 and other


Sun server systems.

● Describe the advanced features of the Enterprise 10000 system.

● Install and configure the SSP (System Service Processor).

● Obtain status and configuration data about the Enterprise 10000


from the command line and by using Hostview.

● Control the Enterprise 10000 system from the command line and
from Hostview.

● Load Solaris into an Enterprise 10000 domain.

xx Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Course Objectives

● Describe the boot flow of an Enterprise 10000 domain.

● Configure, test, and boot an Enterprise 10000 domain.

● Describe the system configuration requirements for Dynamic


Reconfiguration and Alternate Pathing.

● Perform initial troubleshooting and error condition identification


for the Enterprise 10000 and its domains.

● Perform Dynamic Reconfiguration operations from the command


line and from Hostview, and understand their restrictions and
constraints.

● Perform disk and network Alternate Pathing operations and


understand their restrictions and constraints.

About This Course xxi


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Topics Not Covered

This course does not cover the topics shown on the above overhead.
Many of these topics are covered in other courses offered by Sun
Educational Services (SES). Refer to the Sun Educational Services
catalog for specific information and registration

● Network adminstration – SA-380: Solaris 2.x Network


Administration

● General Solaris problem resolution – ST-350: Sun Systems Fault


Analysis Workshop

● Data center operations – RS-350: Operating Client-Server Systems

● System tuning – SP-280: Solaris 2.x Concepts and Tuning

● Disk storage management – SA-345: Volume Manager With a


SPARCstorage Array

xxii Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Course Prerequisites

To be able to succeed in this course, you must have the prerequisite


training and experience shown in the overhead image above.

If you do not, you may be unable to complete the lab exercises or


perform the steps required to successfully install, support, and operate
the complex Enterprise 10000 environment.

About This Course xxiii


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Introductions

Now that you have been introduced to the course, introduce yourself
to each other and the instructor, addressing the items shown on the
above overhead.

xxiv Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
How to Use the Course Materials

To enable you to succeed in this course, these course materials employ


a learning model that is composed of the following components:

● Course Map – Each module starts with an overview of the content


so you can see how the module fits into your overall course goal.

● Relevance – The relevance section for each module provides


scenarios or questions that introduce you to the information
contained in the module and provoke you to think about how the
module content relates to your interest in installing, configuring,
and operating the Sun Cluster environment.

● Additional Resources – Where you can look for more detailed


information, or information on configurations, options or other
capabilites on the topics covered in the module or appendix.

● Overhead Image – Reduced overhead images for the course are


included in the course materials to help you easily follow where
the instructor is at any point in time. Overheads do not appear on
every page.

About This Course xxv


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
How to Use Course Materials

● Lecture – The instructor will present information specific to the


topic of the module. This information will help you learn the
knowledge and skills necessary to succeed with the exercises.

● Exercise – Lab exercises will give you the opportunity to practice


your skills and apply the concepts presented in the lecture. The
procedures presented in the lecture should help you in completing
the lab exercises.

● Check Your Progress – Module objectives are restated, sometimes


in question format, so that before moving on to the next module
you are sure that you can accomplish the objectives of the current
module.

● Think Beyond – Thought-provoking questions are posed to help


you apply the content of the module or predict the content in the
next module.

xxvi Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
How to Use the Icons

The following icons are used in this course to represent various


training elements and alternative learning resources:

For Discussion – Indicates a small-group or class discussion on the


current topic is recommended at this time.

Additional Resources – Indicates additional reference materials are


available.

Caution – A potential hazard to data or machinery.


!

Warning – Anything that poses personal danger or irreversible


damage to data or the operating system.

About This Course xxvii


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Typographical Conventions and Symbols

The following table describes the typographical conventions and


symbols used in this course:

Typeface or
Meaning Example
Symbol

AaBbCc123 The names of commands, Edit your .login file.


files, and directories; on- Use ls -a to list all files.
screen computer output. system% You have mail.

AaBbCc123 What you type, system% su


contrasted with Password:
on-screen computer
output.
AaBbCc123 Command-line To delete a file, type rm
placeholder—replace filename.
with a real name or
value.
AaBbCc123 Book titles, new words or Read Chapter 6 in User’s
terms, or words to be Guide.
emphasized. These are called class
options.
You must be root to do this.

xxviii Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Ultra Enterprise 10000 Capabilities
and Features 1

Course Map
This module discusses the capabilities and features of the Sun Ultra
Enterprise 10000 system. It discusses the system hardware and
software components and describes the packaging of the system,
system configuration, and some of the Enterprise 10000 system’s
special features such as Alternate Pathing and Dynamic
Reconfiguration.

insert course map here

1-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. What makes the Enterprise 10000 so different from other systems?

2. What special features does the Enterprise 10000 provide?

3. What are the capabilities of the Enterprise 10000?

1-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Objectives

Upon completion of this module, you will be able to:

● Explain the capabilities and features of the Enterprise 10000


system.

● List the hardware and software components of the Enterprise


10000 system.

● Describe the Enterprise 10000 system packaging.

● Explain the special features of the Enterprise 10000 system, such as


domains, Alternate Pathing, and Dynamic Reconfiguration.

● Describe the concepts of system operation.

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

● Sun Enterprise 10000 System Overview Manual

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

1-3
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Ultra Enterprise 10000

The Ultra Enterprise 10000 system is a SPARC™/Solaris™ (UNIX®-


System V Release 4) scalable symmetrical multiprocessing (SMP)
computer system. It is an ideal general purpose application and data
server for host-based or client/server applications like on-line
transaction processing (OLTP), decision support systems (DSS), data
warehousing, communications services, or multimedia services.

The Enterprise 10000 system provides the following capabilities:

● Solaris compatible (2.5.1 HW 4/97 or up; 2.6 HW 5/98 or up)

● Internal bus clock speed of 83.3 MHz with Ultra SPARC v9


processors running at a clock frequency of 250 or 333 MHz.

● Fast processing of up to 20 GFLOPS.

● More flexibility, reliability, availability, and serviceability (RAS)


than other comparable systems.

1-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Ultra Enterprise 10000

● Error-correction interconnect: data and address buses are protected


by a combination of error correcting codes (ECC) and parity.

● Significant input/output (I/O) flexibility: support for up to 32


independent SBuses and 64 SBus slots, or 32 independent PCI
(Periperhal Component Interconnect) slots, or any mixture totaling
32 buses. Each system board supports a pair of the same type bus.

● High I/O bandwidth: up to 3.2-Gbytes per second aggregate I/O


bus bandwidth. The Enterprise 10000 system’s individual buses do
64-bit transfers, yielding a data transfer rate of 100 Mbytes per
second per bus.

● Alternate Pathing support provides I/O flexibility by supporting


dual paths to disk devices or networks.

● No single points of hardware failure: no single component


prevents a properly configured Enterprise 10000 system from
automatically reconfiguring itself to resume execution after a
failure. This is achieved through a combination of redundancy and
alternate pathing architecture.

● System domains: groups of system boards can be arranged in up


to eight multiprocessor domains that can run independent copies
of Solaris concurrently. Each domain is completely isolated from
hardware or software errors that may occur in another domain.

● Dynamic Reconfiguration: enables the system administrator to


add, remove, or replace many system components on line without
disturbing production usage.

● Hot swapping: power supplies, fans, and most board-level system


components can be exchanged while “hot”; that is, while the
system is active.

● Scalable configurations: the Enterprise 10000 system can add


memory and I/O slots without forcing the removal of any
processors, while the system is running.

● Service and maintenance process flexibility: the System Service


Processor (SSP) connects to the Enterprise 10000 system via
standard Ethernet, permitting system administration from a
remote location. The SSP reports status information using the
Simple Network Management Protocol (SNMP).

1-5
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Ultra Enterprise 10000 Features

The Enterprise 10000 provides features unique in the SMCC Ultra


Enterprise product line.

● System Service Processor (SSP). SSP is used to monitor and control


the Enterprise 10000 server, and to execute system administration
functions, such as shutdown, boot, establish domains, and more.
The SSP monitors key conditions, such as air inlet and processor
temperatures, and issues warnings as appropriate. It also lets you
view and control power to the processor and I/O cabinets, logs
errors, automatically reboots the host after certain types of failures,
and more.

● Network Virtual Console (netcon). Virtually any SPARC terminal


in the network can open a host console session, which can read
output from and write input to the host console. Multiple host
console sessions can be open simultaneously, but only one at a
time can have permission to write to the host.

1-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Ultra Enterprise 10000 Features

● Hostview Graphical User Interface (GUI). Hostview simplifies


monitoring and control of the Enterprise 10000 system host by
providing a series of easy-to-follow menu-driven screens. You can
access Hostview from the SSP or almost any workstation logged
in to the SSP environment.

● POST (Power On Self Test). POST on the Enterprise 10000 system


supports system-level tests, error reporting; and limited, isolated
testing of boards.

● Blacklisting. The Enterprise 10000 system lets you place the


identities of specific domain hardware components into a file to
prevent them from being used when the domain is being
configured during the boot process.

1-7
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Ultra Enterprise 10000 System Cabinet

System service
processor (SSP)

Access panel
Styling panel

(I/O expansion cabinet not shown) Processor cabinet


(includes disk trays)

The Enterprise 10000 cabinet is 1.8 m (70") high, 1 m (39") wide, and
1.3m (50") deep. Fully configured it weighs 638 kg (1400 pounds) and
draws 13.6 kVA of power.

1-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Ultra Enterprise 10000 System Cabinet

The Enterprise 10000 system is comprised of a processor cabinet,


optional I/O cabinets, and an SSP.

The processor cabinet contains the main system components:

● System boards

● Centerplane

● Control boards

● A 48-volt power subsystem

● An alternating current (AC) sequencer subsystem

● Cooling subsystem

The system boards house the processors, I/O interface modules, SBus
cards, and system memory.

Additionally, an area is reserved in the processor cabinet for three


peripheral trays, which can only be removable storage module (RSM)
trays.

The I/O cabinet(s) are used to contain additional peripheral devices;


they contain peripheral trays and AC sequencer subsystems. They are
standard Sun data center peripheral enclosure cabinets.

1-9
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

System Domains

A domain consists of a group of one or more system boards logically


grouped together by the SSP into the appearance of a stand-alone
SPARC server system.

Each domain must be configured with a disk to boot from, a network


connection, and sufficient memory and disk space to function
properly. A domain’s size is limited only by the number of system
boards it is configured with.

Some properties of domains are:

● Each domain can contain multiple boards.

● Each domain has its own independent copy of Solaris.

● Each domain has its own I/O devices.

● There is complete failure isolation between domains.

1-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

System Domains

● A system board can only be in one domain at a time.

● All of a system board’s components belong to the domain and


may not be used by other domains.

● You may have from one to eight domains.

System boards in the same system domain:

● Have a common physical address space.

● Execute the same copy of the Solaris operating system.

● Can see all components on all system boards in the domain.

1-11
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Alternate Pathing

Alternate Pathing (AP) provides a domain with the ability to have two
paths to the same I/O device, providing redundancy at the level of the
I/O controller and cabling.

Most network devices (including Ethernet and FDDI (Fiber


Distributed Data Interface) interfaces) are supported. The StorEdge™
A5000 and SPARCstorage™ Arrays are the supported disk devices.

The implementation of AP is similar to that of disk management


software like Sun Enterprise Volume Manager™ and Solstice™
DiskSuite™. AP creates a new level of device drivers (for meta-disks
and meta-networks), which then access one of the two physical device
paths.

The disk management software runs on top of these meta-devices. You


may use AP with the current versions of both Enterprise Volume
Manager and Solstice DiskSuite.

1-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Alternate Pathing

Application programs use the meta-device name, so they have no


awareness of the presence of AP. Because the "active" path can be
manually switched with no interruption to active device traffic, the
application has no way to see the operation of AP.

The only requirement for using AP is that the domain must be


equipped with a second interface to the device, with a separate
controller and cable, or have a second interface to the same network.

If the system volume is on a device with an alternate path, with the


help of the SSP AP will automatically attempt to boot from the
alternate path if the primary path fails. For AP 2.1, if the boot device is
mirrored and has an alternate path, the SSP will try all four paths.

All AP operations are completely managed from within the domain,


except for booting from an alternate path.

Installing AP
In Solaris 2.6, AP Version 2.1 is installed from the SMCC Supplements
CD-ROM. Its documentation can be found in the Hardware
AnswerBook (SUNWabhdw) on the same CD-ROM..

For Solaris 2.5.1, AP 2.0 is installed from its own CD-ROM and comes
with its own AnswerBook.

AP software components are installed on both the SSP and the host
domain.

Limitations for AP 2.0 (Solaris 2.5.1)


If an active path fails, except at boot time for the boot disk and
primary network interface, a manual action must be used to switch the
failed path to the other alternate path. AP 2.1 will automatically switch
disk paths when a failure occurs.

You will also need to manually switch both disk and network active
alternate paths during DR operations.

1-13
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Dynamic Reconfiguration

Dynamic Reconfiguration (DR) provides the ability to add or remove a


system board to a domain while that domain is live, that is, while it’s
running Solaris.

DR can be used to remove a failed board from a running domain. For


example, the board can be used in the domain even though one of its
processors failed. In order to replace the module without incurring
downtime, DR can disconnect the board from the domain. Support
personnel can then hot swap it out, replace the failing processor
module, replace the board in the system, and then add it back to the
domain.

DR is a complex mechanism, and requires careful planning and


operational procedures to fully use all of its capabilities.

1-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Dynamic Reconfiguration

You can use DR to:

● Add a system board upgrade

● Modify the domain hardware configuration

● Remove a faulty system board

● Create a test domain

Some of the capabilities of Dynamic Reconfiguration are it:

● Relocates any memory used on the board (or makes it available) to


the domain

● Removes processors from use (or adds them) to the domain

● Disconnects any I/O on a board (or adds it) to the domain

● Is built into the Solaris kernel

● Is completely controlled from the SSP, not the domain

● Does not require an OS reboot

For Solaris 2.6, DR is installed with the operating system. For Solaris
2.5.1, DR is installed from the Solaris SMCC Updates CD-ROM, and
comes with its own AnswerBook.

DR enables you to logically attach and detach system boards to and


from a domain without incurring machine downtime. DR can be used
in conjunction with hot swap, which is the process of physically
removing or inserting a system board while the system continues
running.

If a system board is being used by a domain, you must DR detach it


before you can power it off and remove it. After a new or upgraded
system board is inserted and powered on, you may DR attach it to any
domain.

1-15
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Dynamic Reconfiguration

An interesting additional advantage of Dynamic Reconfiguration and


hot swap is that on-line system upgrades can be performed. You can
add processors, memory or an I/O interface card to the Enterprise
10000 system without disturbing any user activity.

You can execute the DR procedures through the Hostview GUI


program and the dr shell application.

1-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Operating System Support

Solaris 2.6 Hardware 5/98 or higher is pre-installed on one domain at


the factory, using the standard SunInstall™ process. The entire
distribution is installed, including SunVTS.™ The Enterprise 10000
also supports Solaris 2.5.1 Hardware 4/97 running in a domain.

An Enterprise 10000 domain has kernel architecure sun4u1, which is


an extension of the sun4u architecture.

Only some of the architecture specific binaries are different, and all
appropriate standard Sun Solaris patches will install.

In the domain, the only special change to the boot process is to


automatically start the cvcd (network virtual console) daemon, which
is only started for sun4u1 architecture machines.

1-17
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Operating System Support

Solaris Binary Compatibility


The Enterprise 10000 looks like any other SPARC architecture system
to applications.

● It is completely binary compatible with Enterprise 6000 (except


where architecture specific differences occur).

● All Sun compiler products are supported.

● All Sun unbundled products supported on the Enterprise 6000 are


supported.

● All ‘‘well behaved" third-party products supported on the


Enterprise E6000 are supported.

● If a binary incompatibility is found, it will be fixed.

SSP Operating System Levels


On the SSP, standard Solaris 2.5.1 Hardware 4/97 or higher (only) is
installed. The SSP software is added from an unbundled CD-ROM
after the OS has been installed. An additional package in support of
AP is then added.

The SSP software will not run with Solaris 2.6 on the SSP, but will still
support Solaris 2.6 running in a domain.

1-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Operating System Enhancements

The Enterprise 10000 system uses an extended version of the Solaris


environment. The following list outlines some of the enhancements
that have been implemented for the Enterprise 10000 system:

● Resource management

The Solaris 2.6 processor partitioning capability called Processor


Sets is provided that allows processors to be grouped together and
dedicated to a certain group of tasks. A processor set can then be
linked to a particular set of applications.

● Parallel processing

Through its HPC products, Sun shows its significant high


performance computing and parallel processing expertise. HPC
provides special programming libraries, load management, and
administration and development tools that significantly improve
the performance of parallel applications, and it lets parallel
applications run across multiple systems.

1-19
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Operating System Enhancements

● Memory management

The ability to support up to 64 Gbytes of physical memory was


added to Solaris for the Enterprise 10000 system. User processes
continue to use 32 bits of virtual memory addressing.

● Added RAS features

These include:

● Dynamic Reconfiguration

● Hot swap of system components

● Alternate Pathing

● Extended internal error checking

● Extended hardware failure logging from the control board to


the SSP

● Common, detailed message logs on the SSP

1-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

The SSP

The System Service Processor (SSP) enables you to control and monitor
the Enterprise 10000 system. The SSP is built from a SPARCstation™ 5
with 64 Mbytes of random access memory (RAM) and a 1-Gbyte disk.
A CD-ROM is included for loading software onto the Enterprise 10000
system.

The SSP runs Solaris 2.5.1 in an OpenWindows™ /Open Look


environment, plus the following software:

● Control and management daemons

The SSP runs a number of daemons that control and monitor the
Enterprise 10000 and its domains.

● Hostview

Hostview is a graphical user interface (GUI) that assists the system


administrator with management of the Enterprise 10000 system
hardware.

1-21
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

The SSP

● Power on self test (POST)

This software ensures that the hardware in a domain is ready for


use. It is run during the boot process.

● OpenBoot™ PROM (OBP)

The OBP provides the domain’s interface into the hardware


environment. It provides configuration information to the
operating system running in the domain. It works just as the OBP
does in other SPARC systems.

● Network console (netcon)

The netcon software enables the system administrator to create an


Enterprise 10000 system console remotely using a line mode
interface.

The SSP enables the system administrator to perform the following


tasks:

● Boot domains.

● Perform automatic emergency shutdown in an orderly fashion.


For example, the SSP software automatically shuts down a domain
if the temperature of a processor within that domain rises above a
preset level.

● Create domains.

● Dynamically reconfigure a domain.

● Monitor and display the temperature and voltage levels of one or


more system boards or domains.

● Control fan operations.

● Monitor and control power to the components within a platform.

● Execute diagnostic programs in a domain such as POST.

1-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

The SSP

In addition, the SSP environment:

● Warns you of potential problems, such as high temperatures or


malfunctioning power supplies.

● Notifies you when a software error or failure has occurred.

● Automatically reboots a domain after a system software failure


(such as a panic).

● Keeps logs of the interactions between the SSP and the domains.

1-23
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

SSP Logical Connectivity

The following diagram shows the logical external connectivity for the
Enterprise 10000 host and SSP. This diagram does not show the actual
physical network connections, which are discussed in Module 3.

Cu
sto
me
rE
the
rne
t

t
rne rd SP
Ethe oard l boa To
S
us l b t r o
SB ontro l con
To c a
To option
To

or t SP
upp al S
s ion
ote opt
Re
m To

Public-switched System console (SSP)


network

Transceiver
(optional)

Telephone cable
DTE
Optional second SSP
Modem
DTE

The SSP is connected via Ethernet to the Enterprise 10000 system


Control Board. The Control Board has an embedded control processor
that interprets the Transmission Control Protocol/Internet Protocol
(TCP/IP) Ethernet traffic and converts it to Joint Test Action Group
(JTAG) control information. JTAG allows the control board to inspect
and control the Enterprise 10000 system components at very low
levels.

1-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

The SSP User Environment

System Accounts
There are two accounts on the SSP system, root and ssp. These are
used to manage the SSP itself and the Enterprise 10000, respectively.

SSP Window
An SSP window is a normal OpenWindows window into the Solaris
and SSP environments of the SSP itself.

To bring up an SSP window, you must log in as user ssp. You are then
prompted for the name of a domain that you want to manage. You can
switch the domain that you are managing at any time.

You can run the display software remotely, by properly setting your
DISPLAY and xhost environment.

Multiple SSP windows can be used simultaneously.

1-25
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

The SSP User Environment

Network Console Window


A network console window is a SSP window that is hosting a netcon
session. A netcon window acts as the system console for a domain.

Multiple netcon windows can be open simultaneously, but only one at


a time can have write privileges (ability to enter a command) to a
specific domain. When a netcon window is in read-only mode, you
can view messages, but you cannot enter any commands.

Remember that the Enterprise 10000 does not have a directly attached
console; it has no keyboard or serial ports. It can only be
communicated with over Ethernet.

1-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

The SSP User Environment

Hostview
The Hostview program provides a graphical user interface (GUI)
which provides the same functionality as many of the SSP commands.

Hostview, written largely in TCL/TK, runs on the primary SSP and


provides the ability to graphically monitor the state of the entire
Enterprise 10000, its domains and components.

Almost all administration and maintenance commands can be


executed through Hostview.

1-27
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

The SSP User Environment

Hostview
Hostview enables you to perform the following actions:

● Power the Enterprise 10000 system and boards on and off.

● Power peripherals and I/O expansion cabinets on and off.

● Create domains.

● Boot the operating system in a domain.

● Manage Dynamic Reconfiguration.

● Start a netcon console window for each domain.

● Access the SSP log messages file for each domain.

● Remotely log in to each domain.

● Edit the blacklist file to disable or re-enable hardware


components in a domain.

● Monitor the Enterprise 10000 system status.

1-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Hardware Configuration Control

The SSP provides two forms of configuration control: user control,


which is provided through a blacklist concept; and recovery control
(implemented as predefined user control), which is provided
automatically through the figure of merit concept.

Blacklist
The blacklist file lists system components, such as central
processing units (CPUs), address buses, I/O slots, or lower-level
subcomponents that are not to be included in the domain the next time
that the domain is configured. It can be modified by the user as
necessary.

Parts can be put into the blacklist file even if they are functional
for diagnostic, benchmarking, testing and so on.

The system will never automatically modify the blacklist file under
any circumstances.

1-29
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Hardware Configuration Control

Figure of Merit
During domain configuration, the SSP determines the best possible
domain hardware configuration by assigning a figure of merit (FOM)
to each possible hardware configuration for the domain (a total of 45).
It then chooses the configuration with the highest FOM.

The FOM is a value that can be adjusted by giving different weighting


factors (or values) to various Enterprise 10000 architectural
components. The weights can be adjusted, and the outcome influenced
by the blacklist file.

The FOM process is normally invisible to the system administrator.

1-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Diagnostic and Monitoring Tools

The Enterprise 10000 system utilizes several type of diagnostics and


monitoring tools. Several of them are SSP-based diagnostics, while
other tests run directly on the Enterprise 10000 host. These tests
include exercisers and hpost (host POST).

Bringup
Bring-up diagnostics provide static, repeatable testing that catch most
hard errors. Bring-up diagnostics log all failures to the system or
domain log file on the SSP. They can be run at varying levels of depth,
depending on the situation.

They are automatically run when a domain is configured active.

1-31
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Diagnostic and Monitoring Tools

Power On Self Test


Power on self test (POST) exercises the Enterprise 10000 system logic
at a subcomponent level. It provides a high degree of accuracy in
locating the source of an error, enabling fault isolation to the failing
field-replaceable unit (FRU) with a high degree of confidence.

OpenBoot PROM
The primary task of the OpenBoot firmware is to record the domain
configuration and boot the operating system from either a disk or a
network interface. The firmware also provides extensive features for
testing the hardware and software interactively. It is very similar to the
OBP software found in Sun’s other processors.

1-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Status Monitoring and Display

SSP
The SSP is the primary provider of services related to monitoring and
reporting on the status of the machine.

SunVTS
SunVTS, the on-line validation test suite, tests and validates hardware
functionality by running multiple diagnostic hardware tests on
configured controllers and devices.

redx
redx is an internal-use-only interactive hardware debugger for the
Enterprise 10000, like a hardware version of adb.

1-33
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Resiliency Features

DC Power
The Enterprise 10000 system logic direct current (DC) power system is
managed at the system board itself. The 48-volt DC power is supplied
through a circuit protector to each system board. The 48-volt power is
converted through several small DC-to-DC converters on the board to
the specific lower voltages needed directly on the system board.
Failure of a DC-to-DC converter will affect only that particular system
board.

System power supplies can be hot swapped.

1-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Resiliency Features

System Boards
System boards can be removed from and inserted into a powered on
and operating Enterprise 10000 system (hot swap) for servicing the on-
board components.

Two control boards can be configured in the system for redundancy.

A centerplane support board (CSB) powers one-half of the centerplane.


Should a centerplane support board fail, the centerplane will continue
to operate in a degraded mode, with half of the address and data
buses.

Processors
A failed UltraSPARC™ processor can be isolated from the remainder
of the system by the POST process. As long as there is at least one
functioning processor in the configuration, the domain may be used.

Memory
There is one memory controller on each system board. If it fails, only
the memory on that system board is unavailable. As long as there is
sufficient memory left for the domain to run, it can be used.

I/O Interface Subsystem


If a SYSIO chip fails, both of the SBus slots (or the single PCI slot) that
it services become inaccessible. This is discovered by POST which
removes the interface from the domain’s configuration. This will
usually result in a loss of access to some I/O devices.

Alternate Pathing support for network interfaces and disk arrays can
provide the ability to transparently recover from most of these kinds of
failure.

1-35
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Redundant Components

There are no components in the system that cannot be configured


redundantly if the customer desires. In addition, each system board is
capable of independent operation.

In addition to the system boards, redundantly configurable


components include:

● Control boards

● Centerplane support boards

● Disk storage

● Bulk power subsystems

● Bulk power supplies

● Peripheral controllers and channels

● SSP and its interfaces

Note that input power itself cannot be made redundant by the system.

1-36 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Concurrent Serviceability

The most significant serviceability feature of the Enterprise 10000


system is the ability to replace system boards on line, while the OS
continues to run. With the exception of the centerplane, all of the
boards and power supplies in the system can be removed and replaced
during system operation, without scheduled downtime.

Note – Replacing the active control board or switching to the backup


control board requires the shutdown and reboot of all active domains.

Failing components are identified in the SSP failure logs in such a way
that the field-replaceable unit is clearly identified. Repair can be made
quickly and with only minor disruption, if any, usually at a convenient
time.

1-37
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Concurrent Serviceability

There are several features that enable service to be performed without


forcing scheduled downtime:

● All centerplane connections are point-to-point, making it possible


to logically isolate system boards with Dynamic Reconfiguration.

● The Enterprise 10000 system uses a distributed DC power system;


that is, each system board has its own power supply and
individual control. This type of power system enables each system
board to be powered on and off individually.

● All the system board interfaces to the centerplane have a loopback


mode that enables the system board to be tested (by POST) before
it is added to the system.

1-38 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Error Logging

When uncorrectable hardware errors occur, information about the


error is saved on the SSP to help with problem determination. The
Enterprise 10000 system has extensive error logging capabilities.

By default, when any hardware error (except for some transient


memory failures) is detected, the error is logged by the hardware to
the SSP.

The SSP detects these errors by polling the control boards on a regular
basis. If the error is fatal, the affected domain is stopped, error log
information is collected by the SSP, and the domain is automatically
rebooted.

If a serious hardware failure occurs, the hardware will produce a


dump of its internal status, some 90 Kbytes in length, that can be used
by support personnel to determine the cause of the failure, similar to a
Solaris panic dump.

1-39
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Explain the capabilities and features of the Enterprise 10000


system.

❑ List the hardware and software components of the Enterprise


10000 system.

❑ Describe the Enterprise 10000 system packaging.

❑ Explain the special features of the Enterprise 10000 system, such as


domains, Alternate Pathing, and Dynamic Reconfiguration.

❑ Describe the concepts of system operation.

1-40 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
1

Think Beyond

Why was the Enterprise 10000 designed in such a modular fashion?

Why does the Enterprise 10000 need a new kernel architecture?

How does the Enterprise 10000 load the operating system if the
console communicates over the Ethernet?

What kind of processing would be necessary to record a hardware


logout on the SSP after a serious Enterprise 10000 hardware failure?

1-41
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Architecture Overview 2

Course Map
This module describes the architecture, construction, layout, and basic
hardware operation of the Enterprise 10000 system.

2-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. How is the Enterprise 10000 different from the other Sun servers?

2. What are the components of the Enterprise 10000 system?

3. What can the Enterprise 10000 do?

4. How does the SSP work with the Enterprise 10000?

2-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Objectives

Upon completion of this module, you will be able to:

● Describe the construction of the Enterprise 10000

● Describe the system board interface

● Explain the system board structure

● Recount centerplane operation

● Explain the control board structure

● Explain component packaging

● Understand the function of JTAG

● Describe the SSP interaction with the Enterprise 10000

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Sun Enterprise 10000 System Overview Manual

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

Architecture Overview 2-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Enterprise 10000 Packaging

The Enterprise 10000 physically provides the following:

● A maximum of 16 system boards, each supporting:

● Four UltraSPARC processor modules


● One I/O module providing two independent SBuses and four
SBus slots, or 2 PCI buses

● One memory module supporting one Gbyte (16 Mbit DRAMs)


or four GBytes (using 64 Mbit DRAMs) of main storage

● One or two Control Boards to configure and manage the


Enterprise 10000 as well as communicate with the SSP.

● Dual Centerplane Support Boards; each provides power and


clocks to half of the centerplane.

2-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Enterprise 10000 Packaging

● An active centerplane with a 144-bit-wide, point-to-point data


interconnect and four address buses. The centerplane can operate
in degraded mode with failed address or data bus components.

● Eight bulk power supplies to provide 48-volt DC to each


centerplane slot.

● Sixteen fan trays, containing two muffin fans per tray (32 total
fans). An entire tray is hot swappable. Fan speed is controlled by
the SSP.

Architecture Overview 2-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Enterprise 10000 Component List

The Enterprise 10000 host is comprised of system boards, a


centerplane, centerplane support boards, control boards, peripherals,
and power and cooling subsystems.

Quantity per
Component Function
System
Centerplane Contains address and data interconnect to all system 1 (2 logical
boards halves)
Centerplane Provides the centerplane JTAG, clock, and control 2
support board functions
System board Contains processors, memory, I/O subsystem, SBus Up to 16
and PCI boards, and power converters
Processor Mezzanine boards that contain the UltraSPARC Up to 64
modules processor and support chips
Memory Removable DIMMs (Dual In-line Memory Module) Up to 16
I/O Removable SBus boards Up to 64
Control board Controls the system JTAG, clock, fan, power and Up to 2
Ethernet interface functions
48-volt power system
AC input Receives 220-volt AC, monitors it, and passes it to Up to 4
module the power supplies
48-volt power Converts AC power to 48-volt DC 5 or 8
supply
Circuit breaker Interrupts power to various components within the 1
panel system
AC power Receives 220-volt AC, monitors it, and passes it to 1 or more
sequencer the peripherals
Peripheral Converts AC power to DC for peripherals (In peripheral
power supply cabinet)
Remote power Connects the remote control line between two 1
control module control boards and passes it to a master AC
sequencer
Fan centerplane Provides power to the pluggable fan trays 2
Fan trays Each fan tray contains two fans for system cooling 5 to 8 pairs

2-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Component Locations

In general, system components are numbered top to bottom, left to


right. The back of the system has the lower-numbered components; the
front has the higher-numbered components. The components are:

● AC – AC input control modules

● CB – Control board

● CSB – Centerplane support board

● FT– Fan tray

● PDU – Input power sequencing and distribution unit

● PS – AC power supply

● RPC – Remote power control connectors

● SB – System board

Architecture Overview 2-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Component Locations
PS4
PS5
AC2
PS6
PS7
AC3
FT8
FT9
FT10
FT11
CSB1
SB8
SB9
SB10
SB11
SB12
SB13
SB14
SB15
CB1
FT12
FT13
FT14
FT15

Figure 2-1 Front of Enterprise 10000 Cabinet

2-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Component Locations
AC0
PS0
PS1
AC1
PS2
PS3
PDU
RPC0
RPC1
RPC2
RPC3
RPC4
FT0
FT1
FT2
FT3
CSB0
CB0
SB0
SB1
SB2
SB3
SB4
SB5
SB6
SB7
FT4
FT5
FT6
FT7

Figure 2-2 Back of Enterprise 10000 Cabinet

Architecture Overview 2-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Data Interconnects

The Enterprise 10000 provides multiple data and address buses to


connect the system boards through the centerplane.

Data Paths
The data bus consists of two pairs of unidirectional, two-level, 16 x 16
crossbar switches that transfer data packets between the 16 system
boards. This means that each system board is connected directly to
every other system board through the centerplane.

System data paths are separate 144-bit-wide data paths to and from
each system board slot. If all system boards request different
destinations, the system could do 16 simultaneous 64-byte transfers.
However, if two boards request same destination, one must wait.

The data bus has a theoretical bandwidth of 21.3 Gbytes per second,
but in normal operation, 10.7 Gbytes per second is the limit based on a
combination of the data and address path bandwidths.

2-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Data Interconnects

Address Paths
The Enterprise 10000 system provides four hardware address buses.
Each bus can be used to make data transfer requests to another
location in the domain, either on the same or on a different system
board. These buses are 48 bits wide including error correcting code
bits. Each bus is independent, meaning that there can be four distinct
address transfers occurring simultaneously.

The four address buses can carry as many as 167,000,000 transactions


per second at the initial 83.3 MHz centerplane clock speed. System
boards only see the messages to and from the system boards in their
own domain. The "crosses" in the crossbar are only enabled if their
respective system boards are in the same domain. This allows the
domains to configure their buses independently. All components have
access to all buses.

Architecture Overview 2-11


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Data Interconnects

System board 12
1

3
rd 1

rd 1
64-byte

oa

boa
block

mb
10

14
tem
d
ar

te

d
ar
bo

Sys

Sys

bo
em

em
st
64-byte

st
Sy
block

Sy
rd 9 5
mb
oa
oa rd 1
te
Sys te mb
Sys

System board 8
System board 0

Sys
tem Sys
boa tem
rd 7 boa
rd 1
Sy

Sy
te s

st
m

em
Sys

Sys
b

System board 4
oa

bo
te

tem
r

ar
d

mb

d
6

2
boa
rd 5oa

rd 3

144-bit wide, 16 x 16
data bus
(full centerplane)

64-byte
Global data router block
(on centerplane)
4 bus cycles to
transfer 64-byte block
Memory
64-byte block
64-byte block
Processor module 64-byte block
64-byte Block

Ecache

2-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Centerplane Configurability

The centerplane is designed so that a single component failure will not


cause a system failure. This is accomplished by partitioning the
centerplane into two, identical, independent sets of components that
operate together unless there is a failure. Should a failure occur, the
following degraded modes of operation can be used:

● The system will operate with one, two, or three address buses.
Performance degradation when operating with less than four
address or two data buses will be application dependent. At
configuration time the system will determine the optimum
combination and use it.

● The system can operate with one 72-bit data bus. Note that the
data bus bandwidth is two times the available address bus
bandwidth in a fully operational system. Therefore, with only one
72-bit data bus, the system is balanced for address and data
bandwidth.

Architecture Overview 2-13


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Centerplane Configurability

● The system will operate with one or two address buses and one
72-bit data bus with a half-centerplane failure.

● The system board slots can be logically isolated and effectively


removed from the system.

Note – The active control board cannot be hot swapped without a


system boot even after configuring the clock source to the alternate
control board.

2-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

The System Board

The system board itself is a multilayer printed circuit board that


connects the processors, main memory, and I/O subsystems to the
centerplane. Each system board can support the following:

● Four 333 MHz UltraSPARC-II (V.9) processor modules with a


supporting second-level 4 Mbyte cache per module or 250 MHz
CPUs with either 1 or 4 MBytes of cache. CPU types may not be
mixed in a system.

● Four memory banks with a capacity of up to 4 Gbytes per system


module (64 Gbytes per Enterprise 10000 system).

● Two I/O buses per system board, each with either 2 SBus slots or
1 PCI slot, giving a total of 32 SBuses with 64 SBus slots per
system or 32 PCI slots per system. PCI and SBus slots may not be
mixed on a system board, but may be mixed in a system and
domain.

Architecture Overview 2-15


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

The System Board

The system board is a complex computing system in its own right,


similar to a standalone four CPU UltraSPARC system. See the Sun
Enterprise 10000 System Overview Manual for more detail.

Logical View
Global address arbiter (GAARB)
Global address router (GAMUX)
Global address arbiter
Global address router
Global address arbiter
Global address router
Global address arbiter
Global address router

Coherency Coherency Coherency Coherency


arbiter (LAARB)

interface interface interface interface


Local address

cntrl (CIC) cntrl (CIC) cntrl (CIC) cntrl (CIC)

Memory Port Port Port


controller controller controller controller
(MC) (PC) (PC) (PC)
U P A a d d r e s s b u s e s
UltraSPARC

UltraSPARC

UltraSPARC

UltraSPARC

Memory
I/O bridge

I/O bridge

Pack/
unpack
arbiter (LDARB)

U P A d a t a b u s e s
Local data

Data Data Data Data


buffer buffer buffer buffer
(XDB) (XDB) (XDB) (XDB)

Local data router (LDMUX)

Global data Global data


arbiter (GDARB) router (GDMUX)

2-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

The System Board

Physical View (SBus I/O)


I/O module
Centerplane

SBus card
Four banks of Pack/
eight DIMMs unpack
each

Pack/
unpack
SBus card

Memory
Module

SBus card
~21.1”
Pack/
unpack

Pack/
unpack
SBus card

UltraSPARC UltraSPARC UltraSPARC UltraSPARC


processor processor processor processor
module module module module

~16.0”

Architecture Overview 2-17


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Mezzanine (Daughter) Board Packaging

The Enterprise 10000 is designed to be able to take advantage of future


technological improvements without making the current hardware
obsolete.

The processor modules connect directly to the system board. They can
easily be replaced if necessary, as they are individually mounted on
the system board.

Memory DIMMs and SBus and PCI cards can also be easily replaced.

However, technology will change. Perhaps the DIMM (Dual In-line


Memory Module) memory package will not be the most efficient
format in a few years. Or perhaps you want to exchange some SBus
cards for PCI cards or another I/O bus technology. In that case, the
connectors for these components will be useless, which usually
requires the expensive replacement of the entire system board.

2-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Mezzanine (Daughter) Board Packaging

The Enterprise 10000 system boards are specially constructed with


intermediate mezzanine or daughter boards connecting the memory
DIMMs and the I/O cards to the system board. In the event that
memory or I/O interconnect technology changes, the mezzanine board
is replaced with one supporting the new interconnect, the new
components are added, and the system board goes back in service.

And, with Dynamic Reconfiguration, it may not even be necessary to


reboot the operating system. Normal processor, DIMM and I/O card
replacements can be made this way.

Architecture Overview 2-19


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

SBus Mezzanine Packaging

Memory
mezzanine

I/O
mezzanine

SBus
cards

Processors
Front

2-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
System board

Personality plate

PCI filler panel


PCI card

PCI front bracket

Architecture Overview
PCI Mezzanine Packaging

PCI front cover


PCI I/O module

Top PCI riser card

Bottom PCI riser card

Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

2-21
2

Memory Subsystem

Each Enterprise 10000 system board provides a daughter card that


holds up to thirty-two 128-Mbyte DIMMs. Using currently available
64-Mbit DRAM chips, a fully configured Enterprise 10000 system
today offers 64 Gbytes of system memory.

The daughter card supports two sizes of JEDEC-standard, 144-pin,


8-byte DIMMs: 32 Mbytes and 128 Mbytes. This flexibility enables the
amount of memory on a given system board to vary from 0 to 4
Gbytes.

DIMM Size Full System Board Full System


32 Mbytes 1 Gbyte 16 Gbytes
128 Mbytes 4 Gbytes 64 Gbytes

A system board can have zero to four banks of eight DIMMS each
installed. A bank must be fully populated. While all DIMMs on a
given system board must be the same size, DIMM sizes can be
different on the different system boards.

2-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Memory Subsystem

A memory module can do a 64-byte read or write every four system


clocks (48 nanoseconds), providing a memory bandwidth of 1300
Mbytes per second. The memory system allows a 64-deep input
request queue per address bus at each memory controller.

The entire memory data path is protected by error correcting code


(ECC) mechanisms, and DIMM organization is specifically designed
such that each DRAM chip contributes only 1 bit to a 72-bit word of
data. Thus the complete failure of any single DRAM chip causes only
correctable single-bit errors.

High memory performance is ensured by offering extensive


interleaving. Interleaving is automatically configured across two, four,
or eight DIMM banks, depending on the installed memory
configuration. Four banks (one system board) is the default. While an
interleave factor of eight gives better performance, it will prevent the
removal of a system board using Dynamic Reconfiguration. However,
an interleave factor of eight can be specified if desired.

Architecture Overview 2-23


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

I/O Subsystem

Ultra Port Architecture


The Ultra Port Architecture (UPA) defines a separate address and data
system component interconnect for I/O. The UPA defines both
processor and I/O access interface to shared physical memory.

A UPA module logically plugs into a UPA port. The UPA module can
contain a processor, an I/O controller with interfaces to I/O buses, and
so forth. A UPA port uses separate packet-switched address and data
buses, and the address and data paths operate independently,
providing significantly better performance.

On a normal bus-based system only about 70 percent of the possible


bandwidth is available for transferring data, with the rest being used
for address and control functions. Separating these functions lets both
addresses and data each have 100 percent of the possible bandwidth
on their separate paths, and enables the implementation of each
function to be optimized differently.

2-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

I/O Subsystem

Logical View

I/O Module

Data side of UPA Address side of UPA

SYSIO

Port
Controller
(PC)
Enterprise 10000 data buffer

SBus SBus
card card

SYSIO

SBus SBus
card card

The I/O daughter card contains the SYSIO ASICs (application-specific


integrated circuit) (UPA to SBus interface) and connections for 2 SBus
cards. Each system board has two I/O daughter cards. With 4 SBus
slots per system board, this provides 64 SBus slots in a fully
configured Enterprise 10000 system.

PCI attaches through the SYSIO chip as well, using a PCI version
instead of an SBus verion. With 2 PCI slots per system board, you can
have 32 PCI slots in a fully configured system.

No SBus slots are preassigned, and there are no new bandwidth


restrictions on the SBus. Full 64-bit, 25 MHz SBus cards are supported.
High speed 64-bit SBus cards, and certain other SBus cards, however,
must always be placed in slot 0. They may not work properly in slot 1.
You may not be able to use SBus slot 1 in some configurations.

Architecture Overview 2-25


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

JTAG

The Joint Test Action Group (JTAG) technology, also known as


boundary scan, is an Institute of Electrical and Electronics Engineers
(IEEE) standard used for testing VLSI (Very Large Scale Integration)
electronics. It uses a four-wire serial interface to essentially enable a
board or component to test itself instead of requiring special testers.

Sun uses an extended form of JTAG, called JTAG+, to test, configure,


and monitor the Enterprise 10000. Most SSP applications and daemons
access the system through JTAG. All SSP application JTAG requests are
sent through the SSP Control Board Server daemon (cbs).

cbs sends the JTAG commands over TCP/IP to the Control Board
Executive (cbe) running on the control board. cbe monitors and
controls the Enterprise 10000 hardware under the direction of various
SSP applications.

The JTAG interface supports setting and monitoring various system


components, temperatures and power, and measuring processor
module core resistors, and other functionality.

2-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Support Boards

The Enterprise 10000 system has two types of support boards in


addition to the system boards.

Two centerplane support boards (CSB) (both are required) supply


system clock signals and power to the centerplane.

The control board (CB) generates clock signals, JTAG scan signals and
control, and provides an Ethernet interface to SSP from the system.
Only one Control Board is required. A second Control Board may be
installed for redundancy, although only one may be active at a time.
Switching Control Boards requires a reset of the entire platform.

Architecture Overview 2-27


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Support Boards

System board

Centerplane
support board

Control board

The other Centerplane Support and Control boards are directly behind
those shown (on the other side of the centerplane).

2-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Control Board

The control board supports communication between the SSP and


Enterprise 10000 processors, and controls the support subsystems such
as power and cooling. Additionally, it monitors the health of the
system by using the JTAG hardware located on each system board.

The control board is a self-contained computer system consisting of a


SPARClite™ processor, memory, serial interface, Ethernet controller,
JTAG controller, 10BASE-T interface, and reset and control logic. It
controls the system JTAG, clock, fan, system and I/O power, and
Ethernet interface functions.

The control board contains a flash programmable, read-only memory


(PROM) that holds its initialization program and the network boot
support.

Architecture Overview 2-29


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Control Board

The Ethernet controller provides the link between the SSP and the
control board. The JTAG controller scans and controls the power to all
of the Enterprise 10000 components. The reset and control logic
performs various functions, such as monitoring the inlet airstream of
the ambient air and maintaining the system heartbeat mechanism.

2-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Enterprise 10000 Client-Server Architecture

The Enterprise 10000 control board interface is accessed over an


Ethernet connection using TCP/IP. The control board executive, cbe,
runs on the control board; and the control board server, cbs, on the
SSP makes service requests to it. The SSP control board server (the
client to the real cbs) is a server to other SSP clients.

Architecture Overview 2-31


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

System Failure Isolation Capabilities

Some of the special system reliability features of the Enterprise 10000


design include:

● Point-to-point buses mean that a single board failure cannot bring


down a whole bus.

● Degraded interconnect operation is possible; the system

● Can run on four, three, two, or one address buses

● Can run on half the data bus

● Error correcting code (ECC) protection is provided on all


centerplane address and data paths.

● History buffers in the interconnect chips are used for pinpointing


where in a data or address path an error occurred.

● Two centerplane support boards

● Optional redundant control boards

2-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

System Failure Isolation Capabilities

● Modular power supplies with hot swapping

● Availability of at least one more power supply than needed

● On-board DC-to-DC converters

Architecture Overview 2-33


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Describe the construction of the Enterprise 10000.

❑ Describe the system board interface.

❑ Explain the system board structure.

❑ Recount centerplane operation.

❑ Explain the control board structure.

❑ Explain component packaging.

❑ Understand the function of JTAG.

❑ Describe the SSP interaction with the Enterprise 10000.

2-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
2

Think Beyond

Why was the Enterprise 10000 built in such a modular fashion?

What do you think the system administration challenges will be for


the Enterprise 10000? The operational challenges?

Why were the ccontrol boards included in the system?

Why was some much attention paid to reliability and on-line repair?

Architecture Overview 2-35


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
SSP Software Installation 3

Course Map
This module describes how to install and configure the software
required on the SSP. It covers both Solaris and the SSP software for the
Enterprise 10000. It also describes how to boot and operate the SSP.

3-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. What needs to be done to install the OS on the SSP?

2. What special software needs to be installed on the SSP?

3. What order dependencies might there be in preparing the SSP?

4. How does the installation of a main SSP differ from the spare SSP?

5. How do we maintain the ID PROM replacement files for the


Enterprise 10000 domains? What are they?

3-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Objectives

Upon completion of this module, you will be able to:

● Plan the network for an Enterprise 10000 and its SSP

● Describe the software used on the SSP

● Describe the SSP configuration options

● Understand the restrictions of the SSP

● Completely install the SSP software

● Perform basic SSP commands and procedures

● Describe how to change the control board configuration.

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Ultra Enterprise 10000 SSP 3.1 Release Notes

● Sun Enterprise 10000 System Overview Manual

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

● The man pages for the commands, daemons, and files

SSP Software Installation 3-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Enterprise 10000 Network Planning

Before you can install your system, you will need to know what your
network should look like. You have a large number of Enterprise 10000
system components that require network addresses:

● The primary and backup SSP

● One or two control boards

● Up to eight domains

Some of these will require multiple addresses. The SSP can have as
many as four subnets, and the domains at least two.

There are several different configurations that will work, but all have a
common factor: the control boards must be isolated from other
network traffic. The control boards are very sensitive to delay, and too
much interference could cause the Enterprise 10000 system to fail.

Caution – Always isolate the control boards on their own network.


!
3-4 Ultra Enterprise 10000 Administration
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Enterprise 10000 Network Configurations

The basic choices that you will have to make for the network
configuration are:

● Will the control boards be on an individual or a shared network?

● Will the domain to SSP interfaces be on a private or public


network?

● Will a backup SSP be configured?

Once these issues are decided, you can begin to assign hostnames and
Internet Protocol (IP) addresses and configure the network for the
system

When making a decision, use the general principle that, the more
isolation there is, the better.

The configuration of a spare SSP is easily accomplished after the basic


configuration decisions are made.

SSP Software Installation 3-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Control Board Configurations

The control boards can be configured two different ways. They can
share a subnet, or each can have its own subnet.

On a shared subnet, one host interface from the SSP is required. Two
are required for dual subnets.

Each control board and SSP network interface must have a unique host
name and IP address.

These connections must be on separate hubs from the control boards.

3-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Domain Network Configurations

The domain network interfaces can be configured two different ways.


They can share a subnet, or each can have its own subnet.

In the public network configuration, one host interface from the SSP
and each domain is required. In the private network interface, two are
required from each. Each domain and SSP network interface must
have a unique host name and IP address. This means that the SSP may
have a different hostname from the domains than it does from the
external network.

These connections must be on separate hubs from the control boards.


Remember to create the /etc/notrouter file on the SSP to prevent the
SSP from routing domain traffic.

SSP Privacy
You may or may not want to have the SSPs directly accessable from an
external network. If not, you must use the private network
configuration, and do not connect the interfaces shown with the
dashed lines.

SSP Software Installation 3-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Sample hosts File

The following is a sample /etc/inet/hosts file that shows all of the


SSP and domain connections for a dual control board subnet, public
domain network configuration.
# Internet host table
127.0.0.1 localhost loghost
#
# Control boards
192.9.201.10 jefferson # CB 0
192.9.201.70 madison # CB 1
# SSP interfaces
12.1.1.248 franklin # Outside interface
192.9.201.129 franklin-d # Private domain interface
192.9.201.15 franklin-c0 # CB 0 dedicated interface
192.9.201.75 franklin-c1 # CB 1 dedicated interface
12.1.1.248 hamilton # Backup SSP outside interface
192.9.201.130 hamilton-d # Backup SSP domain interface
192.9.201.16 hamilton-c0 # CB 0 dedicated interface
192.9.201.76 hamilton-c1 # CB 1 dedicated interface
# Private domain interfaces to the SSP
192.9.201.150 washington-ssp
192.9.201.151 addams-ssp
192.9.201.152 jackson-ssp
192.9.201.153 lincoln-ssp
192.9.201.154 grant-ssp
192.9.201.155 kennedy-ssp
192.9.201.156 roosevelt-ssp
192.9.201.157 chase-ssp
# Domain interfaces to the outside world
12.1.1.230 washington
12.1.1.231 addams
12.1.1.232 jackson
12.1.1.233 lincoln
12.1.1.234 grant
12.1.1.235 kennedy
12.1.1.236 roosevelt
12.1.1.237 chase

● The netmask for the SSP “internal” network 192.9.201 is


255.255.255.192.

● In this configuration, /etc/notrouter is created so that the SSP


will not route traffic from outside to the domain private interfaces.

3-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Enterprise 10000 Network Planning Worksheet


SSP 0 Enterprise 10000 server
Hostname: Platform name:
Hostname: Hostname:
CB1_subnet dom_subnet CB0_subnet

CB0_subnet CB1_subnet
Subnet 1

Subnet 1
CB0
IP address: IP address:

le0
Hub 0
CB0_subnet netmask: CB0_subnet netmask:
Hostname: Hostname:
Subnet 3

Subnet 2
hme0

CB1
IP address: Hub 1 IP address:
dom_subnet netmask: CB1_subnet netmask:

QFE
Hostname: Domain 1 name:
Subnet 2

hme1
IP address: Hostname:

dom_subnet
Enet Port

Subnet 3
CB1_subnet netmask: IP address:
dom_subnet netmask:
SSP 1 Domain 2 name:
Hostname:
Hostname:

dom_subnet
Enet Port

Subnet 3
Hostname:
CB1_subnet dom_subnet CB0_subnet
Subnet 1

IP address:
IP address:
le0

dom_subnet netmask:
CB0_subnet netmask:
Domain 3 name:
Hostname:
Subnet 3

Hostname:

dom_subnet
hme0

Enet Port

Subnet 3
IP address:
IP address:
dom_subnet netmask:
QFE

dom_subnet netmask:
Hostname:
Subnet 2

Domain 4 name:
hme1

IP address:
Hostname:

dom_subnet
Enet Port

Subnet 3
CB1_subnet netmask:
IP address:
dom_subnet netmask:
Customer
NIS/NIS+ domain name: Net Domain 5 name:
(dom_subne
DNS domain: t) Hostname:

dom_subnet
Enet Port

Subnet 3
CB0_subnet netmask: IP address:
dom_subnet netmask: dom_subnet netmask:
CB1_subnet netmask: Domain 6 name:
Hostname:

dom_subnet
Enet Port

Notes: IP address: Subnet 3

dom_subnet netmask:
• Netmasks must be the same within a subnet.
Domain 7 name:
• Each hostname must be unique.
Hostname:
• Each IP address must be unique but within the respective
dom_subnet
Enet Port

Subnet 3

subnet.
• Each control board must be on a separate subnet. IP address:
• To avoid confusion, for each domain, the domain name and dom_subnet netmask:
hostname should be the same.
Domain 8 name:
Hostname:
dom_subnet
Enet Port

Subnet 3

IP address:
dom_subnet netmask:

This worksheet, taken from the Sun Enterprise 10000 System Hardware
Installation and De-Installation Guide, will help you plan your Enterprise
10000 control networks.

SSP Software Installation 3-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

The SSP Software

The SSP software normally comes preinstalled from the factory on the
SSP workstation. All that is required to prepare it for use with the
Enterprise 10000 system it accompanies is to reply to the configuration
questions the first time that it boots up.

However, should you lose the SSP system, you may need to reinstall
the system.

Remember that the sole function of the SSP system is to monitor and
control the Enterprise 10000 host system. It should be used for no
other function. The SSP is constantly monitoring the Enterprise 10000
host through the control boards and information from the active
domains. It must be available at any time to handle conditions that
could arise on the host. Never run any other applications on the SSP.

Installation of Solaris on the SSP is intentionally tailored to make it


difficult to use the SSP for any other function.

3-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

The SSP Software

Regardless of the level of Solaris running in your domains, the SSP


software will only operate on Solaris 2.5.1. You must install Solaris
2.5.1 Hardware 4/97 or higher on your SSP system. Solaris 2.6 is not
supported.

SSP Accounts
There are two accounts on the SSP, root and ssp.

The root account is used to manage the SSP itself and is created when
Solaris is installed.

The ssp account is created when the SSP software is installed and is
used to control the Enterprise 10000 host and its domains. The default
password created for the ssp account by the SSP install process is ssp.
The install process also installs .cshrc and .login files. The account
assumes that it is running the C shell; do not modify this default.

SSP Software Installation 3-11


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

The SSP Packages

To prepare the SSP, you must install all 12 of the following packages
on the SSP. These packages must be installed in a specific order. They
are given here in alphabetical order only for reference.

The SSP packages are provided on the System Service Processor (SSP)
3.1 for the Ultra Enterprise 10000 CD-ROM. Make sure that you apply
the appropriate current patches before attempting to communicate
with the Enterprise 10000 platform.

3-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

The SSP Packages

● SUNWsspdf – Data Files

● SUNWsspdo – SSP Domain Utilities

● SUNWsspdr – Dynamic Reconfiguration Utilities

● SUNWsspid – Inter-Domain Networking

● SUNWsspmn – Man pages

● SUNWsspob – Open Boot Prom Utilities

● SUNWsspop – Core Utilities

● SUNWssppo – POST Utilities

● SUNWsspr – SSP, Root

● SUNWsspst – Scan Tests

● SUNWsspue – User Environment

● SUNWuessp – SSP 3.0 AnswerBook

SSP Software Installation 3-13


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

SSP Software Environment Variables

A number of environment variables are set for the ssp account on the
SSP. These specify locations for the SSP files.

● $SUNW_HOSTNAME

No default; it must be set to the name of the domain being


controlled.

● $SSPETC

This variable has the value /etc/opt/SUNWssp, and contains the


SSP startup scripts.

3-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

SSP Software Environment Variables

● $SSPVAR

This variable has the value /var/opt/SUNWssp, and contains:

● Platform and domain configuration files

● Log files

● Scan data files

● Daemon lock files

● $SSPLOGGER

This variable has the value /var/opt/SUNWssp/adm, and contains


log files.

● $SSPOPT

This variable has the value /opt/SUNWssp, and contains:

● SSP executables

● The man pages

SSP Software Installation 3-15


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Saving the SSP Configuration Files

The SSP contains files that are difficult to rebuild if they are damaged
or lost. Back up these files on a regular basis. Remember that much of
the Enterprise 10000’s configuration information is loaded from the
SSP.

In the following steps, platform_name and domain_name refer to


the names of the platform and the currently defined domains for the
SSP. To view the currently defined domains, use the domain_status
command or the $SSPVAR/.ssp_private/domain_config file.

1. Save the following files which contain all of the eeprom.image


files shipped with your system:
$SSPVAR/.ssp_private/eeprom_save/*

3-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Saving the SSP Configuration Files

2. If you have domains defined, the most current copy of the


eeprom.image for each domain is saved in a different location. For
each domain currently defined, save the following file:
$SSPVAR/etc/platform_name/domain_name/eeprom.image

Note – This file should be saved under the name


eeprom.image.domain_name. These files will supersede some of the
files saved in step 1.

3. If you have modified any of the following files, they should be


saved:
$HOME/.postrc
$HOME/.login
$HOME/.cshrc

4. If $SSPVAR/etc/platform_name/domain_name/.postrc
exists, save it as postrc.domain_name.

Do this for all domains that have a modified .postrc file.

5. The directory $SSPVAR/adm and its subdirectories contain log files


describing SSP system operation. This includes messages files and
hardware state dump files for each domain. For future reference,
save the $SSPVAR/adm directory and its subdirectories.

SSP Software Installation 3-17


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Installing and Configuring the SSP Solaris Software

Installing Solaris on the SSP is almost identical to a regular Solaris


installation. The SSP software is tailored to allow it to perform only
SSP functions.

You must install Solaris 2.5.1, at Hardware 4/97 or higher.

The SSP should be installed with a 1-Gbyte or larger disk.

1. Boot the installation CD-ROM on the SSP workstation.

2. When the Upgrade System? dialog is displayed, choose Initial.

3. When the System Type dialog is displayed, choose Standalone.

4. When the Software dialog is displayed, choose End User System


Support, then choose Customize.

3-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Installing and Configuring the SSP Solaris Software

5. When the Customize Software dialog is displayed, add the


following clusters and packages:

● Archive Libraries

● Basic Networking

● Expand the Font Server Cluster and add X Windows Optional


Fonts

● Graphics Headers

● On-Line Manual Pages

● Point-to-Point Protocol

● Programming Tools and Libraries.

Initially, not all of the packages in this cluster are selected.


Click twice on the cluster’s selection box to select all of the
packages.

● SunOS Header Files

● Tooltalk™ End User

Initially, not all of the packages in this cluster are selected. To


select all of the packages, click twice on the cluster’s selection
box.

SSP Software Installation 3-19


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Installing and Configuring the SSP Solaris Software

6. While the Customize Software dialog is still displayed, delete the


following clusters and packages:

● Expand the OpenWindows Version 3 Cluster, and delete the


following packages:

● OpenWindows binary compatibility

● OpenWindows on-line handbooks

● Expand the Volume Management Cluster and delete the


Volume Management Graphical User Interface package.

● Delete the entire cluster of XGL™ Runtime libraries and files.

● Expand the XIL™ Runtime Environment Cluster, and delete


the following packages:

● XIL English Localization

● XIL Loadable Pipeline Library

7. Confirm that no entries are shown in the Unresolved Software


Dependencies area of the Customize Software dialog, then click
on OK.

8. When the Software dialog is displayed again, click on Continue.

9. When the Disks dialog is displayed, choose the disk on which the
software is to be installed, click on Add, then click on Continue.

10. When the Preserve Data? dialog is displayed, click on Continue.

11. When the Automatically Layout File Systems? dialog is displayed,


choose Manual Layout.

12. When the File System and Disk Layout dialog is displayed, choose
Customize.

3-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Installing and Configuring the SSP Solaris Software

13. In the Customize Disk screen, set up the disk partitions for the
root disk, and click on OK when you are done.

It is recommended that you allocate 256 Mbytes for swap


(partition 1), and leave the remainder in partition 0 for root (/).
You should have at least 745 Mbytes left.

14. When the File System and Disk Layout dialog is displayed again,
choose Continue if the layout is correct; otherwise, choose
Customize and go back to Step 13.

15. When the Mount Remote File Systems? dialog is displayed, choose
Continue.

16. When the Profile dialog is displayed, confirm your selections and
choose Begin Installation.

17. Choose Reboot to begin software installation.


This step, which installs the software and the current set of
patches, takes approximately 80 minutes to complete. When the
installation is complete, the SSP will reboot.

SSP Software Installation 3-21


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Installing the xntp Software

The public domain NTP software has been adapted to work on Solaris
to allow synchronization of the clocks betwen the SSP and the
domains, which is necessary for DR.

The version of NTP shipped with Solaris 2.5.1 will only work in the
Enterprise 10000 environment. The 2.6 version will work in a general
configuration. You can interconnect the 2.5.1 and 2.6 versions.

1. Insert the CD-ROM entitled Updates for Solaris Operating


Environment 2.5.1 Hardware 4/97 or a later 2.5.1 version, and install
the SUNWxntp package.
ssp# cd /cdrom/upd_sol_2_5_1_hw_497_smcc/SMCC
ssp# pkgadd -d. SUNWxntp

3-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Installing the xntp Software

2. Configure the SSP to act as a time server for the host domains.

The default xntp configuration file


/etc/opt/SUNWxntp/ntp.conf configures the SSP to use its own
clock as the time source to which other systems running xntp can
synchronize their clocks. You can use this default configuration
without changes.

If your network is currently running the xntp protocol and you


want the SSP time to be synchronized by other time servers,
modify the SSP xntp configuration files (as required) for your
network configuration.

For more information on xntp, including information on accessing


very accurate network time hosts, see Appendix A, “Configuring
NTP.”

SSP Software Installation 3-23


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Preparing the System Files

1. Update your name service hosts registry (Network Information


Service [NIS or NIS+], /etc/hosts, or Domain Name Service
[DNS]) to include the host names and IP addresses of your control
board(s).

Note – These are not the host names and IP addresses of your
domains.

2. Update your name service ethers registry (NIS, NIS+,


/etc/ethers) to include the host names and Ethernet addresses
of your control board(s).

3-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Preparing the System Files

3. Ensure that /etc/nsswitch.conf is properly configured.

If the SSP is using NIS or NIS+, you must edit the


/etc/nsswitch.conf file on the SSP to force it to use its local
/etc/ethers, /etc/bootparams, /etc/services, and
/etc/netmasks files before the NIS or NIS+ files.
Correct entries should appear as follows (for NIS):
ethers: files nis
netmasks: files nis
bootparams: files nis
services: files nis

SSP Software Installation 3-25


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Installing the SSP Software Packages

Make sure that the system is in single-user mode or at run level 1.

To install the SSP software packages from the SSP CD-ROM:

1. Insert the SSP CD-ROM into the CD-ROM drive.

2. Change to the /cdrom/ssp_3_1 directory.

3. Add the SSP packages in the following order:


ssp# pkgadd -d. SUNWsspue SUNWsspop SUNWsspmn SUNWsspdf SUNWsspst \
SUNWsspr SUNWsspdo SUNWsspob SUNWssppo SUNWsspid SUNWsspdr SUNWuessp

Caution – Use the order of the SUNWssp packages shown above. You
! will cause problems with the SSP configuration process if you do not
follow this order.

4. After installing the SSP packages, reboot the SSP system.

3-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Installing the SSP Software Packages

Caution – Whenever you reboot the SSP, wait 3 minutes before you
! perform any SSP commands. In the current release, this delay is
needed to enable the SSP software initialization process to complete.

SSP Software Installation 3-27


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Configuring the SSP Environment

With the factory-installed SSP software, this and the following sections
are the steps that must be performed to prepare the SSP.

When the SSP system boots for the first time with the SSP software
installed, during the boot process you will be asked configuration
information about your Enterprise 10000 host.

You will also be asked these questions if you run ssp_config.

3-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Configuring the SSP Environment

Make sure that you have the following information available:

● Enterprise 10000 platform name

● Control board 0 host name and IP address

● Control board 1 host name and IP address (if present)

● Whether this is the main or a spare SSP

The system will request this information during the SSP boot process..

Responding to the Questions


1. Enter the name of the platform that this SSP will service.

The platform name is simply a name by which the SSP software


refers to the entire Enterprise 10000 host. The platform name is not
the same as the host name of a domain. A domain and a platform
may have the same name, because the domain and platform
names occupy different name spaces. The platform name is not in
the name service, and is not seen off of the SSP.

Note – If you make a mistake during this configuration session,


continue to the end of the prompts where you will be given an
opportunity to correct any errors.

2. Define the control boards.

For each control board slot, indicate whether there is a control


board present and provide the host name for the respective control
board. If the IP address for a control board is not found, you will
be prompted for this information. If two control boards are
present, you will be asked which control board is the primary
(active) control board.

SSP Software Installation 3-29


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Configuring the SSP Environment

Here is a representative session:


Beginning setup of this workstation to act as a MAIN or SPARE SSP.

The platform name identifies the entire host machine to


the SSP software. The platform name occupies a different
name space than domain names(hostnames of bootable systems).

Please enter the name of the platform this ssp will service: presidents
Do you have a control board 0? (y/n): y
Please enter the host name of the control board 0 [presidentscb0]:
jefferson
Do you have a control board 1? (y/n): y
Please enter the host name of the control board 1 [presidentscb1]:
madison

Please identify the primary control board.


Is Control Board 0 [jefferson] the primary? (y/n) y

Platform name = presidents


Control Board 0 = jefferson => 192.9.200.90
Control Board 1 = madison => 192.9.200.120
Primary Control Board = jefferson

Is this correct? (y/n): y

Are you currently configuring the MAIN SSP? (y/n)

● For the main SSP, reply y to this prompt and you will see:
MAIN SSP configuration completed.
● For a spare SSP, reply n to this prompt, and you will see:
SPARE SSP configuration completed.

If you are configuring a spare SSP, you have finished its configuration
at this point.

3-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Connecting to the Enterprise 10000 Host System

1. Log in as user ssp. The SSP account has been created by the SSP
software install process. The default ssp account password is ssp.

2. When prompted for the SUNW_HOSTNAME, use the platform name


for the Enterprise 10000 system.

3. Ensure that the Enterprise 10000 system is powered on. Use the
power command.
ssp% power -on -all

SSP Software Installation 3-31


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Reconfiguring the SSP

If you need to change the SSP platform configuration information, you


can run the ssp_config command from root.
# ssp_config

ssp_config is also used to initially configure the SSP, and asks you for
the same information that it did just after you installed the SSP
packages. Reboot the system after running it.

ssp_config does not, however, make any changes to the SSP’s Solaris
system identity as sys-unconfig does. You can run sys-unconfig to
change the SSP’s identity without needing to rerun ssp_config.

You can also use ssp_config if you need to change the characteristics
of the control boards. This is discussed in Module 5, "Domains."

3-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Changing the SSP Type

The ssp_config command must be run as root.

Switching to Spare
To switch the main SSP to spare status, enter
# ssp_config

with no operands. This will remove the SSP daemon inittab line and
kill any active SSP daemons. You should then immediately configure a
spare SSP to become the new main SSP.

Warning – Do not start two SSPs with both active as main. This may
confuse the control coard, requiring you to reset it and thus resetting
any active domains.

SSP Software Installation 3-33


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Changing the SSP Type

Switching to Main
To change an SSP from a spare to the main SSP, ensure the main SSP is
shut down, then on the spare SSP enter
# ssp_config spare

This changes /etc/inittab to start the SSP daemons on the new


main SSP system. The SSP daemons will be immediately started.

Note – All domains will require that the /etc/ssphostname file


contain the new SSP’s host name. All domains must reference the
same new SSP.

For more information on switching SSP systems, see the Sun Enterprise
10000 System Hardware Installation and De-Installation Guide.

/etc/inittab
The spare SSP contains a file named /etc/inittab.main that contains
the line to activate the SSP daemons. If you change the default
/etc/inittab, remember to change this file as well.

The line added to the end of /etc/inittab to start the SSP daemons
is:
sp:234:respawn:su - ssp -fc /etc/opt/SUNWssp/ssp_startup.sh 15 \
>/dev/null 2>&1 </dev/null # SUNWsspr

3-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Dual Control Boards

A system should be configured with dual control boards for backup


purposes. Although you can manually switch between the control
boards, only one board is active at a time. Should a control board fail,
the backup board may be brought into service, avoiding a prolonged
outage. An inactive control board may be hot swapped.

One of the control boards is identified as the default or primary


control board during SSP configuration. The SSP will only
communicate with this control board, even if the alternate is installed.

If you must switch the primary control board because of a connection


failure or for other reasons, you must modify the control board
configuration file and reboot the SSP.

Note that this operation cannot be performed without rebooting all


running domains, because the control board is providing the high-
frequency system clocks for all boards in the platform. Changing the
control board forces a reset of all domains on the Enterprise 10000.

SSP Software Installation 3-35


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Dual Control Boards

Control Board Executive (cbe)


When the system powers up, both control boards boot from the SSP.
Once cbe (the control board firmware) is booted, it waits indefinitely
for a connection from the SSP cbs daemon.

After the SSP is booted, cbs is started automatically. cbs is responsible


for all communication between the SSP and the primary control board.

cbs connects to the control board specified as the primary in the


control board configuration file and makes it the system master control
board. To connect to the other control board, you must update the
configuration file.

3-36 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

The Control Board Configuration File

The control board configuration file is named


$SSPVAR/.ssp_private/cb_config. It contains only one line.

The format of the line is:


platform_name:platform_type:cb0:status0:cb1:status1

where:

platform_name – The Enterprise 10000 platform name assigned at


installation time.

platform_type – Always Ultra-Enterprise-10000.

cb0 – Control Board 0 host name, if one is installed.

status0 – Indicates if control board 0 is the primary. P indicates


primary; anything else or blank indicates alternate.

cb1:status1 – Control Board 1 host name, if one is installed, and its


status.

SSP Software Installation 3-37


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Control Board Configuration File

For example:
presidents:Ultra-Enterprise-10000:jefferson:P:madison:

This example shows that there are two control boards installed in the
presidents platform. They have host names jefferson (which is the
primary) and madison.

Caution – Do not try to change the primary control board designation


! by editing this file. It is not sufficient, and may cause your domains to
fail.

3-38 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Switching the Active Control Board

If you have dual control boards, you can switch the primary control
board. It will require you to delete all your domains or power off the
entire platform, reconfigure the SSP, and re-create the domains or
system power.

1. If possible, power off all system components except the control


boards. If you can not power them off, use domain_remove to
delete all your domains.

2. Update your name service for the new control board addresses if
necessary. This may be a new MAC address if you have replaced a
control board.

3. Run ssp_config cb from the SSP, specifying the new control


board configuration. This is the same procedure performed when
booting the SSP for the first time. Do this on the main SSP and all
spare SSPs for the platform.

4. Reboot the SSP(s).

SSP Software Installation 3-39


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Switching the Active Control Board

5. Using hostview, make sure that the J and C symbols show in the
hostview display control board squares, which signifies that the
control boards are active.

6. Power the platform back on or re-create the domains, depending


on what you did in step 1.

7. Use bringup to bring up your domains.

Caution – Do not edit the cb_config file in place of this procedure.


!
Determining the Active Control Board
If you are unsure which control board is active, there are several
methods you can use to make the determination.

1. You can use Hostview. The active contol board is the one
containing the J and C characters.

2. You can use snoop to watch the network traffic on the control
board subnet(s). The active control board is the only one that is
sending regular messages.

3. You can physically inspect the control boards. A control board


running the cbe correctly, whether or not it is the active control
board, has lights SW0 through SW6 rolling up and down. If the
control board is operational on the network, light SW7 is on. The
active control board has the JBC PORT CLAIM light on, just
underneath SW7.

3-40 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Control Board Executive Image and Port Specification Files

The SSP is the boot server for the control board. Two files are
downloaded by the control board boot PROM using tftp during boot
time: the image of cbe and the port number specification file. These
files are located in /tftpboot in the SSP. Their naming conventions
are:
/tftpboot/XXXXXXXX – For the cbe image.
/tftpboot/XXXXXXXX.cb_port – For the port number.

where XXXXXXXX is the control board IP address in hex format.

For example, the /tftpboot files for a control board with an IP


address of 129.153.49.147 are:
/tftpboot/81993193
/tftpboot/81993193.cb_port

SSP Software Installation 3-41


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Changing the Control Board Configuration

Warning – For these changes to take effect, you must reset both
control boards. Remember that this will reset all active domains.

You can add a new control board or change the host name and IP
address of existing control boards. This is done with the ssp_config
cb command. The appropriate files in /tftpboot will be updated.

1. Shut down all running domains.

2. As root on the primary and backup SSPs:

a. Update your name service(s) for the new control board names
and addresses

b. Run ssp_config cb on the SSP

c. Reply with the proper control board information

3. Run cb_reset to reload the control boards from the primary SSP

4. Restart the domains

3-42 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Lab

1. Using the sample hosts file shown earlier in the module, diagram
that Enterprise 10000 network and fill out the network planning
worksheet using the provided information.

2. On your workstation:

a. Install Solaris, if necessary.

b. Install xntp from the SMCC updates CD-ROM as shown on


page 3-16 and following.

c. Configure the SSP for the lab host environment. Update the
hosts file from the handout information or a provided file. See
page 3-18 and following for more information.

d. Install the SSP packages, as shown starting on page 3-20.

e. Boot and properly configure the SSP, as shown on page 3-22


and following.

3. Use telnet and log into the lab’s main SSP as the ssp account. On
that SSP:

a. Backup, then create, an eeprom.image file for your assigned


domain, as shown on pages 3-28 through 3-31.

b. Inspect the SSP system files and configuration.

SSP Software Installation 3-43


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Plan the network for an Enterprise 10000 and its SSP

❑ Describe the software used on the SSP

❑ Describe the SSP configuration options

❑ Understand the restrictions of the SSP

❑ Completely install the SSP software

❑ Perform basic SSP commands and procedures

❑ Describe how to change the control board configuration.

3-44 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
3

Think Beyond

Why is the SSP Solaris software profile edited the way it is?

What would happen if you used the SSP for other than monitoring the
Enterprise 10000?

Why are the SSP packages order-dependent?

Why might you need to create or re-create eeprom.image files?

When would you use ssp_config or ssp_unconfig?

SSP Software Installation 3-45


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
System Operation 4

Course Map
This module describes the commands and procedures used to operate
an Enterprise 10000 system. It discusses the interaction between the
SSP, the Enterprise 10000, and the domains, and the control of the
domains and the hardware. It also touches on error reporting, the
location of the various system logs, and security issues.

4-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. What are the specific security issues related to the Enterprise


10000?

2. How does the SSP interact with the Enterprise 10000 platform?

3. How do you control the system hardware?

4. What is the relationship between Hostview and the command-line


commands?

4-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Objectives

Upon completion of this module, you will be able to:

● Describe some of the Enterprise 10000 security issues

● Explain the functions of the SSP and host daemons

● Describe how the SSP and host daemons interact

● Understand Enterprise 10000 error reporting

● List the Enterprise 10000 SNMP interfaces

● Use all of the features of Hostview:

● Domain control

● Power control

● Fan control

● Log inspection

● Perform most of the Hostview functions from the command line

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

● Sun Enterprise 10000 System Overview Manual

● The man pages for the commands and files

System Operation 4-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Security Considerations

Introduction
From a security perspective the Ultra Enterprise 10000 provides a
variety of unique and interesting challenges.

General Comments on Security


The concept of securing corporate computing resources such as the
Enterprise 10000 requires considerable thought and planning. As a
single corporate resource, it is important to consider the place within
the corporate security policy that the Enterprise 10000 fits. It is equally
important to be aware that a system such as this may change the
corporate security policy.

4-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Security Considerations

With its unique and flexible architecture, some new security concerns
come to light. These concerns are the focus of this section and are not
intended to completely address the issue of securing corporate
environments in general.

This discussion will cover three aspects of security as it relates to the


Enterprise 10000. These topics are physical security, system security,
and network security.

For futher information on security issues, check the CERT web page at
www.cert.org and attend the SC-380 course.

Physical Security
Physical security of the Enterprise 10000 is extremely important. Since
this system has the capabilities of Dynamic Reconfiguration and
Alternate Pathing, providing the ability to remove components while
the system remains operational, it is possible that unauthorized
removal of components may occur.

Basic considerations:

● Secure the room within which the system is housed.

● Secure the SSP within the same environment, and ensure that all
networking connections are inside the environment as well.

● Generally, create a safe environment for the system using, for


example, raised floors (where appropriate), non-water based fire
extinguishing equipment, and so on.

System Operation 4-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Security Considerations

System Security
The system security of the Enterprise 10000 is compounded by its
domain architecture. Essentially, each configured domain of the
Enterprise 10000 is another "host" that must be secured.

Also, since the SSP platform controls the entire Enterprise 10000
system, its security is also extremely important.

The following is a list of some items for consideration in each domain:

Users and Passwords

● Restrict users to their own accounts.

● Verify that each account has a unique UID (user ID).

● Verify that every account has a password.

● Ensure that all passwords are within acceptable security


guidelines.

● Use password aging.

● Disable direct root logins.

● Scrutinize /var/adm/authlog and /var/adm/sulog regularly.

● Use restricted shells for any "guest" accounts.

Make sure that you change the default ssp account password
immediately, and then continue to change it on a regular basis.

Make sure that you strictly limit the number of people who have
access to the SSP root and ssp accounts. Anyone with access to these
accounts can control all of the Enterprise 10000 domains.

Also, limit access to any accounts on the SSP to those who need to
control the Enterprise 10000. Using the SSP for other purposes can
significantly slow platform and domain support operations.

4-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Security Considerations

System Security

File Systems

● Define restrictive umask settings (such as 027).

● Eliminate setuid mounts wherever possible.

● Eliminate setuid scripts and programs wherever possible.

● Use utilities such as TIGER, COPS, or ASET to check for


unauthorized setuid and setgid files and other unauthorized
activity.

● Use utilities such as Tripwire to maintain list of critical programs


and directories.

● Back up your file systems regularly, both on the SSP and for the
domains.

System Operation 4-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Security Considerations

Network Security
The Enterprise 10000 has extensive general networking capabilities by
virtue of its multiple add-on SBus slots. Each domain could act as a
host or router independent of each other domain.

General network security issues for these domains are therefore,


entirely similar to those of any host on the network. The general topic
of network security is complex and involved and is beyond the scope
of this course.

The unique network-related security issue with the Enterprise 10000 is


the private network between the SSP and the Enterprise 10000 itself.
Maintaining the privacy of this network is critical. Physical security of
this network is imperative.

Some considerations for the private SSP network are:

● Limit access to this network; if possible, restrict physical access to


the SSP.

● Construct a private network between the SSP, control boards, and


main domain network interface. Do not allow any other systems
on this network.

● Do not advertise the existence of this network.

4-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP and Control Board Software Block Diagram

Enterprise 10000 domains Enterprise 10000 hardware


Solaris Solaris
Control board JTAG
cvcd cvcd cbe
slave to cbs

SSP
netcon server
Relays messages between
netcon sessions and cbs JTAG scan database:
cvcd or OBP. $SSPVAR/data/Ultra-\
Controls all JTAG Enterprise-10000
operations. Passes
client requests to cb_config
cbe. cb_port
Monitors cbe. domain_config
ssp_resource
straps
Listens for SNMP traps.
Forwards messages to
all connected SNMP
clients.
Other clients

snmpd edd
Uploads monitor scripts. edd.emc
Monitors Enterprise platform edd.erc
SNMP proxy agent: 10000 events. per domain edd.erc
Manages Enterprise Executes response action ssp_resource
.scripts.
10000 database for SNMP
clients. Allows
SNMP clients to
monitor and control
the database. hostview

fad
file locking fad_files
MIB configuration services
and data:
$SSPETC/snmp/
ssp_resource

machine_server ssp_startup ssp_startup.main


Services port registration requests Started by init. ssp_startup.restart_main
from netcom.server and snmpd. Starts daemons.
Monitors daemons for
Services port lookup requests from restart.
various clients.

Routes error messages to correct


“messages” file.

RPC

CBMP

SNMP

System Operation 4-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Instances of Client Programs and Daemons

An Enterprise 10000 platform may host multiple domains, where each


domain runs its own copy of the operating system, independent of any
other domains. The client programs and daemons running on the SSP
are divided into three categories, depending on how many instances
are created for a platform and its domains.

4-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Instances of Client Programs and Daemons

Platform Clients

Non-Specific Platform Clients

For certain clients and daemons, exactly one instance is created on the
SSP, without regard to the platform or the number of domains that
exist on the platform. For these clients and daemons, the setting of the
environment variable SUNW_HOSTNAME is irrelevant. One, and only
one, instance will ever be created.

Specific Platform Clients

For other clients and daemons, one instance is started for the entire
platform. Currently, because the SSP can control only one platform,
this looks the same as the previous category.

Note – However, when a client or daemon is specific to a platform, the


setting of SUNW_HOSTNAME is important. SUNW_HOSTNAME must
identify the platform. This can be accomplished by setting
SUNW_HOSTNAME to the name of the platform or to the name of any
domain on the platform.

Domain Clients
For the other clients and daemons, one instance is created on the SSP
for each active domain on the platform. Before you execute a domain
client application, you must set SUNW_HOSTNAME to the proper
domain name. (hpost and bringup are examples of this type of client.)

System Operation 4-11


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Platform Client Reference

The commands and daemons listed in the above overhead run on the
SSP and are responsible for platform-wide operations on the
Enterprise 10000.

4-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Domain Client Reference

The commands listed in the above overhead run on the SSP and are
responsible for domain-specific operations on the Enterprise 10000.
Some of these commands are dicussed later in this course.

Note – Remember that these commands do not take the domain name
as a command-line argument. Instead they use the setting of
SUNW_HOSTNAME. Make sure that it is correct before running these
commands.

System Operation 4-13


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemon Summary

The SSP daemons play a central role in the UE10000’s operation. Each
daemon will be discussed more fully later, and is described in its
corresponding man page. The SSP daemons are:

cbs The control board server provides central access to the


Enterprise 10000 control board for client programs running on
the SSP.
edd The event detector daemon uploads event detection scripts to
control boards. When one of these scripts detects an event, edd
executes a response action script.
fad The file access daemon provides distributed file access services
to SSP clients that need to monitor, read, and write to the SSP
configuration files.
machine_server Provides machine services for netcon and routes host
messages to the proper messages file.
netcon_server The connection point for all netcon clients. netcon_server
communicates with OBP through a control board protocol and
the OS using a TCP/IP protocol.
obp_helper Begins execution of OpenBoot™. obp_helper terminates
when OBP does. During execution, obp_helper provides
services to OBP such as nvram simulation, IDprom simulation,
and time of day. netcon_server is part of obp_helper.
snmpd The SNMP proxy agent daemon listens to a UDP (User
Datagram protocol) port for incoming requests, and services
the objects specified in Ultra-Enterprise-10000.mib.
straps The SNMP trap sink server listens to the SNMP trap port for
incoming trap messages and forwards received messages to
all connected clients.
xntpd The network time protocol (NTP) daemon provides time
synchronization services. This service is used to automatically
synchronize SSP and domain times.

Caution – Never run these daemons manually unless directed to do so


! by the product documentation or a support representative.

4-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

Control Board Server (cbs)


cbs is a server daemon that provides the SSP communication interface
to the Enterprise 10000. Whenever a client program running on the
SSP (such as Hostview, fan, or power) needs to access the Enterprise
10000 platform, the communication is handled by cbs. cbs
communicates directly with the control board executive (cbe) running
on the active control board. cbs converts client requests to the Control
Board Management Protocol (CBMP), which is understood by cbe.

Normal domain network traffic, such as FTP (File Transfer Protocol),


email, and so on, are handled through the normal TCP/IP interfaces
and not by cbs.

System Operation 4-15


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

The cb_reset Command


The cb_reset command resets and reboots hung control boards. It
will cause the reset of all active domains. It should only be used when
the control boards have stopped responding, or when you need to
change the control boards’ IP addresses.

The cb_prom Command


The cb_prom command is used to update the flash PROM on the
control boards. It is used when the control board’s own POST and
network load routines must be updated.

4-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

Event Detector Daemon (edd)


edd uploads event detection scripts to the Enterprise 10000 control
board by using cbs, waits for an event to be generated by the scripts,
and then responds to the event by executing a response action script
on the SSP. The event detection scripts poll various conditions within
the platform including environmental conditions, signature blocks,
power supply voltages, performance data, and so forth. Event
handling is provided by response action scripts, which are run by edd
when an event is received.

The events are transmitted by SNMP traps. It is the responsibility of


the listening application (such as edd or hostview) to detect and
determine whether to respond to an event.

Warning – Changing edd scripts incorrectly may cause physical


damage to the Enterprise 10000 platform.

System Operation 4-17


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

Event Detector Daemon (edd)


edd obtains many of its initial control parameters from the following
configuration files:

● The event response configuration files (edd.erc) specify how the


event detector will respond to events.

$SSPVAR/etc/platform_name/edd.erc provides configuration


information for the Enterprise 10000 platform.

$SSPVAR/etc/platform_name/domain_name/edd.erc provides
configuration information for a particular domain.

● $SSPVAR/etc/platform_name/edd.emc lists the events that edd


will monitor.

4-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

Event Detector Daemon (edd) Event Handling


If an event detection script detects a change requires an event, an event
message is generated and delivered to the cbs by the cbe. When it
receives the event message, cbs delivers the event to the SNMP Agent,
snmpd, which then generates a SNMP trap, as shown in the first
diagram in the above overhead.

When edd receives the SNMP trap, it determines whether to initiate a


response action. If a response action is required, the edd runs the
appropriate response action script (as a subprocess), as shown in the
second diagram in the above overhead.

As an example, in the above figures edd is shown running a response


action script for a high temperature event. While the response action
script is running, additional high temperature events may be
generated by the control board event monitoring scripts. edd does not
respond to those events (generated in response to the original high
temperature condition) until the first response script has finished.

System Operation 4-19


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

Event Detector Daemon (edd) Control

edd_cmd

You can use the edd_cmd command to turn on and off edd processing.
edd_cmd -x stop will stop edd processing, and edd_cmd -x start
will restart it.

Warning – Be careful turning off edd processing. The SSP will not be
able to respond to most requests for service from the Enterprise 10000,
such as power or over-temperature events, which could cause physical
damage to system components.

4-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

File Access Daemon (fad)


The file access daemon is used when an SSP configuration file is
updated. fad provides distributed file access services such as file
locking to all SSP clients that need to be aware of and make changes to
SSP configuration files. Once a file is locked by a client, all other clients
are prevented from locking that file until the first client releases the
lock.

fad is also used by some daemons and commands, such as bringup,


to ensure that only one copy of the command is in execution at a time.
The other copies wait, usually on file lock acceess, until the first copy
completes.

System Operation 4-21


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

Network Time Protocol Daemon (xntpd)


The xntpd daemon keeps the time of day (TOD) clocks synchronized
between the SSP, the platform, and the domains. Each domain obtains
the current time from the SSP at boot time. If the time were allowed to
vary between the platform and the SSP, control events might not be
handled correctly (improper timeouts, for example). For example,
close synchronization of the SSP and domain clocks is required to
support Dynamic Reconfiguration.

NTP works by calculating the accuracy of the internal system TOD


clocks and making regular small adjustments to each system.

The NTP, configuration is set up during SSP and domain installation.


The normal configuration has the SSP serve its local time to the
domains.

4-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

The SNMP Daemon (snmpd)


The snmpd daemon is the Enterprise 10000 SNMP platform proxy
agent. It supports the SNMP Version l set, get, and getnext requests.
It generates SNMP traps for the events detected by the cbe running
the edd rules in the control board.

Since there is no software running on the Enterprise 10000 itself,


SNMP managment is done from the SSP on behalf of the platform. The
SSP and domains still support their own individual SNMP agents if
configured to do so.

snmpd sends its traps to the SNMP trap sink server daemon (straps)
on the SSP, and to possibly other hosts and applications listening for
Enterprise 10000 SNMP events.

System Operation 4-23


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

The SNMP Daemon (snmpd)


To change how the Enterprise 10000 responds to SNMP events, you
can modify the snmpd configuration files located in the
$SSPETC/snmpd directory. These files are:

● Configuration file
$SSPETC/snmp/agt/Ultra-Enterprise-10000.snmpd.cnf

● MIB (Management Information Block) definition file (static


data)
$SSPETC/snmp/Ultra-Enterprise-10000.mib
● MIB data file (dynamic data)
$SSPETC/snmp/Ultra-Enterprise-10000.dat

Warning – Changing the SNMP responses incorrectly may cause


physical damage to the Enterprise 10000 platform by interfering with
edd event processing.

4-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

SNMP Trap Sink Server (straps)


The straps (SNMP trap sink server) daemon monitors the SNMP trap
port. When an SNMP trap is received, it forwards the trap to all
connected clients without modification.

These clients include:

● hostview

● edd

● Customer-written SNMP managers

System Operation 4-25


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Daemons

machine_server
The machine_server daemon performs several network support
functions for the Enterprise 10000 SSP daemons:

● Services TCP port registration for netcon_server

● Services UDP port registration for snmpd

● Services port lookup requests from clients

● Routes error messages to the proper domain or platform messages


file

4-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Domain Support Executables

These files reside in $SSPOPT/release/Ultra-Enterprise-


10000/os_version and are executed within a domain before the OBP is
started.

These commands should be used or modified only by your service


provider; they are normally called internally by other programs rather
than executed on the command line. For example, they include the
tests run by hpost.

Warning – Improper use of these commands may result in failure or


damage to the system. System components could be put into an
unusable state.

System Operation 4-27


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

System Operation

This section discusses the control and management of the Enterprise


10000 system. It covers:

● The Enterprise 10000 management line mode commands

● hostview – This is a GUI front-end to most SSP platform and


domain management commands.

● netcontool – This is a GUI interface to the netcon command.


netcontool simplifies the process of configuring and bringing up
netcon windows. It allows you to replace the tip control
commands used by netcon by clicking on buttons in netcontool.

4-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

The hostinfo Command

The hostinfo command allows you to determine the status of many


of the system components.

It has five operands:

● -F – Provides fan status information

● -S – Provides CPU signature information

● -h – Provides detailed CPU state information

● -p – Provides system component power status

● -t – Provides system temperature information

Most of this information (except for -S) is available from hostview.

System Operation 4-29


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

hostview

The hostview utility enables you to perform the following actions:

● Control power to the platform and system boards

● Create and delete domains

● Bring up domains

● Dynamically reconfigure the boards within a platform, logically


attaching or detaching them from a domain

● Start an SSP window for each domain

● Access the SSP log messages file for each platform or domain

● Remotely log in to each domain

● Edit the blacklist file to enable or disable hardware components


in a domain

● Start a netcon window

● Monitor the platform environment

4-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

hostview

To start hostview, run the hostview command in a window on the


SSP from the ssp account.
ssp% hostview &

Hostview does not provide any support for OS commands to be run


on the SSP or in the domains, including Alternate Pathing. It is
intended to control and display platform and domain status and
configuration.

System Operation 4-31


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

hostview Performance Considerations

You only need to run one instance of hostview for a given platform,
although you can run more than one instance at a time to work with
the same platform. You can run hostview from any SSP window
where you have logged in as user ssp, regardless of the setting of
SUNW_HOSTNAME.

Each copy of hostview requires 5 to 10 Mbytes of virtual memory and


imposes a noticable processing load on the SSP. Before running
multiple copies of hostview, which can overload the SSP, make certain
that the SSP has sufficient CPU power, real memory, and swap space
available.

Note – If you overload the SSP, you may prevent it from processing
requests from the control board.

4-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

hostview Main Window

When you start up hostview, the main window is displayed:

Power
Temperature
Fans
Failure

Support Board
Control Board
System Board

Selected board

Busses

Domain 1
(colored border)

Domain 2
(colored border)

The main window provides a graphical view of the platform boards


and buses. The system boards are named SB0 through SB15, and their
processor numbers are shown. The control boards are named CB0 and
CB1. The centerplane support boards are named CSB0 and CSB1. The
buses are named ABUS0 through ABUS3, DBUS0, and DBUS1.

System Operation 4-33


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

hostview Main Window

The system boards at the top of the display are in the order they
appear on the front of the physical platform. The system boards at the
bottom of the display are arranged in the order they appear on the
back of the physical platform.

If a system board is shown with no outline, the board is not part of a


domain and is not currently selected.

If a system board is part of a domain, a colored outline surrounds it.


The boards within a domain all have an outline of the same color.

A black outline (around any domain color outline) indicates that a


board is selected.

4-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Main Window Processor Symbols

In the main window display, the shape and background color of a


processor symbol on a system board indicates the status of that
processor. For example, a diamond on a green background indicates
the processor is running the operating system.

The shape indicates the last known state of the processo:

◆ Operating system

● hpost

■ download_helper

▲ OBP

? Unknown

System Operation 4-35


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Main Window Processor Symbols

The background color of the symbol indicates the state of the


processor.

Green Running

Maroon Exiting

Yellow Prerun (the OS is being loaded)

Blue Unknown

Black Blacklisted

Red Redlisted

White Present but not yet configured. An example is a board


that has been just powered on but not initialized.

4-36 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Selecting Items in the Main Window

You can select one or more system boards in the main window. You
can also select one entire domain in the main window. You must select
a domain or a set of boards prior to performing some operations, such
as creating a domain.

● To select a single system board, click on it with the left mouse


button. The board is then selected (as indicated by a black outline),
and any other boards are deselected. This also selects the domain
to which the board is attached.

● To select more boards, click on the additional boards with the


middle mouse button. You can also deselect a currently selected
board by clicking on it with the middle mouse button.

● The selected boards are outlined in black.

Note – If you click the right mouse button on a system board, you will
receive a small selection menu allowing you to chose power or
temperature displays for that system board.

System Operation 4-37


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Help Window

When you select a topic from the Help menu, the Help Window is
displayed:

You can select the desired topic in the upper pane. The corresponding
man page or help information is displayed in the lower pane.

4-38 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Main Window Buttons

The main hostview window contains four buttons. If an error has


occurred, one or more of these buttons corresponding to the area
showing the error is outlined in red.

● The Power button displays the Power Detail window, which


enables you to view the power status for the platform.

● The Temperature button displays the Temperature Detail window,


which enables you to view the temperature status for the boards
and components within the platform.

● The Fan button displays the Fan Detail window, which enables
you to view the status of the fans within the platform.

● When certain error conditions occur, the Failure button turns red.

Clicking on a button displays more detailed information about that


button’s area.

System Operation 4-39


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

The Failure Window

If you click on a red Failure button, a window is displayed showing


the recent error condition(s) that have occurred

The following types of error are reported by this mechanism:

● Host panic recovery in progress – The operating system


on a domain has failed and is recovering.

● Heartbeat failure recovery in progress – The SSP was


not receiving updated hardware information as expected.

● Arbitration stop recovery in progress – A parity error


or other fatal hardware error has occurred, and the domain is
recovering.

● Host reboot is in progress – The domain is being


manually rebooted.

● Power-on-bringup recovery in progress – The platform


and domains failed due to a power outage. Power has been
restored, and the system is bringing up the domains.

4-40 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

SSP Log Files

All of the domain messages, both normal and error messages for the
domain, are logged in the file:
$SSPLOGGER/domain_name/messages

where domain_name is the name of the domain for which the message
was issued. This is a copy of the domain’s /var/adm/messages file. It
is constantly updated.

Error messages for the Enterprise 10000 platform that are not specific
to a domain are logged in:
$SSPLOGGER/messages

System Operation 4-41


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Viewing a Messages File With hostview

1. Select the appropriate board.

● To view the messages file for a domain, select that domain


(system board) in the main hostview window.
● To view the messages file for the platform, make sure that no
domain (system board) is selected.

2. Choose File ➤ SSP Logs.

The SSP Logs window is displayed.

3. Edit the supplied command in the Command field, if necessary,


then choose Execute (or press Return). The selected messages file
is displayed in the main panel of the window.

4-42 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Administering Power

The SSP gives you the ability to control the power status of individual
components of the Enterprise 10000 system.

Using either the command line or hostview, you can control power to:

● The entire Enterprise 10000 platform

● Individual system boards

● Individual fan trays

● Individual power supplies

● Remote peripheral cabinets

System Operation 4-43


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

The power Command

Examples
● The power command with no options displays the status of each
power supply and external I/O power connections.
ssp% power

● To power on the entire Enterprise 10000 platform from the


command line use the power -on command.
ssp% power -on -all

The Enterprise 10000 platform does not automatically boot any


domains when powered on. The domains must be initialized by
the bringup command individually from the SSP.

● To power off the entire Enterprise 10000 platform use:


ssp% power -B -off

You will need to turn the individual breakers back on by hand. Do


not do this if you don’t have access to the system.

4-44 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

The power Command

● To power off individual system boards, use


ssp% power -off -sb n [n ...]

where n is a system board number. The power command will


return an error message if it finds the owning domain is still
active, that is, there are processors are still running the operating
system. To force platform power off without first shutting the
domain down, use the -f option.

● To power on only selected power supplies, use the -s option. See


the power man page for more information.

● To control the power of externally attached devices, use the -p


option. This example powers on the peripherals attached to the
power control units 2 and 3. The -v option will tell you the status
of the remote peripherals.
ssp% power -p 2 3 -on

Automatic Recovery From a Power Outage


If the SSP suffers a power outage but the Enterprise 10000 platform
does not, the SSP automatically returns to the proper state when it
reboots. The Enterprise 10000 (and its domains) is not affected.

If both the SSP and the Enterprise 10000 platform suffer a power
outage, after the SSP has returned to its proper state, it checks whether
the following conditions are true:

1. The SSP has been up for less than 30 minutes.

2. The Enterprise 10000 platform currently has no power.

3. The last platform status snapshot indicates that the Enterprise


10000 platform was up and running.

If all of these statements are true, the SSP assumes that the power
outage affected both the SSP and the Enterprise 10000 platform, and
attempts to automatically power on the Enterprise platform. It will not
automatically try to bringup the domains.

System Operation 4-45


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Power Control From Hostview

You can use hostview to control power within a platform.

1. Select a board in the main hostview window by clicking on it


with the left mouse button.

2. Choose Control ➤ Power to display the Power Control and


Status window. The default power command is displayed. You can
add any operands that you want.

3. Click on Execute (or type Return) to run the command in the


window. The results are shown in the main panel of the window.

Usually, after powering on the necessary components, you would


execute bringup commands for the domains that you want to boot. If
the platform loses power due to a power outage, hostview displays
the last known state of each domain before the power was lost until
power is restored.

4-46 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Monitoring Power Levels in Hostview

Click on the Power button:

The Power Status Display window is displayed, showing the state of


every power supply and controllable system component.

In this window, the bulk power supplies are named PS0 through PS7.
The individual system board power supplies are numbered 0 through
15. The individual support board power supplies are named CSB0 and
CSB1, and the individual control board power supplies are named CB0
and CB1.

System Operation 4-47


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Monitoring Power Levels in Hostview

Click on a system board or power supply in the Power State Display


and the Power Detail window for that component is displayed

The Power Detail window shows the voltage for all of the power
supplies or measurement points on the component. The power levels
are given in volts.

If a bar is green, the voltage level is within the acceptable range. If a


bar is red, the voltage level is either too low or too high. Thus, a red
bar could be short or tall, depending on the problem.

Power levels can also be monitored from the command line with the
power command and the hostinfo -p command.

4-48 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Monitoring Temperature in Hostview

You can use hostview to monitor temperature conditions for power


supplies, processors, ASICs, and other sensors located on system
boards, support boards, controller boards, and the centerplane.

Click on the Temperature button.

The Thermal Status Display window is displayed

The centerplane, support boards, controller boards, and system boards


are shown in green if their temperatures are in the normal range, and
in red if not. Too high a temperature is considered to be 80o C.

Proper temperatures are calculated using the thermal calibration


information obtained from each component when it is installed or
when the SSP is first initialized.

System Operation 4-49


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Monitoring Temperature in Hostview

To see the thermal detail for a component, click on it with the left
mouse button. The power detail window for a system board is shown.

The left panel of the system board detail shows the temperatures for
the five ASIC chips, named A0 through A4. The middle panel shows
the temperatures for the three power supplies, and the right panel
shows the temperatures for the four processors, named P0 through P3.

The temperatures are displayed in degrees Centigrade, and the values


are shown both numerically and as vertical bars. If a bar is green, the
temperature is within the acceptable range. If a bar is red, the the
temperature is either too low or too high. The line over the bars shows
how close to the alert level the current temperatures are.

The detail windows for control boards, support boards, and the
centerplane are similar.

You can get temperature information from the command line with the
hostinfo -t command.

4-50 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

The fan Command

The fan command is used to control the speed and activity of the 16
fan trays in the Enterprise 10000. Normally the fans are controlled by
the SSP, but you can override this control if necessary.

Syntax
fan [-l {front | rear}] [-t tray_list] [-p {on | off}]

fan [-s {nominal | fast}]

Usage
● Display power and speed status of the fans at the front or rear of
the system.
-l {front | rear}

System Operation 4-51


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

The fan Command

● To display power and speed of individual fan trays use the -t


operand. tray_list is a space-separated list of fan tray numbers
expressed as integers, from 0 to 15 inclusive.
-t 0 3 5 7

● To control power to all of the fan trays, use the -p operand.


fan -p {on | off}

● To set the speed of all fans, use the -s operand. All fans always
run at the same speed. nominal is the default.
fan -s {nominal | fast}

4-52 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Controlling Fans From Hostview

You can control fan power and speed from within hostview.

1. Choose Control ➤ Fan. The Fan Control and Status window is


displayed. A fan command is shown in the Command field without
any options.

2. Add the desired set of options to the fan command and click
execute (or type Return).

For example, to set the speed of the fans to high, type:


fan -s fast

You can also enter the fan command from the command line. For
more information, see the fan man page.

System Operation 4-53


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Monitoring Fans in Hostview

You can use hostview to monitor fan speeds and fan failures for the
32 fans located throughout the Enterprise 10000 platform.

Click on the Fan button.

The Fan Status Display window is displayed.

The fan trays are named FT0 through FT7 on the front, and FT8
through FT16 on the back. Each fan tray contains two fans. The color
of the fan tray symbol is green if both fans in the tray are functioning
at normal speed, amber if both fans are functioning at high speed, and
red if either fan within the fan tray has failed.

The same information can be obtained from the command line using
hostinfo -F for fan settings, and hostinfo -p for fan power status.

4-54 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Monitoring Fans in Hostview

For information on individual fan status, click on a fan tray symbol


with the left mouse button. The Fan Detail window is displayed.

The top circle indicates the inner (back) fan when you open the fan
tray, and the lower circle indicates the outer (front) fan. The color
surrounding each circle in the fan detail indicates the status of that fan.

The same information can be obtained from the command line using
hostinfo -F for fan state and speed, and hostinfo -p for fan power
status. Fan status can also be monitored with the fan command.

System Operation 4-55


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Lab

In this lab you will exercise the basic features of Hostview:

● Status of power, temperature, fans, errors, processors, and SSP


log views
● Power and fan control

● help

1. Get platform status information from the command line:


ssp:domain% hostinfo -h
ssp:domain% hostinfo -p
ssp:domain% power
ssp:domain% hostinfo -t
ssp:domain% hostinfo -F
ssp:domain% fan
ssp:domain% hostinfo -S

2. Watch as the instructor uses different forms of the power


command. Observe the hostview display to see what effect each
command has. Note that if you do not specify -all or target
board(s), power is applied to the current domain as defined by
SUNW_HOSTNAME.
ssp:domain% power
ssp:domain% power -on
ssp:domain% power -off
ssp:domain% power -on -all
ssp:domain% power -off -sb 2 3
ssp:domain% power -off all
ssp:domain% power -on -csb 0 1 -sb 0 1 2 3

3. Experiment with controlling the fans (on, off and speed) using the
fan command.

4-56 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Lab

4. Examine the SSP environment variables and locate the log files.

5. Investigate the SSP daemons by looking at their configuration files.


You can look at:

● /etc/inittab

● The edd files in $SSPVAR/etc/platform_name

● The SNMP files in $SSPETC/snmp

● The ssp acccount home directory

● The $SSPETC directory

● The $SSPVAR directory

● The $SSPOPT directory

System Operation 4-57


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Describe some of the Enterprise 10000 security issues.

❑ Explain the functions of the SSP and host daemons.

❑ Describe how the SSP and host daemons interact.

❑ Understand Enterprise 10000 error reporting.

❑ List the Enterprise 10000 SNMP interfaces.

❑ Use all of the features of Hostview:

● Domain control

● Power control

● Fan control

● Log inspection

❑ Perform most of the Hostview functions from the command line.

4-58 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
4

Think Beyond

Why does the SSP need so many daemons?

What is the advantage of edd using a rule-driven mechanism?

When would you use the command-line commands instead of


Hostview?

What does the extensive use of SNMP by the SSP daemons imply for
network monitoring?

What else could be monitored that isn’t?

System Operation 4-59


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Domains 5

Course Map
This module discusses the concept of a domain in detail and describes
how to create, delete, and manage Enterprise 10000 domains, both
through the command line and using hostview.

5-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. What kind of management do domains require?

2. What are the restrictions on configuring domains?

3. How do you access domains to control them?

4. How do you control a domain’s configuration?

5-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Objectives

Upon completion of this module, you will be able to:

● Describe a domain.

● List the requirements for a domain.

● Describe the function of and create an eeprom.image file.

● Create, destroy, and rename a domain.

● Describe domain planning issues.

● Identify the SSP domain files.

● Describe blacklisting and how to manage a blacklist.

● Describe how to work with dual control boards.

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

● Sun Enterprise 10000 System Overview Manual

● The man pages for the commands and files

Domains 5-3
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Introduction

The Enterprise 10000 has the ability to run Solaris, as do all of the
other Sun SPARC systems, it also has the unique ability to divide itself
into as many as eight separate systems.

Each separate system, called a domain, appears to Solaris as a


standalone, self-contained SPARC system. The fact that there are other
domains simultaneously sharing the Enterprise 10000 platform with
the domain is not visible. Each domain is administered and managed
as though there were no other domains present.

Domains are implemented through special capabilities of the


Enterprise 10000 hardware and special software support in Solaris and
are configured and controlled from the SSP.

Domains can be created and destroyed, and system boards can be


placed in any desired domain. Using the Enterprise 10000 Dynamic
Reconfiguration capability, system boards can be added and removed
from domains while the OS is running.

5-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Domain Configurations

The SSP enables you to logically group system boards into Dynamic
System Domains, or simply domains, which are able to run their own
operating system and handle their own workload. They appear as
separate, standalone processors to each other and to the network.

Domains can be created and deleted without interrupting the


operation of other domains.

You can use domains for many purposes. For example, you can test a
new operating system version or set up a development and testing
environment in a domain. In this way, if problems occur, the rest of
your system is not affected.

As another example, you can configure several domains to support


different departments with one domain per department. In this case,
you might reconfigure the system into one domain to run a large job
over the weekend.

You may create as many domains as you want, but only eight may be
active at one time.

Domains 5-5
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Domain Configurations

Inter-Domain Networking
You may have noticed references to Inter-Domain Networking (IDN)
in the documentation or in system messages. Please ignore these
references.

In order to be prepared for the future deployment of IDN, Sun’s


engineering groups have incorporated software facilities into current
releases of the software to support future use by IDN if or when it is
deployed. No IDN functionality exists in any current Sun software.

IDN is not a current capability of the Sun Enterprise 10000. IDN is a


future capability. Please note that engineering projects are not without
risk, so firm delivery dates are not available, and IDN functionality is
subject to significant change.

5-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Domain Configuration Requirements

You can create a domain out of any arbitrary group of system boards.
You can have from one to eight domains, with 1 to 16 entire system
boards per domain. A system board can be in only one domain at a
time. You may not split system board components across domains.

The following conditions must be met to create a domain:

● The designated board(s) is (are) present and not in use by another


domain.

● At least one powered-on board has a network interface (for


netcon support). This interface must also support network
booting to allow Solaris to be loaded.

● At least one powered-on processor is present.

● The boards have sufficient memory to support the OS.

● The name given the new domain is unique and matches the host
name of the domain to be booted (this is a netcon requirement).

Domains 5-7
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Domain Configuration Requirements

Each domain should have its own disk interface and local disk from
which it can be booted. If a domain does not have its own disk, you
must always boot it from the network.

Domains can be reconfigured while running Solaris through Dynamic


Configuration (DR). DR will be discussed in Module 9.

In support of DR, it’s usually best to keep your primary network


interface and boot device interface on the lowest numbered system
board in the domain. This reasons for this will be covered in Module 9.

Each system comes with one domain. Additional domains can be


added at any time by requesting additional domain keys, to create
additional eeprom.image files, from Sun.

5-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Domain Planning

You should have considered all of the following network requirements


before you start to configure your own Enterprise 10000 system:

All SSP to Control Board (CB) connections must be on the same


subnetwork. Generally, it should be a dedicated subnet for managing
the Enterprise 10000. In addition, to control each domain, a network
link is required from the SSP to a network interface in the domain.
This may be on the same dedicated management subnet.

All SSP to CB Ethernet connections are made using 10BASE-T with


RJ-45 connectors. Direct connections can be made with the appropriate
cable type for security purposes.

Domains 5-9
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

The eeprom.image Files

Each SPARC system contains an ID PROM (NVRAM) which provides


its hostid, serial number, and MAC (Media Access Control) address,
among other things. The Enterprise 10000 does not contain any
IDPROMs. Instead, this information is loaded from the SSP for each
domain, from files called eeprom.image files. The eeprom.image file
is a binary file that contains exactly what a real ID PROM would
contain.

Enterprise 10000 systems ship with eeprom.image files (on disk) for
the number of domains requested on the sales order, and a serial
number and key for each domain on paper. These image files are
located in:
$SSPVAR/.ssp_private/eeprom_save/eeprom.image.0
$SSPVAR/.ssp_private/eeprom_save/eeprom.image.1
$SSPVAR/.ssp_private/eeprom_save/eeprom.image.2
...

5-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating eeprom.image Files

Additional serial numbers and keys can be obtained, up to the system


limit of eight total domains, by contacting Sun.

If you must re-create your eeprom.image files, you must have the
serial number and the EEPROM (Electrically Erasable Programmable
Read-Only Memory) key that was used to create your first domain
files. This information was shipped with your system.

If you are creating an eeprom.image file for a new domain, you must
obtain a new EEPROM key and hostid from your service provider for
that domain.

If you cannot find this information and do not have backups of the
eeprom.image files, you must contact your service provider for this
information.

Domains 5-11
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating eeprom.image Files

To create new eeprom.image files:

1. Log in as user ssp on the main SSP.

2. When prompted for the SUNW_HOSTNAME, use either the platform


name or the name of an existing domain.

3. Change directories to $SSPVAR/.ssp_private/eeprom_save.

4. Create the files. The key and either a serial number or hostid must
be entered for each domain to be created.

● The first domain uses the serial number. Use the following
form of the sys_id command:
ssp% sys_id -f eeprom.image.domain_name -k key -s serial_number

where domain_name is the host name of the domain and


serial_number is the number provided with the key in the form
of 0xa65xxx.

5-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating eeprom.image Files

● Other domains use the hostid. Use the following form of the
sys_id command:
ssp% sys_id -f eeprom.image.domain_name -k key -h hostid

where domain_name is the host name of the domain and hostid


is the number provided with the key in the form of 0x80a66xxx.

The key and serial number are related; you cannot mix them
indiscriminately. An incorrect key will not allow you to create the
eeprom.image file.

Caution – You must use the -f flag to prevent the default (template)
! eeprom.image file from being overwritten.

5. Check the result with the sys_id -d command.

ssp% sys_id -d -f eeprom.image.domain_name

Domains 5-13
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating eeprom.image Files

In the following example, 49933C54C64C858CD4CF is the key. A


secondary domain is being created. The key is case sensitive.
ssp% sys_id -f eeprom.image.jackson -k 49933C54C64C858CD4CF \
-h 0x80a66e05
ssp% sys_id -d -f eeprom.image.jackson

IDPROM in eeprom.image.jackson

Format = 0x01
Machine Type = 0x80
Ethernet Address = 0:0:be:a6:6e:5
Manufacturing Date = Wed Dec 31 16:00:00 1969
Serial number (machine ID) = 0xa66e05
Checksum = 0x3f

Back up the SSP eeprom.image files to tape or disk where they can be
accessed in the event of an SSP boot disk failure. These files are located
in the $SSPVAR/.ssp_private/eeprom_save directory.

5-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating eeprom.image Files

Remember these points when you are creating eeprom.image files:

● You must delete or rename any existing eeprom.image file for the
domain for which you are making the new image file. If the file
exists and you try to recreate it, you will be given an ‘invalid key’
message.

● eeprom.image files may only be created on a main SSP.

● You will receive a checksum error message the first time a domain
starts with a new eeprom.image file. This is normal and may be
ignored.

● Any existing OBP device aliases, OBP environment parameter


settings, and nvramrc contents will be reset to the defaults in the
new eeprom.image file.

● You can see some of the environment variable parameter settings


with the strings command.

● The creation date will always read as your time zone offset from
the GMT time of January 1, 1970.

Domains 5-15
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

hostid Information

The table in the above overhead shows how the domain’s hostid, serial
number, and Ethernet MAC address are determined.

In the table, XXX in the domain addresses represents the hexadecimal


serial number assigned to the domain.

These address ranges may change in the future.

The hostid is 32 bits, in the hexadecimal format AABBCCCC, where:

● AA is the CPU type, always 80 for recent systems.

● BB is a block of vendor ID numbers assigned by SPARC


International.

● CCCC is assigned by Sun manufacturing.

5-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Obtaining Domain Status From the Command Line

A domain will be shown in either the domain_status file (if it is


activated) or the domain_history file (if it has been deleted), but not
both.

domain_status
The domain_status command displays the contents of the
domain_config file. It shows which domains may be activated (but
not which ones are active) and which system boards compose them.

The status listing has five columns:

● DOMAIN – The name of the domain.

● TYPE – The platform type. It is always Ultra-Enterprise-10000.

● PLATFORM – The name of the platform.

● OS – The Solaris release level.

● SYSBDS – Indicates which system boards make up the domain.

Domains 5-17
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Obtaining Domain Status From the Command Line

domain_history
The domain_history command displays the contents of the
domain_history file. It shows which domains have been removed but
may be re-created, and which system boards compose them.

Sample output would be:


franklin:presidents% domain_history
DOMAIN TYPE PLATFORM OS SYSBDS
domain2 Ultra-Enterprise-10000 presidents 2.5.1 1
eric Ultra-Enterprise-10000 presidents 2.5.1 1
nancy Ultra-Enterprise-10000 presidents 2.5.1 3
bozo Ultra-Enterprise-10000 presidents 2.5.1 5
addams Ultra-Enterprise-10000 presidents 2.5.1 1
kennedy Ultra-Enterprise-10000 presidents 2.5.1 5
lincoln Ultra-Enterprise-10000 presidents 2.5.1 4
carter Ultra-Enterprise-10000 presidents 2.5.1 0
bush Ultra-Enterprise-10000 presidents 2.5.1 0
washington Ultra-Enterprise-10000 presidents 2.5.1 1
franklin:presidents%

The file format and the fields are the same as for the domain_status
command.

SSP Domain Messages Files


Each domain has its own SSP messages file, named
$SSPLOGGER/domain_name/messages
where domain_name is the name of the domain.

5-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Obtaining Domain Status From hostview

1. In the main hostview window, select a board from the domain for
which you want to obtain status information.

If the boards from the desired domain are not displayed, use the
View menu to display the desired domain (or all domains).

2. Select Configuration ➤ Domain ➤ Status. A window similar


to the Domain Status window (see above overhead) is displayed.

3. Click on Execute to display the status listing.

Domains 5-19
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Switching Domains

You will have to "switch domains" if you have more than one domain
on your Enterprise 10000. Many of the SSP commands are domain-
specific, and take the identity of the domain they execute against from
the current setting of the SUNW_HOSTNAME environment variable.

Failure to set the SUNW_HOSTNAME to the proper domain before


executing a command will cause the command to execute against
some other domain, usually with a bad result for this domain.

domain_switch
The domain_switch command is a C shell alias installed by the SSP
packages that changes the value of the SUNW_HOSTNAME environment
variable and the prompt.
franklin:presidents% domain_switch new26
Switch to domain new26
franklin:new26%

5-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Switching Domains

If you leave out the domain name, you will get the following error:
franklin:presidents% domain_switch
Bad ! arg selector
franklin:presidents%

Specifying the Domain for an SSP Window


1. Open a new SSP window.

2. When you are prompted to provide a value for the environment


variable SUNW_HOSTNAME, specify the name of the domain that
you want to control and monitor from within that SSP window.

You can execute domain_switch in an existing window if you do not


want to open a new window.

Domains 5-21
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating Domains From the Command Line

Use the domain_create command to create a new domain. Run it


from a window in the ssp account. The value of the HOSTNAME variable
in the window is not important, but must specify an existing domain
or the platform name.
ssp:domain% domain_create -d domain_name -b system_board_list \
-o os_version -p platform

● domain_name – The name of the new domain. It is usually


convenient if this is also the domain’s host name.

● system_board_list – Specifies the boards that make up this domain.


These boards must be installed and not in another domain, active
or not. The board numbers are separated by commas or spaces.

● os_version – The version of Solaris that will be loaded into the


domain, such as 2.5.1 or 2.6.

● platform – The name of the Enterprise 10000 platform that contains


the boards that will make up the new domain. This is present for
when the SSP controls multiple platforms.

5-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating Domains From the Command Line

If the boards that you want to include already belong to a domain, you
must remove the boards from the owning domain using DR or remove
the domain before you can use them.

If you want to reactivate a deleted domain in a different configuration,


you must enter all of the domain_create operands.

A re-created domain is deleted from the domain_history file when it


is moved to the domain_status file.

Note – In either case, the proper eeprom.image file must exist in the
$SSPVAR/.ssp_private/eeprom_save directory.

Examples

● Creating a new domain


ssp:domain% domain_create -d new26 -b 0 3 4 -o 2.6 -p presidents
● Re-creating a domain that previously existed (the configuration is
saved in the domain_history file) exactly as it was
ssp:domain% domain_create -d old251

Domains 5-23
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating Domains From hostview

1. In the main hostview window, select the board(s) that the domain
will contain.

a. Click the left mouse button on the first board.

b. Click the middle mouse button on any additional boards.


Remember that the boards you select may not currently belong
to any other domain.

2. Choose Configuration ➤ Domain ➤ Create. The Create Domain


window is displayed.

3. Fill in the domain name.

4. If all other fields are acceptable, click on Execute. You will see the
results of the command displayed in the window, just as if you
had run domain_create from the command line.

5-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating Domains From hostview

Note that the System Boards field indicates the boards that you
selected in the main Hostview window. The default OS version
and the default platform type are also shown. Note that the
platform name defaults as well.

If hostview successfully executes the command, it displays the


message Command completed in the information panel of the
window.

Note – Hostview can execute only one domain create or remove


command at a time. If you try to execute a second create or remove
command before the first is complete, it will fail.

Domains 5-25
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Removing Domains From the Command Line

Use the domain_remove command to delete a domain. Run it from a


window in the ssp account. The value of the HOSTNAME variable in the
window must be set to the name of the domain that you want to
remove. The domain must be inactive.

The syntax of domain_remove is:


ssp% domain_remove -d domain_name

The domain_remove command performs the following tasks:

1. Checks to see if the domain is running

2. Prompts the user to save selected domain log files

3. Edits the domain_config and domain_history files

4. Makes a copy of the domain’s eeprom.image file in the


eeprom_save directory

5. Removes the domain’s directories and files from the SSP

5-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Removing Domains From the Command Line

The domain_remove command asks whether to save certain domain


files. If you expect to re-create the domain, reply Y. If not, reply N to
delete them. Even if the files are deleted, domain_create can rebuild
them; it just takes a little longer. The deleted logs and other files are
lost.

Example

franklin:presidents% domain_remove -d hayes

domain_remove: The following subdirectories contain domain specific


information such as messages files, configuration files, and hpost dump files.
You may choose to keep these directories if you still need this information.
This domain may be recreated with or without this information being saved.
/var/opt/SUNWssp/adm/hayes
/var/opt/SUNWssp/etc/presidents/hayes

Keep directories (y/n)? y


Domain : hayes is removed !
franklin:presidents%

Domains 5-27
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Removing Domains From hostview

1. In the main hostview window, select any board from the domain
that you want to remove. You only need to select one board to
identify the domain.

2. Choose Configuration ➤ Domain ➤ Remove. A window similar to


the above window is displayed.

3. If the displayed command is satisfactory, click on Execute,


otherwise, edit the command first. You will see the results of the
command displayed in the window.

5-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Renaming Domains From the Command Line

To rename an existing domain, use the domain_rename command. The


new name must be unused and the old domain must be down.

The syntax of domain_rename is


ssp% domain_rename -d old_name -n new_name

The domain_rename command will rename the proper SSP directories


and files to effect the name change. You are still responsible for
making the necessary changes to your name service files. This would
include the hosts, ethers and perhaps bootparams entries as well as
any /etc/hostname.xxx files.

Remember to set the SUNW_HOSTNAME variable properly for the new


name in the ssp window, or use the domain_switch command.

Note – Remember that the domain name and Solaris host name must
match. You may need to do a sys-unconfig in the domain before
shutting it down.

Domains 5-29
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Renaming Domains From the Command Line

Example

franklin:presidents% domain_status
DOMAIN TYPE PLATFORM OS SYSBDS
jackson Ultra-Enterprise-10000 presidents 2.5.1 2 3
new26 Ultra-Enterprise-10000 presidents 2.6 0 1
bozo Ultra-Enterprise-10000 presidents 2.5.1 5
franklin:presidents% domain_rename -d bozo -n hayes
Domain : bozo is renamed to hayes !,
NOTE: The domain boot disk name may also need to be changed
franklin:presidents% domain_status
DOMAIN TYPE PLATFORM OS SYSBDS
jackson Ultra-Enterprise-10000 presidents 2.5.1 2 3
new26 Ultra-Enterprise-10000 presidents 2.6 0 1
hayes Ultra-Enterprise-10000 presidents 2.5.1 5
franklin:presidents%

5-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Renaming Domains From hostview

1. In the main hostview window, select a board from the domain


that you want to rename by clicking on it with the left mouse
button.

2. Choose Configuration ➤ Domain ➤ Rename. A window similar to


that shown above is displayed:

3. If the displayed command is satisfactory, click on Execute,


otherwise, edit the command first. You will see the results of the
command displayed in the window.

Domains 5-31
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating a netcon Window for a Domain

Although netcon may be run without using OpenWindows or CDE, it


is much easier to use the GUI. Start netcon in an SSP window that
has its SUNW_HOSTNAME set to the proper domain name. To more
easily track which window contains the console for a given domain,
you can change the background color of the window to match that of
the domain in hostview.

1. Create the console window by using:


ssp% cmdtool -title domain_name -bg domain_color -fg black &

You could use a cmdtool, xterm or shelltool, depending on


your preferences and environment. The arguments are the same.

This provides a window title corresponding to the controlled


domain, and sets the color of the background of the window to the
corresponding hostview domain color. For a dark domain color
such as brown, you might set the foreground color to white.

Remember to set your TERM environment variable properly for the


type of window you are using.

5-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Creating a netcon Window for a Domain

The hostview domain colors are assigned to domains in the order


that they are brought up. The colors are:

● Domain 1 – white

● Domain 2 – orange

● Domain 3 – yellow

● Domain 4 – pink

● Domain 5 – brown

● Domain 6 – red

● Domain 7 – green

● Domain 8 – violet

Components not in a domain are colored grey.

2. Execute domain_switch in the window for the domain that you


want to control. The window’s initial SUNW_HOSTNAME value will
be inherited from the setting at the time you started the GUI
environment.

3. Start netcon in the window.

If the domain is up:

4. Log in to the domain as root.

5. To see all the messages issued for the domain, use the following
command:

# tail -f /var/adm/messages &

Some host domain messages are sent to the SSP console window
(but not all). This command enables you to see every message
from the domain in the domain’s netcon window.

Domains 5-33
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Bringing Up a Domain From the Command Line

Before bringing up a domain, you must ensure that all of its system
boards are powered up.

1. Ensure that the SUNW_HOSTNAME variable is set to the proper value.


You can confirm this with the env command. Don’t trust the
prompt; SUNW_HOSTNAME may not have been set with
domain_switch. If the setting is incorrect, use the domain_switch
command to set it properly.

2. Power on all boards in the domain specified by SUNW_HOSTNAME


with the power command.
ssp% power -on

3. Bring up the domain.


ssp% bringup

4. Create the new console window for the domain and start netcon.

5-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

The bringup Command

The bringup command activates a domain. It can bring it up to the


boot PROM level and start the boot process.

bringup checks to see if the domain is already running and will not
execute if it is (unless you use the -f option).

The operation of bringup is covered in Module 7, "System Boot


Process."

Domains 5-35
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

The bringup Command

The basic syntax of the bringup command is:

bringup [-f] [-A [on|off]] [-C] [boot_arguments]

● -f - Force; do not check to see if the domain is running or all its


components are powered on.

Warning – bringup -f will crash the domain if it is already running.

● -A on|off – Toggle the OBP auto-boot parameter value. This


changes its value just as if you had done a setenv auto-boot? at
the ok prompt.

● on – Boot to the OS.

● off – Stop at the OBP ok prompt.

● boot_arguments – Boot device name, -a, -s, -r, and so on.

Configuring the Centerplane


The Enterprise 10000 centerplane must be reconfigured when power is
first applied. The bringup command will automatically prompt for
permission to configure the centerplane if:

● Only one domain is configured, or

● No other domains are running, or

● The -C option is used to force centerplane configuration.

Warning – Configuring the centerplane will crash any running


domains on the entire platform.

5-36 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Bringing Up a Domain From hostview

1. Select the domain you want to bring up. Use the mouse to select
any system board belonging to the domain you want to bring up.

2. Choose Control ➤ Bringup to display the window that shows the


name of the selected domain.

3. Choose Execute to execute the bringup command.

4. After the bringup operation has completed, choose Terminal ➤


netcontool.

This command changes the SSP window to a netcon window and


displays the OBP prompt or OS prompt, depending on your
bringup command.

Warning – The force option (-f) is always assumed in hostview.

Domains 5-37
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Overview of netcon

netcon allows you to connect to an Enterprise 10000 domain as a


master console. It replaces the keyboard and monitor or serial port A
connection used by other SPARC systems. All netcon sessions are
connected through and controlled by the domain netcon_server
daemon.

Typically, you log in to the SSP machine as user ssp and enter the
netcon command. This changes the window into a netcon window
for the domain specified by the SUNW_HOSTNAME environment
variable set in the SSP window.

On the netcon command, you can also specify:

● -g for Unlocked Write permission

● -l for Locked Write permission

● -f to force Exclusive Session mode

5-38 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Overview of netcon

The netcontool command is similar to netcon. netcon differs in


that no GUI interface is provided, making it more useful for dial-in or
other low-speed network access.

If you have write permission, you can enter Solaris commands. You
can also enter special tip commands prefixed by tilde (~) to perform
the functions offered by the netcontool window.

Some of the more useful netcon control commands are:

~. – Exit from netcon

~# – Send a break (Stop-A)

~= – Switch from JTAG to cvcd mode or vice versa (covered in


Module 7)

~? – Show the status of all netcon sessions for this domain

There are also ~ commands for controlling netcon session permissions


which will be covered later in this module.

To reconnect to netcon after exiting with ~., you must reenter the
netcon command.

netcon uses two distinct paths for communicating console


input/output between the SSP and a domain: the standard network
interface and the cbe interface. Usually, when the OS is up and
running, all console traffic flows over the network. If the local network
becomes inoperable, all interactive access to the domain is lost and, for
example, telnet, rlogin, and netcon sessions hang. You may need to
use the tip commands to try to reconnect to the domain.

Domains 5-39
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Using netcontool

You can start netcontool in either of two ways:

● From an SSP window, type


% domain_switch domain_name
ssp_name:domain_name% netcontool &

Note that SUNW_HOSTNAME must be set to the domain for which


you want to bring up a netcontool window before you run the
command.

● From the hostview window, select a board from the domain for
which you want to bring up a netcon and then select Terminal ➤
netcontool.

The netcontool window is displayed. The domain name is in its


title.

5-40 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Using netcontool

In the netcontool window, choose Connect. A new netcon window


is created for the domain.

Domains 5-41
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

netcon Session Types

There are four types of netcon sessions that you can request, either
when starting netcon or later. The ~ command given is how you
change to this mode from within a netcon session. Use the buttons on
the netcontool tool bar to change netcontool window states.

● Read Only Session (~^)

The default session type. It enables you to look at the netcon


traffic for the domain, but you cannot enter commands.

● Unlocked Write (-g) (~@)

Attempts to display a netcon window with unlocked write


permission. If this attempt succeeds, you can enter commands into
the console window, but your write permission is taken away
whenever another user requests unlocked write, locked write, or
exclusive session permission for the same domain.

● If another user currently has unlocked write permission, it is


changed to read only permission, and you are granted
unlocked write permission.

5-42 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

netcontool Session Types

● If another user currently has locked write permission, you are


granted read only permission.

● If another user currently has exclusive session permission, you


cannot bring up a netcon window.

● If you are granted unlocked write permission and another user


requests unlocked write or locked write permission, you are
notified and your permission is changed to read only. You can
attempt to reestablish unlocked write permission at any time,
subject to the same constraints as your initial attempt to gain
unlocked write permission.

● Locked Write (-l) (~&)

Attempts to display a console window with locked write


permission. If you are granted locked write permission, no other
user can remove your write permission unless they request
exclusive session permission.

● If another user currently has locked write permission, you are


granted read only permission.

● If another user currently has exclusive session permission, the


command fails.

● Exclusive Session (-f) (~*)

Displays a console window with locked write permission,


terminates all other open console sessions for this domain, and
prevents new console sessions for this domain from being started.

You can change back to multiple session mode by choosing the


Rel. Write button to release write access or by choosing the
Disconnect button to end your console session. The command will
fail if any other user currently has exclusive session permission.

Any time your access permission changes due to another user


connecting and changing your permissions, netcon issues a message
informing you of your new permission status.

Domains 5-43
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

netcontool Window Configuration

Use the Console Configuration window to specify the new window


type as either Xterm, Shell Tool (shelltool), or Command Tool
(cmdtool). The netcon window is brought up in the specified type of
window. The default is Xterm.

Select the type of session in the left panel, the type of window in the
right panel, and then choose Done.

To display the netcon window, choose Connect on the netcontool


tool bar. netcon attempts to connect to your default domain or to the
domain that you specified in the Console Configuration window.

If no error occurs, the netcon window is displayed directly beneath


the netcontool tool bar. Note that these are two separate windows,
although they can affect each other. You can view messages in the
netcon window and, if you have write permission, enter commands.

Remember to set your TERM environment variable properly.

5-44 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

netcontool Buttons

The Disconnect button in the netcontool window disconnects it from


the domain and closes the console window. The netcontool tool bar
stays open.

The OBP/kadb button in the netcontool window causes a break to


the OpenBoot PROM (OBP) or kadb programs (a logical Stop-A).

The JTAG button, equivalent to using ~= from netcon, switches the


netcon communication path between JTAG and cvcd (TCP/IP).

The Lock Write, Unlock Write, and Excl. Write buttons request the
corresponding mode for the console window.

The Rel. Write button in the netcontool window releases any write
access and places the console window in read only mode.

The Status button displays information about all open netcon session
that are connected to the same domain as the current session. This can
be useful in determining which system currently has write permission
to the domain.

Domains 5-45
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Blacklisting Components

You can use the SSP blacklist feature to configure out of use any of the
following Enterprise 10000 components:

● System boards

● Processors

● Address buses

● Data buses

● I/O controllers

● I/O adapter card

● System board memory

● Memory DIMM groups

● Enterprise 10000 half-centerplane

5-46 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Blacklisting Components

● Port controller ASICs (Application Specific Integrated Circuits)

● Data buffer ASICs

● Coherent interface controller ASICs

You may want to configure out parts for benchmarking purposes,


error isolation, configuration testing, or configuration changes.

You would not use the blacklist feature in normal operation.

Domains 5-47
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Using the blacklist

You would normally use blacklist on a component you believe may


be failing intermittently, or failing only after the system is booted.

If a component failure is detected by hpost (during bringup), the


failing component is automatically configured out of the system. This
component, however, is not blacklisted. You must blacklist the
component yourself if you do not want it checked during each
bringup. Once you blacklist a component, it is not checked again.

To blacklist a component, edit the blacklist file with a text editor,


and include the component. When a domain runs POST, hpost reads
the blacklist file and automatically configures out the components
specified in that file. Changes that you make to the blacklist file do
not take effect until bringup is run again on the affected domains.

The default blacklist is $SSPVAR/etc/platform_name/blacklist,


where platform_name is the name of the Enterprise 10000 platform. A
description of possible blacklist file contents and their format can be
found in the blacklist man page. The blacklist file location can be
changed in the .postrc file.

5-48 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Blacklisting Boards and Buses With hostview

1. In the hostview window, select Edit ➤ Blacklist File.

The Blacklist Edit window is displayed.

2. Select the boards or buses that you want to blacklist.

To select a only single component, click on that component with


the left mouse button. To toggle the selection status of a
component without changing the status of any other component,
click on that component with the middle mouse button.

The selected components are displayed in black.

3. To save the changes, choose File ➤ Save.

4. To exit the Blacklist Edit window, choose File ➤ Close.

Domains 5-49
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Blacklisting Processors With hostview

1. Choose Edit ➤ Blacklist File.

The Blacklist Edit window is displayed.

2. From the Blacklist Edit window, choose View ➤ Processors.

The Blacklist Edit window displays the processor view.

3. Select the processors that you want to blacklist.

The selection mechanism is the same as for buses or system


boards.

4. To save the changes, choose File ➤ Save.

5. To exit the Blacklist Edit window, choose File ➤ Close.

5-50 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Clearing the Blacklist File

From the command line, you can edit the file with a text editor and
delete its contents, or just delete the file.

From hostview:

1. Choose Edit ➤ Blacklist File. The Blacklist Edit window is


displayed.

2. From the Blacklist Edit window, choose File ➤ New.

3. From the Blacklist Edit window, choose File ➤ Close.

This saves a new, empty, blacklist file.

Domains 5-51
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Processor Sets

Processor sets are a feature provided by Solaris 2.6 that add the ability
to "fence" or isolate groups of CPUs for use by specially designated
processes. This allows these processes to have guaranteed access to
CPUs that other processes, including the system itself, cannot use. A
processor may belong to only one processor set at a time.

Processor sets differ significantly from processor binding (the pbind


command). The pbind command designates a single specific processor
that the process must run on, but other processes can continue to use
that processor. With processor sets, the CPUs are dedicated to the
processes that are assigned to the set.

System CPUs can be grouped into one or more processor sets by the
psrset -c command. These processors will remain idle until processes
(technically, LWPs) are assigned to them by the psrset -a command.
Processors can be added and removed from a processor set at any time
with the psrset -a and psrset -r commands, respectively. Processor
set definitions can be viewed with the psrset -p command.

5-52 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Processor Sets

Once a processor set has been created, processes are assigned to it by


the psrset -b command. These processes may run only on the
processors in the specified processor set, but they can run on any
processor in the set. Any child processes or new LWPs created by a
process in a processor set will be set to run in the same processor set.
Processes are unbound from a processor set with the psrset -u
command.

Only the root user may create, manage and assign processes to these
processor sets.

The psrset -q command displays the processes bound to each


processor set, and the psrset -i command shows the characteristics
of each processor set.

As well as manually created processor sets, the system may also


provide automatically defined system processor sets. These processor
set definitions can not be modified or deleted. Any user can assign
processes to system-defined processor sets, but other processes will
not be excluded from these CPUs. A processor may be in a system or
non-system processor set, but not both at the same time.

Processor sets can also be managed from within a program by using


the psrset(2) system call.

If a processor set is completely removed from the system by DR


detach, the processes using the processor set will be unbound, and
able to run on any CPU in the system. If a large part of a processor
group is removed, you may need to use psrset -a to add additional
processors to the processor set.

The system does not remember processor set configuration across DR


oeprations. If you DR attach a board to the system that had been in
one or more processor sets, you must use psrset -a to add the CPUs
back to that set. System-defined processor sets will be automatically
restored.

Domains 5-53
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Lab

1. Use telnet to connect to the lab’s main SSP as the ssp account.
On that SSP:

a. Backup then create an eeprom.image file for your assigned


domain, as shown beginning on page 5-12.

b. Inspect the SSP system files and configuration.

2. Create a domain from your assigned system board.

3. Use domain_status to see your domain.

4. Run bringup -A off to start OBP. If yours is the first domain to


start, you will be asked to configure the centerplane.

5. Start netcon and inspect the domain from the OBP. Look at the:

a. OBP environment variables with printenv

b. devalias list

c. OBP device tree (cd / at the ok prompt)

Optional:

6. Remove the domain.

7. Use domain_history to locate it.

8. Add the domain back again and run bringup -A off again.

Leave the domain at the ok prompt.

5-54 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Describe a domain.

❑ List the requirements for a domain.

❑ Describe the function of and create an eeprom.image file.

❑ Create, destroy, and rename a domain.

❑ Describe domain planning issues.

❑ Identify the SSP domain files.

❑ Describe blacklisting and how to manage a blacklist.

Domains 5-55
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
5

Think Beyond

Why can’t a system board’s components be divided among domains?

Why might you want extra domains in the domain_history file?


What could you use them for?

When would you use hostview versus the command line?

Why is there a blacklist file? What other uses does it have?

When would two control boards be essential?

How could you minimize the disruption of switching between control


boards?

5-56 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Installing Solaris in a Host Domain 6

Course Map
This module describes how to install and configure Solaris for an
Enterprise 10000 domain. These instructions assume that you will have
open both an SSP window and a netcon window. It also assumes that
there is not a local CD-ROM drive attached to the domain being
installed.

6-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. How is installing Solaris in an Enterprise 10000 domain different


from installing it on any other SPARC system?

2. What needs to be installed along with Solaris itself?

3. Are there any restrictions in the process?

Note – This module covers both the Solaris 2.5.1 and 2.6 releases.
Where no distinction is made, the material applies to both releases.

6-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Objectives

Upon completion of this module, you will be able to:

● Describe the Enterprise 10000 Solaris environment.

● Install Solaris on an empty disk in an Enterprise 10000 domain.

● Install the appropriate SMCC Updates CD-ROM packages.

● Configure factory pre-installed Solaris in a domain.

● Perform the default configuration for NTP in a domain.

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Solaris installation documentation

● Solaris release notes for the level of Solaris you are installing

● The man pages for the commands and files

Installing Solaris in a Host Domain 6-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

The Enterprise 10000 Solaris Environment

Some reminders about the Enterprise 10000 Solaris environment:

● Each domain has its own, independent copy of Solaris. The


domains share nothing with each other.

● The minimum level of Solaris required for a domain is 2.5.1


Hardware 4/97 or Solaris 2.6 Hardware 5/98. Solaris 2.6
Hardware 3/98 will run in a domain without AP and DR.

● The platform architecture type is sun4u1. Some commands use the


default Ultra sun4u interfaces, others use specific Enterprise 10000
sun4u1 interfaces. There are sun4u1 subdirectories in system
directories such as /platform and /usr/platform.

● The Solaris 2.5.1 4/97 installation process installs a large group of


patches on completion of the basic Solaris installation. The
maintenance patches are integrated into the Solaris 2.6 releases.

● All applications that are not platform dependent should run in a


domain without change.

6-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

The Enterprise 10000 Solaris Environemnt

● Because a domain does not have a keyboard and monitor, or serial


ports, it is controlled through a netcon window from the SSP. The
OS sees the console as interface ttya.

● The domain name and host name for a domain must be the same,
and cannot duplicate any other host name in the name service
domain.

Installing Solaris in a Host Domain 6-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Configuring the SSP as a Boot Server

Since there are usually no CD-ROM drives installed in the domains,


you must set up the SSP as the boot server to load Solaris over the
network for each domain you are going to install.

On the main SSP:

1. Log in as root.

2. Add the domain host name and IP address to /etc/inet/hosts

3. Add the Ethernet MAC address to /etc/ethers.

4. Insert the Solaris CD-ROM in to the CD-ROM drive. After


inserting the CD-ROM, wait for the system to mount it.

Note – Remember that the SSP packages configure the SSP to have the
system automatically share every CD-ROM.

If you do this often, you may want to use setup_install_server


to copy the installation image to your hard drive.

6-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Configuring the SSP as a Boot Server

If the host name for the new domain is a subset of an existing host
name (such as starfire and starfire-ssp), in the
/etc/inet/hosts file, you must ensure that the new domain entry
precedes all other host and SSP entries.
#
# Internet host table
#
140.55.22.87 starfire
127.0.0.1 localhost
140.55.22.88 starfire-ssp
140.55.22.89 otherhost

Caution – If this type of new domain name entry follows the other
host or SSP entry, the following step will not work correctly.

5. Set up the domain as an install client.

For Solaris 2.5.1 use:


ssp# cd /cdrom/solaris_2_5_1_hw497_sparc/s0
ssp# ./add_install_client new_domain_name sun4u1

For Solaris 2.6 use:


ssp# cd /cdrom/sol_2_6_598_sparc_smcc_server/s0/Solaris_2.6/Tools
ssp# ./add_install_client new_domain_name sun4u1

You might want to leave the boot server information configured to


do maintenance in the domain, since you can’t boot from a CD-
ROM to fix problems.

6. Double check that the CD-ROM has been shared with NFS share
options ro and anon=0. Correct these if it has not been, or your
boot will hang.

Installing Solaris in a Host Domain 6-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Preparing the Domain

Note – The following procedure requires that the domain being


installed has already been created.

The steps that follow modify the usual suninstall procedures:

1. After the SSP is set up to be a boot server, log in to the SSP as user
ssp.

2. When prompted, specify the name of the domain that you want to
install.

3. In an SSP window, boot the domain to the OBP ok prompt by


typing:
ssp% bringup -A off

Remember that if this is the first domain to be brought up, you


will be prompted for permission to configure the centerplane.
Reply y.
This bringup will configure the Centerplane. Please confirm (y/n)? y

6-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Preparing the Domain

4. Start a netcon session for your domain. After a few minutes, the
OBP ok prompt will be displayed.
ssp% netcon

During the period while the OBP is initializing, you will see no
activity in netcon. The delay could take anywhere from 30
seconds to several minutes. This is normal. At the end of OBP
initialization, you will see the OBP banner and ok prompt in your
netcon window.

The extent of the delay depends on the size of the hardware


configuration. The OBP must probe every processor, memory
bank, and interface card slot on every system board in the domain,
which could take a while.

5. Check for duplicate devalias entries in the domain’s OBP. The


installation process will hang if there are duplicate device aliases
in the OBP. If any aliases are multiply defined, you must remove
them.

If there are duplicate entries in the devalias list, remove them


with nvunalias. For example, if there is an extra net alias, use:
ok nvunalias net

The extra alias will not disappear from the devalias listing until
the next OBP reset is performed, but the alias has been deleted.

6. Create an alias with nvalias for the network interface that you
will boot from. Use show-nets to help determine the proper alias.
ok nvalias net ...

Installing Solaris in a Host Domain 6-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Preparing the Domain

7. From the netcon window, boot the domain from the SSP by
typing:
ok boot net

If appropriate, do not forget to change the boot-device OBP


environment variable setting.

Note – If the domain hangs after you see the Solaris release and
copyright boot messages, this means that you forgot to share the CD-
ROM with NFS option anon=0.

If you use set-defaults to reset all of the OBP variables to their


default values, you may still use the nvrestore command to recover
the prior contents of the nvramrc parameter, restoring your device
aliases.

6-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Installing Solaris

The domain OS installation process is almost identical to a normal


Solaris installation. If you are installing on top of an existing OS, use
these steps as a guide.

1. At the prompt, select the appropriate terminal type, most likely


Sun Command Tool.

2. Proceed with the normal Solaris installation process, replying to


the prompts as necessary.

3. When the Software dialog is displayed, select the Entire


Distribution plus OEM Support option, then press F2.

4. When the Disks dialog is displayed, choose the disk on which the
software is to be installed, then press F2 to continue.

Note – If you choose a drive other than the one designated in the OBP
boot-device parameter, a warning message will appear later in the
installation process. Make sure that you update the OBP boot-device
parameter before booting the domain.

Installing Solaris in a Host Domain 6-11


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Installing Solaris

5. When the Automatically Layout File Systems? dialog is displayed,


press F4 to manually layout file systems.

6. When the File System and Disk Layout dialog is displayed, press
F4 to customize the layout.

7. In the Customize Disk screen, set up the disk partitions for the
root disk. Two disks are necessary if you are installing on disks
smaller than 2 Gbytes. If two disks are used, at least / and /usr
must be on the device specified in the OBP boot-device alias.

Note – If you plan to encapsulate you boot disk with Enterprise


Volume Manager, you will need to reallocate these partitions
accordingly.

Use the following minimum sizes for your disk partitions:.

0 / 64 Mbytes Minimum recommended size


1 swap 512 Mbytes
2 overlap Actual total disk size
3 /var 512 Mbytes This may have to hold large panic dumps
4 2 Mbytes Reserved for the Alternate Pathing or Solstice
DiskSuite products
5 /opt 425 Mbytes This may be larger depending upon remaining
space
6 /usr 512 Mbytes Asian-language users may need more space here

Press F2 when you are done.

When setting up on a 4-Gbyte drive, the sizes recommended are 128,


1024, 4 Gbytes, 1024, 2, 1024, 512 and 376 for /export/home in slice 7.

6-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Installing Solaris

8. When the Profile dialog is displayed, press F2 to confirm your


profile selections.

If you installed the operating system on a drive other than the


one designated as the OBP boot-device, you will receive a
warning similar to the following:
Warning
You have an invalid disk configuration because of the condition(s)
displayed in the window below. Errors should be fixed to ensure a
successful installation. Warnings can be ignored without causing the
installation to fail.

> To go back and fix errors or warnings, select Cancel.


> To accept the error conditions or warnings and continue with the
installation, select Continue.

WARNING: The boot disk is not selected or does not have a “/” mount
point (c0t3d0)

Ignore this warning, but remember that you will have to update
the OBP boot-device parameter. Press F2 to continue.

9. When the Begin Installing Solaris dialog is displayed, select Do not


reboot. Press F2 to begin installation. This step, which installs the
Solaris software and a large patch cluster, may take 80 minutes or
more to complete.

When the installation is complete, the Enterprise 10000 host will


display a # prompt on the console.

10. If the eeprom boot-device parameter is not correct, use the


eeprom command to set it properly.
# /usr/platform/sun4u1/sbin/eeprom boot-device=bootdisk

where bootdisk is the OBP devalias name for your boot disk.

11. Enter init 0 to take the domain to the OBP ok prompt.

Installing Solaris in a Host Domain 6-13


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Booting the Domain for the First Time

1. To remove the Solaris CD-ROM from the SSP CD-ROM drive, log
in to the SSP as root and unshare and eject it.

2. As user ssp, in an SSP window, bring up the new domain with the
bringup command.
ssp% bringup -A on

3. If this is the first domain to be brought up, you will be prompted


for confirmation to proceed with configuring the centerplane.
Reply y.
This bringup will configure the Centerplane. Please confirm (y/n)? y

6-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Booting the Domain for the First Time

4. Start a netcon session after the bringup completes. The next steps
are completed from the host domain console.

5. Respond to the prompts from sys-config for Solaris


configuration information. You may be asked for the following
items, depending on your naming service:

● IP address and netmask

● Name service (such as none, NIS, NIS+)

● Domain nam,e if any

● Time zone

● Time

6. Provide the new root password.

7. Enter the host name of the SSP and its IP address. Make sure that
you specify the SSP host name that corresponds to the domain
connection subnet. The name will be saved in /etc/ssphostname.

Press Enter if the system has properly located the SSP, otherwise,
enter the SSP’s host name.
Please enter hostname of SSP for Enterprise 10000_host [name-ssp]: sspname

The only times that you will be prompted for this information are
the first time a domain boots after installation or after the
ssp_unconfig command has been run in the domain.

8. Verify or correct the IP address of the SSP.


SSP Host Name: namessp
SSP IP Address: nnn.nnn.nn.nn

Is this correct (y or n):

The domain now finishes the boot sequence and provides the root
login prompt.

9. Install any appropriate additional patches.

Installing Solaris in a Host Domain 6-15


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Installing Packages From the 2.6


SMCC Server Supplement CD-ROM

There are several packages that must be installed from the SMCC
Server Supplement CD-ROM to finish the domain software
installation.

To install the SMCC Supplement packages:

1. On the SSP, insert the SMCC Server Supplement CD-ROM. Wait


for the system to mount it.

2. When the CD-ROM is on the SSP, to share the CD-ROM, log in as


root and make the CD-ROM available if it is not automatically
shared.
ssp# share -F nfs -o ro /cdrom/supp_sol_2_6_598_smcc

6-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Installing Packages From the 2.6


SMCC Server Supplement CD-ROM

3. Log in to the domain as root from netcon.

4. Access the CD-ROM through /net or NFS mount the /cdrom


directory to the domain if the CD-ROM is not directly attached to
the domain.
# mkdir /cdrom
# mount ssp_name:/cdrom/supp_sol_2_6_598_smcc /cdrom

5. Using pkgadd, install at least the following packages from the


CD-ROM:

Package Description

SUNWehea Header file extensions


SUNWabhdw SMCC Hardware AnswerBook
SUNWeman Enterprise Solaris man pages

# cd /cdrom/SMCC
# pkgadd -d . SUNWehea SUNWabhdw SUNWeman

6. Unmount the CD-ROM from the domain. Do not delete /cdrom if


the CD-ROM was mounted by the system.
# cd /
# eject cdrom
# rmdir /cdrom

7. Apply any appropriate patches.

8. Reboot the domain.


# init 6

9. To remove the Solaris CD-ROM from the SSP CD-ROM drive, log
in to the SSP as root and unshare and eject it.

Installing Solaris in a Host Domain 6-17


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Installing Packages From the 2.5.1


SMCC Hardware Updates CD-ROM

There are several packages that must be installed from the SMCC
Hardware Updates CD-ROM to finish the domain software
installation.

To install the SMCC Updates packages:

1. At the SSP, insert the SMCC Hardware Updates CD-ROM. Wait for
the Volume Manager to mount it.

2. If the CD-ROM is on the SSP, to share the CD-ROM, log in as root


and make the CD-ROM available if it is not automatically shared.
ssp# share -F nfs -o ro /cdrom/upd_sol_2_5_1_hw_497_smcc

6-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Installing Packages From the 2.5.1


SMCC Hardware Updates CD-ROM

3. Log in to the domain as root from netcon.

4. NFS mount the /cdrom directory to the domain if the CD-ROM is


not directly attached to the domain.
# mkdir /cdrom
# mount ssp_name:/cdrom/upd_sol_2_5_1_hw_497_smcc /cdrom

5. Using pkgadd, install at least the following packages from the


CD-ROM:

Package Description

SUNWabdr Dynamic Reconfiguration Answer Book


SUNWehea Header file extensions
SUNWprtnu Processor partition utilities
SUNWxntp Network Time Protocol
SUNWabhdw SMCC Hardware AnswerBook
SUNWeman Enterprise Solaris man pages

# cd /cdrom/SMCC
# pkgadd -d . SUNWabdr SUNWehea SUNWprtnu SUNWxntp SUNWabhdw SUNWeman

6. Unmount the CD-ROM from the domain. Do not delete /cdrom if


the CD-ROM was mounted by Volume Manager.
# cd /
# eject cdrom
# rmdir /cdrom

7. Install any appropriate patches.

8. Reboot the domain.


# init 6

9. To remove the Solaris CD-ROM from the SSP CD-ROM drive, log
in to the SSP as root and unshare and eject it.

Installing Solaris in a Host Domain 6-19


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Finishing the Installation - Solaris 2.6

1. After the domain has rebooted, configure NTP for your local
network. To use the default configuration, create a file called
ntp.conf in /etc/inet. It should contain the following:
server ssp_domain_hostname prefer
server 127.127.1.0
fudge 127.127.1.0 stratum 9
#
driftfile /etc/inet/ntp.drift
#
disable auth
controlkey 1
requestkey 1
authdelay 0.000793
#
precision -18

2. Install and configure the Alternate Pathing software.

3. Reboot.

6-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Finishing the Installation - Solaris 2.5.1

After the domain has rebooted:

1. Adjust the xntp configuration for your local network. To use the
default configuration, update /etc/opt/SUNWxntp/ntp.conf
by:

a. Changing the line:

server 127.127.1.7

to

server 127.127.1.9

b. Inserting the following line:


peer ssp_domain_hostname

For more information, see Appendix A.

2. Install and configure the Alternate Pathing software.

3. Reboot.

Installing Solaris in a Host Domain 6-21


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Preinstalled Domain Software

Solaris may be preinstalled in one of your domains for you when your
Enterprise 10000 is shipped from the factory. If this is the case, you can
boot the domain immediately, without going through the Solaris
installation process. The SMCC Supplement or Updates packages will
have been installed.

Just boot from the identified drive, which is usually c0t0d0s0.

You will still need to respond to the normal suninstall prompts and
then the ssp_config prompts. After configuring NTP, the domain will
be ready for use.

Make sure that your domain’s name service is properly updated for
the Enterprise 10000 control boards, domains and SSP.

Once again, make sure that you have installed any appropriate
patches.

6-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Lab

Using this module as your guide:

1. Install Solaris in the domain if it has not already been done.


Remember to:

● Use bringup to activate the domain and start netcon.

● Confirm the domain ethers and IP addresses and use


add_install_client to set up for the domain on the lab SSP.

● Create any necessary OBP boot disk and net devaliases (See
Appendix B for help)

● Start the install process by boot net for the domain.

● Install the Entire Distribution plus OEM Support.

● Partition the boot drive as suggested.

● If necessary, create and set the proper boot-device alias in the


OBP.

● Respond to the SSP name prompt when the domain reboots.

● Install the SMCC Supplements or Updates packages.

2. Configure NTP, then reboot.

3. Inspect the domain’s Solaris environment.

4. Note which daemons are running that are specific to the


Enterprise 10000.

5. Inspect the NTP configuration and make sure that it is correct.


Check /var/adm/messages for the synchronization messages.

6. Shut down and then reboot the domain.

Installing Solaris in a Host Domain 6-23


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Describe the Enterprise 10000 Solaris environment.

❑ Install Solaris on an empty disk in an Enterprise 10000 domain.

❑ Install the appropriate SMCC Updates CD-ROM packages.

❑ Configure factory pre-installed Solaris in a domain.

❑ Perform the default configuration for NTP in a domain.

6-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
6

Think Beyond

Why would you not want to have a CD-ROM drive attached to your
domain?

Should you use the SSP as your boot server if you load the OS often?

Are there any special issues involved with installing patches on the
Solaris domain? On the SSP?

Should you synchronize NTP to an external clock? Why?

Installing Solaris in a Host Domain 6-25


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
System Boot Process 7

Course Map
This module provides an explanation of the boot process. It discusses
the environment variables, executables, and order involved in the boot
processes of both the SSP and an Enterprise 10000 domain.

7-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. How does an Enterprise 10000 domain boot? What’s different from


other Enterprise servers?

2. How does the SSP participate in the boot process?

3. What do the SSP daemons do during boot? What files are


involved?

4. What are the new Enterprise 10000 OBP environment parameters?

7-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Objectives

Upon completion of this module, you will be able to:

● Describe the steps of the SSP boot process.

● List the SSP daemons and their functions.

● Explain how to boot a domain.

● List the steps of the domain boot process.

● Identify the files used in the domain boot process.

● Explain the domain hardware configuration process.

● Describe the purpose of the eeprom.image files.

● Decode SBus slot and processor physical locations.

● Describe the OBP environment variables specific to the


Enterprise 10000.

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Sun Enterprise 10000 System Overview Manual

● Sun Enterprise 10000 System Hardware Instalation and De-Installation


Guide

● man pages for the commands, daemons and files

System Boot Process 7-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The SSP Boot Process

Prepare the SSP


Power up the SSP monitor, CPU, and all external peripherals.

SSP Boot Process


The SSP runs its POST self-tests, then loads and begins executing the
Solaris kernel in the normal boot process.

/sbin/init runs, which loads /etc/inittab and executes the


commands specified in it.

On the main SSP, inittab includes a new line at the end to start the
ssp_startup script as a respawn process. The line is:
sp:234:respawn:su - ssp -fc /etc/opt/SUNWssp/ssp_startup.sh 15 \
>/dev/null 2>&1 </dev/null # SUNWsspr

7-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The SSP Boot Process

Daemon Start Up
The ssp_startup script starts up the two platform daemons: edd and
snmpd. It then starts up the non-domain (platform) daemons in the
proper order (although the order of startup is not specified here): cbs,
machine_server, fad, and straps.

edd uploads the event detection scripts to the Enterprise 10000 control
board(s), waits for an event to be generated by the scripts running on
the active control board, and then responds to the event by executing
the proper response action script on the SSP.

System Boot Process 7-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The ssp_startup Script

Each time the main SSP boots, init runs the SSP startup script
$SSPETC/ssp_startup.sh. This startup script checks the SSP
environment for the availability of certain files and the state of the
Enterprise 10000 system itself, sets various environment variables,
and then starts the various SSP daemons.

The ssp_startup script is responsible for starting the SSP system


daemons in the order appropriate for the current configuration. It is
run automatically on each reboot of the main SSP directly from
/etc/inittab.

Caution – Do not run the ssp_startup command from the


! command line. This may cause multiple instances of the SSP daemons
to run, potentially crashing the Enterprise 10000 platform.

7-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The ssp_startup Script

The SSP daemons to be started at SSP boot time are specified in the
ssp_startup.main file. Each of these daemons is discussed in more
detail elsewhere. This list is provided here for reference.

● machine_server

● fad

● cb_reset

● cbs

● straps

● snmpd

● edd

● obp_helper

● netcon_server

Restartable Daemons
Many of the SSP daemons are monitored and restarted if they die,
because they are essential to the operation of the Enterprise 10000
system. These are specified in the ssp_startup.restart_main file
and are checked every 30 seconds. These daemons are:

● machine_server

● fad

● cbs

● straps

● snmpd

● edd

System Boot Process 7-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Domain Bringup Flow

SSP Domain

On the SSP as user ssp


Set SUNW_HOSTNAME to proper domain
Issue power -on to domain, if necessary

bringup

Kill old obp_helper and netcon_server daemons


Run hpost, using .postrc and blacklist
Configure centerplane, if necessary

POST tests
*.elf

Start obp_helper

download_helper
OBP
eeprom.image
Start netcon_server TOD value

netcon
ok prompt

Communication path

cbs cbe domain


RPC JTAG+

7-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The bringup Command

The bringup command is run from the SSP to configure and boot the
current domain as defined by the SUNW_HOSTNAME environment
variable. It starts the same process that the reset command from the
ok prompt does for other Sun SPARC systems.

The bringup command is responsible for coordinating all the other


commands and processes required to configure, test, and boot the
domain. It runs on the SSP, coordinating the activities of the domain
being configured through the control board.

The bringup command must be run from a window of the ssp


account on the primary SSP. A separate bringup command is required
for each domain to be booted or configured.

System Boot Process 7-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The bringup Command

Syntax
bringup [-f] [-F] [-p proc] [-Q boot_proc] [-gvCL] [-A {on | off}]
[-l level] [-X blacklist_file_pathname] [boot_args]

● -f – Force; boot even if the domain is already running or a domain


component is powered off.

● -F – Reserved for edd automatic reboot scripts.

● -p – Specifies the preferred boot processor number.

● -Q – Run a shortened version of POST for crash recovery.

● -g – Allows the -l, -p, -C and -X options to be passed to hpost.

● -v – Requests hpost to produce detailed progress messages.

● -C – Configure the centerplane, crashing all running domains.

● -L – Requests hpost to use the -s and -v10 options, and to send


all logging to syslog.

● -A – Toggle the OBP auto-boot? environment variable to either


true (on) or false (off).

● -D – Toggle the OBP diag-switch? environment variable to either


true (on) or false (off).

● -l – Sets the diagnostic level for this run of hpost.

● -X – Specifies the path name of the blacklist file.

● boot-args is passed to the boot command, and specifies the boot


parameters such as the device alias to boot from, -r, -s, and so on.

Options -g, -l, -p, -C and -X are passed directly to hpost and are not
used by bringup.

Examples:
ssp:domain% bringup -A off
ssp:domain% bringup -A on net -s

7-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The bringup Command

Execution
When bringup is run, it:

1. Makes sure the necessary environment variables are set.

● $SSPVAR

● $SUNW_HOSTNAME

● $SSPETC

2. Parses command-line arguments and builds variables.

3. Checks that the domain is powered up using power unless -f is


specified.

4. Ensures no other bringup command is running for this domain by


using check_host. If one is, bringup quits unless -f was
specified.

5. Configures the centerplane if -C is specified or no other domain is


running.

6. Serializes the centerplane configuration if necessary.

7. Kills any existing obp_helper and netcon_server processes for


this domain.

8. Runs hpost.

9. Saves the boot processor ID in


$SSPVAR/etc/platform_name/domain_name/bootproc.

10. Starts obp_helper for the domain.

11. Starts netcon_server for the domain.

System Boot Process 7-11


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The hpost Command

hpost stands for host POST; it coordinates the POST tests that run in
the host (domain).

Warning – Never run hpost directly. You can damage your


equipment by using incorrect options.

Syntax
hpost [-?] [-?postrc | -?blacklist | -?level | -?verbose]

hpost [-Ccfnqs] [-D[boardmask,][path]] [-d "comment"] [- g[path | none]]


[-Hboardmask,refproc] [-i[proc]] [-Jbus_mask] [-JJbus_mask] [-llevel]
[-pproc] [- Qproc[,skipmask]] [-R{redlist_file | none}] [-vlevel]
[-X{blacklist_file | none}] [-W] [-Zproc]

Options can be passed to hpost from the bringup command.

7-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The hpost Command

The hpost command runs on the SSP and directs the activities of the
domain configuration and initialization process through
communication with the Enterprise 10000 control board.

The hpost command performs normal system POST testing,


configures out blacklisted components, and builds the device tree used
by Solaris to locate and access all system internal and external
peripheral devices.

The POST program identifies and tests the physical components of the
uninitialized Enterprise 10000 hardware assigned to the domain,
configures what is operational and not blacklisted into an initialized
system, and prepares for the OBP.

The operation of the hpost command can be modified by the contents


of the .postrc file.

System Boot Process 7-13


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The hpost Command

Functions
When hpost is run, it:

1. Ensures that the domain is ready to run OBP.

2. Identifies failing components.

3. Removes any failing components from the configuration.

4. Configures around any missing or failed components as necessary.

5. Tests the hardware components to give confidence that the


configured hardware functions correctly.

6. Prepares the initial device tree for the OBP.

When appropriate, it also:

● Ensures a replacement system board can be safely added to the


system and domain while the target domain is up and running.

● Dumps JTAG (internal hardware) state after a system crash.

● Enables hardware recording of errors.

7-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

hpost Control Files

.postrc

Warning – Be very careful with the contents of .postrc. You can


seriously damage your equipment by using incorrect directives.

.postrc is an ASCII text file containing configuration and execution


directives for hpost. It controls the options made to POST. The
keywords are case sensitive. hpost or bringup command-line
arguments override .postrc directives.

There is no default .postrc file, and you are not required to have one.

System Boot Process 7-15


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

hpost Control Files

hpost looks for .postrc in these places (and in this order). It uses the
first one it finds.

1. The current working directory

2. $SSPVAR/etc/platform_name/domain_name

3. The user’s home directory (default location is ~ssp/.postrc)

Warning – There are numerous .postrc directives, which are


explained in the postrc man page. Most are reserved for internal Sun
use and should not be used otherwise. Permanent damage to system
components could result.

The directives most likely to be used in a production system are:

● logfile [path] – Enable POST log file generation. This option is


on by default. The default log location is:
$SSPVAR/adm/domain_name/post/postMMDD.HHMM.log
● mem_board_interleave_ok – Improves performance by allowing
the system to use eight-way memory interleaving, making real
memory accesses faster, but this can prevent DR from removing
the system board. Only use this directive if you will not be
removing any system boards from this domain with DR.

There are many other directives, but they should not be used without
the guidance of Sun support or engineering personnel.

7-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

blacklist and redlist Files

During domain hardware configuration, hpost uses the blacklist


and redlist files if they are present. These files tell hpost not to
configure specific domain hardware components.

The distinction between the blacklist and the redlist files is fairly
simple. The blacklist says don’t use the component, while the
redlist says don’t even see it.

blacklist
The blacklist file tells hpost which hardware components in the
domain are not to be used. They will not be tested and are unavailable
to the domain for the life of the boot. The blacklist was discussed in
Module 5, "Domains."

System Boot Process 7-17


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

blacklist and redlist Files

redlist
The redlist file is for internal and development use only. It tells
hpost which system components are to be considered as not installed,
even if they are physically present. While redlisted components are
effectively blacklisted, redlisting components carries a price in
capability and performance. If any component on a board is redlisted,
POST cannot reset that board. Because some failures require a board
reset to clear them, this forces the entire board to become unusable
and, in some cases, the entire system can become unusable.

Warning – You can make your domain or the entire UE10000 platform
unusable by using the redlist incorrectly. Do not use the redlist
without specific directions from Sun support personnel.

7-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The obp_helper Command

The Enterprise 10000 OpenBoot PROM (OBP) is not a hardware PROM


as it is on other Sun SPARC systems. It is actually a file loaded from
the SSP during domain bring up. Another SSP-resident file
(eeprom.image) also replaces the traditional OBP ID PROM (NVRAM)
as well.

The bringup command starts obp_helper in the background, which


kills any previous obp_helper daemon for the domain. The
obp_helper command then executes download_helper, and,
subsequently, downloads and executes the OBP itself in the host
domain.

The obp_helper command runs on the SSP, communicating with the


domain through the control board. It provides an environment in
which the OBP can execute and provides OBP time-of-day and
EEPROM simulation services.

System Boot Process 7-19


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The obp_helper Command

Syntax
obp_helper [-eivqr] [- o filename] [ - d filename] [ - m boot_proc]
[-A {on|off}] [-D {on|off}] [boot-arguments]

Function
● Loads download_helper into the domain.

● Configures domain memory and works with obp_helper to load


the OBP code into memory.

● Loads the eeprom.image file and the Time of Day (TOD) into the
domain.

● Provides ongoing communication to the SSP for the OBP.

The obp_helper command also maintains the OBP EEPROM image


on the SSP. Whenever a change is made to an OBP environment
variable, the change is communicated to obp_helper, which updates
the value in the eeprom.image file for the domain on the SSP.

Restarting obp_helper
If the obp_helper daemon for a domain terminates, you can restart it
by running obp_helper -r with the proper domain SUN_HOSTNAME
value set. Other than this case, do not try to run obp_helper from the
command line.

7-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The download_helper Command

The download_helper and obp_helper commands work together to


provide support for the OBP. The download_helper command runs in
the domain, while the obp_helper command runs on the SSP. The
download_helper command is loaded into the domain by
obp_helper.

The download_helper command enables programs to be downloaded


to the domain memory. It provides an environment in which host
programs can execute without having to know how to relocate
themselves to memory.

download_helper is the first code that gets executed in the domain


after hpost has finished. It remains resident in the domain throughout
the life of the current reset and boot session.

System Boot Process 7-21


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The download_helper Command

Function
● Responsible for preparing the domain for OBP execution.
Remember that the OBP is just a program, and needs memory and
CPU resources properly set up for it to run.

● Loads the OBP and eeprom.image files.

● Sets the initial domain Time of Day (TOD) clock

● Provides the interface to cbe for OBP command-line


communication with the SSP.

● Reflects EEPROM updates back to obp_helper.

● Handles all four kinds of reset and redmode conditions


(see Module 10).

7-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Other Boot-Time Software

netcon_server
netcon_server is an SSP daemon started by bringup. There is one
netcon_server per active domain. It manages communications
between a domain’s various netcon sessions and the domain specified
by the SUNW_HOSTNAME environment variable.

When the domain is up, netcon_server connects the netcon


session(s) on the SSP to the cvcd daemon in the domain using TCP/IP.
When the domain is down, it connects the netcon sessions with the
OBP through the control board.

It also has the responsibility for updating the Enterprise 10000 SNMP
Management Information Block (MIB) with the final domain
configuration information.

System Boot Process 7-23


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Other Boot-Time Software

Solaris
Modifications have been made to support the Enterprise 10000,
Dynamic Reconfiguration, and Alternate Pathing. All the normal
features of Solaris are provided.

The minimum Solaris 2.5.1 level required to support the Enterprise


10000 is 2.5.1 Hardware 4/97. For Solaris 2.6, Hardware 5/98 is
required, although Hardware 3/98 is supported without AP and DR.

7-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Console Communication Paths

As discussed previously, the Enterprise 10000 provides two paths for


primary console communication. Since there is no keyboard
attachment and no serial ports, the use of normal Sun server console
communication paths is not possible.

When the domain is running, Solaris communicates through the


domain cvcd daemon, using a TCP/IP network interface.

If Solaris is not running, a different communication path must be


chosen. For example, when the domain is at the OBP level, it isn’t
possible to use TCP/IP. In this case, a path using JTAG via the active
Control Board is chosen.

The system automatically switches between the JTAG and TCP/IP


paths as the domain changes state. You can force the use of JTAG
either from the netcontool JTAG button or by using ~= in netcon.

System Boot Process 7-25


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The OpenBoot PROM

The OpenBoot PROM (OBP) is a binary that is loaded into the


domain’s memory by obp_helper and download_helper. The OBP
code is stored on the SSP. It is essentially identical in function and
operation to the standard SPARC OBP.

OpenBoot PROM Functions


● Manage any NVRAM (Non-Volatile RAM) parameters

● Implement a pseudo-EEPROM (ID PROM with eeprom.image)

● Build the device tree

● Initialize OBP parameters for Dynamic Reconfiguration (DR)


support using dr-max-mem

● Initialize support for Alternate Pathing (AP) by sending the


current boot device path to the SSP

7-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The OpenBoot PROM

OpenBoot PROM Functions


● Initialize the system and prepare for booting the kernel

● Load and initiate execution of the OS

● Report the system configuration to the OS

● Provide console and memory mapping services to the OS during


its early start-up sequence

● Implement the Network Virtual Console (cvcd) interface with


download_helper for the JTAG console path.

System Boot Process 7-27


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

obp

obp is named after the OpenBoot PROM. obp is fundamental to the


boot process of a domain. It performs all of the functions of the OBP
on other Sun SPARC systems, and manages the ID PROM information.
The OBP binary file itself is located in
$SSPOPT/release/Ultra-Enterprise-10000/2/6/hostobjs/obp

The /2/6 portion of this path is specific to the version of the


operating system, in this case Solaris 2.6. A different version of the
operating system will have a different path name, such as 2/5/1. New
versions of the Enterprise 10000 are provided by patches installed on
the SSP. Each domain running the same OS level uses the same obp
file.

Just as on a regular server, obp builds the domain device tree, and
interprets and executes the FCode resident in the SBus cards.

Note – The OBP is a critical file for domain operation. You should
always have a backup copy of it.

7-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

eeprom.image

The eeprom.image file is a binary SSP file that takes the place of the
normal SPARC hardware ID PROM. It is loaded into the domain
during initialization along with the OBP and essentially "customizes"
the OBP for this domain. It contains the:

● OBP NVRAM area – Environment variables and devaliases created


by the user

● ID PROM area – Includes the hostid database

● Reboot information area

Each domain must have its own unique eeprom.image file because
each domain has a unique host ID and may have different OBP
environment variable settings.

System Boot Process 7-29


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

eeprom.image

Caution – You must have the eeprom.image file for a domain to be


! able to boot the domain. While the eeprom.image file can be re-
created with the sys_id command, it is much wiser to back up all
eeprom.image files on a regular basis.

If you want to see the changes that have been made to the ID PROM
default settings, and additional devalias entries, you can run the
strings command on the eeprom.image file for the domain.

7-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Managing the eeprom.image Files

The domain_create command uses a template version of


eeprom.image, which resides in:
$SSPVAR/.ssp_private/eeprom_save/eeprom.image.domain_name

The domain’s specific eeprom.image file is in:


$SSPVAR/etc/platform_name/domain_name/eeprom.image

If you make changes to OBP variables or devaliases, which would


cause changes to a normal ID PROM, obp_helper updates the
working copy file on the SSP and keeps up to five backup copies of
previous versions in:
$SSPVAR/etc/platform_name/domain_name/eeprom.backup.0
$SSPVAR/etc/platform_name/domain_name/eeprom.backup.1

System Boot Process 7-31


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Managing the eeprom.image Files

When you remove a domain (with domain_remove) the domain’s


eeprom.image template file is updated and the working copy is
deleted. The eeprom.image file is saved in the event that you want to
re-create the domain.

Remember that you can use sys_id to display the ID PROM area of
the eeprom.image for a domain.
ssp:domain% sys_id -d -f eeprom.image.domain_name

IDPROM in eeprom.image.domain_name

Format = 0x01
Machine Type = 0x80
Ethernet Address = 0:0:be:a6:6e:5
Manufacturing Date = Wed Dec 31 16:00:00 1969
Serial number (machine ID) = 0xa66e05
Checksum = 0x3f

7-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

OBP Environment Variables Specific to the Enterprise 10000

This list is current as of Enterprise 10000 OBP Version 3.2.4. These


parameters are discussed in more detail in later modules, but are listed
here to be near the OBP material.

Boot Time Parameter

dr-max-mem

● For Solaris 2.6 and above, any nonzero value enables DR.

● For Solaris 2.5.1, this parameter requests the OS to configure as


though more physical main store is present than was actually
found at boot time. This enables the OS to support the addition of
memory through DR. The total amount of memory supported by
the domain will be the value of dr-max-mem. A value of zero (0)
allows no storage to be added. Changes will not take effect until
the next bringup unless setenv-dr is used.

System Boot Process 7-33


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

OBP Environment Variables Specific to the Enterprise 10000

Reset Handling
More detail on these conditions is in Module 10, "Diagnostic
Information."

sir-sync?

● If set to TRUE, the OBP will try to perform an OBP sync operation
when a SIR (system initiated reset) occurs, caused by a request
from the OS.

xir-sync?

● If set to TRUE, the OBP will try to perform an OBP sync operation
when a XIR (externally initiated reset) occurs, caused by the
hostreset commands.

redmode-sync?

● If set to TRUE, OBP will try to perform an OBP sync operation


when a REDMODE (red mode condition) occurs.

redmode-reboot?

● If set to TRUE, OBP will try to reboot the default boot disk when a
REDMODE condition occurs.

watchdog-sync?

● If set to TRUE, OBP will try to perform an OBP sync operation


when a WATCHDOG (watchdog reset) occurs.

7-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

OBP Environment Variables Specific to the Enterprise 10000

watchdog-reboot?

● If set to TRUE, OBP will try to reboot from the default boot disk
when a watchdog reset condition occurs.

When any of the three reset conditions (SIR, XIR or Watchdog) or a


REDMODE condition occurs, hardware status information is saved by
download_helper and saved on the SSP. This provides detailed
information about the state of the machine at the time of the failure,
making it much more likely that the problem will be fixed quickly.

ssi-smr-size and idn-smr-size

These parameters are used in support of IDN, and are not discussed
further here. Always leave them set to zero.

System Boot Process 7-35


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The OBP Device Tree

Decoding an Interface Card Location


The PCI or SBus interface card location path for a device may come
from either the OBP device tree or from looking in the Solaris
/devices directory. You can decode this device specification to
determine the physical location of the associated card.

For example:

/sbus@41,0/qec@1,20000/qe@3,0

/sbus@41 = system board 0, SBus 1


4 1
0 1 0 0 0 0 0 1
1 = IO sbus number
0 = proc
0-1
sys board number
0-F

/qec@1 = SBus slot 1

SBus slot number


0-1

SYSIO 0 is the upper pair of SBus slots in the system board. Each
system board SBus slot is labelled with its SBus and slot number.

7-36 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The OBP Device Tree

Decoding an SBus Number

System Board SBus 0 SBus 1

0 /sbus@40 /sbus@41
1 /sbus@44 /sbus@45
2 /sbus@48 /sbus@49
3 /sbus@4c /sbus@4d
4 /sbus@50 /sbus@51
5 /sbus@54 /sbus@55
6 /sbus@58 /sbus@59
7 /sbus@5c /sbus@5d
8 /sbus@60 /sbus@61
9 /sbus@64 /sbus@65
10 /sbus@68 /sbus@69
11 /sbus@6c /sbus@6d
12 /sbus@70 /sbus@71
13 /sbus@74 /sbus@75
14 /sbus@78 /sbus@79
15 /sbus@7c /sbus@7d

Decoding a PCI Slot Location


PCI slots decode exactly the same as SBus slots. Since there is only one
PCI card per interface, the slot number is always zero.

For example, this means that /pci@68 would decode to board 10, slot
or position 0 (upper slot).

System Boot Process 7-37


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The OBP Device Tree

Decoding a Processor Location

/SUNW,UltraSPARC@1f,0

/SUNW,UltraSPARC@1f,0 = system board 7, proc 3


1 f
0 0 0 1 1 1 1 1
1 = IO processor number
0 = proc
0-3
sys board number
0-F

7-38 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

The OBP Device Tree

Decoding a Processor Location

System
Processor 0 Processor 1 Processor 2 Processor 3
Board

0 /SUNW,UltraSPARC@0,0 /SUNW,UltraSPARC@1,0 /SUNW,UltraSPARC@2,0 /SUNW,UltraSPARC@3,0

1 /SUNW,UltraSPARC@4,0 /SUNW,UltraSPARC@5,0 /SUNW,UltraSPARC@6,0 /SUNW,UltraSPARC@7,0

2 /SUNW,UltraSPARC@8,0 /SUNW,UltraSPARC@9,0 /SUNW,UltraSPARC@a,0 /SUNW,UltraSPARC@b,0

3 /SUNW,UltraSPARC@c,0 /SUNW,UltraSPARC@d,0 /SUNW,UltraSPARC@e,0 /SUNW,UltraSPARC@f,0

4 /SUNW,UltraSPARC@10,0 /SUNW,UltraSPARC@11,0 /SUNW,UltraSPARC@12,0 /SUNW,UltraSPARC@13,0

5 /SUNW,UltraSPARC@14,0 /SUNW,UltraSPARC@15,0 /SUNW,UltraSPARC@16,0 /SUNW,UltraSPARC@17,0

6 /SUNW,UltraSPARC@18,0 /SUNW,UltraSPARC@19,0 /SUNW,UltraSPARC@1a,0 /SUNW,UltraSPARC@1b,0

7 /SUNW,UltraSPARC@1c,0 /SUNW,UltraSPARC@1d,0 /SUNW,UltraSPARC@1e,0 /SUNW,UltraSPARC@1f,0

8 /SUNW,UltraSPARC@20,0 /SUNW,UltraSPARC@21,0 /SUNW,UltraSPARC@22,0 /SUNW,UltraSPARC@23,0

9 /SUNW,UltraSPARC@24,0 /SUNW,UltraSPARC@25,0 /SUNW,UltraSPARC@26,0 /SUNW,UltraSPARC@27,0

10 /SUNW,UltraSPARC@28,0 /SUNW,UltraSPARC@29,0 /SUNW,UltraSPARC@2a,0 /SUNW,UltraSPARC@2b,0

11 /SUNW,UltraSPARC@2c,0 /SUNW,UltraSPARC@2d,0 /SUNW,UltraSPARC@2e,0 /SUNW,UltraSPARC@2f,0

12 /SUNW,UltraSPARC@30,0 /SUNW,UltraSPARC@31,0 /SUNW,UltraSPARC@32,0 /SUNW,UltraSPARC@33,0

13 /SUNW,UltraSPARC@34,0 /SUNW,UltraSPARC@35,0 /SUNW,UltraSPARC@36,0 /SUNW,UltraSPARC@37,0

14 /SUNW,UltraSPARC@38,0 /SUNW,UltraSPARC@39,0 /SUNW,UltraSPARC@3a,0 /SUNW,UltraSPARC@3b,0

15 /SUNW,UltraSPARC@3c,0 /SUNW,UltraSPARC@3d,0 /SUNW,UltraSPARC@3e,0 /SUNW,UltraSPARC@3f,0

System Boot Process 7-39


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Lab

The instructor will assign each group a domain. Working together,


perform the following exercises. Remember that other students are
using the system; be careful which command options you use.

1. Log in to the SSP as user ssp. Enter your domain name, when
prompted, for SUNW_HOSTNAME.

2. Using hostview, verify that the I/O power distribution unit(s) and
your domain system board(s) are on. (Note: I/O components will
not be on until enabled by the SSP power command.)

3. Connect to the domain with netcon, or if the netcon console is


not already running, right click on the root window and select
SSP ➤ Host Console to start netcontool.

4. Boot the domain to OBP. You may need to answer Y to the


question about configuring the centerplane if you receive it.
ssp:domain% bringup -A off

5. Observe the output of bringup and hpost.

6. In the netcon window:

● From the banner, note the domain memory size, serial number,
ethernet address and host ID.
● Use printenv and devalias to verify the boot disk.

7. Boot to the OS.


ok> boot

8. Observe the boot messages.

9. Log in as root. Use ps and df to look at what processes are


running and what file systems are mounted.
domain# ps -ef
domain# df -k

7-40 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Lab

10. Shut down the system.

Note – You can use init 0, halt, or shutdown. Normally in a


production environment, you would use shutdown with a grace
period (-g) to give users some warning. Also, it is always a good idea
to use uname before executing a shutdown command to make sure that
you are halting the correct machine.

domain# uname -a
SunOS domain 5.5.1 Generic sun4u sparc SUNW,Ultra-Enterprise-10000
domain# shutdown -y -i0 -g0

11. Reboot the domain directly to the OS.


ssp:domain% bringup -A on
$SSPVAR/adm/$SUNW_HOSTNAME/post/

12. Examine the various log entries on the SSP.

Enterprise 10000 platform messages file:


/var/opt/SUNWssp/adm/messages

Domain messages file:


/var/opt/SUNWssp/adm/$SUNW_HOSTNAME/messages

SSP messages file:


/var/adm/messages

13. Start a new netcon session in an existing window. Start a second


session in another window. Start a third session using
netcontool.

14. Shut down the domain normally.

System Boot Process 7-41


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Lab

15. Using the Enterprise 10000 CD-ROM in the SSP’s CD-ROM drive,
boot the domain from the CD-ROM to single-user mode. This can
be used for maintenance on the system disk (forgotten root
password, and so on). Remember, the SSP must be correctly set up
as a boot server for this to work.
ssp:domain% bringup -A on net -sw

16. Use ps and df to look at what processes are running and what file
systems are mounted.
domain# ps -ef
domain# df -k
Reboot to the disk.
domain# reboot

17. Once again, examine the log entries on the SSP.

Enterprise 10000 platform messages file:


/var/opt/SUNWssp/adm/messages

Domain messages file:


/var/opt/SUNWssp/adm/$SUNW_HOSTNAME/messages

SSP messages file:


/var/adm/messages

7-42 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Describe the steps of the SSP boot process.

❑ List the SSP daemons and their functions.

❑ Explain how to boot a domain.

❑ List the steps of the domain boot process.

❑ Identify the files used in the domain boot process.

❑ Explain the domain hardware configuration process.

❑ Describe the purpose of the eeprom.image files.

❑ Decode SBus slot and processor physical locations.

❑ Describe the OBP environment variables specific to the


Enterprise 10000.

System Boot Process 7-43


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
7

Think Beyond

Why are the SSP daemons only started on the main SSP?

What would you need to replace download_helper if it did not exist?

Why is the boot PROM loaded from the SSP? How else could it be
done?

Why is the bringup/hpost process so complex?

7-44 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Alternate Pathing 8

Course Map
This module describes the concepts, configuration, restrictions,
operation, and control of Alternate Pathing (AP). AP gives you the
ability to have multiple paths to the same device from one domain,
providing an extra degree of fault resilience.

8-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. Why does the Enterprise 10000 need Alternate Pathing support?

2. What are the advantages of Alternate Pathing?

3. Where is the AP configuration information kept?

4. What types of devices need AP support?

8-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Objectives

Upon completion of this module, you will be able to:

● Describe the concepts of Alternate Pathing (AP).

● Discuss the supported device types.

● List the AP device restrictions.

● Install and set up the AP software.

● Use the AP command-line commands.

● Configure alternate network paths.

● Configure alternate disk array paths.

● Configure alternate paths for the boot drive.

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Sun Enterprise 10000 Alternate Pathing 2.1 User’s Guide

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

● Sun Enterprise 10000 System Overview Manual

● The man pages for the commands and files

Alternate Pathing 8-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP Concepts

Alternate Pathing (AP) enables you to have two physical paths to the
same A5000 or SSA storage array or network interface, transparent to
the operating system.

Only one path may be active at a time. If a path fails, the alternate path
can be configured active in place of the failed path. Path switching
does not always occur automatically; it may need to be be performed
manually.

The system uses the meta-device, a name representing the end object
(such as the disk partition or network interface), but does not use the
physical path names to access the device.

For example, if /dev/dsk/c2t0d0s0 and /dev/dsk/c3t0d0s0 were


paths to the same device, the meta-device name for the device would
be /dev/ap/dsk/mc2t0d0s0.

Note – The material covered in this module applies almost identically


to the AP 2.1 support that Solaris 2.6 provides for the Sun Enterprise
3X00, 4x00, 5x00 and 6x00 servers.

8-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
User process
Disk
/dev/mxxx
Network
User process
Stream head
/dev/ap/dsk/meta-device
AP Implementation

AP disk Meta Driver


(ap_dmd)

Alternate Pathing
Read Write
proc- proc-
Two alternates essing essing
Disk driver
(e.g., ssd for SSA)
AP network Meta Driver
Read Write (mxx)
proc- proc-
Nexus driver essing essing
(e.g., pln/SOC for SSA)

Stream end
Device Read Write (xx driver)
(e.g., SSA disk array)
Driver routines
Once
per
interface

Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Physical network
interface
8

8-5
8

AP Implementation

Alternate Pathing provides redundancy at the level of the I/O


controller and cabling. For disks, in combination with mirrorings, this
provides full I/O redundancy for critical system data along the entire
data path, providing enhanced failure resistance. Only one path is
active at a time.

Alternate Pathing supports Dynamic Reconfiguration, recognizing


when alternate paths are added and removed from the system.
Support for booting from a meta-disk is also provided.

The implementation of AP is similar to the approach taken by disk


management software such as Enterprise Volume Manager and
Solstice DiskSuite.

AP creates a new layer of device drivers (meta-disks and meta-


networks), which accesses one of two physical device drivers to access
the device. Applications and the OS components, including the disk
management software, use the meta-device name to access the
resource. Only the drivers know the actual physical paths.

8-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP Implementation

No component other than AP is aware that the normal device paths


are to the same device. This can cause problems for applications that
use the physical paths instead of the meta-devices to scan or inspect
disk or network devices; they may identify the meta-device paths as
separate devices.

The active path can be manually switched to the alternate, at any time,
with no interruption to active traffic using the metadevice. Note that
there is no automatic switch-over to the alternate path if the active
path fails. In the case of Dynamic Reconfiguration, however, disk and
network paths will be automatically switched.

Meta-device definitions are stored in an AP state database that is used


early in the boot process. There are usually several copies of this
database. You must create the meta-devices yourself; the system will
not automatically create these for you.

Alternate Pathing 8-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP Implementation

AP can be used with Solstice DiskSuite or Volume Manager. Disk


management mirrors are implemented on top of the meta-disk
devices, giving them redundant paths to complement the mirrored
data.

8-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP Requirements

To use Alternate Pathing, the following conditions must exist:

● The AP software must be installed properly on the SSP and the


host domain.

● You must have created the AP database.

● For disk alternate paths, both ports of the SSA to be used must be
connected to the domain.

● For network alternate paths, you must have two interfaces of the
same device type (such as qfe and qfe) on the same subnet.

AP supports paths attached to the same domain. AP does not support


pathgroups across domains.

Alternate Pathing 8-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Supported Devices

Disk Devices
AP supports the StorEdge A5000 (Solaris 2.6 only) and SPARCstorage
Arrays.

SCSI devices are not supported. The StorEdge A3000 is not supported,
but has its own internal AP capability.

After you set up Alternate Pathing for disks, you can use Solstice
DiskSuite Version 3.0 and Sun Volume Manager Versions 2.3, 2.4, and
2.5 normally. (However, on installation DMP will automatically disable
itself in Volume Manger 2.5 if AP is already installed.)

Caution – You must make sure that any AP devices used by these
! products are used by their meta-device names only.

8-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Supported Devices

You can place your boot disk and primary network interface under AP
control. This makes it possible for the system to boot unattended, even
if the primary network or boot disk controller is not accessible, as long
as a usable alternate path for these devices is defined and available.

Network Devices
The network devices supported by AP are:

● SunFastEthernet™ 2.0 (hme)

● FDDI 3.0 (bf) (Solaris 2.5.1 only)

● FDDI 5.0 (nf) SAS and DAS

● LE Ethernet (le)

● QE Ethernet (qe)

● QFE Ethernet (qfe)

● The system network console (netcon)

Network device types may not be mixed in a meta-device.

Alternate Pathing 8-11


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Installing AP

Installation of AP for domains running Solaris 2.6 is very similar to


that for domains running Solaris 2.5.1. Support on the SSP has been
upgraded.

AP Version 2.1 supports Solaris 2.6. AP Version 2.0 supports Solaris


2.5.1. Do not use the wrong version for your OS level.

How to configure AP in a domain will be shown later. There is no


configuration for the SSP portion of AP.

Solaris 2.6
The AP 2.1 packages for Solaris 2.6 are provided on the SMCC Server
Supplements CD-ROM shipped with the Solaris 2.6 server media kit.

8-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Installing AP

Solaris 2.5.1
The AP 2.0 packages for Solaris 2.5.1 are provided on the Alternate
Pathing 2.0 for the Ultra Enterprise 10000 CD-ROM shipped with each
system.

Installing AP (Both Releases)


To use AP, you must install the following packages:

On the SSP:

● SUNWapssp – AP subsystem (SSP)

On each Ultra Enterprise 10000 domain that will use AP:

● SUNWapr – AP subsystem (root)

● SUNWapu – AP subsystem (usr)

Documentation:

● SUNWabap – AP AnswerBook

● AP 2.0 only. AP 2.1 documentation is in the Hardware


AnswerBook, SUNWabhdw.
● SUNWapdoc – AP man pages

The installation process uses the pkgadd command to install the AP


packages. There is no order dependency.

To install AP in the domain, share the AP CD-ROM from the SSP and
mount it to the domain using NFS. Apply any appropriate patches.

Alternate Pathing 8-13


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Basic Alternate Pathing Concepts

Physical Paths
For the purposes of AP, an I/O device is either a disk or network device.
The only types of disk device currently supported by AP are the
StorEdge A5000 (Solaris 2.6 only) and the SPARCstorage Array (SSA).
In this module, the term disk always refers to one of these devicse.

An I/O adapter is the controller for an I/O device such as an A5000


SOC+ adapter.

A device node is a path in the devices directory that is used to access a


physical device, such as /dev/dsk/c0t1d1s0.

The term physical path refers to the electrical path from the host to a
disk or network.

8-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Basic Alternate Pathing Concepts

Meta-Disk
A meta-disk is a logical name that enables you to access a disk device
generically—you do not need to specify the particular path to the
device. You reference a meta-disk just as if it were a real device, using
an AP-specific device node such as /dev/ap/dsk/mc0t1d1s0. The
AP software determines which path is active and uses that path to
access the device.

In the above figure, /dev/ap/dsk/mc0t1d1s0 is used to access a


slice on a meta-disk, regardless of which pln port is currently active
(handling I/O) for the meta-disk. For the A5000, the sf ports
(representing an SOC+ adapter) are where AP activates the paths.

Alternate Pathing 8-15


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Disk Pathgroup

A disk pathgroup consists of two physical paths leading to the same


storage array. When a physical path is part of a pathgroup, it is called
an alternate path. An alternate path to a disk can be uniquely identified
by the pln or sf port that the alternate path uses.

Make sure that you understand the use of the term alternate. It means
either possible path, not just the spare path. The path in use is the
active alternate.

Only one alternate path at a time is allowed to handle disk I/O. The
alternate path that is currently handling I/O is called the active
alternate.

One of the alternate paths is designated the primary path. The primary
path is initially made the active alternate. Although you can change
which path is the active alternate, the primary path is always the same.

8-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Disk Pathgroup

The primary path has several purposes.

● It is initially the active alternate.

● It provides the meta-disk name.

● It is used to identify the meta-disk.

You reference a disk pathgroup by specifying the pln or sf port (such


as pln1 or sf7) that corresponds to the primary path. For example, if
the primary path is sf1, the pathgroup name is msf1.

Some considerations:

● Both array interfaces in a pathgroup must be attached to the same


array

● Only one interface is active at a time through the meta-device

● There must be exactly two adapters in a pathgroup

For even more redundancy with an A5000:

● If you have two interface boards (IB), consider connecting a path


to each

● If you are using hubs in your configuration, use a separate hub for
each interface

Alternate Pathing 8-17


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Meta-Network

A meta-network, just like a meta-disk, is a logical interface that enables


you to access a network through either of two physical paths without
having to reference either path explicitly within your scripts and
programs. You reference a meta-network by using a meta-network
interface name such as mle1.

In the above figure, interface mle1 is used to access the meta-network,


regardless of which physical adapter (le1 or le6) is currently active
for the meta-network device.

8-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Network Pathgroup

A network pathgroup consists of two network adapters connected to the


same physical network. The concept is identical to that of a disk
pathgroup.

To specify a network pathgroup, use the meta-network interface name,


such as mle1. Just as with a disk pathgroup, this is how you would
switch the active alternate.

Some considerations are:

● Both network adapters in a pathgroup must be attached to the


same subnet

● Only one adapter is active at a time

● Consider using a separate hub for each path for even more
redundancy

● There must be exactly two adapters in a pathgroup

● Both adapters must be of the same device type

Alternate Pathing 8-19


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Sample AP Configurations

The above diagram shows how you can use AP to provide fault
tolerance for an Ethernet network and an A5000 storage array.

In this example, two network adapters—one each on Board 1 and


Board 2—are connected to the same network. Similarly, two SOC+
adapters on the two boards are connected to the same array.

In this situation, if Board 1 is to be removed through a Dynamic


Reconfiguration (DR) detach operation, AP can be used to switch all
device usage to Board 2 without interfering with any I/O operations
in progress.

Both PCI and SBus adapters are supported.

8-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP With Mirroring

AP is similar to, but not the same as, disk mirroring. Disk mirroring
replicates data to separate devices and thus achieves data redundancy.
AP, on the other hand, achieves pathing redundancy. Disk mirroring
and AP are complementary; you can use them together to achieve both
data redundancy and pathing redundancy.

In the above example, the mirroring occurs on top of AP, which enables
switching of the underlying adapters used to implement the SSA
mirror from one board to another without disruption of the disk
mirroring or any active I/O.

Note that Alternate Pathing does not provide mirroring itself.

Alternate Pathing 8-21


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP With Mirroring

Device Paths
The above diagram shows the path of an I/O operation in a Volume
Manager or Solstice DiskSuite mirrored environment using AP.

It also shows that access to each mirror on the storage arrays is


alternate pathed, providing four physical paths to the data. It is not
required that the mirror have alternate paths.

8-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

The AP State Database

AP maintains a database that contains information about all defined


meta-disks, meta-networks, and their corresponding alternate paths
and properties. Each domain will have its own database.

Conceptually, a single AP database is maintained in a single domain.


However, you should set up multiple copies of this database. In this
way, if a given database copy is not accessible or becomes corrupted,
AP can automatically begin to use a current, non-corrupted database
copy. All of the AP databases synchronize their contents during
system initialization and DR operations.

You must dedicate an entire raw disk slice, of at least 300 Kbytes, to
each AP database copy. You can use larger slices, but doing so wastes
disk space since AP won’t need it. It doesn’t matter which slice you
use.

Alternate Pathing 8-23


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP Database Configuration Considerations

When choosing partitions for the AP database, remember that:

● You should set up at least three to five database copies.

● The database copies should have no I/O adapters in common with


each other. This helps protect against an adapter failure.

● The copies can be on any slice of any type of disk device. They do
not need to be on devices that AP supports, and do not need to
have alternate paths.

● Especially if you are using Dynamic Reconfiguration (DR), the


database copies should be on I/O adapters on different system
boards so that at least one database copy is always accessible if
one of the system boards is detached. Generally, you should have
one separate copy per system board.

8-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP Database Configuration Considerations

● As configured at the factory, slice 4 of the root disk is


appropriately sized for an AP database (2 Mbytes) and is not
allocated to any other purpose.

● A subset of the AP database contents is automatically maintained


on the SSP for use at domain boot time. This database contains any
alternate pathing information for the domain’s boot disk.

Alternate Pathing 8-25


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Creating the AP Database

Before you can begin configuring AP, you must create at least one AP
database. The AP database is created with the apdb command. You can
use apdb to create the original database or a copy.

The apdb Command


# apdb -c /dev/rdsk/c0t3d0s4 -f

The -c (create) option is followed by the raw disk slice that will
contain the new AP database copy. Each copy requires its own
dedicated slice, which must be at least 300 Kbytes in size.

The -f (force) option is only necessary to create the first AP database


copy. It is not used otherwise.

8-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Creating the AP Database

The apdb Command


The locations of the AP database slices are automatically recorded in
/etc/system. Be very careful if you try to change this list by hand.

If you have installed the AP software but have not created at least one
database copy, you will see the following messages on the console
early in the boot process:
WARNING: ap: no database locations
/sbin/apconfig: apd_pathgroup_reset: ioctl() failed.
/sbin/apconfig: ... errno 48
/sbin/apconfig: Error 48

Note – In Solaris 2.6, the last three message lines are not seen.

You will also see similar messages during Dynamic Reconfiguration.


These messages will not be displayed after you create the first AP
database copy.

Refreshing the Databases


You can refresh the disk copies from the memory copy of the AP
database with the apdb -Z command.
# apdb -Z
#

Alternate Pathing 8-27


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP Databases on Alternate Pathed Disks

If you want an AP database copy to reside on an AP disk, you must


create two copies of the AP database. The AP configuration process can
only access database locations by the physical disk slice address, and
is not aware of meta-devices at this level.

You must create this database copy twice, specifying each of the
physical paths to the AP meta-disk. For example, if c1 and c9 are
connected to the same AP pathgroup, to create a copy of the AP
database residing on target 3, slice 4, use the following two
commands:
# apdb -c /dev/rdsk/c1t3d0s4 -f
# apdb -c /dev/rdsk/c9t3d0s4

The AP software will be aware of two copies of the database when


actually there is only one, because the disk is accessible through two
paths. This database "alias" is safe, because AP always updates and
accesses its database copies sequentially. The AP copy is updated
twice with the same information, but this is insignificant overhead.

The whole process works outside of AP. AP is not aware that these are
two separate copies of the database.

8-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Viewing AP Database Status

The apconfig Command


To view the status of your AP database, use the apconfig command
with the -D option.
# apconfig -D

path: /dev/rdsk/c0t1d0s4
major: 32
minor: 12
timestamp: Thu Jul 27 16:24:27 1995
checksum: 687681819
corrupt: No
inaccessible: No
#

In this example, only one AP database had been created. If there had
been more than one copy, information about each copy would have
been listed.

Alternate Pathing 8-29


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Viewing AP Database Status

The apconfig Command


The apconfig command provides the following data:

● The disk slice that contains this copy of the database.

● The major and minor number of the device that it resides on.

● The timestamp of the last update.

● A contents checksum.

● The corrupt field indicates whether the database contents


correlate with the checksum. If corrupt is set to Yes, the
database contents must be refreshed with the apconfig -z
command before it can be used. Usually the database will be
repaired automatically.

● The inaccessible field indicates whether the device that


holds the database can be reached. (The device may be
unavailable due to a device failure, system component failure
or DR.)

8-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Deleting a Copy of the AP Database

You also use the apdb command to delete a copy of the AP database.
# apdb -d /dev/rdsk/c0t1d0s4 -f

The -d (delete) option specifies the raw disk slice continuing the copy
of the AP database that you want to delete.

The -f (force) option is required only when you are deleting either the
last or the next-to-last copy of the AP database.

If you delete the last copy of the AP database, the AP functionality is


no longer available. Make sure that you no longer try to access and AP
meta-devices. If you do, you will get errors and be unable to access the
devices. Make sure you check /etc/vfstab before you reboot, or you
might not be able to reboot.

Alternate Pathing 8-31


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Viewing Pathgroup Information

The AP database contains information about SCSI and network device


class pathgroups.

When a pathgroup is initially defined, its pathgroup definition is an


uncommitted database entry. The meta-disk or meta-network associated
with the uncommitted entry is not visible to the OS and cannot be
used until the new pathgroup is committed. Also, when a pathgroup is
deleted, the deletion must be committed before it takes effect.

These two states (uncommitted and committed) enable you to prepare


and test your environment before activating the meta-device. To
commit any uncommitted database entries, use apdb -C. All
uncommitted entries in the database will be committed, including
both new and deleted entries, for both disks and network interfaces.

Note that all uncommitted entries will remain in the AP database


indefinitely until you either commit them or remove them.

8-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Viewing Network Entries

Uncommitted Network Entries


Use the -u option. Only uncommitted network pathgroup
information will be displayed.
# apconfig -N -u

metanetwork: mle0 U
physical devices:
le2
le0 P A

The meanings of the flags are:

U - The pathgroup is uncommitted

P - This is the primary alternate path

A - This is the active alternate path

Alternate Pathing 8-33


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Viewing Network Entries

Committed Network Entries


To view committed network pathgroup entries, use the apconfig -N
command.
# apconfig -N

metanetwork: mle3
physical devices:
le4
le3 P A

Note – There is no way to see both types of entries with one


command.

8-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Planning Network Pathgroups and Meta-Devices

To use Alternate Pathing, both physical networks within a network


pathgroup must be of the same type. For example, a network
pathgroup could consist of two le networks or two qe networks, but
not one of each.

Both devices in a network pathgroup should be physically connected


to the same network. For example, Ethernet adapters should be
connected to the same subnet. Remember that the meta-device looks
like a single physical device to the OS. (The meta-devices are
cloneable.)

While multiple physical network connections exist, only one adapter


at a time is active. The adapters should be on different system boards
so that Dynamic Reconfiguration (DR) operations (such as DR detach)
can be performed without affecting the alternates.

Alternate Pathing 8-35


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Planning Network Pathgroups and Metadevices

Meta-Network Interfaces
A meta-network interface name is derived from the name of the
primary alternate for that meta-network. A meta-network interface
name has the form mxxx where xxx is the primary interface name such
as le0.

For example, assume that the network adapters le0 and le1 connect
to the same Ethernet network. Meta-network device mle0 could
include these two adapters (if the primary adapter is specified as le0).
Similarly, QE Ethernet meta-network names have the form mqen. Note
that you cannot mix le and qe devices in the same pathgroup.

If le1 was specified as the primary, the meta-device name would be


mle1 even though it includes le0.

8-36 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Planning Network Pathgroups and Metadevices

FDDI Devices
FDDI 5.0 meta-network names have the form mnfn. The nf networks
can be either SAS (Single-Attached Station) or DAS (Dual-Attached
Station). AP 2.0 (only) also supports FDDI 3.0 SAS bf devices. You
cannot mix bf and nf devices in a pathgroup.

Alternate Pathing 8-37


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Creating a Network Pathgroup

This example assumes that you are creating a network pathgroup


using physical interfaces le0 and le2, with le0 as the primary
interface.

1. Use apnet to create an uncommitted network pathgroup. The


apnet command creates the meta-interface names and updates the
AP database with the alternate paths.
# apnet -c -p le0 -a le2

The -c operand specifies creation of a pathgroup, and the -p and


the -a operands specify the primary and alternate paths,
respectively.

2. Verify the results with apconfig -N -u.


# apconfig -N -u

metanetwork: mle0 U
physical devices:
le2
le0 P A

8-38 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Creating a Network Pathgroup

3. Use apdb -C to commit the new database entries.


# apdb -C

4. Use apconfig -N to view the new network entries in the


database. Note that the U is now gone.
# apconfig -N

metanetwork: mle0
physical devices:
le2
le0 P A

Activating the Meta-Device


You are now ready to use the new pathgroup.

5. You must remove all direct usage of both members of the


pathgroup.

a. You may have to unplumb the physical interface.


# ifconfig le0 down unplumb

b. If the interface you will be configuring down is the main


network interface, or if it is the interface that you will be using
as you execute the commands to configure the meta-network,
follow one of the procedures in “Alternately Pathing the
Primary Network Interface” later in this module.

c. Either create an /etc/hostname.mxxx file (such as


/etc/hostname.mle0) for any meta-networks that you want to
configure at system reboot, or rename /etc/hostname.le0 to
/etc/hostname.mle0.

Alternate Pathing 8-39


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Creating a Network Pathgroup

Activating the Meta-Device


6. Activate the meta-network in the usual manner, using the meta-
network interface name instead of the physical network interface
name. This may be done by either rebooting the machine or by
manually configuring the network.

For example:
# ifconfig mle0 plumb
# ifconfig mle0 inet 192.9.201.150 netmask + broadcast + up
Setting netmask of mle0 to 255.255.255.224
# ifconfig -a
lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
inet 127.0.0.1 netmask ff000000
mle0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
inet 192.9.201.150 netmask ffffffe0 broadcast 192.9.201.159
ether 0:0:be:a6:51:84

At this point, the network device node, in this example /dev/mle0, is


also active and can be used to access the network with Solaris
commands such as snoop.

8-40 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

FDDI Setup Considerations

FDDI interfaces have 48-bit MACIDs, similar to the way an Ethernet


interface has a MAC address. FDDI MACIDs, however, are different
for each interface, and must be unique.

A FDDI meta-device must have a unique MACID as well, and one that
does not duplicate the MACID of any other FDDI adapter on the
network. You will need to find an unused MACID for each FDDI
meta-device.

The MACID is set using the ether parameter of the ifconfig


command. You can see which MACIDs are in use on your network by
looking at the ifconfig ether field for each adapter.

To get a new MACID address, you can either “create” a number by


transposing digits on an existing MACID of one of the meta-interface's
physical paths or get a specifically designated one from the IEEE. If
you create a number, it is important to verify that there is no other
hardware on the same network which is a legitimate user of this new
MACID.

Alternate Pathing 8-41


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

FDDI Setup Considerations

Without proper configuration, the meta-network defaults to the


MACID of the active alternate on boot, which would prevent an
interface switch from operating properly.

To ensure that the MACID is set properly at boot time, place


appropriate ifconfig commands for the FDDI meta-device in the
/etc/rcS.d/S30rootusr.sh start-up script.

Contacting the IEEE


If you want a guaranteed unqiue MACID for your meta-device, you
must obtain one from the official registration authority, the IEEE.

The allocation of FDDI Media Access Control Identifiers (MACIDs) is


described in RFC1340, “Assigned Numbers,” July 1992. When
generating a MACID for your AP network interface, the new 48 bit
hardware address should be acquired from the IEEE Standards Office,
345 East 47th Street, New York, N.Y. 10017, U.S.A.

8-42 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Switching a Network Pathgroup

Remember that you can switch the active interface of a network


pathgroup while the meta-interface is active. The change is recorded in
the state databases. The new active path will be used until you switch
back, even after a reboot.

To switch the active interface, use the apconfig command. The change
will occur immediately. There is no commit process for pathgroup
switching.
# apconfig -P mle0 -a le2

You can see that the switch has occurred by using the apconfig -N
command.
# apconfig -N

metanetwork: mle0
physical devices:
le2 A
le0 P

Alternate Pathing 8-43


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Switching a Network Pathgroup

Note – Remember that switch operations take effect immediately;


there is no commit process for them.

Warning – When you switch interfaces, AP does not check that the
interface you are going to is the correct path. AP does not know if the
new interface is connected to the wrong subnet, disconnected, or
inoperative.

8-44 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Deleting a Network Pathgroup

This example assumes that you are deleting meta-network device


mle0.

1. Make sure that the meta-interface is inactive. Use the ifconfig


unplumb command.
# ifconfig mle0 down unplumb

A network meta-device can be deleted only if it is in the


unplumbed state or the plumbed up state. Otherwise, AP ignores
the delete request and, depending on your configuration, may
display warning messages of the following form:
WARNING:mnf_setphyspath: APUNSET busy
WARNING:ap_db_commit: mnf3 not deleted, metadevice returned error 16

where mnf3 is the metadevice that is still active.

2. Delete the meta-interface using apnet -d, specifying the


pathgroup name.
# apnet -d mle0

Alternate Pathing 8-45


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Deleting a Network Pathgroup

3. To verify the results, use apconfig -N to view the AP database.


# apconfig -N

metanetwork: mle0 D
physical devices:
le2 A
le0 P

The D flag marks the pathgroup as being in the uncommitted


delete state. It may not be used in this state.

4. Commit the database with the apdb -C command.


# apdb -C

5. You can verify the deletion with apconfig -N.


# apconfig -N
#

No output is produced, indicating that there are no network


pathgroups. (This must have been the last one.)

Reversing an Uncommitted Delete


You can undo a network pathgroup deletion if the deletion has not
been committed.

To undo a deletion, use apnet -z, specifying the same meta-network


interface that you had deleted.
# apnet -z mle0
#

This restores the pathgroup back to committed status (the D is


removed), and the pathgroup may be used again.

Once you have committed the deletion, you must re-create the
pathgroup to restore it.

8-46 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Alternately Pathing the Primary Network Interface

The primary network interface between your Sun server and the other
machines on the network is the Ethernet interface on the same subnet
as the SSP. You can alternately path this interface. The primary
network interface is the only interface that can be auto-switched to its
alternate at boot time.

During the boot process, if the active interface for the primary network
fails, the OS attempts to find an alternate interface. Note that the AP
database in your domain is used to do this. While a subset of the host’s
AP database resides on the SSP, this is only used to switch the boot
drive if necessary. When the host is ready to configure the primary
network interface, the domain’s AP database is available.

Alternate Pathing 8-47


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Alternately Pathing the Primary Network Interface

The primary network interface is activated early in the boot process.


To create and activate an alternate path for it, you must shut it down,
because you cannot configure a meta-interface active when the
underlying network interfaces are in use by the system. However, the
primary network interface is difficult to configure down—it supports
the SSP interface that netcon uses. There are three ways to solve this
problem:

● Create the appropriate AP database entries, create a new


/etc/hostname.mxxx file or rename the corresponding
/etc/hostname.xxx file, and then reboot your domain.

● Set up a script file to perform the transition in your domain


without rebooting.

● Log in to your domain from another network interface so that you


can stay connected when the primary network interface is
disabled.

Create the pathgroup normally. The steps are:

1. Verify which interface is the primary network interface. The


primary interface is the one whose /etc/hostname file matches
the name in /etc/nodename.
# cat /etc/nodename
washington
# cat /etc/hostname.qe0
washington
# cat /etc/hostname.qe4
lincoln

In this example, qe0 is the primary network interface.

2. Create the new network pathgroup, commit the change, and verify
the result.
# apnet -c -p qe0 -a qe4
# apdb -C
# apconfig -N
metanetwork: mqe0
physical devices:
qe4
qe0 P A

8-48 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Alternately Pathing the Primary Network Interface

3. Create the hostname.mxxx file to automatically configure the


interface at boot time.
# mv /etc/hostname.qe0 /etc/hostname.mqe0

4. Remove any configuration file for the alternate network interface.


# rm -f /etc/hostname.qe4

At this point, the new meta-interface is ready to be activated.

Bring down the physical network interfaces and bring up the meta-
network interface in any of the following ways:

● Reboot the domain. The meta-network interface will be started


automatically on reboot. The individual interfaces qe0 and qe4
will not be automatically started, because their /etc/hostname
files have been deleted.

● If the machine has other network interfaces, log in to your domain


from one of these alternate networks, and reconfigure the primary
interface. Be sure to use the right values for your network.
# ifconfig qe0 down unplumb
# ifconfig qe4 down unplumb
# ifconfig mqe0 plumb
# ifconfig mqe0 inet 136.162.22.45 netmask + broadcast + up

Alternate Pathing 8-49


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Alternately Pathing the Primary Network Interface

● Generate a script to configure the qe0 and qe4 interfaces down,


then configure up the meta-network interface. This method does
not require you to reboot your domain, but you will briefly lose all
communication over the primary network interface.
# ifconfig -a
lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
inet 127.0.0.1 netmask ff000000
qe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
inet 136.162.22.45 netmask ffffff00 broadcast 136.162.22.255
ether 0:0:be:0:8:c5
# cat > /tmp/washington.restart
ifconfig qe0 down unplumb
ifconfig qe4 down unplumb
ifconfig mqe0 plumb
ifconfig mqe0 inet 136.162.22.45 netmask + broadcast + up
^D
# chmod 700 /tmp/washington.restart
# nohup /tmp/washington.restart &
# ifconfig -a
lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
inet 127.0.0.1 netmask ff000000
mqe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
inet 136.162.22.45 netmask ffffff00 broadcast 136.162.22.255
ether 0:0:be:0:8:c5
#

You can also execute these commands all on one line, separated
with semi-colons. Ensure that you do not have any syntax errors.

Remember the remove any /etc/hostname.qe0 and


/etc/hostname.qe4 files, and add the /etc/hostname.mqe0 file.

Boot Time Interface Failure


If the primary network path fails at boot time, AP will switch the
primary interface to the other alternate. An automatic switch due
to an error will not occur at any other time.

8-50 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Viewing Disk Entries

Just like the network pathgroups, use the apconfig command to view
disk pathgroup entries, but with the -S option.

Uncommitted Disk Entries


Use the -u option. Only uncommitted disk pathgroup information will
be displayed.The P, A and U indicators are the same as for network
pathgroups.
# apconfig -S -u

c1 pln0 P A
c3 pln1
metadiskname(s):
mc1t5d0 U
mc1t4d0 U
mc1t3d0 U
mc1t2d0 U
mc1t1d0 U
mc1t0d0 U

Alternate Pathing 8-51


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Viewing Disk Entries

Committed Disk Entries


To view committed disk pathgroup entries, use the apconfig -S
command.
# apconfig -S

c1 pln0 P A
c3 pln1
metadiskname(s):
mc1t5d0 R
mc1t4d0
mc1t3d0
mc1t2d0
mc1t1d0
mc1t0d0

The P next to pln0 indicates that pln0 is the primary path, and the A
indicates that pln0 is currently the active path. The R next to mc1t5d0
indicates that this is the root (boot) device.

8-52 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Disk Path Components

Network pathgroups are built by specifying the physical adapter


names to the AP commands. Disk paths have physical names as
well, although they are not commonly seen.

You could build disk pathgroups by specifying each drive


individually, which would be tedious, or you could locate the
internal names for the disk paths.

In the case of the A5000, AP uses the sf port, or the Fibre Channel
connection to the host. The socal driver represents the SOC+
card, the sf represents the GBIC on the SOC+ card, and the ssd
driver is the physical disk driver. The ses driver is not seen in the
I/O path. It represents a monitor connection to an interface board.

AP uses the pln port for SPARCstorage Arrays, again representing


the Fibre Channel host connection. The soc driver corresponds to
the SOC interface card, the pln represents the FC/OM port in the
SOC card, and the ssd driver is the physical disk device driver.

The naming conventions are the same for the SOC and SOC+
adapters built-in to the Enterprise server I/O boards.

Alternate Pathing 8-53


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Planning a Disk Pathgroup and Meta-Disks

Note – AP 2.1 for Solaris 2.6 supports both A5000 and SSA devices. AP
2.0 for Solaris 2.0 only supports SSAs. Unless otherwise mentioned, all
commands apply to both releases and both disk arrays. Examples will
use SSAs to be able to apply to both releases; A5000 output is very
similar.

Before you begin to create a pathgroup, you must understand your


current disk configuration and plan your pathgroups carefully. This
means that you must understand your physical connections to your
A5000s and SSAs, as well as the various Solaris names for these
devices.

You must know your system hardware configuration to be able to


recognize when two pln ports are connected to the same SSA. You can
use the apinst command to display all pln and sf ports (such as
pln0 or sf3) and their disk device nodes (such as /dev/dsk/c1t0d0).

In conjunction with information from the ssaadm or luxadm


command, you can confirm your system configuration.

8-54 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Meta-Disk Configuration Example

Running the apinst command, you receive the following output:


# apinst
pln3
/dev/dsk/c7t0d0
/dev/dsk/c7t1d0
/dev/dsk/c7t2d0
/dev/dsk/c7t3d0
/dev/dsk/c7t4d0
/dev/dsk/c7t5d0
pln2
/dev/dsk/c5t0d0
/dev/dsk/c5t1d0
/dev/dsk/c5t2d0
/dev/dsk/c5t3d0
/dev/dsk/c5t4d0
/dev/dsk/c5t5d0

Alternate Pathing 8-55


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Meta-Disk Configuration Example

pln0
/dev/dsk/c1t0d0
/dev/dsk/c1t1d0
/dev/dsk/c1t2d0
/dev/dsk/c1t3d0
/dev/dsk/c1t4d0
/dev/dsk/c1t5d0
pln1
/dev/dsk/c3t0d0
/dev/dsk/c3t1d0
/dev/dsk/c3t2d0
/dev/dsk/c3t3d0
/dev/dsk/c3t4d0
/dev/dsk/c3t5d0
#

Note – This apinst output has been edited to remove the non-Fibre
Channel devices such as SCSI controllers and peripherals. These
usually appear as ispx.

8-56 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Meta-Disk Configuration Example

The apinst command gives you the following information:

pln number Controller SSA WWN

0 1
1 3
2 5
3 7

Using the ssaadm disp command, you can get the World Wide Name
(WWN) for each controller. The WWN is a unique identifier that
identifies every FibreChannel device, exactly like an Ethernet MAC
address.

Alternate Pathing 8-57


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Meta-Disk Configuration Example

# ssaadm disp c1

SPARCstorage Array 110 Configuration


(ssaadm version: 1.20 97/05/14)
Controller
path:/devices/sbus@45,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a0e2f:ctlr
DEVICE STATUS
TRAY 1 TRAY 2 TRAY 3
slot
1 Drive: 0,0 Drive: 2,0 Drive: 4,0
2 NO SELECT NO SELECT NO SELECT
3 NO SELECT NO SELECT NO SELECT
4 NO SELECT NO SELECT NO SELECT
5 NO SELECT NO SELECT NO SELECT
6 Drive: 1,0 Drive: 3,0 Drive: 5,0
7 NO SELECT NO SELECT NO SELECT
8 NO SELECT NO SELECT NO SELECT
9 NO SELECT NO SELECT NO SELECT
10 NO SELECT NO SELECT NO SELECT

CONTROLLER STATUS
Vendor: SUN
Product ID: SSA110
Product Rev: 1.0
Firmware Rev: 3.12
Serial Num: 00000083BE1D
Accumulate Performance Statistics: Enabled

For A5000s, you would use:


# luxadm disp c2

Note that the luxadm command includes the ssaadm command


functionality. You could use luxadm to obtain information for both
A5000 and SSA devices.

8-58 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Meta-Disk Configuration Example

The ssaadm output allows you to complete the table:

pln number Controller SSA WWN

0 1 00000083BE1D
1 3 00000083BE1D
2 5 00000083BC49
3 7 00000083BC49

You now can confirm that the same SSA is accessible through c1 (pln0)
and c3 (pln1), and the other through c5 (pln2) and c7 (pln3).

The two pathgroups therefore must consist of pln0 and pln1, and
pln2 and pln3.

Alternate Pathing 8-59


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Creating a Disk Pathgroup and Meta-Disks

1. Use apdisk to create an uncommitted disk pathgroup. The apdisk


command creates the meta-disk names and updates the AP
database with the alternate paths for all six SSA disks.
# apdisk -c -p pln0 -a pln1

The -c operand specifies creation of a pathgroup, and the -p and


the -a operands specify the primary and alternate paths,
respectively.

8-60 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Creating a Disk Pathgroup and Meta-Disks

2. Verify the results with apconfig -S -u.


# apconfig -S -u

c1 pln0 P A
c3 pln1
metadiskname(s):
mc1t5d0 U
mc1t4d0 U
mc1t3d0 U
mc1t2d0 U
mc1t1d0 U
mc1t0d0 U

Note that the entries are uncommitted.

3. Use apdb -C to commit the new database entries.


# apdb -C

4. Use apconfig -S to view the new disk entries in the database.


Note that the U is now gone.
# apconfig -S

c1 pln0 P A
c3 pln1
metadiskname(s):
mc1t5d0
mc1t4d0
mc1t3d0
mc1t2d0
mc1t1d0
mc1t0d0

Alternate Pathing 8-61


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Creating a Disk Pathgroup and Meta-Disks

5. Run drvconfig to create the new metadevice entries in the


/devices directory. The -i operand ensures that only AP
metadevices are created.
# drvconfig -i ap_dmd

6. Use the ls command to confirm that the device nodes have been
created.
# ls /devices/pseudo/ap_dmd*
/devices/pseudo/ap_dmd@0:128,blk
/devices/pseudo/ap_dmd@0:128,raw
/devices/pseudo/ap_dmd@0:129,blk
/devices/pseudo/ap_dmd@0:129,raw
/devices/pseudo/ap_dmd@0:130,blk
/devices/pseudo/ap_dmd@0:130,raw
...

7. Use apconfig -R to create the /dev directory links to the new


/devices directory nodes. /dev/ap/dsk and /dev/ap/rdsk links
for each possible partition on each drive will be created, just like
the disks command does for regular disk devices.
# apconfig -R

8. Use the ls command to confirm that the /dev links to the device
nodes have been created.
# ls -l /dev/ap/dsk
total 8
lrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s0 ->
../../../devices/pseudo/ap_dmd@0:128,blk
lrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s1 ->
../../../devices/pseudo/ap_dmd@0:129,blk
lrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s2 ->
../../../devices/pseudo/ap_dmd@0:130,blk

Similar entries will exist for /dev/ap/rdsk.

8-62 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Using the Meta-Devices

You must modify every reference to a physical device node (such as a


path name that begins with /dev/dsk or /dev/rdsk) to use the
corresponding meta-disk device node, the path that begins with
/dev/ap/dsk or /dev/ap/rdsk.

Warning – Remember that you can still access the device through
both physical paths when the meta-device is active if you specify the
physical path name. This may not be safe, as the OS environment as a
whole is not aware that the physical paths are related and can cause
data loss or corruption. To be safe, never access a meta-device through
the physical path unless you are very sure of what you are doing.

Alternate Pathing 8-63


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Using the Meta-Devices

If a partition is currently mounted under a physical path name, it


should be unmounted and remounted under the meta-disk path name.
This can be done by changing the vfstab file and having the meta-
device become active on the next reboot. Do not do this for partitions
on the boot drive.

Note that if you are placing the boot disk under AP control, you will
also need to modify the vfstab file by using the apboot command.
See Appendix B for further information.

AP 2.1 supports mirrored boot drives. This means that you could have
four physical paths to your boot volume, two to each copy.

8-64 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Disk Managers and AP

Using Volume Manager With AP


If you are using Volume Manager (VxVM) with AP, there are a few
points to remember.

● Volume Manager Versions 2.3, 2.4 and 2.5 are supported.

● After creating your disk pathgroups, Enterprise Volume Manager


must rescan the configuration to find the new devices. You must
either reboot or use the vxdctl command. This gets the Volume
Manager to recheck to system disk configuration, and will enable
SEVM to see the new meta-devices.
# vxdctl enable
● Also, when deleting pathgroups, ensure that SEVM is not using
the meta-disks in any way before you perform the deletion.

● DMP in Volume Manager 2.5 will not be installed if AP


metadevices are found in the system. Its use with AP is not
supported.

Alternate Pathing 8-65


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Disk Managers and AP

Disabling DMP
SEVM 2.5 DMP is incompatible with both AP 2.0 and 2.1 (and with
Sun Cluster 2.0 and 2.1). It must be disabled to allow these products to
function correctly.

DMP will automatically disable itself on installation if SEVM is


installed after AP has been installed. (SEVM checks for the presence of
/kernel/drv/ap.)

If SEVM is installed before AP, DMP must be disabled manually. The


procedure is:
# rm -r /dev/vx/dmp /dev/vx/rdmp
# ln -s ../dsk /dev/vx/dmp
# ln -s ../rdsk /dev/vx/rdmp
# rm /kernel/drv/vxdmp

After this, remove the forceload drv/vxdmp line from /etc/system


and reboot the domain.

Using SDS With AP


Only Solstice Disk Suite Version 3.0 is supported with AP. Later
versions are supported only if you are not using AP.

8-66 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Manually Switching the Active Path

Note – You can perform a switch at any time, even while I/O is
occurring on the device. You might want to experiment with the
switching process to verify that you understand it and that your
system is set up properly, rather than wait until a critical situation
occurs.

Warning – When you switch paths, AP does not check that the path is
correct. That is, it does not check to see if the same device is accessed
by each path. It does determine whether or not that path is detached
or off line. You may want to verify the status of the path before
switching to it by using a command such as prtvtoc. AP does not
produce any error or warning messages if you switch to a path that is
not functioning properly. If you switch to a non-functioning path for
your boot disk, your system may crash if the path is not switched back
immediately.

Alternate Pathing 8-67


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Manually Switching the Active Path

1. Use apconfig -S to view the current configuration:


# apconfig -S

c1 pln0 P A
c3 pln1
metadiskname(s):
mc1t5d0
mc1t4d0
mc1t3d0
mc1t2d0
mc1t1d0

2. To perform the switch, use apconfig -P -a, where -P identifies


the pathgroup and -a specifies the path to become active.
# apconfig -P pln0 -a pln1

The syntax is confusing because the primary path and the pathgroup
name are the same. Be careful.

3. Verify the results with the apconfig -S command. You can see
that the active alternate has been switched to pln1.
# apconfig -S

c1 pln0 P
c3 pln1 A
metadiskname(s):
mc1t5d0
mc1t4d0
mc1t3d0
mc1t2d0
mc1t1d0

Note – Remember that switch operations take effect immediately.


There is no commit process for them.

8-68 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Switching Back to the Primary Path

Use the apconfig command, specifying that the primary path is now
to be the active path.
# apconfig -P pln0 -a pln0
# apconfig -S

c1 pln0 P A
c3 pln1
metadiskname(s):
mc1t5d0
mc1t4d0
mc1t3d0
mc1t2d0
mc1t1d0

Do not confuse the pathgroup name with the physical path name. It is
easy to do.

Alternate Pathing 8-69


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Automatic Disk Pathgroup Switching (AP 2.1)

AP 2.1 provides the ability to automatically switch the active path of a


disk pathgroup. This will occur only under two conditions:

● The currently active path has failed

● DR requests the switch (Enterprise 10000 only)

If AP detects that a path has failed, it will be marked with a T in the


apconfig -S output.
# apconfig -S

c1 pln0 P A
c3 pln1 T
metadiskname(s):
mc1t5d0
mc1t4d0
mc1t3d0
mc1t2d0
mc1t1d0

8-70 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Automatic Disk Pathgroup Switching (AP 2.1)

When a path is marked T (tried), AP will not automatically switch to


it. You can reset the tried flag by:

● Rebooting the domain

● Using DR detach and then DR attach the board

● Resetting the flag manually with apdisk -w. Specify the tried path,
not the pathgroup name.
# apdisk -w pln1
#

Note – Resetting the flag manually should only be done after the cause
of the failure has been repaired.

You can still manually switch to a path marked tried with the
apdisk -P command.

Alternate Pathing 8-71


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Deleting a Disk Pathgroup

Note – All usage of the meta-disks in the pathgroup must be


discontinued (for example, the file systems must be unmounted)
before you can delete the pathgroup. If you do not, you will get an
"error 16" which means that a device in the pathgroup is still active.

1. Use apdisk -d, specifying the pathgroup name.


# apdisk -d pln0

8-72 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Deleting a Disk Pathgroup

2. To verify the results, use apconfig -S to view the AP database.


# apconfig -S

c1 pln0 P A
c3 pln1
metadiskname(s):
mc1t5d0 D
mc1t4d0 D
mc1t3d0 D
mc1t2d0 D
mc1t1d0 D
mc1t0d0 D

If the pathgroup was not previously committed, the apdisk -d


command immediately deletes it from the database. If the pathgroup
was previously committed, the deletion is not complete until the next
time you commit the database. In the example above, the pln0
pathgroup was previously committed, so the letter D indicates that it is
marked for deletion. It is not usable in this state.

3. Commit the database with the apdb -C command.


# apdb -C

4. You can verify the deletion with apconfig -S -u.

Reversing an Uncommitted Delete


You can undo a disk pathgroup deletion if the deletion has not been
committed.

To undo a deletion, use apdisk -z, specifying the same disk


pathgroup that you had deleted.
# apdisk -z pln0
#

This restores the pathgroup back to committed status (the D is


removed), and the pathgroup may be used again.

Alternate Pathing 8-73


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

AP and the Boot Disk

To allow for unattended system boot even if the I/O adapter for the
boot disk fails, you can place your domain boot disk under AP control.

Because AP only works with A5000s (Solaris 2.6 only) and SSAs, the
boot device must reside on a drive of one of these types.

If you have encapsulated and mirrored your boot disk with Volume
Manager, AP will attempt to recover from boot device problems before
Volume Manager attempts to use a mirror drive. AP 2.1 will retry
using the mirror device if it has alternate paths.

This discussion applies to both AP 2.0 (for Solaris 2.5.1) and AP 2.1
(for Solaris 2.6) unless otherwise noted.

8-74 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Placing the Boot Disk Under AP Control

1. Create an AP pathgroup for physical path that includes the boot


disk.

2. Run apboot, specifying the boot meta-disk name, to define the


new AP boot device. apboot modifies /etc/vfstab and
/etc/system.

# apboot mc2t0d0

where mc2t0d0 is the meta-disk name of the boot disk.

apboot examines /etc/vfstab and replaces the physical device


name of the boot disk, such as /dev/dsk/c2t0d0sx, with the
meta-disk name, such as /dev/dsk/mc2t0d0sx. It also edits
/etc/system so that the drivers required for AP boot disk usage
are force loaded.

Do not manually replace the physical devices in /etc/vfstab


with meta-disks for the boot disk. Instead, use the apboot
command to ensure that all required changes are made. Just
changing /etc/vfstab will prevent the system from booting.

Alternate Pathing 8-75


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Placing the Boot Disk Under AP Control

3. Set the OBP environment variable boot-device to the physical


path most likely to be used for booting. Do not use multiple device
names from the devalias command, including the other path.

4. Define an OBP devalias for the alternate boot device physical


path in case you need to perform a manual boot from the alternate
path. Set the OBP boot-device parameter to this name. Do not
add it to the boot-device parameter value.

5. At this point, just reboot the system to begin using the AP boot
device.

Warning – If you want to create a new AP database copy after you


have placed the boot disk under AP control, and the new database
copy is to be located on a partition controlled by a pln port that does
not control any of the current AP database copies, you must first
remove the boot disk from AP control. Make sure that the new AP
database has been created. Then place the boot disk under AP control
again. Failure to follow this procedure may cause the AP database to
become inaccessible during boot.

Caution – If you place the boot disk under AP control, you must
! manually edit /etc/vfstab to also place other file systems that are
mounted during the boot process under AP control.

In the /etc/vfstab file, you must change the device to mount and
device to fsck paths for all of the other mount points that you want
to place under AP control.

8-76 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Removing AP Support From the Boot Disk

To remove AP support from your boot disk, use the apboot command
to specify a physical device node. For example, to use a non-AP device,
use

# apboot c2t0d0

apboot will also edit the /etc/system file to remove the force
loading of the AP kernel driver modules, because they are no longer
immediately needed when the boot disk is not an AP device.

Warning – If you place the boot disk under AP control and later
decide to remove the AP packages, you must first use apboot to
remove the boot disk from AP control. If you do not, the system on
that disk becomes unbootable.

Alternate Pathing 8-77


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Using apboot With a Mirrored Boot Disk (Solaris 2.6 Only)

The apboot command allows you to access a single boot drive through
a physical alternate path. If you want to mirror your boot drive, that is,
if you have to separate drives containing copies of the boot drive, you
can still use AP with them. The mirror must be built using the meta-
disk names for the devices.

The process works with either Sun Enterprise Volume Manager or


Solstice Disk Suite.

Make sure that you have the AP 2.1 support patch on the SSP.

8-78 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Using apboot With a Mirrored Boot Disk (Solaris 2.6 Only)

Telling AP About the Mirror


To make AP aware of both drives:

1. Both boot drives must be on AP meta-disks. They should be in


separate pathgroups.

2. Use apboot as previously described to put one of the drives under


AP control

3. Tell the system about the mirrored drive using apboot -m.

# apboot -m mc5t3d0

4. Create the mirror using the meta-disks if you have not already
done so with your disk management software.

5. Do not put either of the mirrors into the OBP boot-device


parameter.

Removing the Mirror Information


To remove a mirror from AP:

1. Undefine the mirror drive using apboot -u.

# apboot -u mc5t3d0

2. Make sure that your OBP boot-device parameter is still correct.

Alternate Pathing 8-79


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

The AP Recovery Boot Sequence

If the boot disk is on an AP drive, the AP system can automatically


switch to the other alternate if the specified path fails to boot. This
section describes the switching process.

1. By default, the system is booted from the device and path


specified in the OBP devalias boot-device. Note that this device
path may be different from the previously active alternate for the
boot disk.

2. OBP sends the specified boot-device path to the boot disk on the
SSP.

3. If a failure occurs, it is detected after about one minute. Then the


AP Reboot Host program is executed on the SSP.

8-80 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

The AP Recovery Boot Sequence

Caution – One or two minutes may pass before visible action is taken,
! so do not immediately intervene if you notice that the boot process has
failed.

If you attempt a manual recovery from a boot failure, be aware that


the automatic reboot recovery process will still be executing and may
override or interrupt your manual recovery commands.

4. The AP Reboot Host program retrieves the alternate path


information stored earlier by the OBP and sends this path to the
AP SSP daemon.

5. The AP SSP daemon looks up the alternate path for the boot disk
in the AP SSP database, then retries the boot process with the
other alternate path.

6. After the reboot succeeds, AP on the host determines the alternate


from which the system was booted and makes it the active
alternate.

7. If necessary, for Solaris 2.6, if a boot device mirror was defined to


AP, AP will retry using the mirrored boot device paths.

If you change your disk configuration, remember to rerun the apboot


command. The SSP must always have current information.

Alternate Pathing 8-81


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Using AP in Single-User Mode

Normally, when your domain is up, you use the AP command


executables in /usr/sbin. However, if your domain comes up in
single-user mode because the boot process encountered a problem,
you can use the commands in /sbin.

The executables in /sbin do not use the AP daemon’s services, which


are not available until run level 2. If the system enters single-user
mode because of a problem related to AP, you may be able to fix the
problem by using the /sbin version of the commands.

Two AP-related problems that may cause the system to come up in


single-user mode are:

● According to the AP SSP database, two paths are supposed to lead


to a disk that needs to be mounted during the boot process, but
those paths actually lead to different disks.

This can only happen if you have changed the physical


configuration of the pathgroup without running the proper AP
commands to update the AP database.

8-82 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Using AP in Single-User Mode

● An active alternate for a disk, other than the boot disk, turns out to
be inaccessible and that disk is required during the boot process.

Only the boot drive will automatically have an alternate path tried
if the active alternate fails. All other drives required at boot time
(for example, they are listed in /etc/vfstab) must be switched
manually if their active alternate is not available.

These situations will occur only with disks, not network interfaces. In
either case, however, you may be able to use the AP commands in
/sbin to resolve the problem.

Alternate Pathing 8-83


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Lab

1. If they are not already installed, install the AP packages in your


domain.

____________________________________________________________

____________________________________________________________

2. Create two AP database copies, one on each system board.

____________________________________________________________

____________________________________________________________

3. Verify the requirements to set up alternate paths for disks and the
network interface. Make sure that your domain is configured
properly to create the network and disk pathgroups.

____________________________________________________________

____________________________________________________________

4. Create a meta-device for the primary network interface, using the


current primary path and one from the second system board.

____________________________________________________________

____________________________________________________________

5. Reconfigure to use the new meta-device as the primary network


interface. Use whichever of the three methods you want to activate
the new meta-network interface.

____________________________________________________________

____________________________________________________________

6. Switch the active alternate and verify continued access to the


network.

____________________________________________________________

8-84 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8
____________________________________________________________

7. Confirm the presence of the same A5000 or SSA on two paths in


your domain. Use the apinst and ssaadm commands.

____________________________________________________________

____________________________________________________________

8. Create a pathgroup for the array. Remember to run apconfig -R.

____________________________________________________________

____________________________________________________________

9. Use apboot to alternate path your boot drive if it is on the array


and reboot. If not, mount a slice from the array to the domain.

____________________________________________________________

____________________________________________________________

10. Switch the active alternate and verify access to the drive.

____________________________________________________________

____________________________________________________________

11. If you are using AP 2.1, disconnect the active device path, if
possible. Watch what happens. Restore the path and reset the T
flag.

____________________________________________________________

____________________________________________________________

12. Check the state of the AP databases.

____________________________________________________________

____________________________________________________________

Alternate Pathing 8-85


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Describe the concepts of Alternate Pathing (AP)

❑ Discuss the supported device types

❑ List the AP device restrictions

❑ Install and set up the AP software

❑ Use the AP command-line commands

❑ Configure alternate network paths

❑ Configure alternate disk array paths

❑ Configure alternate paths for the boot drive

8-86 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
8

Think Beyond

What other types of devices might need to be supported by Alternate


Pathing? Why aren’t they?

What would happen if the AP databaes weren’t on raw disk slices?

Why are the AP database locations recorded in /etc/system?

Why aren’t network pathgroups switched automatically?

Could you use Alternate Pathing to help improve your domain’s


performance? If so, how?

Alternate Pathing 8-87


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Dynamic Reconfiguration 9

Course Map
This module covers the operation, configuration, and management of
Dynamic Reconfiguration. It discusses the system requirements and
procedures for both DR Attach and DR Detach, interaction with AP,
and the restrictions and problems you may encounter during the DR
process.

9-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Relevance

For Discussion – The following questions are relevant to


understanding the content of this module:

1. What does the OS need to do to support DR?

2. What does the hardware need to do to support DR?

3. When would you want to use DR?

4. When might you not want to use DR?

9-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Objectives

Upon completion of this module, you will be able to:

● Describe the requirements for dynamic configuration.

● List the DR process steps for attach and detach.

● Discuss the restrictions and problems that can occur with DR.

● Display DR information from both dr and hostview.

● Perform a DR attach from both dr and hostview.

● Perform a DR detach from both dr and hostview.

● Solve problems that prevent DR from succeeding.

● Manage AP and DR interaction.

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Sun Enterprise 10000 Dynamic Reconfiguration User’s Guide

● Sun Enterprise 10000 Alternate Pathing 2.1 User’s Guide

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

● Sun Enterprise 10000 System Overview Manual

● The man pages for the commands and files

Dynamic Reconfiguration 9-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Dynamic Reconfiguration Capabilities

Dynamic Reconfiguration (DR) enables you to logically add (DR


Attach) or remove (DR Detach) an entire system board to or from a
running domain, allowing you to reconfigure your domain hardware
environment while the OS is running.

DR (and power) operations in one domain do not affect other domains,


except for the restriction that only one DR operation can be in progress
across the entire platform at a time.

Once you have removed the board from the domain, you may power it
off and physically remove it from the system. When you add a new
board, after installing it and powering it on, you can add it to a
domain.

You must add or remove one entire board to or from the domain at a
time. There is no DR mechanism for removing board components from
a domain, nor for DR operations involving multiple boards.

9-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DynamicReconfiguration Capabilities

When to Use DR
You might reconfigure the system for several different reasons.

● Board Addition – You are adding a system board to a domain or a


new board to the system as a whole.

● Board Deletion - You are removing a system board from the


domain or the system as a whole.

● Board Replacement – You are replacing a system board with


another or have removed the system board to add or remove
components such as processors, memory, SBus cards, or I/O
devices.

● New Domain Creation - You are splitting system boards from one
or more existing domains or deleting domains entirely, to add the
system boards to a new or existing domain. Remember that it may
be easier to halt the old domain, then delete and re-create it to free
up system boards, rather than do multiple DR operations.

Dynamic Reconfiguration 9-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Dynamic Reconfiguration

Adding a new system board (DR attach) is easier than removing a


system board (DR detach).

DR attach adds the specified system board to a domain, which may be


already running the OS. When the board is added, and the board’s
resources and interfaces are configured to the operating system, and
the OS uses them without any difficulty.

When removing a board from a running domain, the OS must stop


using all of the board’s resources:

● Unschedule all work from the processors

● Remove all needed data from the board’s memory

● Stop using all I/O interfaces connected to the board

Depending on the use the OS makes of these resources, this may be a


time-consuming or even an impossible operation.

All DR operations are performed from the SSP, which interacts with
the control board and the DR daemon in the host domain.

9-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

dr-max-mem (Solaris 2.6)

dr-max-mem is used to signal the OS that the domain may do DR


operations.

For Solaris 2.6, it is a binary switch that enables or disables the


domain’s ability to do DR operations. Set to zero, no DR operations,
attaches or detaches, are permitted. Set greater than zero, DR
operations are permitted.

You can set dr-max-mem from the ok prompt with setenv or from the
OS with the eeprom command. Both methods require a bringup of the
domain before they take effect. You can also set dr-max-mem from the
ok prompt with setenv-dr, which only requires a reset (reboot) to
take effect since it makes the change in memory as well as on the SSP.

Its meaning for domains running Solaris 2.5.1 is significantly different.

Dynamic Reconfiguration 9-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

dr-max-mem (Solaris 2.5.1)

The Solaris kernel has a number of memory-related data structures


such as page structures, that are statically allocated at boot time. The
size of these data areas is based on the amount of physical memory in
the domain at boot time.

You can use DR attach to dynamically add a board and its physical
memory after the domain is booted. However, the extra memory
cannot be used unless enough memory data structures were allocated
at boot time to support it. In Solaris 2.5.1, they cannot be extended
dynamically after boot time.

To ensure that enough memory data structures to support DR attach


operations are built at boot time, the OBP environment variable
dr-max-mem can be used to specify the maximum size of physical
memory that the domain should support without requiring a reboot.

9-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

dr-max-mem (Solaris 2.5.1)

Caution – If dr-max-mem is specified as zero, neither attaches nor


! detaches may be performed in the domain.

To calculate the proper value for dr-max-mem, combine the amount of


memory most likely to be added during DR attaches to the current
memory size of the domain, then set dr-max-mem to the total.

If you add system boards to the domain totaling more memory than
dr-max-mem specifies, only the boards’ processors and I/O devices can
be attached (without a reboot). Memory on these system boards sits
idle.

Dynamic Reconfiguration 9-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

dr-max-mem (Solaris 2.5.1)

Considerations
Do not set dr-max-mem too high. At least 8 Mbytes of system memory
is consumed for each 1 Gbyte of memory that the domain is to
support. If you never attach the memory, the reserved memory used
for the page structures is wasted.

Also, if dr-max-mem is too large relative to the current memory in the


domain, its size can impact the performance of the operating system
by leaving insufficient memory for OS uses. To prevent this, the
system limits the value of dr-max-mem based on the amount of
memory present in the domain at boot time.

9-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

dr-max-mem (Solaris 2.5.1)

Considerations
If the value of dr-max-mem is set smaller than the amount of physical
memory present when the domain is booted, the operating system
instead uses a value equal to the current memory size. This means that
you cannot attach more memory to the domain, but you can detach
and then reattach up to the current amount of memory. The maximum
amount of memory you can reattach is the amount that was present
when the domain was booted. Additional memory attached is ignored.

In any case, the value of the actual dr-max-mem parameter is not


modified.

Actual Effective
dr-max-mem dr-max-mem Effect on DR
Setting Setting

0 0 No DR
Memory size < 0 No DR
512 Mbytes
< memory size Memory size Cannot increase
memory size Memory size Cannot increase
> memory size Value set Increase to limit
set

Dynamic Reconfiguration 9-11


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

dr-max-mem Usage by Solaris 2.5.1

dr-max-mem must be set before the domain is booted.

To set the dr-max-mem, from the OBP prompt for the domain type:
ok setenv-dr dr-max-mem NNNNN

where NNNNN is the total number of megabytes of memory to be


supported by the domain after all likely system boards are attached.
The value of dr-max-mem persists across domain reboots (like any
OBP environment parameter) and is only applicable to that particular
domain.

Note – Remember that if you use setenv or the eeprom command,


you must rerun bringup. setenv-dr enables you to boot without a
fresh bringup since it makes the change in memory as well as on the
SSP. You must still reboot Solaris.

9-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

dr-max-mem Usage by Solaris 2.5.1

At boot time, if the dr-max-mem is nonzero, you will see the following
messages:
DR: current memory size is XXXXX MBytes
DR: capacity to allow an additional YYYYY MBytes of memory

where XXXXX is the amount of physical memory currently available


to the operating system, and YYYYY is the difference between XXXXX
and dr-max-mem.

When a system board with memory is successfully attached or


detached, this message is displayed:
DR: capacity to allow an additional ZZZZZ MBytes of memory

where ZZZZZ represents the remaining amount of memory that can


still be attached.

Dynamic Reconfiguration 9-13


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR Attach

Requirements
To be able to attach a system board to a domain:

● The target system board must be installed.

● The system board must be powered up.

● The system board must not belong to any domain.

● The system board must contain at least one processor.

● dr-max-mem must be set to a nonzero value prior to booting


Solaris.

If all of these requirements are not met, you will not be able to do the
DR attach.

9-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR Attach

Requirements
DR will block the attach if its requirements are not met. You can ask
DR if the requirements are met before starting the attach.

The domain dr_daemon tracks the state of the attach operation. For
example, once the Init Attach operation is completed successfully, the
daemon remembers this state. You can return to an unfinished DR
operation later and complete or abort the attach at that time.

Dynamic Reconfiguration 9-15


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR Attach

Operation
DR attach is a two- and sometimes a three-step process. The first two
steps are always required. The process is the same from hostview or
from the dr shell.

1. init_attach – Prepares the board to be attached to the domain.

● Updates the domain_config file.

● Runs hpost on the system board.

● Loads obp_helper to initialize the processors.

● Updates the centerplane to add the board to the domain.

● OBP probes the board devices and builds the device tree.

● The kernel builds initial /device nodes.

9-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR Attach

Operation
2. complete_attach – Gives the board to the OS.

● The kernel brings the new processors on line.

● New memory is added to the system.

● All previously known I/O devices are configured.

3. reconfig – New I/O device configuration (if required).

● Run if the domain has not seen the devices before.

● Updates the /devices and /dev directories.

● Can be run from hostview or the command line.

On completion, you will probably want to run apconfig -F to


synchronize the AP database with the state of the domain.

Dynamic Reconfiguration 9-17


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Attaching a Board With dr

Note that the entire DR attach operation is run from the SSP, even
though it performs work in the domain.

1. Run the dr command in an SSP window to start the dr shell.


SUNW_HOSTNAME must be set to the name of the domain for which
you are performing the DR operation.

You will get a prompt from the dr shell.


ssp:domain% dr

Checking environment...
Establishing Control Board Server connection...
Initializing SSP SNMP MIB...
Establishing communication with DR daemon...

xf3: System Status - Summary

BOARD #: 0 1 2 5 6 8 9 10 11 13 physically present.


BOARD #: 4 7 being used by the system.
dr>

9-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Attaching a Board With dr

2. Run init_attach to attach the designated board. For this


example, Board 6 is being attached to a domain called xf3.
dr> init_attach 6
Initiate attaching board 6 to domain xf3.
Adding board 6 to domain_config file.
/opt/SUNWssp/bin/hpost -H40,28
Opening SNMP server library...

Significant contents of /export/home/ssp/.postrc:


blacklist_file ./bf
Reading centerplane asics to obtain bus configuration...
Bus configuration established as 3F.
phase cplane_isolate: CP domain cluster mask clear...
phase init_reset: Initial system resets...
phase jtag_integ: JTAG probe and integrity test...
phase mem_probe: Memory dimm probe...
phase iom_probe: I/O module type probe...
phase jtag_bbsram: JTAG basic test of bootbus sram...
phase proc1: Initial processor module tests...
phase pc/cic_reg: PC and CIC register tests...
phase dtag: CIC DTAG tests...
phase mem: MC register and memory tests...
phase io: I/O controller tests...
phase procmem2: Processor vs. memory II tests...
phase lbexit: Centerplane connection tests...
phase npb_mem: Non-Proc Board MC and memory tests...
phase final_config: Final configuration...
Configuring in 3F, FOM = 2048.00: 4 procs, 4 SCards, 1024 MBytes.
Creating OBP handoff structures...
Configured in 3F with 4 processors, 4 SBus cards, 1024 MBytes memory.
Interconnect frequency is 83.294 MHz, from SNMP MIB.
Processor frequency is 166.631 MHz, from SNMP MIB.
Boot processor is 6.0 = 24
POST (level=16, verbose=20, -H28,0040) execution time 3:07
hpost is complete.
obp_helper -H -m24
Board debut complete.
Reconfiguring domain mask registers.
Board attachment initiated successfully.
Ready to COMPLETE board attachment.Abort or complete the attach
operation.

Dynamic Reconfiguration 9-19


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Attaching a Board With dr

3. After init_attach is finished, you can use drshow to see an


inventory of the board’s physical components.
dr> drshow board_number OBP

4. To complete the attach operation, use complete_attach.


dr> complete_attach 6
Completing attach for board 6.
Board attachment completed successfully.
dr>

5. After using complete_attach, the OS is aware of the new system


board. You can use drshow to display the logical I/O information
for the newly attached board. See the image above.

6. If you want, use the reconfig command to configure any new


devices into the OS. This is not necessary if the devices have been
attached to the domain before.

9-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Attaching a Board With dr

7. Use exit to terminate the dr shell.


dr> exit
ssp:domain%

The SSP shell prompt is again displayed.

8. Run apconfig -F to synchronize the AP database copies.

Aborting the Attach Operation


If you want to cancel the attach after you have called init_attach but
before complete_attach, use the abort_attach command.
dr> abort_attach board_number

System Failures
If the domain fails during the DR operation, it is frozen in its current
state. You may need to run bringup to clear the operation.

Dynamic Reconfiguration 9-21


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Attaching a System Board With hostview

With hostview, you perform the same steps to attach a system board
that you do with dr. hostview will fill in some defaults and command
fields for you, and give you the ability to track the progress of the
operation graphically.

hostview Attach Buttons


When you perform an attach operation using the hostview GUI
program, the following buttons appear at various times during the
attach process:

● Init Attach – Begins the attach operation. Once the operation


has been completed successfully, the label on this button changes
to complete.

● complete – Completes the attach operation.

● reconfig – Automatically reconfigures the device directories in


the domain.

● abort – Cancels the attach operation. This button is enabled after


the Init Attach operation has successfully completed.

● dismiss – Aborts the currently active step, and leaves the board in
its current state (Present, Init Attach, or In Use).

● help – Summary information for the DR attach operation.

9-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Attaching a System Board With hostview

1. From hostview, choose Configuration ➤ Board ➤ Attach.

2. Select the board to attach in the main hostview window.

3. Click on the top Select button, and the Board and Source Domain
fields will be filled in for you.

The source domain is the domain that the board currently belongs
to. If the board is not a member of any domain, the source domain
name will always be no_domain. This is filled in for you.

4. Fill in the target domain name or, in the main hostview window,
select the domain to which you want to attach the board.

To do this, select any board that is already a member of that


domain, then click the bottom Select button to fill in the Target
Domain field.

5. Click Execute.

If any errors occur, the error messages appear in the window.

Dynamic Reconfiguration 9-23


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Attaching a System Board With hostview

6. If there are no errors, the Dynamic Reconfiguration window is


displayed.

7. Click on init attach.

Clicking on init attach begins the first phase of the board attach
process. When this phase is complete, the caption on the button
changes to complete.

The Init Attach operation may take a few minutes to complete,


because it runs hpost on the system board. The output from
hpost is shown in the window.

9-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Attaching a System Board With hostview

8. At this point, the system board is ready to be used by the OS. You
release it to the OS by clicking on complete.

The complete operation may take a minute or so to finish.

When it has been successfully completed, DR displays the following


message:
Board attachment completed successfully

The system board resources (processors, memory, and I/O devices) are
now available to the operating system.

9. Click on dismiss.

The DR attach operation is complete.

You should run apconfig -F from the command line to resynchronize


the AP state database copies with the domain.

Dynamic Reconfiguration 9-25


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

I/O Device Reconfiguration After a DR Operation

The DR user interface enables you reconfigure the system after a DR


attach or DR detach operation. This reconfiguration sequence is
exactly the same process done by the reconfiguration boot sequence
(boot -r). The reconfiguration sequence is usually required after a DR
attach to enable the OS to recognize the new I/O devices, if the
devices have not been in the domain before. It is not required, but may
be useful, after a DR detach operation, if the devices will not be
returned to the domain.

To perform a reconfiguration sequence, either choose the reconfig


button in hostview or enter the following sequence of commands:
# drvconfig; devlinks; disks; ports; tapes

When the reconfiguration sequence is executed after a board is


attached, device path names not previously seen by the system are
added to the /etc/path_to_inst file. The same path names are also
added to the /devices hierarchy, and the appropriate links to them
are created in the /dev directory. This is equivalent to the processing
that boot -r would perform.

9-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

I/O Device Reconfiguration After a DR Operation

Disk Devices

Caution – Disk controllers are numbered consecutively as they are


! encountered by the disks program. All disk partitions are assigned
/dev names according to the disk controller number that disks
assigns. Unused controller numbers will be reused.

When reconfig is executed after a board is detached, the /dev links


for all the disk partitions on the detached board are deleted. The
remaining boards retain their original numbering. New disk controllers
on a newly attached board are assigned the next available lowest
number by disks.

For example, suppose there are four system boards numbered 0 to 3.


You detach boards 1 and 2, then reattach board 2. If you now execute
disks, controller numbers from board 1 are reassigned to the
controllers on board 2 if the old board 1 controller numbers are the
next available lowest numbers.

Dynamic Reconfiguration 9-27


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Viewing System Information

Both dr and hostview enable you to display information about the


system board selected during DR operations and view the suspend-
unsafe devices. The information available is the same for both dr and
hostview.

For dr, this information is accessible using the drshow command.

From hostview, this information is available by clicking the cpu,


memory, device, obp, and unsafe buttons in the Attach or Detach
windows. Note that the cpu, memory, and device displays are only
enabled when the board is attached to the operating system.

9-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using drshow

The drshow command is used only within the dr shell. It provides a


description of the current state of the domain resources.

This example shows the drshow IO command for a system board


being drained for detach.

Dynamic Reconfiguration 9-29


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using drshow

Syntax

drshow UNSAFE [interval [count]]


drshow sb [report_type] [interval [count]]
drshow ALL [report_type] [interval [count]]

Where

● UNSAFE – Shows the domain’s current set of suspend-unsafe


devices

● sb or ALL – Specifies which system board (or all boards) to display.

● report_type – Is CPU, IO, OBP, MEM, or DRAIN (type of


information to display). The default report type is CPU.

● interval – Specifies how often to display the information.

● count – Specifies how many times to display the information.

Note – If you specify interval or count incorrectly, the only way to


terminate the drshow output is to use Control-c, which will also
terminate the dr shell, aborting any DR operation in progress. The
state of the DR operation remains as it was before the Control-c.

9-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using hostview

When you are using hostview to do a DR operation, you can view the
system information by using the buttons shown in the image.

If you click on All, all of the currently available items are displayed.

Processor Configuration Information


To view processor configuration information, click on the cpu button.

For each processor on the selected board, the window shows the
numeric ID, processor status (Online or Offline), and any bound
thread information.

Threads can be bound to a processor by using the pbind command.


Some operating system device drivers may bind threads to processes
to provide better servicing of a device. The number of user and system
threads and the process IDs of the bound threads are displayed. You
will always see one system thread, the CPU idle thread, even if the
processor is inactive.

Dynamic Reconfiguration 9-31


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using hostview

Memory Configuration Information


To view memory configuration information, click on memory

The Memory Configuration window is divided into system memory


information and information about a particular board. If a drain
operation is in progress, information showing the status of the
memory drain is also displayed.

Interleave board specifies which other system board this board is


interleaved if you have specified eight-way memory interleaving
(mem_board_interleave_ok in .postrc).

9-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using hostview

Memory Configuration Information


The system memory display includes:

● Current System – The total size of memory in the domain from


all boards.

● Attach Capacity – The amount of memory that can currently be


added to the domain by a DR attach operation (dr-max-mem less
the current system memory).

● dr-max-mem – The value of the OBP variable that determines the


maximum amount of memory the system can support.

The board memory window shows:

● The amount of memory on the selected board.

● The board it is interleaved with (if any).

● The highest and lowest physical pages that reside in this board’s
memory.

The memory drain display will show one of the following states:

● Unavailable – A suitable target memory area is not currently


available.

● Estimated – The estimated values reflect the memory


configuration if the drain operation were started at this time.

● In Progress – The drain operation is in progress.

● complete – The drain operation is complete; the board’s memory


is free.

Dynamic Reconfiguration 9-33


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using hostview

Memory Configuration Information


The displayed values are:

● Reduction – The amount of memory that will be taken from the


domain when the board is detached.

● Remaining in System – The system memory size after the board


is detached.

● Percent Complete – How far the drain operation has progressed.

As an aid in tracking drain progress, the drain operation start time and
current time are also displayed.

9-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using hostview

Device Configuration Information


To view the device configuration information, click on the device
button.

The controllers or devices installed in each slot are listed. Devices are
listed by the instance number (for example, sd31). You can use
/etc/path_to_inst to decode these if necessary.

This display shows the devices that are logically present to the OS; it
does not always show all the devices that are physically present on the
board. For example, controllers whose drivers are unattached will not
appear in the list.

The physical device display that is available using the obp button
shows all of the cards on the board that were configured in the system.

Dynamic Reconfiguration 9-35


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using hostview

Device Configuration Detail


For more detailed information, highlight one or more controller(s) and
choose Detail. The current usage information for each device is shown.

The window includes an open count (if available) and the name by
which the device is known to the OS. This might be a disk partition,
meta-device, or an interface or instance name. Additional information
may be provided including the partition mount points, network
interface configuration, swap space usage, and meta-device usage.

There are some forms of device usage which may not be reported.
Examples are the raw disk partitions used for Solstice DiskSuite,
Alternate Pathing databases, and Sun Volume Manager.

If a controller or network interface is part of the AP database, the


window indicates whether it is an AP alternate and whether it is
active. For active AP alternates, the usage of the AP meta-device is
also displayed.

9-36 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using hostview

OBP Configuration Information


The OBP window contains board configuration information obtained
from the OBP device tree. This information is less detailed than that
available from the other windows described here, because it is at a
purely physical level.

For example, in the Init Attach state, only the I/O adapters known
are—not the devices attached to them or the memory interleave
configuration. The OBP window is usually most useful when a board
is in the Init Attach state.

Dynamic Reconfiguration 9-37


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

System Information Using hostview

Viewing Suspend-Unsafe Devices


To see a list of the suspend-unsafe devices across the entire domain—
not just those on the selected system board—click on the unsafe
button.

This display shows the suspend-unsafe devices that are currently open
(in use). This information is useful for determining the cause of
operating system quiesce errors from unsafe devices. In this example,
no unsafe devices are open.

9-38 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR Detach

Requirements
To be able to detach a system board from a domain:

1. dr-max-mem must be set to a nonzero value before Solaris is


booted, which confines all non-pageable kernel and OBP
(permanent) memory to one system board. This board is usually
the lowest numbered board in the domain.

2. Any alternate paths or mirrors of vital file systems and network


interfaces must be appropriately configured not to use the target
board.

3. Devices hosted by the target board must be explicitly closed before


completing the detach.

Dynamic Reconfiguration 9-39


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a System Board

4. There must be sufficient swap space. All pageable memory is


flushed to swap during the drain, and you may be detaching swap
devices, so there must be enough remaining swap resources to
replace the lost swap space and memory.

In addition, you must have enough additional swap space to


complete the drain operation.

Warning – DR does not check to see if you will have enough swap
space before starting the detach. You can use DR to determine how
much memory needs to be drained from board, then use swap -l to
determine if the current amount system swap space is sufficient. The
system will hang is there is insufficient space.

5. Detach-unsafe devices must be closed or their drivers unloaded


with the modunload command.

6. Suspend-unsafe devices must be closed, their drivers unloaded


with the modunload command, or manually suspended.

DR will disallow the detach request (at some point) if its requirements
are not met.

9-40 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a System Board

System boards in use by the OS can be detached if they meet the


requirements previously discussed. Once you select a board, you
detach that board by performing two operations: Drain and Complete
Detach. You may also run reconfig, as for DR Attach, if you want.

drain
The primary function of the drain operation is to empty all of the
physical memory on the board being detached. If sufficient remaining
memory or swap space is not available when the drain operation is
requested, the request fails.

hostview and dr are available to monitor the drain operation. You can
view the current status of the drain operation including the number of
memory pages remaining to be emptied. The drain operation is
complete when all memory on the detaching board is free.

If you decide not to proceed with the detach operation, you can abort
the operation, and the board's memory is returned to regular usage.

Dynamic Reconfiguration 9-41


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a System Board

complete_detach
Before the detach operation can be completed, you must terminate all
use of board resources (processors, memory, and I/O devices). DR
terminates the use of memory, processors, and network devices
automatically, while you must manually terminate the use of all non-
network I/O devices.

When a board is detached, any processes bound to its processors are


automatically unbound. You can use pbind to rebind them to other
processors.

reconfig
The reconfig step runs the drvconfig command and then the disks,
tapes, ports, and devlinks commands, deleting the removed
devices from the system’s configuration directories. Do not run
reconfig if you will later be returning the devices to the domain.

9-42 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Finishing the Complete Detach Operation

When all board memory usage is terminated, you can try to complete
the detach process with the complete_detach operation.

The complete_detach command may fail for several reasons:

● A device is still in use – The detach operation will report which


one. Start the complete_detach operation again after you have
finished using the device.

● Abort – You can abort the detach operation at any time before it
completes.

If you abort the detach, the board's memory is returned to the OS and
all detached board devices are reattached.

Dynamic Reconfiguration 9-43


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Finishing the Complete Detach Operation

Note – If the system configuration was modified to allow a board’s


removal, for example, file systems were unmounted or network
interfaces were unplumbed, you must restore their proper operating
state if the drain operation is aborted.

The board is now available for any desired use. You may:

● Attach the board to another domain.

● Remove it from the system by hot swap.

● Leave it in the system, unattached to a domain.

● Reattach it to this domain later.

9-44 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Configuring for DR Detach

Because you are removing resources from the OS while it is running,


the resources must be in a specific state before they can be removed.

If the OS is configured such that the resources cannot be freed, the


board cannot be detached. If you will be detaching system boards, you
must configure the OS to avoid these problems.

Enabling DR Detach
DR detach requires that the OBP parameter dr-max-mem be set to a
nonzero value. This setting is required at the time the domain is
booted. If the value is zero, you will not be able to perform any DR
operations on the domain.

Dynamic Reconfiguration 9-45


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Configuring for DR Detach

I/O Devices
To be able to remove system boards containing critical system
resources, the system must be properly configured.

You will need to plan ahead for this. If you can not end usage of even
a single device attached through the board, you cannot remove the
board.

This means that critical system disks must be configured with


Alternate Pathing or disk mirroring, or both. If the root or /usr
partition is on a disk attached to a controller on the board to be
removed, the board cannot be detached unless there is a hardware
alternate path to the disk (using apboot) or the disk is mirrored using
controllers on other system boards.

The same applies to network controllers. The board that hosts the
interface that connects the SSP to the domain cannot be detached
unless an alternate path exists on another board.

9-46 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Configuring for DR Detach

I/O Devices
A board hosting non-vital or replaceable system resources can be
detached whether or not there are alternate paths to the resources.
There are still a series of requirements that must be met:

● All of the board's devices still must be closed before the board can
be detached.

● All of its file systems must be unmounted.

● You must have the system discontinue using all of its raw
partitions on the drives being removed.

● All of its swap partitions must be deleted.

These actions are not done by the system; they must be done manually
before the board can be detached. You may have to kill processes that
have open files or devices using the board’s resources.

Dynamic Reconfiguration 9-47


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching Network Devices

DR automatically terminates usage of all network interfaces on the


board that is being detached. When you run the complete_detach
operation, the dr_daemon identifies all of the configured network
interfaces on the board being detached and issues the following
ifconfig commands to each interface:
ifconfig interface down
ifconfig interface unplumb

FDDI
If FDDI interfaces are detached, DR kills the FDDI network monitoring
daemon before performing the detach operation, and then restarts it
after the detach is complete.

Note – The /usr/sbin/nf_snmd daemon for nf devices is neither


started nor stopped when a board that contains a FDDI interface is
attached.

9-48 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching Network Devices

Causes for DR Failure


The detach operation fails and the interface is left configured if an
interface on the board is:

● The active primary network interface for the domain.

● On the same subnet as the SSP host for the system. Because DR
operations are initiated on the SSP, control of the detach process
would be lost.

● For Solaris 2.5.1, the active alternate for an Alternate Pathing (AP)
meta-network device when the AP meta-device is plumbed.
(Manually switch the active path to one that is not on the board
being detached.)

Note – If the detach operation fails during complete_detach, any


deconfigured network interfaces are not reactivated.

Dynamic Reconfiguration 9-49


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching Non-Network Devices

All non-network devices must be closed before they can be detached.

In the hostview device display and in the drshow I/O listing, there is
an open count field that indicates how many processes are using a
particular device. To see which processes have these devices open, use
the fuser command.

9-50 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching Non-Network Devices

You must perform the following tasks for non-network devices:

● Unmount file systems, including Solstice DiskSuite meta-devices


that have a partition hosted from a controller on the board.

Note – Unmounting file systems may affect NFS client systems.

● If Alternate Pathing or Solstice DiskSuite mirroring (or both) are


used to access a device connected to the board, reconfigure them
so that the device or network is accessible from controllers on
other system boards.

● Remove Solstice DiskSuite databases from board-resident


partitions.

● Remove any private regions used by Volume Manager by


deporting the disk groups.

● Any A3000 controllers on the board that is being detached must be


idled or taken off line first, using the rm6 or rdacutil commands.

● Remove any swap files or partitions from the swap configuration.

● Either kill any process that directly opens a device or raw partition
from the board, or direct it to close the open device.

● If a detach-unsafe device is present on the board, close all


instances of the device, and use modunload to unload the driver.

Dynamic Reconfiguration 9-51


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR Detach-Safe Devices

Not all device drivers support DR. This means that devices managed
by those drivers will require special attention during DR detach.

To support DR, a driver must be able to perform two specific


operations: DDI_DETACH, and DDI_SUSPEND and DDI_RESUME. These
two functions impact DR in different ways.

DDI_DETACH support is required to be able to detach a device from the


OS. It means that the driver can close the device, like it had never used
it at all. DDI_SUSPEND and DDI_RESUME must be supported if the
device must be quiesced.

You can detach a system board that hosts a device only if the driver for
that device supports the DDI_DETACH interface or if the device driver
is not currently loaded into memory. A driver that supports
DDI_DETACH is called detach-safe; a driver that does not support
DDI_DETACH is called detach-unsafe.

9-52 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR Detach-Safe Devices

An example of a detach-unsafe device is the Token-Ring adapter.

If a device is detach-unsafe, the only way to detach a board that has a


device managed by that driver is to remove the driver from memory.

Declaring a Driver Detach-Safe


If you want to add a device to your system and you know that the
device and its driver can be safely detached, you should add the
device name to the detach-safe list in the /etc/system file. This list
enables you to add devices to the list already maintained by the
system.

Caution – If you are not sure whether a device can be safely detached,
! ask your service provider or vendor. Do not add it to the list first.

Add new detach-safe device drivers to the /etc/system file in the


following format, where driverx represents the device driver module
name:
set dr:detach_safe_list1=”driver1 driver2 ... drivern”
set dr:detach_safe_list2=”driver1 driver2 ... drivern”
set dr:detach_safe_list3=”driver1 driver2 ... drivern”
set dr:detach_safe_list4=”driver1 driver2 ... drivern”
set dr:detach_safe_list5=”driver1 driver2 ... drivern”

The /etc/system file can contain up to five detach-safe strings, each


no more than 128 characters long. Do not exceed 128 characters.

Dynamic Reconfiguration 9-53


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Unloading a Loaded Detach-Unsafe Driver

To be able to detach a system board with an active detach-unsafe


device, you must first unload the device driver for that device.

1. Stop all usage of the controller for the detach-unsafe device, and
stop all other controllers of the same type on all boards in the
domain.

Caution – Because the detach-unsafe driver must be unloaded with the


! modunload command, you must stop use of that driver (device type)
on all system boards in the domain. The remaining controllers can be
used again after the DR detach is complete.

2. Manually close and use the modunload command to unload all


such drivers on the board.

3. Detach the system board.

4. You can now resume use of the remaining devices. The driver will
be reloaded by Solaris.

9-54 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Unloading a Loaded Deatch-Unsafe Driver

If you cannot execute the above steps, you can reboot your domain
with the board or interface card blacklisted, or you can remove the
board from the domain while the domain is down.

Note – Many third-party drivers do not properly support the Solaris


modunload interface. Use of this interface is rare during normal
operation, and so the interface is sometimes missing or works
improperly. You should test the driver modunload interface before
trying to use it with DR.

DR has a default list of devices that can be detached. If a DR detach


operation fails because the system board hosts a device that is not
included in the detach-safe list and its driver is loaded, the system
displays the message:
WARNING: DR: driver (xxx) not known to support DDI_DETACH

where xxx is the name of the driver module in /kernel/drv or


/usr/kernel/drv and in /etc/name_to_major.

Using modunload
Once you have identified the driver that you need to remove from
memory, you must run the modunload command to delete it from the
kernel. First you must use the modinfo command to get the device
driver’s ID number. It is not always the same, and is not the driver
major number from /etc/name_to_major.

1. Run the modinfo command to get the driver ID. The driver ID is
the first number in the modinfo output, in this case 107.
# modinfo | grep tape
107 f66a0000 dfe9 33 1 st (SCSI tape Driver 1.173)

2. Then run the modunload command, specifying the driver number


from modinfo.
# modunload -i 107
#

Dynamic Reconfiguration 9-55


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Swap Space

System swap space should be configured as multiple partitions or files


on disks attached to controllers hosted by different system boards.
This allows any swap partition or file to be easily replaced with the
swap command. It also has a side benefit of providing better swap
performance because any swap load is spread over several I/O
controllers.

Caution – When memory or disk swap space is detached, there must


! be enough memory and swap disk space remaining in the system to
accommodate all of the currently running programs.

The domain must contain enough remaining configured swap space so


it can flush pageable memory from the detaching system board. For
example, if you want to remove 1 Gbyte of memory from a 2-Gbyte
system, you may need 2 Gbytes of swap space just for DR.

If eight-way memory interleaving is supported, you could need up to


16 Gbytes of swap space just for DR alone, depending on your system
configuration.

9-56 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Swap Space

The amount of additional swap space that you will need is equal to the
amount of mainstore on two domain system boards. To be able to
handle every case, you need to plan using the largest memory amount
on any domain system board. The full amount will double if you are
using eight-way interleave, when it is supported by DR, because you
would need to empty twice as many boards.

Insufficient swap space prevents DR from completing the detach of a


board that contains memory. If this happens, the memory drain phase
of the detach operation will not be able to complete, and you must
abort the detach operation.

Depending on how short of swap space you are, the DR operation may
fill all available swap space and take down the system, as mentioned
earlier.

Also, remember that you need enough available swap space to run
your production workload with adequate performance.

Caution – Make sure that you have enough space in the new primary
! swap partition (and in /var) to contain a full domain panic dump,
approximately 500 to 800 Mbytes.

Dynamic Reconfiguration 9-57


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Memory Interleaving

If you have specified memory interleaving between system boards,


those system boards cannot be detached (at this time). This is because
DR does not yet support interboard (8-way) interleaving.

By default, the system does not set up system boards with interleaved
memory. To allow interleaving, the following line must be in .postrc:
mem_board_interleave_ok

If mem_board_interleave_ok is present, you probably will not be


able to detach a system board that contains memory from the domain.

9-58 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Memory Usage

Before a board can be detached, the memory on that board must be


emptied by the operating system. This is what the DR detach drain
step does.

Emptying a system board’s memory means flushing its pageable


memory to swap space and copying its permanent memory—non-
pageable kernel and OBP memory—to another system board. When
permanent memory is on the detaching board, the operating system
must find other memory to receive the copy. This can be a slow
process.

You can use either hostview or the dr command drshow to see if a


board’s memory is pageable or permanent:
ssp:domain% dr
dr> drshow board_number mem

Dynamic Reconfiguration 9-59


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Memory Usage

When permanent memory is detached, DR chooses a new memory


area on a different system board to contain the permanent memory of
the detaching board. This new memory area must fit the following
rules:

● It must be large enough to hold a copy of all of the non-pageable


memory to be moved.

● It must not be interleaved with memory on other boards.

The DR software automatically disallows the DR detach operation if


these conditions are not met.

If no acceptable target board is found, the detach operation is refused


with the following message:
Jul 28 06:00:00 unix: WARNING:dr_build_adg_detach_list:no target memory
board found

Correctable Errors
Correctable memory error reporting can interfere with DR.
recordstop dumps are taken by the SSP when one occurs. Multiple
dumps can prevent DR from completing its drain processing.

To prevent this, you may need to temporarily disable edd using the
edd_cmd command. Remember to restart edd processing after the DR
operation completes.

The long-term solution is to enable correctable memory error reporting


and replace the failing DIMM(s).

9-60 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a Board With dr

Note that the entire DR detach operation is run from the SSP. The only
host operations required are those to prepare the I/O devices for the
detach.

Remember that SUNW_HOSTNAME must be set to the domain from which


you will be detaching the board.

1. Run the dr command in an SSP window to start the dr shell. You


will get a prompt from the dr shell:
ssp:domain% dr

Checking environment...
Establishing Control Board Server connection...
Initializing SSP SNMP MIB...
Establishing communication with DR daemon...

xf3: System Status - Summary

BOARD #: 0 1 2 5 8 9 10 11 13 physically present.


BOARD #: 4 6 7 being used by the system.
dr>

Dynamic Reconfiguration 9-61


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a Board With dr

2. Drain the board with the drain command. You can drain only one
board at a time.

The drain command will return immediately, but the drain may
not be finished. If you want the drain command to complete only
when the board has been completely drained, use the wait option
(drain 6 wait).
dr> drain 6
Removing board 6 from domain_config file.
Start draining board 6
Board drain started. Retrieving System Info...

Bound Processes for Board 6

cpu user sys procs


--- ---- --- -----
24 0 1
25 0 1
26 0 1
27 0 1

Active Devices for Board 6

device opens name usage


------ ----- ---- -----
ssd384 0 /dev/rdsk/c5t0d0s4 AP database

Memory Drain for Board 6 - IN PROGRESS

Reduction= 1024 MBytes


Remaining in System= 1024 MBytes
Percent Complete= 99% (5696 KBytes remaining)

Drain operation started at Wed Oct 09 18:06:00 1996


Current time Wed Oct 09 18:06:34 1996
Memory Drain is in progress. When Drain has finished,
you may COMPLETE the board detach.

dr>

9-62 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a Board With dr

3. You can monitor the progress of the drain operation with drshow.
dr> drshow board_number drain

4. To determine if there are any active I/O devices on the board


being detached, use drshow IO.
dr> drshow board_number IO

5. When the drain has completed, use complete_detach to finish the


operation.
dr> complete_detach 6
Completing detach of board 6
Operating System has detached the board.
Processors on board 6 reset.
Reconfiguring domain mask registers.
Board 6 placed into loopback.
Board detachment completed successfully.
dr>

6. Use exit to terminate the dr shell.


dr> exit
ssp:domain%

The SSP shell prompt is again displayed.

7. Run apconfig -F to resynchronize the AP databases with the new


state of the domain.

Aborting the Detach Operation


You can cancel the detach operation at any time before you run
complete_detach. Use abort_detach board_number to do so.

Dynamic Reconfiguration 9-63


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a Board With hostview

With hostview, you perform the same steps to detach a system board
that you do with dr. hostview will fill in some defaults and command
fields for you and give you the ability to track the progress of the
operation graphically.

Beginning the Detach


1. From the hostview menu, choose Configuration ➤ Board ➤
Detach. The Board and Domain Selection window is displayed

2. Select the board to detach in the main hostview window.

3. Click on Select. The Board and Source domain fields will be filled
in for you.

4. Click on Execute.

If the target domain is not active, the attach operation simply changes
the domain configuration file on the SSP and completes.

9-64 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a Board With hostview

5. If the target domain is currently running, the Dynamic


Configuration window is displayed.

6. Click on the drain button.

The memory display is provided and enables you to monitor the


progress of the drain operation. The memory drain statistics are
automatically updated at regular intervals if you enable the Auto
Update System Information Displays option of the DR Properties
window.

Continue with the next steps without waiting; they do not depend on
completion of the drain operation.

Dynamic Reconfiguration 9-65


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a Board With hostview

7. To determine which devices are active on the board, click on


Device. This starts the regularly updated DR Device Configuration
window.

8. You can configure the update time interval for the Hostview DR
windows by clicking on the properties button.

9. Terminate all usage of all board-resident I/O devices.

When the complete button is displayed, DR has finished draining


memory. The board is ready to be detached.

9-66 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a Board With hostview

hostview Detach Buttons


The hostview detach window displays the following buttons at
various times during a detach operation:

● drain – Starts to drain memory. Once the drain operation is


finished, the drain button becomes the complete button.

● complete – Completes the detach operation.

● force – If the complete detach operation fails due to a forcible


quiesce condition, the force button is enabled, permitting you to
complete the detach operation by forcibly quiescing the domain.

● reconfig – Reconfigures the OS device directories in a domain.


Remember to use reconfig with extreme caution.

● abort – Cancels the current DR operation. This button is enabled


after the drain operation starts and remains enabled until the
complete detach operation starts.

● dismiss – Aborts the currently active step and leaves the board in
its current state (Present, Init Detach, or In Use).

● help – Summary information for the DR detach operation.

Dynamic Reconfiguration 9-67


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a Board With hostview

10. Choose complete.

complete may take several minutes to finish. When it finishes, the


board’s devices are detached from the operating system.

If the complete command fails, it may be due to any of the following


reasons:

● All of the on-line processors in the domain are on the board being
detached.

● Critical network interfaces are on the board being detached. You


must manually stop all usage of these network interfaces.

● All usage of the detaching I/O devices has not been stopped.

● Quiesce failed.

When the failure is resolved, you can select either complete or force to
complete the detach.

When the board is successfully detached, the following message is


displayed:
Board detachment completed successfully.

You can now either reconfigure the OS device directories or dismiss


the Detach window. The board can be powered off and removed (hot
swapped), attached to another domain, left in the system unattached,
or reattached at a later time.

11. Run apconfig -F to resynchronize the AP databases with the state


of the domain.

9-68 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Pageable and Permanent Memory

Some of the virtual memory used by Solaris must be fixed in real


memory. This memory is contained in the kernel and contains internal
system control and configuration information such as page tables and
device control information. This memory is non-pageable: it cannot be
paged out of real memory and, in fact, cannot be moved at all. It is
referred to as permanent memory.

When performing a DR detach, a system board containing non-


pageable memory presents a special problem because the memory
cannot be paged out to swap to free the system board’s memory
banks, and cannot be moved without causing the failure of most
current activity in the system.

DR supports the removal of a system board with non-pageable


memory. Because it must move this non-pageable memory to free the
system board, it has to find a way to do so without disrupting the
activity and integrity of the system.

Dynamic Reconfiguration 9-69


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Pageable and Permanent Memory

To do this, DR must quiesce the system. Quiesce implies that all system
operations will be suspended, including I/O operations, for the period
of time that it takes to move the data from the detaching system board
to a remaining board. The quiesce process can take a minute or more.

A quiesce is not needed for a system board with only pageable


memory.

9-70 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Detaching a System Board

Operation: Permanent Memory on the Target Board


The drain process with permanent memory on the system board goes
through the same steps as it does on a board with only pageable
memory.

When dr-max-mem is not zero, the system allocates all permanent


memory only in the physical memory located on the lowest numbered
system board. This prevents the need to do a quiesce for every board
you might want to detach. If dr-max-mem is zero, permanent memory
can be allocated on any board.

Thus, it is always better to configure your system so that any board


that you will be detaching is not the lowest numbered board in the
domain. You would usually put your boot device and critical network
interfaces on this board.

Dynamic Reconfiguration 9-71


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Operation: Permanent Memory on the Target Board

It is always better to configure your system so that any board that you
will be detaching is not the lowest numbered board in the domain.

The three steps of a DR operation with permanent memory are:

1. drain

The drain operation will:

● Determine which remaining board will get the non-pageable


memory for the detaching board.
● All pageable memory from both boards is flushed to swap.

● All free pages are locked to prevent further use.

2. complete_detach

During the complete_detach operation:

● The OS in the domain is quiesced.

● The non-pageable memory from detaching board is copied to


the new board.

● System board physical memory addresses are swapped.

● The OS is resumed.

OS suspend time is typically 30 to 60 seconds.

3. reconfig (if appropriate)

No changes from the DR operation without permanent memory.

9-72 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Operating System Quiesce

During a DR Detach operation for a system board with permanent OBP


or kernel memory, the operating system is briefly quiesced. All activity
on the domain centerplane must cease for a few seconds during a
critical phase of the operation. This quiesce only affects the target
domain; other domains in the system are not affected.

Note – Remember that a quiesce is required only if the system board


being detached contains permanent memory. By carefully planning
your domain, you may be able to avoid the necessity for a quiesce.

Before it can quiesce, the operating system must temporarily suspend


all processes, processors, and device activities. If the operating system
cannot quiesce, the detach operation fails.

Dynamic Reconfiguration 9-73


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Operating System Quiesce

Reasons why a quiesce might fail include:

● A user thread in the domain did not suspend.

● Real-time processes are running in the domain (use ps -cf).

● A device that cannot be quiesced by the operating system, called a


suspend-unsafe device, is open.

You can retry the complete_detach operation as often as you


necessary until the quiesce succeeds.

9-74 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Operating System Quiesce Failures

A quiesce failure due to real-time processes or open suspend-unsafe


devices is known as a forcible condition. When you force the quiesce,
you give the operating system permission to continue with the quiesce
even if these conditions are still present.

Warning – Be very careful when using the force option. You could
do serious damage to critical system data by panicing the domain.

If a real-time process is running, determine whether suspending the


process would have an adverse effect on the system. If not, you can
force the operating system to quiesce. Otherwise, you can abort the
detach operation and try again later.

If any suspend-unsafe device is open and cannot be closed, you may


be able to manually suspend the device, then force the operating
system to quiesce. After the operating system resumes, you can
manually resume the device.

Dynamic Reconfiguration 9-75


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Operating System Quiesce Failures

To manually suspend the device, you may have to:

● Close the device by killing the processes that have it open.

● Ask users not to use the device.

● Disconnect the device’s cable.

Be sure to test any procedures used to quiesce a device while it is open


prior to executing them on a production system.

For example, if a device that allows asynchronous unsolicited input is


open, you can disconnect its cables prior to quiescing the operating
system, and reconnect them after the operating system resumes. Doing
this prevents traffic from arriving from the device, and so the device
should not be able to access the domain centerplane.

If you cannot make a device suspend its activity, you should not force
the operating system to quiesce. Doing so could cause the domain to
crash or hang. Instead, delay the DR operation until the suspend-
unsafe device is no longer open.

Warning – If you attempt to do a forced quiesce operation when


suspend-unsafe devices are open, the domain may hang or crash if any
interrupt activity, even an interrupt, occurs on a suspend-unsafe
device during the period of system quiescence. However, the hang will
not affect other running domains.

9-76 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Suspend-Safe and Suspend-Unsafe Devices

Asuspend-safe device is one that does not use the domain centerplane
while the operating system is quiesced. This means that the device
must not transfer any data, reference memory, or generate any
interrupts during the quiesce operation.

A driver is considered suspend-safe if it supports operating system


quiescence (suspend and resume) and guarantees that when a suspend
request is successfully completed, the devices that the driver manages
will not attempt to access the domain centerplane, even if the device is
open when the suspend request is made. All other I/O devices are
considered suspend-unsafe when they are open.

Dynamic Reconfiguration 9-77


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Suspend-Safe and Suspend-Unsafe Devices

The drivers provided by Sun that are known to be suspend-safe are:

● SCSI drivers sd, isp, esp, and fas

● SSA drivers soc, pln, and ssd

● A5000 drivers sf, socal and ses

● Nexus drivers sbus, pci and pei-pci

● Ethernet drivers hme (Sun FastEthernetTM), qe (Quad Ethernet), qfe


(Quad Fast Ethernet) and le (Lance Ethernet)

● FDDI driver nf (NPI-FDDI)

The known suspend-unsafe drivers are:

● SCSI tape driver st

Tape Devices
The sequential nature of tape devices prevents them from being
reliably suspended in the middle of an operation and then resumed.
You can not stop a read and restart it; the tape is moving. Therefore all
tape drivers are considered suspend-unsafe and cannot be quiesced.
Before executing a DR operation that requires a quiesce, make sure all
tape devices are closed or inactive and the driver unloaded.

9-78 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Adding New Suspend-Safe Drivers

The DR driver dr recognizes the devices that can be safely quiesced. It


contains an internal list of drivers that are suspend-safe or can be
safely ignored.

To add a new suspend-safe device to your system, you can describe


the device as suspend-safe by placing an entry in the /etc/system
file. This mechanism enables you to extend the default suspend-safe
list.

Warning – To be suspend safe, the driver must support the


DDI_SUSPEND/DDI_RESUME interface. If you are not sure whether a
device driver supports this interface, ask your service provider or the
manufacturer of the device. Do not include it in the suspend safe list
until you know for sure.

Warning – Tape devices are not suspend-safe; do not append such


devices to the suspend-safe list via the /etc/system file.

Dynamic Reconfiguration 9-79


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Adding New Suspend-Safe Drivers

To add new devices that support quiesce to the /etc/system file, use
the following format, where drivern represents the device driver
module name:
set hswp:suspend_safe_list1=”driver1 driver2 ... drivern”
set hswp:suspend_safe_list2=”driver1 driver2 ... drivern”
set hswp:suspend_safe_list3=”driver1 driver2 ... drivern”
set hswp:suspend_safe_list4=”driver1 driver2 ... drivern”
set hswp:suspend_safe_list5=”driver1 driver2 ... drivern”

The /etc/system file can contain up to five suspend-safe strings, each


no more than 128 characters long. Do not exceed 128 characters.

9-80 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Adding New Suspend-Bypass Drivers

Solaris has a preset list of devices that it ignores during the quiesce
process, making no attempt to quiesce them. These devices, which
include the OS pseudo devices, do not perform any actual I/O
operations and so do not need to be suspended during the quiesce.

Caution – Do not add suspend-unsafe devices to the suspend-bypass


! list. The domain will hang or reset if an interrupt comes in from one of
these devices during DR.

You can add suspend-bypass devices to the /etc/system file that do


not support quiescing but which can be safely ignored during the
quiesce process.

Dynamic Reconfiguration 9-81


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Adding New Suspend-Bypass Drivers

To add new suspend-bypass devices to the /etc/system file, use the


following format, where drivern represents the device driver module
name:
set hswp:suspend_bypass_list1=”driver1 driver2 ... drivern”
set hswp:suspend_bypass_list2=”driver1 driver2 ... drivern”
set hswp:suspend_bypass_list3=”driver1 driver2 ... drivern”
set hswp:suspend_bypass_list4=”driver1 driver2 ... drivern”
set hswp:suspend_bypass_list5=”driver1 driver2 ... drivern”

The /etc/system file can contain up to five suspend-bypass strings,


each no more than 128 characters long. Do not exceed 128 characters.

9-82 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Quiesce Operation

The following transcript from a Solaris 2.6 systen shows the domain
message traffic for a DR detach of the lowest-numbered system board,
Board 0, one that contains permanent memory. Comments are made in
italics.

Other than the apconfig commands, nothing was entered from the
domain console.
The drain operation has completed.
DR op: DRAIN BOARD (board 0)...
# apconfig -N

metanetwork: mqe0
physical devices:
qe4
qe0 P A DR
# apconfig -S

c1 pln3
c0 pln0 P A DR
metadiskname(s):
mc0t5d0 R
mc0t4d0
mc0t3d0
mc0t2d0
mc0t1d0
mc0t0d0
Note that the interfaces on board zero are
marked DR for drain. The active interfaces are
still on board 0.
The SSP now signals the domain to perform the
complete_detach.
# DR op: MOVE CPU0 (move CPU0 from 0 to 4)
CPU 4 has been chosen to run the quiesce since 0
is being detached.
DR op: DETACH BOARD (board 0)...

Dynamic Reconfiguration 9-83


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Quiesce Operation

The quiesce process begins.


NOTICE: hswp: Performing OS QUIESCE...
Suspending USER threads...
qe@0,0 suspended
qe@1,0 suspended
qe@2,0 suspended
qe@3,0 suspended
ssd@0,0 suspended
ssd@1,0 suspended
ssd@2,0 suspended
ssd@3,0 suspended
ssd@4,0 suspended
ssd@5,0 suspended
SUNW,pln@b0000000,8a0e2f (aka pln) suspended
SUNW,soc@0,0 (aka soc) suspended
The transfer is now being performed. It took
less than 30 seconds.
NOTICE: hswp: Performing OS RESUME...
qe@0,0 resumed
qe@1,0 resumed
qe@2,0 resumed
qe@3,0 resumed
SUNW,soc@0,0 (aka soc) resumed
SUNW,pln@b0000000,8a0e2f (aka pln) resumed
ssd@0,0 resumed
ssd@1,0 resumed
ssd@2,0 resumed
ssd@3,0 resumed
ssd@4,0 resumed
ssd@5,0 resumed
Resuming USER threads...
The quiesce is now over.
soc1: port 1 WWN 8a0e2f: Fibre Channel is OFFLINE
soc1: port 1 WWN 8a0e2f: Fibre Channel is ONLINE

DR op: Calling OBP to detach board 0


The memory has been moved so the board may be
detached from the domain.

9-84 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Quiesce Operation

WARNING: ap_write_data: can’t open device (118,644), errno = 6


WARNING: ap_sync_all_db: can’t write database copy (118,644)
WARNING: ap_write_data: can’t open device (118,644), errno = 6
WARNING: ap_sync_all_db: can’t write database copy (118,644)
WARNING: ap_write_data: can’t open device (118,644), errno = 6
WARNING: ap_sync_all_db: can’t write database copy (118,644)
WARNING: ap_write_data: can’t open device (118,644), errno = 6
WARNING: ap_sync_all_db: can’t write database copy (118,644)
WARNING: ap_write_data: can’t open device (118,644), errno = 6
WARNING: ap_sync_all_db: can’t write database copy (118,644)
AP was trying to update the database copies, but
one had been detached.
# apconfig -N

metanetwork: mqe0
physical devices:
qe4 A
qe0 P DE
# apconfig -S

c1 pln3 A
c0 pln0 P DE
metadiskname(s):
mc0t5d0 R
mc0t4d0
mc0t3d0
mc0t2d0
mc0t1d0
mc0t0d0
The AP interfaces on board 0 are now marked DE
for detached. Notice that the AP switch occurred
automatically (and quietly).

Dynamic Reconfiguration 9-85


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR and AP Interaction

DR notifies the Alternate Pathing (AP) subsystem when system boards


are attached, detached, or placed in the detach drain state.

DR also asks AP about the pathgroups and alternates that are in the
AP database and what their status is (active or inactive).

DR Attach
● You must run apconfig -F to clear the detached flag on disk
pathgroup alternate paths that have been reattached.

● If you attach a board with an AP database copy, AP will


automatically synchronize it with the current copies at the end of
the attach process.

9-86 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

DR and AP Interaction

DR Detach
● If the board has a path to an AP database copy, the copy will be
disconnected and marked inaccessible in the other databases.

For Solaris 2.6:

● A system board with a functional disk pathgroup (the alternate is


not marked T) will be automatically be switched by DR during
complete_detach.

● DR and AP will combine to automatically switch both disk and


network pathgroups away from the board being removed. Both
must be manually switched in Solaris 2.5.1.

● AP pathgroups with one of the alternate paths on a board being


being drained will mark that alternate as DR.

● AP pathgroups with one of the alternate paths detached will mark


the detached alternate as DE.

● The AP database state flags are not always correctly updated. Run
apconfig -F to refresh the state of all AP pathgroups.

Dynamic Reconfiguration 9-87


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Lab

Part 1: Using hostview


If you have not done so yet, complete the AP lab, then set dr-max-mem
to an appropriate value and reboot your domain.

1. Use DR attach to add an available board to your domain.

a. Select the board to attach to the domain.

b. Select Configuration ➤ Board ➤ Attach.

c. Click the top select button to fill in the board for the attach
operation.

d. Select the board currently in domain in the main hostview


display.

e. Click on the bottom select button to fill in the Target domain.

f. Click on execute.

g. Click on unsafe to check for any open unsafe devices.

9-88 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Lab

Part 1: Using hostview


h. Select Init Attach to begin the attach operation. Observe the
console output.

i. Select on complete.

j. When the operation is complete, log in as root in the domain,


and verify that the OS is aware of the new hardware:
/usr/platform/sun4u1/sbin/prtdiag

k. From the command line, run apconfig -N and apconfig -S


and look at the state of your AP pathgroups.

l. Run apconfig -F to resynchronize the AP databases with the


domain.

m. Rerun apconfig -N and apconfig -S to see what has changed.

Open Hostview and DR detach the board that you just added.

2. Select the target board.

a. Select Configuration ➤ Board ➤ Detach.

b. Choose the top Select button to have hostview read in the


target board you selected. Leave the optional box empty. (This
box not used for a detach.)

c. Click on the System Information buttons. Are there any


conditions that would prevent a DR Detach operation? If so,
resolve them.

d. Click on drain to begin the detach operation.

e. Observe the Memory Information pop-up window. When drain


is complete, dismiss the memory pop-up window.

f. Click on Complete to complete the detach.

g. Observe the netcon console messages.

h. When detach is complete, dismiss the DR window and note


changes in the hostview display for the target board.

Dynamic Reconfiguration 9-89


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Lab

Part 1: Using hostview


i. From the command line, run apconfig -N and apconfig -S
and look at the state of your AP pathgroups.

j. Run apconfig -F to resynchronize the AP databases with the


domain.

k. Rerun apconfig -N and apconfig -S to see what has changed.

3. Power off the detached board, as if it was going to be hot


swapped.

a. Turn off power to this board. Select power and edit the
command line: power -off -sb X.

b. Select execute.

c. Look at the board’s status in hostview.

d. Turn power back on to the board you removed. Select power


and edit the command line: power -on -sb X

e. Use power to show that it is back on.

4. Create a new domain on the SSP for the detached board:

a. Rename any existing eeprom.image file for the new domain.


ssp:domain% cd $SSPVAR/.ssp_private/eeprom_save
ssp:domain% mv eeprom.image.old_domain eeprom.save.old_domain

b. Create a new eeprom.image file using sys_id.


ssp:domain% sys_id ...

c. Create the domain.

d. Experiment with displaying the first and second domains or all


domains:
• View ➤ domain1
• View ➤ domain2
• View ➤ All Domains

9-90 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Lab

Part 1: Using hostview


5. Configure the OBP for the new domain.

● Create new devaliases for the new domain’s network interface


and system disk.
● Set the boot-device variable to your new disk devalias.
<#8> ok setenv boot-device new-disk-alias
● Set dr-max-mem so that you will be able to use DR on the
domain.
<#8> ok setenv-dr dr-max-mem 1024

6. Boot the new domain from the network as if you were going to
install the software from the CD-ROM in the SSP.
<#8> ok boot new-net-alias

Since the OS software is already installed, once you have verified


that the new domain is able to boot from the CD-ROM, stop the
install program, halt the system, and reboot the domain from its
boot disk.

7. While the domain is booting, examine the SSP directories for the
new domain:

$SSPVAR/adm/new_domain

$SSPVAR/etc/platform/new_domain

8. Do a sum on the new domain's eeprom.image file.


ssp:domain% cd $SSPVAR/etc/platform/domain2
ssp:domain% sum eeprom.image
18493 16 eeprom.image

Record the checksum.

Dynamic Reconfiguration 9-91


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Lab

Part 1: Using hostview


9. Use hostview to remove the new domain:

a. Log in as root and halt the domain with shutdown or init.

b. Remove the domain.

c. Note that the new domain's directories are gone:

$SSPVAR/adm/domain2

$SSPVAR/etc/platform/domain2

10. Verify that the domain’s eeprom.image has been saved:

a. Execute the sum command on the domain's template file:


domain% ssp: cd $SSPVAR/.ssp_private/eeprom_save
domain% sum eeprom.image.new_domain

b. Record the checksum and compare it with that from step 8.

9-92 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Lab

Part 2: Using the Command Line


1. Run apconfig -N and apconfig -S and look at the state of your
AP pathgroups.

2. Run apconfig -F to resynchronize the AP databases with the


domain.

3. Rerun apconfig -N and apconfig -S to see what has changed.

4. Use DR attach to reconnect the removed board after removing it


from its temporary domain, created in the last exercise.

5. Check and update the state of the AP databases.

6. Use the command line to re-create a new domain containing the


board that you just detached from the original domain.

7. Configure and boot the new domain to the OS from its disk. Use
only the command line.

8. Use hostint to force a panic in the new domain. You might want
to enter sync in the netcon window first.
ssp% domain_switch new_domain
ssp% hostint

a. Verify that the original domain continues to run.

b. Observe the automatic attempt to reboot the new domain.

9. Using the command line:

a. Halt the new domain.

b. Remove the new domain.

c. DR attach the board back into the original domain.

Dynamic Reconfiguration 9-93


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Check Your Progress

Before continuing on to the next module, check that you are able to
accomplish or answer the following:

❑ Describe the requirements for dynamic configuration.

❑ List the DR process steps for attach and detach.

❑ Discuss the restrictions and problems that can occur with DR.

❑ Display DR information from both dr and hostview.

❑ Perform a DR attach from both dr and hostview.

❑ Perform a DR detach from both dr and hostview.

❑ Solve problems that prevent DR from succeeding.

❑ Manage AP and DR interaction.

9-94 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
9

Think Beyond

What devices do you have the might be detach-unsafe? Suspend-


unsafe?

Why does the OS need to perform the quiesce operation when there is
permanent memory on the board being detached?

When would you not want to use DR?

What needs to considered when configuring your system in support of


DR?

What would be the best way to combine two 8-board domains? Why?

Dynamic Reconfiguration 9-95


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Diagnostic Information 10

Course Map
This module discusses the various failures that might occur in an
Enterprise 10000 system and how to obtain and save diagnostic
information about them.

10-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Relevance

For Discussion – The following questions are relevant to


undestanding the content of this module:

1. Given the difference between the Enterprise 10000 and other


systems, what additional types of failure information might be
required?

2. What new types of failures might there be?

3. Where is this information likely to be found?

4. What types of information are available?

10-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Objectives

Upon completion of this module, you will be able to:

● Understand how an Enterprise 10000 fails and recovers.

● Understand the role of the SSP in failure logging.

● Discuss the different types of failures.

● Find where failure information is recorded.

● Interpret failure information.

● Understand support information requirements.

References

Additional resources – The following references can provide


additional details on the topics discussed in this module:

● Ultra Enterprise 10000 SSP 3.1 User’s Guide

● Sun Enterprise 10000 System Hardware Installation and De-Installation


Guide

● Sun Enterprise 10000 System Overview Manual

● Appropriate Solaris documentation

● The man pages for the commands and files

Diagnostic Information 10-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Standard Domain Message Logs

The standard domain message logs contain important information that


indicates a problem in a single domain (or physical machine) that may
be due to some hardware or software failure.

These messages can range from a kernel panic to an over-temperature


reading from the hardware. User software errors can be logged,
including information about what was executed, boot failures and
status, and incorrect command semantics.

Each domain has its own SSP messages file named


$SSPVARLOGGER/domain_name/messages. Look in this file first if there
is any indication of a problem in a domain. It is a copy of the domain’s
own /var/adm/messages file.

The message file specifically for the Enterprise 10000 platform is


$SSPVARLOGGER/messages. This contains messages that are not
related to a specific domain or that aren’t issued by the domain.

Remember that the SSP records its own messages in its


/var/adm/messages file.

10-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Bus Configurations and the Figure of Merit

The Enterprise 10000 system has four address buses and two data
buses. It can continue to run, in many cases, when one or more of
these buses has failed. There may be multiple configurations that will
work in these cases, but usually there is only one optimum
configuration.

The Enterprise 10000 can be configured to run with any combination


of one to four address buses. Under normal circumstances, less than all
four would only be used if failures were encountered on one or more
buses; performance would be reduced, but the system would
otherwise be fully functional. There are 15 possible combinations; one
4-bus, four 3-bus, six 2-bus, and four 1-bus configurations.

Similarly, the 144-bit data path is actually two 72-bit paths that
normally operate together. But remember that the Enterprise 10000 can
operate with either one or two data buses. One 2-bus and two 1-bus
data bus configurations are possible.

The address and data bus configurations are strictly independent, so


there are 45 (15 x 3) possible bus configurations for the system.

Diagnostic Information 10-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Bus Configurations and the Figure of Merit

The bus configurations are denoted by the bit mask of the buses they
use. The address bus configurations are numbered 1–F, where F (or 15
decimal) is the normal four-address bus mode. The data bus
configurations are numbered 1–3, where 3 is the normal two-data bus
mode. In some cases a compound bus configuration is used in POST
displays as a two-digit hex value with the databus configuration first,
so the normal all-buses active configuration is 3F.

In some cases of component failure, there are multiple ways in which a


system can be configured from the remaining recourses. For example,
POST can choose to run on three address buses and keep a failing
board, or discard the board and use all four buses with the other
boards. Which configuration is chosen depends on what else was
present on other boards. If the system was one or two boards or if the
board in question had all the SBus cards in the system, you would
keep the board. In a 16-board system you might fail the board and run
in four-bus mode on the other 15 boards.

10-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Bus Configurations and the Figure of Merit

As an example, consider the example above, showing a four-board


domain. If the centerplane connection between boards 8 and 12
partially fails, there are two valid maximal configurations left:

● Boards 6, 8, and 14

● Boards 6, 12 and 14

Which one should be used?

The FOM calculation makes this determination. By considering how


many processors, how much memory, and how many interface cards
are in each possible configuration, it chooses the board configuration
with the most hardware resources available.

If the centerplane interconnect had only partially failed, the calculation


would also have considered which partial bus configuration would
still work and provide the best performance.

Diagnostic Information 10-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Bus Configurations and the Figure of Merit

The FOM decision process is evaluated by a Figure of Merit (FOM)


calculation that computes a value for each possible bus configuration
and chooses the one with the highest FOM.

The FOM mechanism has several desirable properties: it is simple to


calculate, and evaluates to 0 if any of the five components are 0,
independent of any others and their weights. This means that a
configuration is considered usable if, and only if, the FOM is >= 1.

You can see the FOM calculations during bringup processing if


.postrc contains a display_fom_calc {off|on} directive. If you
include it, you will see the FOM for each possible configuration. The
weights can also be adjusted.

Usually the system chooses configuration 3F, meaning that all buses
are available.

10-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Sample FOM Calculation

The following is the output from a bringup run in a domain with six
system boards. The interface between a system board and an address
bus was disabled on two different boards. As you can see, the system
decided to configure with only two address buses, since it could best
access all of the system boards and their components that way.
...
phase final_config: Final configuration...

Config FigureOfMerit Procs MBytes IOcards DBuses ABuses


11 21888.00 24 3072 19 1 1
12 12800.00 20 2560 16 1 1
13 25600.00 20 2560 16 1 2
14 21888.00 24 3072 19 1 1
15 43776.00 24 3072 19 1 2
16 25600.00 20 2560 16 1 2
17 38400.00 20 2560 16 1 3
18 12800.00 20 2560 16 1 1
19 25600.00 20 2560 16 1 2
1A 13312.00 16 2048 13 1 2
1B 19968.00 16 2048 13 1 3
1C 25600.00 20 2560 16 1 2
1D 38400.00 20 2560 16 1 3
1E 19968.00 16 2048 13 1 3
1F 26624.00 16 2048 13 1 4
21 21888.00 24 3072 19 1 1
22 12800.00 20 2560 16 1 1
23 25600.00 20 2560 16 1 2
24 21888.00 24 3072 19 1 1
25 43776.00 24 3072 19 1 2
26 25600.00 20 2560 16 1 2
27 38400.00 20 2560 16 1 3
28 12800.00 20 2560 16 1 1
29 25600.00 20 2560 16 1 2
2A 13312.00 16 2048 13 1 2
2B 19968.00 16 2048 13 1 3
2C 25600.00 20 2560 16 1 2
2D 38400.00 20 2560 16 1 3
2E 19968.00 16 2048 13 1 3
2F 26624.00 16 2048 13 1 4
31 43776.00 24 3072 19 2 1
32 25600.00 20 2560 16 2 1
33 51200.00 20 2560 16 2 2

Diagnostic Information 10-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Sample FOM Calculation

34 43776.00 24 3072 19 2 1
35 87552.00 24 3072 19 2 2
36 51200.00 20 2560 16 2 2
37 76800.00 20 2560 16 2 3
38 25600.00 20 2560 16 2 1
39 51200.00 20 2560 16 2 2
3A 26624.00 16 2048 13 2 2
3B 39936.00 16 2048 13 2 3
3C 51200.00 20 2560 16 2 2
3D 76800.00 20 2560 16 2 3
3E 39936.00 16 2048 13 2 3
3F 53248.00 16 2048 13 2 4

Configuring in 35, FOM = 87552.00: 24 procs, 19 Scards, 3072 MBytes.


...

You would then see the remainder of the bringup messages.

10-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Redlist and Blacklist Files

When bringup is run, hpost will read two syntactically identical text
files called the redlist and blacklist files.

These files cause the system to ignore the presence of hardware


components, usually for purposes of bring up and test, physical
replacement of spares, benchmarking, and so on.

The blacklist file defines resources that hpost is to consider failed; it


may access them to reset or freeze them, but they are not to be
configured into the domain.

The redlist file is more restrictive; it defines resources whose state


must not be altered. Redlisted components are blacklisted by
implication. Because the restriction on not changing state means that
redlisted components cannot be reset, redlisting can, in some cases,
interfere with booting the rest of the domain. Redlisting should not be
used where blacklisting will suffice.

Diagnostic Information 10-11


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Redlist and Blacklist Files

You can isolate problems by blacklisting components that may be


failing to see if the domain will function or allow a domain to run by
blacklisting a failed component.

The default location for the blacklist and redlist files,


$SSPVAR/etc/platform_name, and their file names, may be overridden
by directives in .postrc.

Changes specified in these files take effect the next time a domain runs
bringup. They will not be seen by an affected domain until bringup is
run.

Warning – You should never redlist a component without specific


instructions from Sun service personnel.

10-12 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

The autoconfig Command

The autoconfig command must be run when adding a new revision of


a system board to the system. It may also be required when moving a
system board to a new slot.

You do not need to run autoconfig if all of the system boards are at
same revision level. If you are not sure, run it against the new
board(s).

Warning – Never run autoconfig against a system board that is


running the OS or on the centerplane if any domain is running the OS.
It will reset the domain or entire platform, respectively.

autoconfig checks chip IDs on selected boards against the SSP


scantool database, looking at specific chip ID files and board
signature files on the SSP. If it detects a new revision of a board, it
updates the appropriate SSP configuration database.

It uses and updates several files containing detailed hardware


configuration and status information.

Diagnostic Information 10-13


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

The autoconfig Command

● Chip ID files

● Board signature files

● Platform configuration file

● Board revision directories

● Chip revision directories

See the autoconfig man page for the command syntax. It takes
several minutes to run.

Warning – Do not run this command unless you are specifically


requested to do so by Sun support personnel.

10-14 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Diagnostic Tools

Several types of diagnostic programs are available to supply


diagnostic information about the Enterprise 10000 system.

hpost
Although normally run by the bringup command, hpost can be run
from the SSP command line for diagnostic purposes.

Warning – hpost will crash a running domain. It does not check the
status of the domain before starting.

hpost can be run on one domain while other domains are running.

Warning – Make sure you have the right domain name specified in the
SUNW_HOSTNAME variable.

Diagnostic Information 10-15


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Diagnostic Tools

hpost

Warning – If you do not understand what the option does, do not use
it. You could cause significant damage to the system.

● -llevel, where level can be from 7 to 127. The default is 16. Levels
above 64 take considerable time and may not add a lot of value for
field-level troubleshooting.

● -vlevel, where level can be from 0 to 255. The default is 20. Running
at level 255 is useful (once) to understand how hpost works;
however overly verbose output can mask error reports. The
default verbose level will report all failures.

SunVTS
SunVTS, the on-line validation test suite, is a system exerciser that
tests and validates hardware functionality by running multiple
diagnostic hardware tests on most configured controllers and devices.

The latest version of SunVTS has several modes of testing including


low-impact testing, which can run with minimum impact on customer
applications.

The SunVTS can also be used to stress test hardware, either in or out of
the Solaris operating environment. By running multiple and
multithreaded diagnostic hardware tests, SunVTS verifies the system
configuration and the functionality of most hardware controllers and
devices.

SunVTS tests many board and system functions, as well as the


interfaces for Fibre Channel, SCSI, and SBus interfaces. SunVTS also
accepts user-written scripts for automated testing.

10-16 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Diagnostic Tools

prtdiag
prtdiag is a Solaris command that is run in the domain, and provides
a detailed view of the domain’s hardware configuration. To see the
entire platform’s hardware configuration, either all of the system
boards must be in one domain, or separate prtdiag output must be
combined from all of the domains.

prtdiag is only supported for the sun4u and sun4d architecture


systems, so it cannot be run on the SSP.

For additional information, refer to the prtdiag man page.

To run prtdiag, as root or a regular user, use:


% /usr/platform/sun4u1/sbin/prtdiag -v

Diagnostic Information 10-17


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Diagnostic Tools

prtdiag
This will display information similar to the following:
# /usr/platform/sun4u1/sbin/prtdiag -v
System Configuration: Sun Microsystems sun4u SUNW,Ultra-Enterprise-
10000
System clock frequency: 83 MHz
Memory size: 1024 Megabytes

========================= CPUs =========================

Run Ecache CPU CPU


Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
0 0 0 250 1.0 US-II 1.1
0 1 1 250 1.0 US-II 1.1
0 2 2 250 1.0 US-II 1.1
0 3 3 250 1.0 US-II 1.1
1 4 0 250 1.0 US-II 1.1
1 5 1 250 1.0 US-II 1.1
1 6 2 250 1.0 US-II 1.1
1 7 3 250 1.0 US-II 1.1

========================= Memory =========================

Memory Units: Size


0: MB 1: MB 2: MB 3: MB
----- ----- ----- -----
Board 0 256 256 0 0
Board 1 256 256 0 0

(Output continued on the next page)

10-18 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Diagnostic Tools

prtdiag
========================= IO Cards =========================

Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---- -------------------------------- ----------------
------
0 SBus 25 0 qec/qe (network) SUNW,595-3198
0 SBus 25 0 SUNW,soc/SUNW,pln 501-2069
0 SBus 25 1 QLGC,isp/sd (block) QLGC,ISP1000
1 SBus 25 0 qec/qe (network) SUNW,595-3198
1 SBus 25 0 SUNW,soc 501-2069
1 SBus 25 1 QLGC,isp/sd (block) QLGC,ISP1000

For diagnostic information,


see /var/opt/SUNWssp/adm/new26/messages on the SSP.
#

Diagnostic Information 10-19


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Correctable Memory Errors

Correctable memory errors, while having no effect on the operation of


the software (other than some potential performance degradation), can
be an early indicator of a bad memory DIMM that can cause a more
serious failure later.

A correctable memory error occurs when a line of data from memory


does not agree with its ECC (error correction code; like a checksum).
The correct data is determined by the hardware and passed to its
destination. No data is lost, and there is no danger of data corruption.

Enterprise 10000 messages pinpoint the address of the failing DIMM


from the reported memory location, enabling you to locate and replace
the failing DIMM (possibly by using DR). Failures can be reported by
the OBP if the OS is not running, or by Solaris.

The output display constructed by the OBP looks like:


Board# 15 Bank# 1 P# P1 MM 0_0

10-20 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Correctable Memory Errors

Solaris will display the system board number, memory bank number,
and the location of the DIMM on the board. The error message from
the OS would look like:
Softerror: Intermittent ECC Memory Error SIMM
Board# 3 Bank# 0 P# P13 MM 0_3
ECC Data Bit 63 was corrected

where P13 and MM 0_3 are the DIMM silk screen labelled locations on
the system board for the memory getting the correctable memory
errors.

This is not an emergency condition. A failing DIMM can be replaced at


any convenient time.

Enabling Reporting
To enable OS reporting of correctable memory errors, place the
following commands in the domain’s /etc/system file:
set report_ce_log=1
set report_ce_console=1

The first statement sends the information to /var/adm/messages, the


second sends them to the console. You may use either or both.

Diagnostic Information 10-21


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

System Failures

There are several conditions that can occur on the Enterprise 10000
that require specific handling or provide system state information that
should be saved. These conditions are:

● Reboot request

● Panic

● Watchdog/Redmode/XIR

● Heartbeat failure

● arbstop

Each of these is discussed in the following pages.

10-22 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Reboot Request

A reboot request occurs when a command like reboot or init 6 is


issued from a domain.

When a reboot command is executed, the following steps occur:

● The OS halts cleanly and tells the OBP.

● The OBP makes a reboot request to the SSP.

● edd detects the request (sent by the control board).

● edd logs the event and locates the proper rule.

● edd reboots the domain with bringup -L -F.

● -F – No sanity checking (force)

● -L – Run hpost -s -v10 -QX

The reboot request can come from the reboot, shutdown, or init
commands, for example.

Diagnostic Information 10-23


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Panic

A panic occurs when the OS in a domain detects an unrecoverable


condition. The panic could be caused by either a hardware or a
software problem. You must inspect the panic log message to
determine its cause.

A panic can also be forced from the SSP by using the hostint or
sigbcmd commands.

When a panic is detected, the following steps occur:

● The kernel saves a dump of parts of itself to the primary swap


partition.

● edd detects the panic (notice was sent by the control board)

● edd logs the panic and locates the proper rule

● edd reboots the domain with bringup -L -F

● The vmcore.n and unix.n files are created by savecore in the


domain in /var/crash/hostname

10-24 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Panic

The panic dumps can be analyzed with crash and kadb. You might
also want to use the SunSolve™ script ISCDA.

Considerations
● savecore must be enabled in /etc/init.d/sysetup to save the
crash dump. By default, panic dumps are not saved. Make sure
that you enable savecore. You don’t want to explain why you
don’t know what failed.

● The primary swap partition must be large enough to hold the raw
dump, which could be 500 Mbytes to 800 Mbytes in size. A partial
dump is useless, and only the primary swap partition will be used.

Make sure there is a large enough partition left after any DR


activity removes a primary swap partition.

● /var must be large enough to hold the dump transferred from


swap by savecore. Remember that you can have several dumps
in /var at the same time.

● savecore will completely fill whatever partition contains /var,


trying to save a dump if there is not enough room.

Diagnostic Information 10-25


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Watchdog, Redmode, and XIR Resets

These resets are defined in SPARC Architecture Reference Manual, and


are functions of the SPARC chip, used when it detects certain
exceptional conditions.

● A watchdog reset occurs when a trap (interrupt) happens while the


kernel is already in a critical section of the trap-handler code.
While the Ultra SPARC architecture makes a watchdog reset less
common, they still may occur.

● A redmode is a condition that occurs when the system has received


too many concurrent traps and is running in a very restricted (low-
level kernel) environment.

● A xir is an externally initiated reset. An xir occurs when a specific


signal from outside the processor is sent to the SPARC chip. The
signal can be sent by the hostreset command from the SSP. It’s
like pushing a reset button.

All three of these conditions cause the target domain to halt.

10-26 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Watchdog, Redmode, and XIR Resets

When one of these conditions occurs:

● download_helper saves the processor registers to a special


hardware location.

● edd logs the event in the SSP and locates the proper rule.

● edd reads the domain register information through the control


board and creates an ASCII resetinfo file in a file named like the
following:
$SSPLOGGER/domain_name/resetinfo.timestamp
● edd then runs bringup -L -F and restarts the domain.

resetinfo files are ASCII and do not require redx to view them. They
contain the domain processors’ registers at the time of the failure.

Hostview will notify you that one of these has occurred in the Failure
window, and the resetinfo dump may be viewed from that window.

Saving a Panic Dump


Usually the OBP environment parameters watchdog-sync?,
redmode-sync? and xir-sync? would be set to true to allow you to
save a panic dump if one of these conditions occurrs. The default
setting is false. (See Module 7 for more details.)

Diagnostic Information 10-27


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Heartbeat Failure (Hung Host)

A hung host occurs when a heartbeat failure is detected by the control


board. The processors in a domain regularly update a special heartbeat
register indicating that they are active. When the control board sees
that this register is not being updated, the domain is considered to be
hung and hung host recovery is started.

This often occurs with I/O problems. Some I/O device/CPU


communication will suspend the CPU until the device responds. A
device failure to respond will hang the processor.

When a hung host is detected, the following steps occur:

● edd detects and logs the hang (a notice is sent from the control
board)

● edd forces a panic in the domain

● edd reboots the domain with bringup -L -F

10-28 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Heartbeat Failure (Hung Host)

Manual Intervention With a Hung Host


A software or other condition can result in the appearance of a hung
host, but the hardware heartbeat is still being updated. If this occurs,
you will need to intervene manually to recover from the problem. This
list of possible recovery actions is ordered from the most state
information saved to the least. Don’t try them all; stop when you get a
result.

1. Attempt another login to ensure that the domain is not


responding.

2. Attempt to break in to the domain OBP from your netcon session,


and request a panic dump.
~# (Breaks to OBP)
ok> sync

3. Force a panic from the SSP. A panic dump will be created.


ssp:domain% hostint

Diagnostic Information 10-29


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Heartbeat Failure (Hung Host)

4. Force an xir from the SSP. A resetinfo file will be created.


ssp:domain% hostreset

5. As a last resort, use a forced bringup from the SSP. No failure


information will be recorded at all.
ssp:domain% bringup -f

10-30 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Arbstop

An arbstop occurs when the system centerplane interconnect hardware


detects an error. The error could be ECC, parity, a bus grant time out,
hardware queue overflow, or any one of a number of other problems.

Arbstops and related recordstops generally only affect the domain in


which they occur. An exception is errors that occur within the
centerplane itself, which can affect all configured domains.

If the error occurs on a system board, the arbitration stop request is


broadcast to all of the system boards in the domain, which then halt.
All other domains on the platform continue running.

If the problem is not local to a system board, for example, in the


centerplane, all domains are stopped.

Diagnostic Information 10-31


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Arbstop

When an arbstop occurs:

● The control board sends a message to the SSP.

● edd logs the arbstop event and locates the proper rule.

● edd then captures hardware history information by running


hpost -D.

● edd then starts a reboot of the domain(s) by running


bringup -L -F.

arbstop dumps are binary files that require redx to interpret them.
You can think of them as hardware equivalents of Solaris panic
dumps.

10-32 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Creating a Hardware State Dump File

When the system fails, you usually just want to reboot and get on with
your work, but there may be useful information about bad hardware
that needs to be saved. The -D option of hpost will create a binary
dump of all the internal hardware state information that might be of
interest. This file can later be read by redx.

hpost -D should always be run by edd if an error is detected; you


should never run this command manually.

If the arbstop occurs while the domain is running, the dump file will
be:
$SSPLOGGER/domain_name/Edd_Arbstop_Dump-mm.dd.hh.mm:ss

If the arbstop occurs while hpost is running, the dump file will be:
$SSPLOGGER/domain_name/xfstate.mmdd.hhmm.ss

Note that these are both in the same directory that the SSP’s copy of
the domain /var/adm/mesages file resides in.

Diagnostic Information 10-33


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Creating a Hardware State Dump File

In all cases, the fully qualified name of the file created and the mask of
boards in the created file are printed by hpost. Each system board or
half-centerplane included in the dump requires 4–5 Kbytes, making a
dump from a fully configured system approximately 90 Kbytes.

If hpost -D is run from the command line, it will ask for a 60 character
comment. Add something helpful, or just press Enter. You do not have
to add the date, time, domain name, platform name, SSP name, or the
mask of boards in the dump; these are all placed in the dump file
automatically.

Caution – Do not run hpost -D manually unless instructed to do so by


! Sun support personnel. You may interfere with normal system
recovery.

10-34 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

redx

redx is an interactive hardware debugger that runs on the SSP. Think


of it as a version of adb intended to debug Enterprise 10000 hardware
and view arbstop dumps.

Generally, you should never have a need to use redx. It is intended for
use by trained Sun support personnel only.

Diagnostic Information 10-35


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

redx

redx runs in two modes: on-line and off-line.

● On-line – redx can read and write directly to the Enterprise 10000
hardware.

Warning – Do not use redx in on-line mode unless directed to do so


by Sun support personnel. Even reading some Enterprise 10000
registers can cause a running domain to crash.

● Local or off-line (-l operand) – Used to examine arbstop and


recordstop dump files

The redx prompt shows which mode redx is running in.

● The on-line prompt is redx>.

● The local or off-line prompt is redxl>.

Starting redx
Start redx from the SSP. Make sure that the SUNW_HOSTNAME variable is
set for the proper domain.
ssp:domain% redx -l
Output window is open to child PID 13683 through fifo /tmp/redx_pipe13681
Environment DISPLAY = 129.153.40.26:0
redxl>

redx opens an output window to display the results of the commands.


If redx is run from a remote login, you will need to set the DISPLAY
variable and issue an xhost command.

To quit redx, use the exit command.


redxl> exit

10-36 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Technical Information for Escalation

General Information Needed


For hardware and software system configuration information, review
and have accessible:

● prtdiag output

● sysdef output

● showrev -p or patchadd -p output

for each domain.

Diagnostic Information 10-37


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Technical Information for Escalation

Problem-Specific Information Needed


● Indicate the type of problem such as panic, arbstop, watchdog,
hang, disk error, and so on (if you can).

● Indicate the severity of the problem, such as whether the system is


down or partially deconfigured.

● List all recent system changes such as hardware, software, patches,


and new applications.

● Review and have accessible all relevant SSP-resident file


information from:

● /var/adm/messages

● /var/opt/SUNWssp/adm/domain_name/messages

● /var/opt/SUNWssp/adm/messages

● /var/opt/SUNWssp/adm/domain_name/post/POST

● Include the SSP Console log, if available, the domain .postrc


file(s), and any /etc/opt/SUNWssp/platform/blacklist or
redlist files.

● Have available the domain /etc/system files.

● For arbstops, have accessible the


/var/opt/SUNWssp/domain_name/arbstopdump file on the SSP.

● For watchdog resets, xirs and redmodes, have accessible the


/var/opt/SUNWssp/domain_name/resetinfo file on the SSP.

● For panics, review and have accessible the core files from the
Enterprise 10000 host domain in /var/crash/domain_name.

10-38 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Lab

On the main SSP:

1. Create a .postrc file for your domain, if you do not already have
one, and add:
display_fom_calc on

Run bringup on the domain and watch the FOM calculation


display.

2. Run prtdiag from Solaris in your domain and look at the output.

3. Look at the contents of the various SSP log files.

4. Create a domain Solaris panic with hostint or hostreset.


Observe the dump creation process and watch the OS manage the
dump file. (Make sure that you have enabled savecore in
/etc/init.d/sysetup.)

5. Run bringup with hpost at the default test level (16) and
verbosity level 255. Observe the output.

6. Run bringup with hpost at test level 32 and the default verbose
level (20).

Diagnostic Information 10-39


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Check Your Progress

Check that you are able to accomplish or answer the following:

❑ Comprehend how an Enterprise 10000 fails and recovers.

❑ Understand the role of the SSP in failure logging.

❑ Discuss the different types of failures.

❑ Find where failure information is recorded.

❑ Interpret failure information.

❑ Understand support information requirements.

10-40 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
10

Think Beyond

What types of failure recovery need to be documented for operations


personnel?

What information should be available in the event of a hardware


failure?

How can prtdiag help determine the source of a problem?

Why must you exercise caution when using the hardware support
commands?

Diagnostic Information 10-41


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Configuring NTP A
This appendix summarizes some material from the Network Time
Protocol User’s Guide. It is intended to provide a quick summary of
some of the more important configuration issues for NTP.

For more detail, please refer to the Network Time Protocol User’s Guide
and the appropriate man pages.

A-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
A

SSP and Domain Time Synchronization

The Ultra Enterprise 10000 domains and the SSP keep time
independently but are kept in sync by the Network Time Protocol
(NTP) daemon. During the domain boot process, through NTP the
domain’s kernel asks the SSP for the time and then sets its time to
match.

If the date on the SSP is changed, the daemon changes the time in the
domain.

If you use the date command to change the time in a domain and it
differs from that of the SSP, the daemon immediately begins to
gradually adjust the domain’s time toward that of the SSP. This can
prove confusing to users and programs.

The only time you should use the date command to set the time in a
domain is if a problem prevents the domain from getting the proper
time from the SSP. Note that the domain’s clock device has no battery
backup. If this error occurs and is detected, the following message
appears during the domain boot process:
“WARNING: TOD clock not initialized -- CHECK AND RESET THE DATE!”

If you see this message, as the domain superuser use the date
command to set the time as closely as possible to that shown on the
SSP, and the domain time should quickly sync up with it.

NTP Files
For Solaris 2.5.1, the NTP executables are installed by SUNWxntp in the
/opt/SUNWxntp/bin directory along with the key file (ntp.keys),
while the configuration file (ntp.conf) and drift file (ntp.drift) are
installed in the /etc/opt/SUNWxntp directory.

For Solaris 2.6, the executables are installed by SUNWntpr and


SUNWntpu in /usr/sbin and /usr/lib/inet, and the configuration
files are installed in /etc/inet.

In both cases, the NTP daemon is started during run level 2 processing
at boot time. The daemon’s name is xntpd for 2.5.1, and ntpd for 2.6.

A-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
A

NTP Server Strata

Time is distributed through a hierarchy of NTP servers, with each


server adopting a “stratum” that indicates how far away from an
external source of UTC it is operating. Stratum-1 servers have access to
an external time source, usually a radio clock synchronized to time
signal broadcasts from radio stations that explicitly provide a standard
time service. A stratum-2 server is one that is currently obtaining time
from a stratum-1 server, a stratum-3 server gets its time from a
stratum-2 server, and so on. To avoid long-lived synchronization
loops, the number of strata is limited to 15.

Each client in the synchronization subnet (which may also be a server


for other, higher-stratum clients) chooses exactly one of the available
servers to synchronize to, usually from among the lowest stratum
servers to which it has access. It is thus possible to construct a
synchronization subnet where each server has exactly one source of
lower stratum time to which it can synchronize. NTP prefers to have
access to several sources of lower stratum time (at least three) because
it can then apply an agreement algorithm to detect errors from any one
of these.

Normally, when all servers are in agreement, NTP chooses the best,
where “best” is defined in terms of lowest stratum (closest to
stratum-1), closest in terms of network delay and claimed precision.
While a goal should be to provide each client with three or more
sources of lower stratum time, several of these will only be providing
backup service and may be of lesser quality in terms of network delay
and stratum. That is still acceptable; a same-stratum peer that receives
time from other lower-stratum sources not accessed directly by the
local server can provide good backup service.

Configuring NTP A-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
A

Synchronization Sources

When a workstation is used in an enterprise network for a public or


private organization and appropriate lower-strata servers are not
available, you can explore some portion of the existing NTP subnet
running in the Internet.

Thousands of time servers are willing to provide a public time-


synchronization service. Some of these are listed in a file maintained
on the Internet host louie.udel.edu (128.175.1.3) on the path
pub/ntp/doc/clock.txt. This file is updated on a regular basis
using information provided voluntarily by various site administrators.
In most cases the keepers of those servers listed in the clock.txt file
provide unrestricted access without prior permission; however, in all
cases, it is considered polite to notify the administrator listed in the file
upon commencement of regular service. In all cases the access mode
and notification requirements listed in the file must be respected.

Other ways to explore the nearby subnet include use of the nptrace
and ntpq programs, provided with the NTP packages. See their man
pages for more detail.

It is usually better to choose sources that are likely to be “close” to you


in terms of network topology, though you should not worry overly
about this if you are unable to determine who is close and who is not.
It is usually much more serious when a server becomes faulty and
delivers incorrect time than when it simply stops operating, because
an NTP-synchronized host normally can coast for hours or even days
without its clock accumulating a serious error of more than one
second. Selecting at least three sources from different operating
locations is the minimum recommended.

Normally, it is not considered good practice for a single workstation to


request synchronization from a primary (stratum-1) time server. At
present, these servers provide synchronization for hundreds of clients
(in many cases) and could become seriously overloaded if large
numbers of workstation clients requested synchronization directly.
Therefore, workstations with no local synchronization infrastructure
should request synchronization from nearby stratum-2 servers instead.

A-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
A

Configuring Your Server or Client

The NTP configuration file is /etc/opt/SUNWxntp/ntp.conf (2.5.1)


or /etc/inet/ntp.conf (2.6). This is an ASCII file conforming to the
usual comment and white space conventions. A working configuration
file might look like:
peer config for 128.100.100.7 #(expected to operate at stratum 2)
server 128.4.1.1 # rackety.udel.edu
server 128.8.10.1 # umd1.umd.edu
server 192.35.82.50 # lilben.tn.cornell.edu
driftfile /etc/inet/ntp.drift

This particular host is expected to operate as a client at stratum-2 by


virtue of the “server” keyword and the fact that two of the three
servers declared (the first two, actually) have radio clocks and usually
run at stratum-1. (This information can be obtained from the
clock.txt file.) The third server in the list has no radio clock but is
known to maintain associations with a number of stratum-1 peers and
usually operates at stratum 2. Of particular importance with the last
host is that it maintains associations with peers besides the two
stratum-1 peers mentioned. This can be verified using the ntpq
program included in the distribution (see the section "Inspecting Your
Configuration"). When configured using the server keyword, this
host can receive synchronization from any of the listed servers but can
never provide synchronization to them.

Unless restricted using facilities described later, this host can provide
synchronization to dependent clients, which do not have to be listed in
the configuration file. Associations maintained for these clients are
transitory and result in no persistent state kept by the host. These
clients are normally not visible when using the ntpq program included
in the distribution; however, the daemon includes a monitoring
feature that caches a minimal amount of client information that is
useful for debugging and administrative purposes.

Configuring NTP A-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
A

Configuring Your Server or Client

A time server expected to both receive synchronization from another


server, as well as to provide synchronization to it, is declared using the
“peer” keyword instead of the “server” keyword. In all other aspects
the server operates the same in either mode and can provide
synchronization to dependent clients or other peers. It is considered
good engineering practice to declare time servers outside the
administrative domain as “peer” and those inside as “server” in order
to provide redundancy in the global Internet while minimizing the
possibility of instability within the domain itself. A time server in one
domain can, in principle, heal another domain temporarily isolated
from all other sources of synchronization. However, it is probably
unwise for a casual workstation to bridge fragments of the local
domain that have become temporarily isolated.

One of the things the NTP daemon does when it is first started is to
compute the error in the intrinsic frequency of the clock on the
computer that it is running on. It usually takes about a day or so after
the daemon is started to compute a good estimate of this (and it needs
a good estimate to synchronize closely to its server). Once the initial
value is computed, it will change only by relatively small amounts
during the course of continued operation. The “driftfile” declaration
indicates to the daemon the name of a file where it may store the
current value of the frequency error so that, if the daemon is stopped
and restarted, it can reinitialize itself to the previous estimate and
avoid the day's worth of time it will take to recompute the frequency
estimate. Since this is a desirable feature, a “driftfile” declaration
should always be included in the configuration file.

If the daemon stops for some reason, the local platform time will
diverge from UTC (Coordinated Universal Time) by an amount that
depends on the intrinsic error of the clock oscillator and the time since
it last synchronized. In view of the length of time necessary to refine
the frequency estimate, every effort should be made to operate the
daemon on a continuous basis and limit the time when it is not
running.

A-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
A

Configuration Guidelines

When planning your network there are several configuration rules to


remember:

1. Do not synchronize a local time server to another peer at the same


stratum unless the latter is receiving time from lower stratum
sources the former does not talk to directly. This minimizes the
occurrence of common points of failure but does not eliminate
them in cases where the links to the primary sources of
synchronization are disrupted due to failures.

2. Do not configure peer associations with higher-stratum servers.


Let the higher strata configure lower-stratum servers, but not the
reverse. This greatly simplifies configuration file maintenance,
because there is usually a higher rate of configuration change in
the higher stratum clients (such as personal workstations).

3. Do not synchronize more than one time server in a particular


domain to the same time server outside that domain. Such a
practice invites common points of failure as well as raises the
possibility of overuse should the configuration file be
automatically distributed to a large number of clients.

Configuring NTP A-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
A

NTP Query Programs

Three utility query programs are included with the NTP facility: ntpq,
ntptrace, and xntpdc. For more information on these, see their man
pages in /opt/SUNWxntp/man (2.5.1) or /usr/man (2.6).

ntpq sends queries and receives responses by using NTP standard


mode-6 control messages. It is most useful to query remote NTP
implementations to assess timekeeping accuracy and to expose bugs in
NTP configuration or operation.

ntptrace can be used to display the current synchronization path


from a selected host through the possibly intervening servers to the
primary source of synchronization, usually a radio clock.

xntpdc is a low-level internal debugging utility.

A-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
A

Inspecting Your Configuration

After starting the NTP daemon, run the ntpq program with the -n
switch, which will avoid possible distractions due to name resolutions.
Use the peer command to display a table showing the status of the
configured peers and possibly other clients using the daemon.

After operating for a few minutes, the display should be something


like:

remote refid st when poll reach delay offset disp


+128.4.2.6 132.249.16.1 2 131 256 373 9.89 16.28 23.25
*128.4.1.20 .WWVB. 1 137 256 377 280.62 21.74 20.23
-128.8.2.88 128.8.10.1 2 49 128 376 294.14 5.94 17.47
+128.4.2.17 .WWVB. 1 173 256 377 279.95 20.56 16.40

The meanings of the columns are:

● remote – Lists hosts that should agree with the entries in the
configuration file, plus any peers not mentioned in the file at the
same or lower level than your stratum that happen to be
configured to peer with you.

● refid – The current source of synchronization for that peer.

● st – The host’s stratum.

● poll – The polling interval, in seconds.

● when – The time since the peer was last heard, in seconds.

● reach – The status of the reachability register in octal format.

● The remaining entries show the latest delay, offset and dispersion
computed for that peer, in milliseconds.

Configuring NTP A-9


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
OBP Device Aliases B
This appendix describes the process of creating OBP device aliases
using the nvalias and devalias commands at the ok prompt.

B-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
B

OBP Device Aliases

The Enterprise 10000 system does not provide any default OBP device
aliases like the other Sun SPARC systems do. Because of the E10000’s
domain and DR capability, it is very difficult to determine which
interfaces or devices should have aliases, so none are created.

The E10000 does have several device aliases defined in the OBP, but
they are intended as examples only and should not be used for real
devices.

To boot the domain, either from a disk, network interface or storage


array drive, you must first create an appropriate device alias in the
OBP to simplify the boot process. The alternative is to specify the full
OBP device tree physical path each time, which can be difficult.

Device aliases can be created with the devalias and the nvalias
commands. Aliases created with devalias only last until the system is
reset. Aliases created with nvalias last until they are deleted with
nvunalias. It is usually best to use nvalias, since these device aliases
will remain available.

The syntax for the devalias and nvalias commands is identical.

Deleting Device Aliases


If you are replacing a device alias, you must first delete it with the
nvunalias command. Having two or more active device aliases with
the same name may cause problems for the OBP, even if they are for
the same device.

Note that, when you delete a device alias, it will not disappear from
the devalias listing until the OBP reset (or boot) command has
been run. As long as the alias has been deleted, it is safe to recreate it,
even if it is still visible using devalias.

B-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
B

Creating a SCSI Disk Alias

By using the SBus and PCI card mapping table from Module 7, you
can determine the physical location of the SCSI host adapter card that
you wish to use.

Once you know this, you can create a disk device alias using the
show-disks command.

To use device t3d0 on the SCSI controller in board 3, Sbus 0, slot 1:

1. Delete any old alias with the same name using nvunalias.
<#15> ok nvunalias bootdrive

2. The physical location of this SBus card is /sbus@4c,0/isp@1


(from the table in Module 7).

3. The show-disks command lists all of the disk devices (including


disk arrays) connected to the domain:
<#15> ok show-disks
a) /sbus@48,0/QLGC,isp@1,10000/sd
b) /sbus@4d,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a1085/SUNW,ssd
c) /sbus@4c,0/QLGC,isp@1,10000/sd
q) NO SELECTION
Enter Selection, q to quit:

4. Item c is the proper SBus card, so choose c.


Enter Selection, q to quit: c
/sbus@4c,0/QLGC,isp@1,10000/sd has been selected.
Type ^Y ( Control-Y ) to insert it in the command line.
e.g. ok nvalias mydev ^Y
for creating devalias mydev for
/sbus@4c,0/QLGC,isp@1,10000/sd

OBP Device Aliases B-3


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
B

Creating a SCSI Disk Alias

5. Start to create the alias with nvalias.


ok nvalias bootdrive ^Y

6. The system will replace the ^Y with the device path chosen from
show-disks, leaving the full path on the command line with the
cursor at the end of the line. Do not press Enter yet.
<#15> ok nvalias bootdrive /sbus@4c,0/QLGC,isp@1,10000/sd

7. You will need to manually add the SCSI device txdx numbers to
the alias. This takes the format of @target_number,lun_number:slice,
where slice is a letter from a to h, corresponding to disk slices 0
through 7, respectively. For t3d0, slice 0, you would add @3,0:a.
<#15> ok nvalias bootdrive /sbus@4c,0/QLGC,isp@1,10000/sd@3,0:a

8. Press Enter to create the alias.

9. You can check the new alias with devalias if you wish.
<#15> ok devalias
bootdrive /sbus@4c,0/QLGC,isp@1,10000/sd@3,0:a
...

B-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
B

Creating a Storage Array Disk Alias

By using the SBus and PCI card mapping table from Module 7, you
can determine the physical location of the SOC or SOC+ interface card
that you wish to use.

Once you know this, you can create a device alias using the
show-disks command.

You want to use the SSA with WWN 8a1085, connected to port b of its
SOC card. To specify drive t4d2 on this SSA, attached to the SOC card
in board 3, Sbus 1, slot 0:

1. Delete any old alias with the same name using nvunalias.
<#15> ok nvunalias bootdisk

2. The physical location of this SBus card is


/sbus@4d,0/SUNW,soc@0 (from the table in Module 7).

3. The show-disks command lists all of the disk devices (including


disk arrays) connected to the domain:
<#15> ok show-disks
a) /sbus@48,0/QLGC,isp@1,10000/sd
b) /sbus@4d,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a1085/SUNW,ssd
c) /sbus@4c,0/QLGC,isp@1,10000/sd
q) NO SELECTION
Enter Selection, q to quit:

4. Item b is the proper SBus card and has the correct WWN, so
choose b.
Enter Selection, q to quit: b
/sbus@4d,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a1085/SUNW,ssd has been
selected.
Type ^Y ( Control-Y ) to insert it in the command line.
e.g. ok nvalias mydev ^Y
for creating devalias mydev for
/sbus@4d,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a1085/SUNW,ssd

OBP Device Aliases B-5


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
B

Creating a Storage Array Disk Alias

5. Start to create the alias with nvalias.


ok nvalias bootdisk ^Y

6. The system will replace the ^Y with the device path chosen from
show-disks, leaving the full path on the command line with the
cursor at the end of the line. Do not press Enter yet.
<#15> ok nvalias bootdisk
/sbus@4d,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a1085/SUNW,ssd

7. You will need to manually add the SCSI device txdx numbers to
the alias. This takes the format of @target_number,lun_number:slice,
where slice is a letter from a to h, corresponding to disk slices 0
through 7, respectively. For t4d2, slice 0, you would add @4,2:a.
<#15> ok nvalias bootdisk
/sbus@4d,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a1085/SUNW,ssd@4,2:a

8. Press Enter to create the alias.

9. You can check the new alias with devalias if you wish.
<#15> ok devalias
bootdisk
/sbus@4d,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a1085/SUNW,ssd@4,0:a
...

B-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
B

Creating a Network Device Alias

By using the SBus and PCI card mapping table from Module 7, you
can determine the physical location of the network interface card that
you want to use.

Once you know this, you can create a network device alias using the
show-nets command. The only other thing you may need to know, for
a quad Ethernet card, is which interface you want to use. The
show-nets command will show all of these interfaces, in reverse order
(3 through 0).

To use interface 0 of the qec card on board 3, Sbus 0, slot 0:

1. Delete any old alias with the same name using nvunalias.
<#15> ok nvunalias bootnet

2. The physical location of this SBus is /sbus@4c,0/qec@0 (from the


table in Module 7).

3. The show-nets command lists all of the network interfaces in the


domain:
<#15> ok show-nets
a) /sbus@48,0/qec@0,20000/qe@3,0
b) /sbus@48,0/qec@0,20000/qe@2,0
c) /sbus@48,0/qec@0,20000/qe@1,0
d) /sbus@48,0/qec@0,20000/qe@0,0
e) /sbus@4c,0/qec@0,20000/qe@3,0
f) /sbus@4c,0/qec@0,20000/qe@2,0
g) /sbus@4c,0/qec@0,20000/qe@1,0
h) /sbus@4c,0/qec@0,20000/qe@0,0
q) NO SELECTION
Enter Selection, q to quit:

4. Items e through h are on the proper SBus card. Interface 0 is the


proper connection, so choose h.
Enter Selection, q to quit: h
/sbus@4c,0/qec@0,20000/qe@0,0 has been selected.
Type ^Y ( Control-Y ) to insert it in the command line.
e.g. ok nvalias mydev ^Y
for creating devalias mydev for
/sbus@4c,0/qec@0,20000/qe@0,0

OBP Device Aliases B-7


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
B

Creating a Network Device Alias

5. Create the alias using the nvalias command.


ok nvalias bootnet ^Y

6. The system will replace the ^Y with the device path chosen from
show-nets, leaving the full path on the command line. Hit enter
to create the alias.
<#15> ok nvalias bootnet /sbus@4c,0/qec@0,20000/qe@0,0

7. You can check the new alias with devalias if you want.
<#15> ok devalias
bootnet /sbus@4c,0/qec@0,20000/qe@0,0
...

B-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Glossary

.postrc
A text file that controls options in hpost(1M). Some of the
functions can also be controlled from the command line.
Arguments on the command line take precedence over lines in
the .postrc file, which takes precedence over built-in defaults.
hpost -?postrc gives a terse reminder of the .postrc
options and syntax. See postrc(4).
ARBSTOP
A condition that occurs when one of the Ultra Enterprise 10000
ASICs detects a parity error or equivalent fatal system error.
Bus arbitration is frozen, so all bus activity stops. The system is
dead until the SSP detects the condition by polling the status
registers of the Address Arbiter ASICs via JTAG, and clears the
error condition.
ASIC
Application-specific integrated circuit. Used in the Enterprise
10000 system context to mean any of the large main chips in the
design, including the UltraSPARC processor and data buffer
chips.
BBSRAM
See bootbus SRAM.
blacklist
A text file that hpost(1M) reads when it starts up that tells it
about Enterprise 10000 system components that are not to be
used or configured into the system. The default path name for
this file can be overridden in the .postrc file (see postrc(4))
and on the command line.

Glossary-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Board descriptor array
The description of the single configuration that hpost chooses.
It is part of the structure handed off to OBP.
Bootbus
A slow-speed byte-wide bus controlled by the processor port
controller ASICs, used for running diagnostics and boot code.
UltraSPARC starts executing code from bootbus when it exits
reset. In Enterprise 10000 system, the only component on the
bootbus is the BBSRAM.
bootbus SRAM
A 256-Kbyte static RAM attached to each processor PC ASIC.
Through the PC, it can be accessed for read/write from JTAG or
the processor. It is downloaded at various times with
hpost(1M) and OBP start-up code, and provides shared data
between the downloaded code and the SSP.
Caching UPA master
A UPA module with master capability that also has a coherent
cache. The caching UPA master module participates in the
cache coherence protocol.
centerplane
A double-sided backplane where eight system boards, one
centerplane support board, and one control board plug
perpendicularly into each side.
centerplane support board
Board that plugs into the centerplane and supplies clocks,
JTAG, and control functions for one-half of the centerplane.
Normally, two centerplane support boards are used; each
plugging into opposites sides of the centerplane.
CIC
Coherency Interface Controller. Handles coherency transactions
for the three port controllers on a board. Connects to one of four
global address buses. Snoops for one quarter of the address
space.
control board
Board that plugs into the centerplane and provides the system’s
JTAG, clock, fan, power, serial interface, and Ethernet interface
functions.

Glossary-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
correctable error
Any error that can be corrected so that processing can continue
with no loss of data. This includes automatic correction by
hardware, algorithmic correction by software, and correction by
hardware with software intervention. Correctable errors are all
hardware-initiated.
CS6400 system
System containing up to 64 SuperSPARC™ processors
interconnected by four XDbuses. The predecessor system to the
Enterprise 10000.
Data block
64 bytes. On a 128-bit UPA_Databus, four quadwords are
transferred, one quadword per clock cycle. On a 64-bit
UPA_Databus, eight doublewords are transferred, one
doubleword per clock cycle.
Degraded mode
Mode in which only half of the centerplane is in use (72 bits of
data, two address buses, and one centerplane support board).
DIMM
Dual in-line memory module. A small printed circuit board
containing memory chips and some support logic. Dual refers to
the fact that the corresponding pins on each side of the edge
connector are unique, so that there are a dual row of pins.
Domain
A set of one or more system boards that act as a separate system
capable of booting the OS and running independently of any
other domains.
DR
Dynamic Reconfiguration.
DRAM
Dynamic RAM. Hardware memory chips that require periodic
rewriting to retain their contents. This process is called refresh.
In the Enterprise 10000 system, DRAM is used only on main
memory SIMMs, and on the control boards.
ECache
External cache. A 1/2-Mbyte to 4-Mbyte synchronous static
RAM second-level cache local to each processor module. Used
for both code and data. This is a direct-mapped cache.

Glossary-3
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
ECC
Error correction code
Enterprise 10000
The successor to the CS6400 system. Has up to 64 UltraSPARC
processors and 64 GBytes of main storage, interconnected by a
UPA crossbar via the Gigaplane XB system bus.
Externally initiated reset (XIR)
Refer to Xir.
Fatal error
A class of unrecoverable errors which necessitate that the
machine be rebooted; may be hardware or software initiated.
This type of unrecoverable error will result in an arbiter stop
condition which requires SSP interaction.
FCS
First customer ship. The date a product will be shipped to
customers.
GAARB
Global address arbiter. Arbitrates for a global address buses.
Implemented by an Enterprise 10000 arbiter chip.
GAB
Global address buses. Four 16:16, 48-bit wide multiplexers that
connect together a coherent interface controller from each
system board. The multiplexors broadcast one of the inputs to
all the outputs. Implemented by 16 XMUX ASICs. Functions
like a snoopy bus for coherency purposes, but is really a point-
to-point address router.
GDARB
Global data arbiter. Arbitrates for the global data router’s 16x16
crossbar.
GDR
Global data router. Sixteen 16:1, 144-bits wide multiplexers that
connects together the local data routers on each system board.
Implemented by 12 XMUX ASICs.
Gbits/sec
Gigabits per second.
Gbytes/sec
Gigabytes per second.

Glossary-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
MC
Memory Controller chip. Accepts memory addresses from the
four coherent interface controllers and data from the Starfire
data buffer (XDB), and performs reading and writing of 64-byte
blocks of data into one to four banks of memory.
Giga
1,024 X 1,024 x 1,024 = 1,073,741,824.
Graphics port
Synonym for a UPA slave port.
HdwMTTI
Hardware mean time to interruption is defined as the average
hours between interrupt in production caused by a hardware
failure. Each automatic shutdown, reboot, panic, and so on is
counted as a separate interrupt, with the exception that only
one interrupt is counted when there is insufficient time between
interrupts for production to resume. Power failures, manual
shutdowns not caused by system failures, and an isolated
failure of a peripheral not affecting system operation are not
interrupts.
HdwMTTR
Hardware mean time to repair is defined as the average time
require to repair a failed system. In a properly configured
system, that is, one with alternate resources to allow concurrent
servicing the HdwMTTR should not be a factor in calculating
customer availability.
Hostname
Another term for domain name. It is short for the environment
variable SUNW_HOSTNAME which is used to instruct SSP
commands which domain the command is intended for.
I/O module
A daughter card containing two SYSIO chips. Each SYSIO
controls an SBus, with sockets for two SBus cards, and an
embedded SCSI and Ethernet interface.
I/O port
Synonym for a caching UPA master port.
JTAG
A serial scan interface specified by IEEE standard 1149.1. The
name comes from Joint Test Action Group, which initially
designed it. See JTAG+.

Glossary-5
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
JTAG+
An extension of JTAG, developed by Sun, which adds a control
line to signal that board and ring addresses are being shifted on
the serial data line. Often referred to simply as JTAG.
Kilo
1,024
LAARB
Local address arbiter. Arbitrates for the local address router.
LAR
Local address router. Four bidirectional 3:1 multiplexors that
connect the three local address buses to four coherent interface
controllers. Implemented inside the four coherent interface
controllers on each board.
LDARB
Local data arbiter. Arbitrates for the local data router.
LDMUX
Local data mux. One mode of the XMUX.
LDR
Local data router. Two unidirectional 144-bit-wide 4:1
multiplexers that connect the four UPA databuses on a system
board with the global data router. Implemented by four XMUX
ASICs per system board.
Mbits/sec
Megabits per second.
MBus
A 64-bit wide, circuit switched bus, used by sun4m architecture
desktop systems from Sun Microsystems™.
Mbytes/sec
Megabytes per second.
MC
Memory Controller chip. Accepts memory addresses from the
four coherent interface controllers and data from the Starfire
data buffer (XDB), and performs reading and writing of 64-byte
blocks of data into one to four banks of memory.
Mega
1,024 x 1,024 = 1,048,576.

Glossary-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Memory bank
512-data bits, and 64-ECC bits wide, made up of eight DIMMs.
Memory module
A daughter card containing 32 DIMMs.
OBP
OpenBoot PROM. The code that allows a SPARC system to
boot. Located in PROM on Sun servers.
PC
Port controller ASIC. Interfaces UPA modules to the Starfire
system. The PC controls address flow between the UPA port
and the four coherent interface controllers, and controls data
flow between the UPA port and the Starfire data buffer (XDB).
PCI
Peripheral component interconnect bus interface. An industry
standard I/O attachment bus used in Sun processors,
supplanting the SBus.
Platform
The official platform type for the Enterprise 10000 machine is
Ultra Enterprise 10000. The platform name is a logical name
given to an Enterprise 10000 machine. A platform or an
Enterprise 10000 machine does not correspond to any host on
the network.
POST
Power-on self-test.
PUP
Pack/unpack. A mode of the XMUX ASIC used on the memory
module.
Quadword
Sixteen bytes. The byte order is big endian. In non-caches
read/write transactions, valid bytes within the quadword are
identified with a 16-bit bytemask.
R/W
Read/write.
RED_state trap (redmode)
A SPARC v9 processor takes a trap at an offset 0xa0 (PA =
0x1ff f000 00a0) when processor processes a reset or trap occurs
and TL is at MAX_TL -1 (4 for the Enterprise XX00) or the
system software has set PSTATE.RED to 1.

Glossary-7
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
RO
Read only.
RPC
Remote procedure call.
SBus
A Sun designed I/O bus, now an open standard.
SC2000
SPARCcenter™ 2000. Has up to 20 SuperSPARC processors
interconnected by two XDbuses.
SIMM
Single in-line memory module. Single refers to the fact that the
corresponding pins on each side of the edge connector are tied
together, so that there is only a single row on pins.
SIR
Software initiated reset. A software initiated reset is initiated by
SIR instruction within any processor. This pre-processor reset
has a trap type 4 at physical address offset 0x80 (PA = 0x1ff
f0000 0080).
SMP
Symmetric multiprocessor. Mainstream parallel systems.
Memory space is shared, and equally accessible to all the
processors. Caches are kept coherent by hardware mechanisms.
SOC
Serial optical channel. Connects two Fibre Channels to an SBus
SOC+
Second generation fibre channel incorporating FC-AL, Fibre
Channel Arbitrated Loop.
SRAM
Static RAM. These are memory chips that retain their contents
as long as power is maintained.
SS1000
SPARCserver™ 1000. Has up to eight SuperSPARC processors
interconnected by one XDBus.
SSP
System service processor. A networked SPARCstation™ from
which the system is booted and diagnosed.

Glossary-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
sun4d
The system architecture of the current server systems that use
the XDBus, such as the CS6400 system.
sun4m
The system architecture of current desktop systems using the
MBus.
sun4u
The system architecture of the UltraSPARC systems.
Sun Microelectronics
A division of Sun Microsystems that was formed in April 1993
to develop, design, and distribute SPARC technologies and
products worldwide. Sun Microelectronics’ portfolio includes
microprocessors, chipsets, modules, boards, technology
licenses, silicon and system packages and consulting services.
Sun Microelectronics has more than 500 employees working in
product development, engineering, marketing and international
sales and support.
SuperSPARC
SPARC processor used in the CS6400 system. Interfaces via the
XBus to the XDBus.
System board
A large circuit board, containing a memory module, four
UltraSPARC processor modules, and two I/O modules.
System clock.
The interconnect clock. It is centrally distributed to all UPA
ports and within the interconnect.
System controller
Also referred to as the “SC.” This is the central controller in the
interconnect, and orchestrates the cache coherency, data flow,
flow control, and memory operations.
UDB
UltraSPARC data buffer. Two chips that connect the
UltraSPARC-I processor and its external cache to the 144-bit
wide UPA port.
UltraSPARC-II processor
SPARC processor used in the Starfire™ system. Interfaces to a
UPA port.

Glossary-9
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
UltraSPARC-II processor module
A small circuit board containing an UltraSPARC-II processor,
two UltraSPARC-II data buffer chips, one external-cache tag
SRAM, and four external-cache data SRAMs.
Uncorrectable Error
Same as unrecoverable error.
Unrecoverable error
An error which cannot be corrected through hardware or
software action. Error detected by the hardware indicating data
has been lost, this type of error is fatal and will result in an
arbiter stop condition.
UPA
UltraSPARC port architecture. Defines the processor and DMA
interface to shared memory through a cache-coherent
interconnect for a family of uni- and multiprocessors designed
around the V.9 UltraSPARC processor.
UPA_Addressbus
The UPA Addressbus can be a bus, or a point-to-point
interconnection between the SCs and UPAs. For descriptive
purposes, the address path is sometimes also referred to as the
UPA_Addressbus.
UPA_Databus
The UPA interconnect data path can be a bus, a switch, or a
combination of the two. For descriptive purposes, the data path
is sometimes also referred to as the UPA_Databus.
UPA master port
A UPA port which can initiate data transfer actions on the
interconnect.
UPA slave port
A UPA port which can only be the recipient of a transaction. A
slave port does not generate transactions. A slave port has an
address space associated with it for programmed I/O, and
implements the port-ID registers. A slave port also handles
copyback requests for cache blocks in UPA ports which support
a coherent cache, and handles interrupt transactions in a UPA
port which contains a processor.

Glossary-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Watchdog reset (WDR)
A SPARC V9 processor signals itself internally to take a
Watchdog Reset (WDR) trap at physical address offset 0x40
(PA = 0x1ff f0000 0040) when a trap occurs and TL is at
MAX_TL (5 for Enterprise X000).
WO
Write only.
XARB
Enterprise 10000 arbiter chip. Has modes to implement three
functional units: the local address arbiter, the local data arbiter,
and global address arbiter.
XBAR
Data crossbar, an application of the XMUX on the centerplane.
XBus
Connects between the MBus XBus Cache Controller (MXCC) of
a SuperSPARC, and one to four Bus Watchers (BW), which in
turn connect to the XDBus.
XDB
Enterprise 10000 data buffer chip. Buffers cache lines that are
in transit between a UPA data port or a memory bank and the
local data router.
XDBus
A 64-bit wide, 40–60 MHz, packet-switched, combined address
and data, set of 1-4 buses. Used by the CS6400 system.
Xir
Externally initiated reset. An externally initiated reset (Xir) is
sent to the CPU via the XIR pin. It causes a SPARC V9 XIR
which has a trap type 0x3 at physical address offset 0x60 (PA =
0x1ff f000 0060). It has higher priority than all other resets
except POR (power-on reset).
XMUX
Crossbar multiplexer chip. Has modes to implement four
functional units: the global address router, the local data router,
the global data router, and the 144/576-bit wide pack/unpack
on the memory module.

Glossary-11
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Index

Symbols abort_detach 9-63


$SSPETC 3-14 accounts
$SSPLOGGER 3-15 SSP 3-11
$SSPOPT 3-15 active control board 3-40
$SSPVAR 3-15 add_install_client 6-7
$SUNW_HOSTNAME 3-14 adding detach-safe devices 9-53
.postrc 7-15 alternate pathing
logfile 7-16 boot-device devalias 8-76
mem_board_interleave_ok 9- database partition 8-23
58 delete meta-device 8-72
/etc/hosts 3-8 primary network
/etc/inet/ntp.conf 6-20 considerations 8-47
/etc/init.d/sysetup 10-25 amount of memory
/etc/inittab 3-34, 7-4 attachable 9-33
/etc/notrouter 3-7 AP
/etc/opt/SUNWssp 3-14 boot 8-74
/etc/opt/SUNWxntp/ntp.conf recovery 8-80
6-21 concepts 8-14
/etc/ssphostname 6-15 configuration 8-20
/etc/system 9-53, 9-80, 9-82, FDDI 8-41
10-21, 10-38 installing 8-13
/opt/SUNWssp 3-15 meta-device
/tftpboot 3-41 drivers 8-5
/var/opt/SUNWssp 3-15 mirroring 8-21
/var/opt/SUNWssp/adm 3-15 primary network
~ netcon commands 5-39 interface 8-47
single user mode 8-82
Solstice Disk Suite 8-10, 8-66
A state database 8-23
A3000 8-10, 8-54, 9-51 supported devices 8-10
A5000 8-10, 8-54 versions 8-12
abort button 9-22, 9-67 Volume Manager 8-10, 8-65
abort_attach 9-21

Index-1
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
AP database boundary scan 2-26
create 8-26 bringup 1-31, 7-19
deleting 8-31 control flow 7-8
on AP disks 8-28 Hostview 5-37
refresh 8-27 syntax 7-9
status 8-29 buttons
apboot 8-64, 8-75 abort 9-22, 9-67
mirrored drive 8-78 complete 9-22, 9-67
apconfig 8-29 CPU 9-31
apdb 8-26, 8-31 device 9-35
apdisk 8-60 dismiss 9-22, 9-67
apinst 8-55 force 9-67
apnet 8-38 help 9-22, 9-67
arbstop 10-31 reconfig 9-22, 9-67
array disk select 9-23
device alias B-5
attach C
complete 9-20, 9-25
dr shell 9-18 cabinet components 2-6
Hostview 9-22 cb_prom 4-16
init 9-19 cb_reset 4-16
attach buttons 9-22 cbe (control board
attachable memory 9-33 executive) 3-36
autoconfig 10-13 cbs (control board server) 3-36,
4-15
centerplane configuration 5-35
B centerplane support board 2-27
blacklist 1-29, 7-17 cmdtool 5-32
blacklisting components 5-46 colors
clearing 5-51 Hostview domain 5-33
Hostview 5-49 Hostview processors 4-36
processors 5-50 command line
board detach 9-39 creating domains 5-22
board location removing domains 5-26
PCI 7-37 commands
SBus 7-36 add_install_client 6-7
board, attach 9-23 apboot 8-75
boot apconfig 8-29
AP apdb 8-26, 8-31
mirrored disk 8-78 apdisk 8-60
recovery 8-80 apinst 8-55
AP disk 8-74 apnet 8-38
domain 7-8 autoconfig 10-13
SSP 7-4 bringup 7-19
boot-device and AP 8-76 cmdtool 5-32
bound threads 9-31 domain_create 5-22

Index-2 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
domain_history 5-18 NTP A-1
domain_remove 5-26 2.5.1 6-21
domain_rename 5-29 2.6 6-20
domain_status 5-17 packages 3-29
domain_switch 5-20 saving 3-16
download_helper 7-21 Solaris 3-18
drshow 9-20 SSP
edd_cmd 4-20 boot server 6-6
fan 4-51 SSP network 3-6
hostinfo 4-29 configuration requirements,
hostint 10-24 domain 5-7
hostreset 10-26 configuring NTP 4-22
modunload 9-55 connecting the SSP 1-29
netcon control 5-39 console
netcon_server 7-23 paths 7-25
netcontool 5-40 control board 2-27, 2-29
ntpq A-8 active 3-40
ntptrace A-8 configuration
obp 7-28 changing 3-42
obp_helper 7-19 file 3-37
psrset 5-52 dual configuration 3-35
redx 10-35 loading 3-41
savecore 10-25 network configuration 3-6
sigbcmd 10-24 switching active 3-39
sys_id 5-12 control board executive
xntpdc A-8 (cbe) 3-36
complete button 9-22, 9-67 control board server (cbs) 3-36,
complete_attach 9-25 4-15
complete_detach 9-42 CPU button 9-31
dr shell 9-63 CPU configuration window 9-31
Hostview 9-68 creating
concurrent serviceability 1-37 domains
configuration command line 5-22
AP 8-20 Hostview 5-24
centerplane 5-35 eeprom.image 5-10
control board 3-37 cvcd 7-25
changing 3-42
control board network 3-6 D
domain network 3-7
meta-disk 8-57 daemons
meta-network 8-36 cbs 4-15
network 3-5 edd 4-17
network planning event handling 4-19
worksheet 3-9 fad 4-21
SSP 4-14
delete meta-device 8-72

Index Index-3
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
detach 9-39 preinstalled software 6-22
buttons 9-67 properties 1-10
dr shell 9-61 removing
FDDI 9-48 command line 5-26
Hostview 9-64 Hostview 5-28
memory errors 9-60 rename
network devices 9-48 command line 5-29
non-network devices 9-50 renaming
processors 9-42 Hostview 5-31
detach-safe devices SSP control 5-21
adding 9-53 status
detach-safe list 9-53 command line 5-17
detach-unsafe 9-53 Hostview 5-19
devalias 6-9, B-1 switching 5-20
devalias boot-device and domain messages files 5-18
AP 8-76 domain_create 5-22
device alias B-1 domain_history 5-18
array disk B-5 domain_remove 5-26
disk B-3 domain_rename 5-29
network interface B-7 domain_status 5-17
device button 9-35 domain_switch 5-20
device tree 7-36 domains
OpenBoot PROM 7-36 configuration
diagnosing problems 10-15 requirements 5-7
disk download_helper 7-21
device alias B-3 DR and processor sets 5-53
pathgroup DR detach 9-39
automatic switch 8-70 DR overview 1-15
components 8-53 dr shell
create 8-60 attach 9-18
delete 8-72 detach 9-61
switch 8-68 drain 9-41
viewing 8-51 dr shell 9-62
dismiss button 9-22, 9-67 percent complete 9-34
DMP 8-65 drain button 9-67
domain driftfile A-6
/etc/hosts 3-8 driver
bringup AP meta-device 8-5
Hostview 5-37 dr-max-mem environment
creating variable 7-33, 9-8, 9-33
command line 5-22 drshow 9-20, 9-29
Hostview 5-24 dump
environment variables 7-33 hardware 10-33
Hostview colors 5-33 panic 10-25
netcon window 5-32
network configuration 3-7

Index-4 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
E H
edd 4-17 hardware dump 10-33
event handling 4-19 help button 9-22, 9-67
edd_cmd 4-20 hostid 5-16
eeprom.image 7-29 hostinfo 4-29
backup copies 7-31 hostint 10-24
creating 5-10 hostreset 10-26
enabling detach 9-45 Hostview 4-30
encapsulated root 8-74 attach 9-22
environment variables blacklist 5-49
domain 7-33 buttons
dr-max-mem 9-8, 9-33 complete 9-25, 9-68
SSP 3-14 drain 9-67
error init_attach 9-22, 9-24
logging 1-39 memory 9-32
logs 4-41 colors
memory 10-20 processors 4-36
Ethernet address 5-16 controlling fans 4-53
exclusive session (netcon) 5-43 detach 9-64
buttons 9-67
F domain
bringup 5-37
fad 4-21 colors 5-33
fan 4-51 creating 5-24
controlling in Hostview 4-53 removing 5-28
monitoring in Hostview 4-54 renaming 5-31
speed 4-53 status 5-19
tray display 4-55 icons, meaning of 4-35
FDDI main window 4-33
AP 8-41 memory configuration
detach 9-48 window 9-32
meta-network names 8-37 monitoring fans 4-54
features monitoring power 4-47
SSP 1-21 monitoring temperature 4-49
fencing netcontool 5-40
processors 5-52 performance
figure of merit 1-30, 10-5 considerations 4-32
file symbols, meaning of 4-35
domain messages 4-41 windows
eeprom.image 5-10 detach properties 9-66
log 4-41 hostview 1-21
force button 9-67 hot swap 1-37
hpost
.postrc 7-15

Index Index-5
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
-D option 10-33 memory
syntax 7-12 configuring for detach 9-58
hung host 10-28 draining, detach 9-41
errors 10-20
I detach 9-60
interleaving 9-58
I/O devices, configuring for pageable 9-69
detach 9-46 permanent 9-69
icons, Hostview 4-35 reduction, detach 9-34
ID PROM 5-10, 7-29 subsystem 2-22
idn-smr-size 7-35 usage 9-8
init_attach memory attach capacity 9-33
button 9-24 message files
dr shell 9-19 domain 4-41, 5-18
Hostview 9-22, 9-24 meta-disk 8-15
installation configuration 8-57
AP 8-13 meta-network 8-18
domain 6-11 configuration 8-36
SMCC software mirrored drive
packages 6-16 apboot 8-78
Solaris 3-18 mirroring
SSP software packages 3-26 AP 8-21
xntp 3-22 modunload 9-55
Inter-Domain Networking 5-6 monitoring power,
interface card locations 7-36 Hostview 4-47
interleaving memory 9-58 monitoring temperature,
Hostview 4-49
J
JTAG 1-37, 2-26 N
name service 3-24
L netcon 1-22, 1-26, 5-39
locations control commands 5-39
interface card 7-36 data paths 7-25
processor 7-38 overview 5-38
locked write (netcon) 5-43 session types 5-42
log file netcon ~ commands 5-39
domain message 10-4 netcon_server 7-23
SSP 4-41 netcontool 5-40
logfile .postrc directive 7-16 buttons 5-45
terminal type 5-44
network
M AP
MAC address 5-16 primary interface 8-47
mem_board_interleave_ok 7-16, AP devices 8-11
9-58 console paths 7-25

Index-6 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
detach 9-48 P
inter-domain 5-6 packages
pathgroup configuration 3-29
create 8-38 pageable memory 9-69
delete 8-45 panic 10-24
switch 8-43 pathgroup
pathgroup viewing 8-33 disk 8-16
planning worksheet 3-9 automatic switch 8-70
network configuration 3-5 components 8-53
control board 3-6 create 8-60
domain 3-7 delete 8-72
SSP 3-6 switch 8-68
Network Console window 1-26 viewing 8-51
network interface network 8-19
device alias B-7 create 8-38
network planning 3-4 delete 8-45
non-network devices switch 8-43
detach 9-50 viewing 8-33
NTP pbind 9-31
configuration A-1 PCI
2.5.1 6-21 slot decoding 7-37
2.6 6-20 PCI card locations 7-36
configuring 4-22 peer 6-21
driftfile A-6 percent complete
files A-2 drain 9-34
server strata A-3 permanent memory 9-69
SSP 3-22 planning
sychronization A-4 network 3-4
ntpq A-8 port specification
ntptrace A-8 control board 3-41
nvalias 6-9, B-1 POST 7-12
NVRAM 5-10 power
monitoring power,
O Hostview 4-47
OBP 7-26 power command 4-44
device tree 7-36 Power On Self Test (POST) 1-22
obp command 7-28 primary network and AP 8-47
obp_helper 7-19 processor
OpenBoot PROM 1-22, 1-32, Blacklist
7-19, 7-26, 7-28, B-1 Hostview 5-50
device alias 6-9 detach 9-42
device tree 7-36 locations 7-38
environment variables 7-33 processor sets 5-52
DR 5-53

Index Index-7
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
PROM serial number 5-16
OpenBoot 7-26 session types
properties of domains 1-10 netcon 5-42
SEVM 8-65
Q sigbcmd 10-24
single user mode
quiesce 9-88 AP 8-82
example 9-83 sir-sync? 7-34
failures 9-75 SMCC
purpose 9-73 software packages
2.5.1 6-18
R 2.6 6-16
RAS software
concurrent serviceability 1-37 installing packages 3-29
error logging 1-39 software packages
read only session (netcon) 5-42 SMCC
real time thread 9-74 2.5.1 6-18
reboot request 10-23 2.6 6-16
reconfig button 9-22, 9-67 SSP 3-12
reconfiguring installation 3-26
SSP 3-32 Solaris 6-4
redlist 7-18 SSP version 1-21
redmode 10-26 Solaris, installation and
redmode-reboot? 7-34 configuration 3-18
redmode-sync? 7-34 Solstice Disk Suite 8-10
reduction of memory, AP 8-66
detach 9-34 source domain, attach 9-23
redx 10-35 speed
removing domains controlling fan 4-53
Hostview 5-28 ssi-smr-size 7-35
rename SSP
domain /etc/hosts 3-8
command line 5-29 accounts 3-11
Hostview 5-31 boot 7-4
reset 10-26 boot server 6-6
handling 7-34 configuring 3-28, 3-29
restarted daemons 7-7 daemons 7-7
restart 7-7
domain control 5-21
S domain messages files 5-18
savecore 10-25 environment variables 3-14
saving configuration files 3-16 features 1-21
SBus files
slot decoding 7-36 backup 3-16
SBus card location 7-36 network configuration 3-6
select button 9-23 network privacy 3-7

Index-8 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
reconfiguring 3-32 terminal type
software packages 3-12 netcontool 5-44
Solaris version 1-21 testing
type SunVTS 10-16
changing 3-33 threads
user environment 1-25 bound 9-31
window 1-25
SSP daemons 4-14 U
ssp_config 3-28, 3-32
ssp_startup 7-6 ultra port architecture 2-24
ssp_startup.main 7-7 unlocked write (netcon) 5-42
ssp_startup.restart_main 7-7 unsafe device 9-77
ssphostname 6-15 updates CD
state database contents 6-17, 6-19
AP 8-23 user environment 1-25
storage array disk
device alias B-5 V
strata versions
NTP A-3 AP 8-12
stratum-2 server A-3 viewing system
stratum-3 server A-3 information 9-28
SunVTS 10-16 Volume Manager 8-10, 8-65
support boards 2-27 boot 8-74
supported devices DMP 8-65
AP 8-10 vxdctl 8-65
suspend-bypass devices 9-81 VxVM 8-65
suspend-unsafe device 9-77 boot 8-74
swap space, configuring for
detach 9-56
switch W
active control board 3-39 watchdog 10-26
domains 5-20 watchdog-reboot? 7-35
symbols in Hostview 4-35 watchdog-sync? 7-34
synchronization subnet A-3 windows
sys_id 5-12 CPU configuration 9-31
system information, Hostview
viewing 9-28 detach properties 9-66
System Service Processor 1-21 Hostview main window 4-33
sys-unconfig 3-32 memory configuration 9-32
netcon 1-26
T netcontool 5-44
Network Console
target domain, attach 9-23
window 1-26
temperature
SSP window 1-25
monitoring termperature,
worksheet
Hostview 4-49
network planning 3-9

Index Index-9
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
X
xir 10-26
xir-sync? 7-34
xntp
installing
SSP 3-22
xntpdc A-8

Index-10 Ultra Enterprise 10000 Administration


Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService June 1998
Copyright 1998 Sun Microsystems Inc., 901 San Antonio Road, Palo Alto, Californie 94303, U.S.A. Tous droits réservés.
Ce produit ou document est protégé par un copyright et distribué avec des licences qui en restreignent l’utilisation, la
copie, la distribution, et la décompilation. Aucune partie de ce produit ou de sa documentation associée ne peut être
reproduite sous aucune forme, par quelque moyen que ce soit, sans l’autorisation préalable et écrite de Sun et de ses
bailleurs de licence, s’il y en a.
Des parties de ce produit pourront être dérivées du système UNIX® licencié par Novell, Inc. et du système Berkeley 4.3
BSD licencié par l’Université de Californie. UNIX est une marque enregistrée aux Etats-Unis et dans d’autres pays et
licenciée exclusivement par X/Open Company Ltd. Le logiciel détenu par des tiers, et qui comprend la technologie
relative aux polices de caractères, est protégé par un copyright et licencié par des fournisseurs de Sun.
Sun, Sun Microsystems, le logo Sun, sont des marques déposées ou enregistrées de Sun Microsystems, Inc. aux Etats-Unis et
dans d’autres pays. Toutes les marques SPARC, utilisées sous licence, sont des marques déposées ou enregistrées de SPARC
International, Inc. aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une
architecture développée par Sun Microsystems, Inc.
Les interfaces d’utilisation graphique OPEN LOOK® et Sun™ ont été développées par Sun Microsystems, Inc. pour ses
utilisateurs et licenciés. Sun reconnaît les efforts de pionniers de Xerox pour la recherche et le développement du concept
des interfaces d’utilisation visuelle ou graphique pour l’industrie de l’informatique. Sun détient une licence non exclusive
de Xerox sur l’interface d’utilisation graphique Xerox, cette licence couvrant aussi les licenciés de Sun qui mettent en
place l’interface d’utilisation graphique OPEN LOOK et qui en outre se conforment aux licences écrites de Sun.
Le système X Window est un produit de X Consortium, Inc.
CETTE PUBLICATION EST FOURNIE “EN L’ETAT” SANS GARANTIE D’AUCUNE SORTE, NI EXPRESSE NI
IMPLICITE, Y COMPRIS, ET SANS QUE CETTE LISTE NE SOIT LIMITATIVE, DES GARANTIES CONCERNANT LA
VALEUR MARCHANDE, L’APTITUDE DES PRODUITS A RÉPONDRE A UNE UTILISATION PARTICULIERE, OU LE
FAIT QU’ILS NE SOIENT PAS CONTREFAISANTS DE PRODUITS DE TIERS.

Please
Recycle

You might also like