Dolphin Express for MySQL

Installation and Reference Guide

Dolphin Interconnect Solutions ASA

Dolphin Express for MySQL: Installation and Reference Guide
Dolphin Interconnect Solutions ASA

by This document describes the Dolphin Express software stack version 3.3.0. Published November 13th, 2007 Copyright © 2007 Dolphin Interconnect Solutions ASA
Published under Gnu Public License v2

...............................................................2................................................................................................................................1........................ Cabling Instructions ....... Interconnect Topology ... 17 4........................................ 1 4................2....................................7...3............................. 7 3............. Rolling Update ...... 18 4...........2................................................................. 19 4.... Working with the dishostseditor ..... No X / GUI on Frontend .................. Supported Platforms ..............2....................................................................................3...................................................1....8................................................ Hardware .......................5.. Interconnect Planning ................ 2 2............4.................................................. 8 4......... 5 3................6. How do Dolphin Express and SuperSockets work? ................. 4 2...................................... 22 4..................... Supported Platforms . Making Cluster Application use Dolphin Express ....................................................................................... Overview .............................................................................................2......................................................................... 7 2........ Software and Cable Installation ....................................... Software ........1........ 23 4....................2.......... 4 1................................. Connecting the cables ....................................1.............................................................................................................................. 6 2................................................................... Finalising the Software Installation ............................................................ Windows ................................. Installing sciadmin .............................................. 20 4............................................................... 26 4..................................... 4 2... 4 2......................................................................................... Cluster Cabling .2........ 4 1..........1........2....................................... Others .........................................................1..............................................7......................................................................................................................3.......................................... Live Installation ...................2......................... 1 2.........................7.................................... 6 1.............. 7 2..... Cabling Correctness Test ...................................7....................................................... 3 1..4.. Introduction & Overview ...................................................................... Adapter Card Installation ................. 20 4...........................1........................5....3.............. 3 1...................................... 21 4........................ Who needs Dolphin Express and SuperSockets? ................ Native SCI Applications .........1........... Starting the Software Installation ................................................ 8 4.......................................................................................... 17 4........................ 25 4.......................................2...... Solaris . 4 1....................................2...................... Fabric Quality Test ...... 21 4...................................................... MySQL Cluster ..........................8........ 27 1.2.......................................Abstract ................1...... 5 2...2.... Starting sciadmin ..................................................... Handling Installation Problems ................................................. 13 4...... 9 4....................................................... Static SCI Connectivity Test ...1............... 3 1................ No X / GUI Anywhere .............................................................................................................................................. Kernel Socket Services ................................................................................................ Recommended Frontend Hardware ............................................... Interconnect Validation with sciadmin ......... 13 4..................... 20 4. 3 1................. Requirements & Planning ................................................................................................................................................................. 23 4............ 3 1. Linux ........................................................... 6 2........... Cluster Overview .......................................................2..... 14 4.... 3 1.7..... Verifying the Cabling ........... Update Installation ........... Nodes to Equip with Dolphin Express Interconnect ........... Physical Node Placement ..........................................3..............................................................3.......................................3........................................................................................................................................... Installation Requirements ............. 8 4...................... SuperSockets Configuration Test ...................... 1 3........... SuperSockets Performance Test ... Node Arrangement ...........1............3.......................... 22 4.................................8... Overview .... 1 1........ 16 4.........4.................................. Generic Socket Applications ..........................................1. Initial Installation ........................................ 21 4......... Recommended Node Hardware .... Cluster Edit ................ Contact & Feedback: Dolphin Support ........................................................................3...............................................2..................................1....1................1............1..... 27 iv .......................5..........................................................................5....................................................3..........4...................1................3...................2........................ Complete Update ..........................................2...........................4.................................... 27 2. 3 1........... 7 2....................................2................................8........... Non-GUI Installation ....... 20 4........................... 19 4................................ vii 1.................................................................7...............................5................................ 4 2.. Terminology ...........................

.................................................................................. 1............................................................. Notification Interface .......... SuperSockets Functionality and Performance ................... Verifying Functionality and Performance .......... 1... SuperSockets Functionality ..................................................... Hardware ............................... SuperSockets Status . Advanced Topics ................................................................................2............... Notification on Interconnect Status Changes .................... 6...1........ 1.............................. RPM Package Structure ......................................2.............................. Installing from Binary RPMs ........... 1....................................................... 2............................4.................... Replacing a PCI-SCI Adapter ............................ Updates with Modified IRM Configuration .. 4....................................................... 2...................................... 7................................................................................................................................................ 2....................... 1......................................... 1... 2.... Interconnect and Software Maintenance ...... Setting Up and Controlling Notification .......................................................................................4....3.......... 3............ 2...............3........................................................................................................... Cable Connection Test ............................................. Low-level Functionality and Performance ............. 1............................................................................................................. 1............ 3......................2.......................... 6........ Adding Nodes ................................. Installation Path Specification .................. Self-Installing Archive (SIA) Reference ...................................................................................................................................................................................................................................................................................... Manual Installation .......... 2......................... SCI Transporter ................... Extraction of Source Archive ...........1.... Installation of a Heterogeneous Cluster ............................... Availability of Drivers and Services .. 1................................... 1................ SIA Options ............................2.... Replacing SCI Cables ............................. FAQ ........................................................ 1........ 1......................................................1.......... SuperSockets Poll Optimization .............................................................3........................................................................................................ Disabling and Enabling Notification Temporarily .......................3.............1..............2....2......................... 1... MySQL Replication ................................................... Full Cluster Installation ............................ 2........................................ Interconnect Status ..................Dolphin Express for MySQL 5....... 1...................................................................................................................................... 2........... 1........3.. Static Interconnect Test ....................... 1..........1................... 2......................................... Interconnect Load Test .................................................................................................... 1...................... 1.................... SuperSockets Utilization ............................................... Physically Moving Nodes .. NDBD Deadlock Timeout ... 2..................................................................... Building RPM Packages Only ..................................... Configure Notification via the dishostseditor ........................... Interconnect Performance Test ..... Configure Notification Manually .. Installation under Load .3.................................................. 1.....................1...............5..................................................... 30 30 32 33 33 34 34 37 37 37 37 37 37 39 41 42 42 42 43 44 45 45 46 46 47 49 49 49 49 49 50 51 51 51 51 52 52 52 53 53 53 53 55 55 56 59 59 59 59 59 59 59 59 60 60 60 60 60 v .......................................... 1................. 1................................ 3............................................2......................... 1........ 3........................... Removing Nodes ....................................... 4.......................... 1.3.......1...1........ 1.................................2.........................................4.... MySQL Cluster ............ 1........................1........................................... Installation of Configuration File Editor ...............................................2....1...... 1.......................................... RPM Build and Installation ............3............................................................................. Software ..........................................................................4............................................ Manual RPM Installation .....................3............. Replacing a Node .....1........ Node Specification ................................................................................................................................................................. Preallocation of SCI Memory ..........................................................................6.........................1.......................................3..............................................................2...................... 1........................... MySQL Operation ..... 7.............. 1............................................................................................................................................................................... 1................... SIA Operating Modes .............................................................................................................................................................................2............... 2............. Verifying Notification ...... Node Installation ..................................................................1............. 8....................... Frontend Installation .... 1............. 1........................... Unpackaged Installation .......................................1..........................................................................1....... 9..................................... A.........................................3..........5......... 1................................................ Managing IRM Resources .. 5.....................................................

..conf ................................................ 2............... dis_irm.................................................... SuperSockets ............................. Node Status ........................ Node Menu ............................ SuperSockets settings .............................1... 2.Dolphin Express for MySQL 2........................................................................... 3............................................................................. supersockets_profiles................ Operation ................................... D............5....................................................... 2.....3..................................................................................... 1.............................2...........................................................................................................................................................1...........1.................................................................2...2................................................................. B.....................................................................conf ........................................... cluster...... Enforce Installation ........................ Admin Menu ........... 3.......... 2................................ supersockets_ports........... 4..................1.............................. Configuration Files ......................................................conf .......1...5......................... 3....................................................................................2.................................................................................................... Logging and Messages ................................................................ 4.................................3... 3....................................................... 2................................. Miscellaneous Notes ...2...................... 2......... 1.......................... 3........................ 3.............. Traffic Test ....conf ................ 1................................ Interconnect Status View ................................................................conf .....................3.............1..1......................... 2........................ Node and Interconnect Control .............1.............................. Non-GUI Build Mode ................8.........................9.... Software Removal ........................................................................................................................................................ SuperSockets Configuration ............6................................................... 3...................... Platform Issues and Software Limitations ......1................. 4...... 61 61 61 61 62 63 63 63 63 64 64 68 68 68 69 70 70 72 73 73 75 78 78 78 78 79 80 80 80 80 80 81 81 82 82 82 84 84 86 86 86 86 vi ........................................... 2......... dis_ssocks....................................................................... 2..................................................................................................................................................... Cluster Menu ...................... 3....................................................................... 1................ Cluster Settings ... 1.............................................. Icons .................................................2.......... Driver Configuration ................................................................................. Batch Mode .............................................3....... Basic settings ...... sciadmin Reference .. 3.............................1............................ 1..........1....... Interconnect Testing & Diagnosis ....................2... Cluster Configuration ........................ 1...2.................................................................................conf .................1.......1.......................................................... Memory Preallocation .........................................2................................................................................. 3...................................... 2................. 2............................ dishosts................................................................................... Configuration File Specification ........................................................................................................... 1........ 1.........................1.......1................................ 3...... IRM ........ Platforms with Known Problems ..........................4. networkmanager............................................. Cable Test .......7. Startup ................................................ 3................. 3........................2............conf .................................................2............................................................................... Resource Limitations ................................................................... 2. 2................................................................ C........................... Adapter Settings ............................................ Cluster Status ...........................

Abstract This document describes the installation of the Dolphin Interconnect Solutions (DIS) Dolphin Express interconnect hardware and the DIS software stack. on single machines or on a cluster of machines. documentation and more. an SDK. including SuperSockets. user space libraries and applications. vii . SuperSockets drastically accelerate generic socket communication as used by clustered applications. This software stack is needed to use Dolphin's Dolphin Express high-performance interconnect products and consists of drivers (kernel modules).

Who needs Dolphin Express and SuperSockets? Clustered applications running on multiple machines communicating via an Ethernet-based network often suffer from the delays that occur when data needs to be exchanged between processes running on different machines. etc. a link is the cable leading from the output of one adapter to the input of another adapter. the user-space library operates between the unmodified binary of the applications and the operating system and intercepts all socket-related function calls. How do Dolphin Express and SuperSockets work? The Dolphin Express hardware provides means for a process on one machine to write data directly into the address space of a process running on a remote machine. The CPU architecture relevant in this guide is characterized by the addressing width of the CPU (32 or 64 bit) and the instruction set (x86. it can fall back and forward to Ethernet transparently even when the socket is under load. This bypass moves data directly via the high-performance interconnect and thereby reduces the minimal latency typically by a factor of 10 and more with 100% binary application compatibility. For a two-dimensional torus topology like node CPU architecture link ringlet 1 . PCI-Express-to-SCI (D35x series) or PCI-Express fabric (DXH series) adapter. For larger clusters. the library makes a first decision if this function call will be processed by SuperSockets or the standard socket implementation and redirects it accordingly. Using this combined software/hardware approach with MySQL Cluster. Physically. Introduction & Overview 1. The implementation on kernel-level makes sure that the SuperSockets socket-implementation is fully compatible with the TCP/UDP/IP-based sockets provided by the operating system. If these two characteristics are identical. which means it has an adapter installed. or using the DMA engine of the Dolphin Express interconnect adapter (for lowest CPU utilization). For an SCI interconnect configured in torus topology. Dolphin Express is a combination of a high-performance interconnect hardware to replace the Ethernet network together with a highly optimized software stack. Terminology We define some terms that will be used throughout this document. throughput improvements for the TPC-C like DBT2 benchmark of 300% and more have been measured already on small clusters. 3. adapter A PCI-to-SCI (D33x series). By being explicitly preloaded. this advantage continues to increase as the communication fraction of the processing time will increase. the links are connected as closed multiple closed rings. This can be done using either direct store operations of the CPU (for lowest latency). Based on the system configuration and a potential user-provided configuration. 2. All nodes together constitute the cluster. SuperSockets consists of both. the CPU architecture is identical for the scope of this guide. Sparc. These delays caused by the communication time make processes wait for data when they otherwise could perform useful work. The SuperSockets kernel module then performs the operation on the Dolphin Express interconnect. If necessary. One part of this software stack is SuperSockets which implement a bypass of the TCP/UDP/IP protocol stack for standard socket-based inter-process communication. A directed point-to-point connection in the SCI interconnect.). kernel modules and a user-space library. This is the Dolphin Express hardware installed in the cluster nodes.Chapter 1. A computer which is part of the Dolphin Express interconnect.

It largely simplifies the deployment and management of a Dolphin Express-based cluster. but you can choose to build the kernel modules on any other machine that fulfills the requirements listed above. The node manager provides is a daemon process that is running on each node and provides remote access to the interconnect driver and other node status to the network manager. installation machine kernel build machine cluster network manager node manager self-installing archive (SIA) Scalable Coherent Interface (SCI) SISCI 4. It reports status and performs actions like configuring the installed adapter or changing the interconnect routing table if necessary. like SuperSockets and SISCI. which means via Ethernet.these are not installed by default on most distributions. or have suggestions for improvement. Scalable Coherent Interface is one of the interconnect implementations that can be used with Dolphin Express software. these rings can be considered to the the columns and rows. the kernel will refuse to load them). like DSX.com>. Despite its inherited name. It is part of the Dolphin software stack and manages and controls the cluster using the node managers running on all nodes. please don't hesitate to contact Dolphin's support team via <support@dolphinics. the kernel-specific include files and kernel configuration have to be installed . 2 . For updated versions of the software stack and this document. Instead. Contact & Feedback: Dolphin Support If you have any problems with the procedures described in this document. You will need to have one kernel build machine available which has these files installed (contained in the kernel-devel RPM that matches the installed kernel version) and that runs the exact same kernel version as the nodes. please check the download section at http://www. SCI is an IEEE standard. it also supports other interconnect implementations offered by Dolphin. although this is possible. but has network (ssh) access to all nodes and the frontend. the implementation offered by Dolphin are the D33x and D35x series of adapter cards. Typically. The network manager knows the interconnect status of all nodes.dolphinics. SISCI (Software Infrastructure for SCI) is the user-level API to create applications that make direct use of the Dolphin Express interconnect capabilities. The interconnect drivers are kernel modules and thus need to be built for the exact kernel running on the node (otherwise. The network manager is a daemon process named dis_networkmgr running on the frontend. These rings are called ringlets. the frontend should communicate with the nodes out-of-band. the frontend should not be part of the Dolphin Express interconnect it controls. To build kernel modules on a machine. This machine is the installation machine. All nodes constitute the cluster. frontend The single computer that is running software that monitors and controls the nodes in the cluster. but can also be executed on another machine that is neither a node nor the frontend.Introduction & Overview when using D352 adapters. The installation script is typically executed on the frontend.com. A self-installing archive (SIA) is a single executable shell command file (for Linux and Solaris) that is used to compile and install the Dolphin software stack in all required variants. the kernel build machine is one of the nodes itself. For increased fault tolerance.

Generally. we qualify certain platforms with our partners which then are guaranteed to run and perform optimally the qualified application. and also supports and adapts to platforms that are several years old to ensure long-term support. If you have questions about your specific hardware platform. However. 1. please contact Dolphin support. 1. 3 . Software Dolphin Express supports a variety of operating systems that are listed below.and software platforms. 1. However.1.1. Supported Platforms The Dolphin Express hardware (interconnect adapters) complies to the PCI industry standard (either PCI 2. The Dolphin Express interconnect is fully inter-operable between all supported hardware platforms. Recommended Node Hardware The hardware platform for the nodes should be chosen from the Supported Platforms as described above. The power consumption of Dolphin Express adapters is between 5W and 15W (Consult the separate data sheets for detail). The frontend requires a reliable Ethernet connection to all nodes. Next to the Dolphin Express specific requirements.Chapter 2. please contact Dolphin support for availability. 1. PowerPC and PowerPC64. 1.2. Dolphin strives to support every platform that can run any version of Windows. you need to consult your MySQL Cluster expert / consultant on the recommended configuration for your application. half length PCI/PCI-X/PCI-Express slot available. Recommended Frontend Hardware The frontend does only run a lightweight network manager service which does not impose special hardware properties. For details. Supported CPU architectures are x86 (32 and 64 bit).1. Next to this general approach. Supported Platforms The Dolphin Express software stack is designed to run on all current cluster hard. please see Appendix D. the frontend should not be fully loaded to ensure fast operation of the network manager service.2 64bit/66MHz or PCI-Express 1. As usual. Requirements & Planning Before you deploy a Dolphin Express solution by either adding it to an existing system. care must be taken by the applications if data with different endianess is communicated.2. some considerations on selection of products and the physical setup are necessary. Note Half-height slots can be used with Dolphin DXH-series of adapters. also with different PCI or CPU architectures. You need to make sure that each node / machine has one full height.1.0a) and will thus operate in any machine that offers compliant slots. A few cases are documented in which bugs in the chipset have shown up with our interconnect as it puts a lot of load onto the related components. please see Appendix D.1.3. Sparc and IA-64. Platform Issues and Software Limitations. some combinations of CPU and chipset implementations offer sub-optimal performance which should be considered when planning a new system. Platform Issues and Software Limitations. A half-height version of the SCIadapters will be available soon. For the hardware platforms qualified or tested with Dolphin Express. Hardware 1. Linux or Solaris and offers a PCI (Express) slot. We also test platforms internally and externally for general functionality and performance. or by planning it into a new system.

please refer to the FAQ chapter. Machines that serve as application servers (clients sending queries to MySQL servers processes) typically have little benefit of being part of the interconnect.4.Requirements & Planning 1. TBA: More information on the available components under Windows. Note Support for Solaris 10™ on Sparc™ and AMD64 (x86_64) including SuperSockets is under development. doing so will introduce new bottlenecks.6.1. Please refer to the release notes of the software stack version you are about to install for the current list of tested Linux distributions and kernel versions.1. 32-bit and 64-bit runtime environments. We provide MSI binary installer packages for each Windows version. 1.2. all machines that run either NDB or MySQL server processes should be interconnected with the Dolphin Express interconnect. On 64-bit platforms offering both. Although it is possible to only equip a subset of the machines with Dolphin Express. Windows 2003 Server and Windows XP. MySQL Cluster For best performance. Software stacks operating on different kernel versions are of course fully inter-operable for inter-node communication. Linux The Dolphin Express software stack can be compiled for all 2. Nodes to Equip with Dolphin Express Interconnect Depending on the application that will run on the cluster. Solaris Solaris™ 2. Ask Dolphin support for the current status. Windows 2000. 1.1. 2. 2. the choice of Dolphin Express Interconnect equipped machines differs.2.3. 4 . using the SISCI interface provided by Dolphin. Please analyze the individual scenario for a definitive recommendation. Windows The Dolphin Express software stack operates on 32.0. Installation and operation on Linux distributions and kernel versions that are not in this list will usually work as well. but especially the most recent Linux version may cause problems if it has not yet been qualified by Dolphin. MySQL Cluster can be used via the SCI Transporter™ provided by MySQL. Interconnect Planning This section discusses the decisions that are necessary when planning to install a Dolphin Express Interconnect. 2. Others The Dolphin Express software stack excluding SuperSockets also run on VxWorks™. Contact Dolphin support with your requirements. For more information.2. LynxOS™ and HP-UX™. Otherwise.6 through 9 on Sparc is supported (excluding SuperSockets).4 kernel versions. A few extra packages (like the kernel include files and configuration) need to be installed for the compilation.1. Dolphin Express fully supports native 32-bit and 64-bit platforms. Q: 2.1.2.and 64-bit versions of Windows NT 4. only the native 64-bit runtime environment is supported. 1. SuperSockets will support 32-bit applications if the compilation environment for 32-bit is also installed.2. Dolphin does only provide source-based distributions which are to be compiled for the exact kernel and hardware version you are using.6 kernel versions and most 2.

Once you have decided on the physical node placement. This increases the bandwidth and adds redundancy at the same time. reducing the performance improvement of a second fabric. typically a 2D-torus built with D352 PCI-Express-to-SCI adapters. which allows to place nodes 5cm apart. it is possible to use fiber instead of copper for the interconnect. PCI performance does not scale well. For situations where nodes need to be placed in a significant distance. but it is possible to expand this cluster in a fault-tolerant way. you can build clusters of any size up to a recommend maximum of 256 nodes. Physical Node Placement Generally. it is possible to use a 3-D torus topology. It is possible to install more than one interconnect fabric in a cluster by installing two adapters into each node. For larger clusters. If this feature is important for you.and 4-node cluster. Connecting nodes or blades that are less than 5cm apart. like for 1U nodes in a 19" rack being a single rack unit apart. nodes that are to be equipped with Dolphin Express interconnect adapters should be placed close to each other to keep cable lengths short. Note For some chipsets. For large clusters. 2. The second dimension of this 2D adapter will not be used. the topology has to be the multi-dimensional torus. With D352. the cables should be ordered in the according lengths. Interconnect Topology For small clusters of just two nodes.Requirements & Planning Machines that serve as MySQL frontends (like to run the MySQL Cluster management daemon ndb_mgmd) do not benefit from the Dolphin Express interconnect. typically causes no problem as long as the effective bend radius is 25mm or more. When going to 4 nodes or more. special topologies without any fail-over delays can be built. 5 . Note The machine that runs the Dolphin network manager must not be equipped with the Dolphin Express interconnect as this reduces the level of fault tolerance. Please contact Dolphin support for other cabling scenarios. • Highest Performance: Connect the two nodes with one dual-channel D350 adapter in each node.2.3. Please ask Dolphin support for details. These interconnect fabrics work independently from each other and do very efficiently and transparently increase the bandwidth and throughput (a real factor of two for two fabrics) and add redundancy at the same time. 2. • Best Scalability: Use D352 adapters to connect the two nodes. contact Dolphin Support to make sure that you chose a chipset for which the full performance will be delivered. This is the typical topology for database clusters. a number of possible approaches do exist: • Lowest cost: Connect the two nodes with one single-channel D351 adapter in each node. This reduces costs and allows for better routing of the cables. The minimal cable bend radius is 25mm. For 3. The Dolphin sales engineer will assist you in selecting the right cable lengths. it makes sense to arrange the nodes in analogy to the the regular 2D-torus topology of the interconnect. The maximum cable length is 10m for copper cables.

2. a few RPM packages that are often not installed by default are required: qt and qt-devel (> version 3. Verification of installation requirements: • • 2. . This step is refined in section Section 4. This means that no Dolphin software is installed on the nodes or the frontend prior to these instructions. The recommended installation procedure. rpm-build.5). and the kernel header files and configuration (typically a kernel-devel or kernel-source RPM that exactly(!) matches the version of the installed kernel) 6 . Installation Requirements For the SIA-based installation of the full cluster and the frontend. the frontend and the installation machine needs to support RPM packages. Note On platforms that do not support RPM packages. Please see Section 4. Both major distributions from Red Hat and Novell (SuSE) use RPM packages. the following requirements have to be met: • Homogeneous cluster nodes: All nodes of the cluster are of the same CPU architecture and run the same kernel version. glibc-devel and libgcc (32. uses the Self-Installing Archive (SIA) distribution of the Dolphin Express software stack which can be used for all Linux versions that support the RPM package format. please refer to “Adding Nodes”. 1. “Installation Requirements”.Chapter 3. “Adapter Card Installation”). to add new nodes to a cluster. • Installed RPM packages: To build the Dolphin Express software stack. Overview The initial installation of Dolphin Express hardware and software will follow these steps that are described in detail in the following sections: 1. To update an existing installation. “Software and Cable Installation”.and 64-bit. Note The cables should not be installed in this step! 3. which is described in this chapter. Installation of interconnect adapters on the nodes (see Section 3.0. Installation on other Linux platforms is covered in section Section 4. please refer to Chapter 4. “Installation of a Heterogeneous Cluster” • RPM support: The Linux distribution on the nodes. Update Installation. Study section Section 2. The frontend machine may be of different CPU architecture and kernel version! Note The installation of the Dolphin Express software on a system that does not satisfy this requirement is described in Section 2. The setup script itself will also verify that these requirements are met and indicate what is missing. “Unpackaged Installation”. Installation of software and cables. Initial Installation This chapter guides you through the initial hardware and software installation of Dolphin Express and SuperSockets. it is also possible to install the Dolphin Express software. depending on what binary formats should be supported). “Unpackaged Installation” for instructions.

Note If the required configuration files are already available prior to the installation. If the qt-RPMs are not available. Note It is possible to assign SIA to use a specific temporary directory for building using the --build-root option.2. the Dolphin Express software stack can be built nevertheless. it is possible to run the installation on this machine.2. No X / GUI on Frontend If the frontend does not support running GUI applications.conf.2. you can still use the SIA-based installation. a GUI is not required (see section Section 2. All required RPM packages are within the standard set of RPM packages offered for your Linux distribution. The software installation can be performed under load. 2. It is needed once on the initial cluster installation. but the GUI applications to configure and manage the cluster will not be available. “Non-GUI Installation”) on how to install the software stack in this case. and each time nodes are added or removed from the cluster. although minor performance impacts are possible. For a description of this installation type. In this scenario. “Non-GUI Installation”). • Disk space: To build the RPM packages. It is only necessary to turn off each node once to install the adapter.2. Please see below (Section 2. report what packages might be missing and will offer to install them if the yum RPM management system is supported on the affected machine. “Installation under Load” 2. In this case. 2. • GUI support: for the initial installation.1. and the generated configuration files will be transferred to the frontend by the installer. the installation machine should be able to run GUI application via X. 7 . but may not be installed by default.2. This requires that the application running on the cluster can cope with single nodes going down. about 500MB free disk space in the system's temporary directory (typically /tmp on Linux) are required on the kernel build machine and the frontend. No X / GUI Anywhere If no machine in the network does have the capability to run GUI applications. the dishostseditor will be compiled. Live Installation Dolphin Express™ can be installed into a cluster which is currently under operation without stopping the cluster application from working. it is necessary to create the correct configuration files on another machine and store them in /etc/dis on the frontend before executing the SIA on the frontend (not on another machine). but another machine in the network does. please proceed as described in Section 1.conf and the network manager configuration file /etc/dis/networkmanager.2. 2. Non-GUI Installation The Dolphin software includes two GUI tools: • dishostseditor is a tool that is used to create the interconnect configuration file /etc/dis/dishosts.Initial Installation Note The SIA will check for these packages. installed and executed on the installation machine. This installation mode can be chosen by executing the SIA on the installation machine and specifying the frontend name when being asked for it. The only requirement is ssh-access from the installation machine towards the frontend and all nodes. • sciadmin is used to monitor and control the cluster interconnect. .1.

the hardware driver and additional kernel modules. the cables can safely be connected and disconnected with the nodes powered up as SCI is a fully hot-plug capable interconnect. An additional RPM for SISCI development (SISCI-devel) will be created for both. 4. It works as follows: 8 . Proceed this way with all nodes. you can send the necessary information to create the configuration files to Dolphin support which will then provide you with the matching configuration files and the cabling instructions.1. On the nodes. It can be installed as needed in case that SISCI-based applications or libraries (like NMPI) need to be compiled from source. Overview The integrated cluster and frontend installation is the default operation of SIA. or install a binary version of the dishostseditor if it is a Windows-based machine. frontend and nodes. Please be advised to inspect and verify the cabling for correctness as described in the remainder of this chapter. Note If you know how to connect the cables. The LED(s) on the adapter slot cover has to light orange. thus use the same slot for the adapter (if the node hardware is identical). you can do so now. Software and Cable Installation After the adapters are installed. 8x or 16x PCI-Express slot for D351 and D352 • 8x or 16x PCI-Express slot for D350 adapters Make sure you are properly grounded to avoid static discharges that may destroy the hardware. To create the the configuration files on another machine. close the enclosure and power up the node again. the network manager and the cluster administration tool will be installed. Generally. no GUI application is run at all during the installation. 4. This information includes: • external hostnames (or IP adresses) of all nodes • adapter type and number of fabrics (1 or 2) • hostnames (or IP adresses/subnet) which should be accelerated with SuperSockets (default is the list of hostnames provided above) • planned interconnect topology (default is derived from number of nodes and adapter type) • description of how nodes are physically located (to avoid cabling problems) 3. but can be specified explicitly with the --install-all option. though. You do not yet need to connect the cables at this point: detailed cabling instructions customized for the specific cluster will be created during the software installation which will guide you through the cabling. the software has to be installed next. When the card is properly inserted and fixed. Adapter Card Installation To install an adapter into a node. but will not be installed. it needs to be powered down. Insert the adapter into a free PCI slot that matches the adapter: • any type of PCI slot for D33x adapters • 4x. you can either run the SIA with the --install-editor option if it is a Linux machine. It is recommended that all nodes are set up identically. Finally. On the frontend. user space libraries and the node manager have to be installed.Initial Installation In this scenario.

. and that the default answer is marked by a capital letter. Self-Installing Archive (SIA) Reference.0.sh The script will ask questions to retrieve information for the installation. The kernel build machine needs to have the kernel headers and configuration installed. but can be any other machine if necessary (see Section 2.52 $ of: 2007/11/09 16:31:32 $) #+ #* Installing a full cluster (nodes and frontend) .Initial Installation • The SIA is executed on the installation machine with root permissions. • The node RPMs with the kernel modules are installed on all nodes. It will in turn configure all nodes according to the configuration files.3. Please note that the complete installation is logged to a file which is shown at the very top (here: /tmp/ DIS_install. • The binary RPMs for the nodes and the frontend are built on the kernel build machine and the frontend.. Therefore. the installer will ask you for the hostname of the designated frontend machine. respectively. while the frontend and the installation machine only compile user-space applications. The root passwords for all machines are required for this. This requires user interaction. A typical installation looks like this: [root@scimple tmp]# sh DIS_install_3. The cluster is now ready to utilize the Dolphin Express interconnect. In case of installation problems.sh Verifying archive integrity.log_140). and the network manager is installed and started on the frontend. For other operation modes. SIA offers to set this up during the installation. If password-less ssh access is not set up between the installation machine. please refer to Appendix A. • A number of tests are executed to verify that the cluster is functional and to get basic performance numbers. 9 . frontend and nodes. Starting the Software Installation Log into the chosen installation machine. If you answer n.2. installation and test operations on the remote nodes via ssh. “No X / GUI on Frontend”).3. All good. become root and make sure that the SIA file is stored in a directory with write access (/tmp is fine). such to install specific components on the local machine.0 #* Logfile is /tmp/DIS_install. the interconnect is not yet configured. the kernel modules are loaded and the node manager is started. • The cluster configuration files are transferred to the frontend. #+ #+ All available options of this script are shown with option '--help' # >>> OK to proceed with cluster installation? [Y/n]y # >>> Will the local machine <tiger-0> serve as frontend? [Y/n]y The default choice is to use the local machine as frontend. • On an initial installation. The installation machine is typically the machine to serve as frontend. You will notice that all questions are Yes/no questions. Execute the script: # sh DIS_install_<version>. At this stage.2. #* This script will install Dolphin Express drivers. password-less ssh to all remote nodes is required. 4. Each cluster needs its own frontend machine.log_140 on tiger-0 #* #+ Dolphin ICS . The SIA controls the building. this file is very useful to Dolphin support.1.Software installation (version: 1. tools and services #+ on all nodes of the cluster and on the frontend node. Uncompressing Dolphin DIS 3. which can be chosen by just pressing Enter. the dishostseditor is installed and executed on the installation machine to create the cluster configuration files.

' when done> [tiger-2] -> tiger-2 # >>> node hostname/IP address <colon '.conf' file that you want to use for installation? [y/N]n Because this is the initial installation. you can enter the hostname of another machine on which the kernel modules are built. Otherwise. no installed configuration files could be found. allowing for a shell-only installation. The installer suggests the hostnames if possible in brackets. #* NOTE: #+ The kernel modules need to be built on a machine with the same kernel #* version and architecture of the interconnect node. You can specify another build #* machine now. # >>> Can you access all machines (local and remote) via password-less ssh? [Y/n]y The installer will later on verify if the password-less ssh access actually works. You will need to enter the root password once for each node and the password.. #+ You can now specify the nodes that are attached to the Dolphin #+ Express interconnect.' when done> [tiger-3] -> tiger-3 # >>> node hostname/IP address <colon '.' when done> [tiger-5] -> tiger-5 # >>> node hostname/IP address <colon '.conf) available. For the default answer. If you answer n. #* NOTE: #+ No cluster configuration file (dishosts. or be generated #+ . If you have prepared or received configuration files. during the installation. By default..' when done> [tiger-6] -> tiger-6 # >>> node hostname/IP address <colon '. #* When done. no GUI application needs to run during the installation. The data entered is verified to represent an accessible hostname. To disable it again. remove the file /root/. the installer will set up password-less ssh for you on all nodes and the frontend.' when done> [tiger-10]. enter a single colon ('.ssh/ authorized_keys from all nodes and the frontend. The necessary configuration files can then #+ be created based on this list of nodes.. to finish.' when done> []tiger-1 # >>> node hostname/IP address <colon '. the first #* given interconnect node is used for this. the hostnames of the nodes need to be specified (see below). make sure you specify the one that is visible for the installation machine and the frontend. The hostnames or IP-addresses of all nodes need to be entered. If a node has multiple IP addresses / hostnames.' when done> [tiger-7] -> tiger-7 # >>> node hostname/IP address <colon '.Initial Installation #* NOTE: Cluster configuration files can be specified now. # >>> Build kernel modules on node tiger-1 ? [Y/n]y If you answer n at this point. enter the hostname or IP address.' when done> [tiger-4] -> tiger-4 # >>> node hostname/IP address <colon '. and the cluster configuration is created later on using the GUI application dishostseditor. In this case. The password-less ssh access remain active after the installation.').' when done> [tiger-8] -> tiger-8 # >>> node hostname/IP address <colon '. #+ (proposed hostname is given in [brackets]) # >>> node hostname/IP address <colon '. #* NOTE: 10 . Make sure it matches the nodes for CPU architecture and kernel version. #+ #+ Please enter hostname or IP addresses of the nodes one per line.. just press Enter. When all hostnames are entered. To accept a suggestion. they can be specified now by answering y. # >>> Do you have a 'dishosts.' when done> [tiger-9] -> tiger-9 # >>> node hostname/IP address <colon '. enter a single colon .

tiger-1 #+ About to install management and control services on the frontend machine: .. or to fix the problem manually and retry. tiger-4 .. # >>> OK to proceed? [Y/n]y The installer presents an installation summary and asks for confirmation.. testing ssh to tiger-7 . (or the current installation path if this is an update installation). If you answer n at this point. testing ssh to tiger-2 . tiger-8 . the installer will exist and the installation needs to be restarted. tiger-9 #+ About to BUILD Dolphin Express interconnect drivers on this node: ..... This allocation can fail on a system that has been under load for a long time.. If a required RPM package was missing. testing ssh to tiger-0 ... # >>> Reboot all interconnect nodes (tiger-1 tiger-2 tiger-3 tiger-4 tiger-5 tiger-6 tiger-7 tiger-8 tiger For optimal performance. If you are not installing on a live system. tiger-2 . please refer to section Section 2..... testing ssh to tiger-9 #+ OK: ssh access is working #+ OK: nodes are homogenous #* OK: found 1 interconnect fabric(s).Initial Installation #+ It is recommnended that interconnect nodes are rebooted after the #+ initial driver installation to ensure that large memory allocations will succeed.. testing ssh to tiger-1 .. tiger-5 . The ssh-access is tested... it would be indicated here with the option to install it (if yum can be used).AEgiO27908 11 .. or do it anytime later if necesary. tiger-0 #* Installing to default target path /opt/DIS on all machines . the ssh access to this node without #+ password is not working. the low-level driver needs to allocate some amount of kernel memory. testing ssh to tiger-5 .. #* NOTE: #+ Testing ssh-access to all cluster nodes and gathering configuration... #* Testing ssh to other nodes .. #* NOTE: #+ About to INSTALL Dolphin Express interconnect drivers on these nodes: ... testing ssh to tiger-4 . If the test for homogeneous nodes failes... testing ssh to tiger-0 #* OK... . #+ #+ If you are asked for a password. #* Building node RPM packages on tiger-1 in /tmp/tmp. testing ssh to tiger-1 ... and some basic information is gathered from the nodes to verify that the nodes are homogeneous and equipped with at least on Dolphin Express adapter and meet the other requirements... If chosen.. tiger-3 .. tiger-1 . testing ssh to tiger-6 . You can perform the reboot manually later on to achieve the same effect.... testing ssh to tiger-8 . testing ssh to tiger-3 ...... you need to interrupt with CTRL-c #+ and restart the script answering 'no' to the intial question about ssh.. tiger-6 . tiger-7 .. “Installation of a Heterogeneous Cluster” for information on how to install the software stack.. the reboot will be performed by the installer without interrupting the installation procedure.. In this case. #+ You can omitt this reboot. rebooting the nodes is therefore offered here.

Then install the cables while this script is waiting.0-1. Installing node tiger-4 OK.rpm /tmp/frontend_RPMS/Dolphin-SISCI-devel-3.rpm /tmp/node_RPMS/Dolphin-SISCI-3.3.x86_64. #* Copying RPMs that have been built: /tmp/frontend_RPMS/Dolphin-NetworkAdmin-3.0-1.0-1. # >>> Stop all DolpinExpress services (SuperSockets) NOW? [Y/n]y #* OK: all Dolphin Express services (if any) stopped for upgrade.0-1. and the dishostseditor application is launched to create the required configuration files /etc/dis/ dishosts. Installing node tiger-3 OK. NOTE: You need to create the cluster configuration files 'dishosts.x86_64. #* Building frontend RPM packages on scimple in /tmp/tmp. so you can easily answer y right away. # >>> Are all cables connected. Installing node tiger-9 OK. #* To install/update the Dolphin Express services like SuperSockets.x86_64.3.0-1. Installing node tiger-2 OK. you can create detailed cabling instruction within this tool (File -> Get Cabling Instructions).x86_64. all running #+ Dolphin Express services needs to be stopped. and do all LEDs on the SCI adapters light green? [Y/n] The nodes get installed and drivers and the node manager are started.x86_64..0-1.log_607 on scimple #* OK.3.x86_64. Installing machine scimple as frontend.. On an initial installation.3. They are placed into the subdirectories node_RPMS and frontend_RPMS for later use (see the SIA option --use-rpms). Then. there will be now user applications using SuperSockets.conf if they do not already exist. Installing node tiger-7 OK. the basic packages are installed on the frontend.conf' and 'networkmanager.rpm /tmp/node_RPMS/Dolphin-SISCI-devel-3. #* Logfile is /tmp/DIS_install.0-1.x86_64.dQdwS17511 #+ This will take some minutes..rpm /tmp/frontend_RPMS/Dolphin-NetworkManager-3. node RPMs have been built.log_983 on tiger-1 #* OK.x86_64.conf and /etc/dis/networkmanager.3. Installing node tiger-8 OK. Installing node tiger-5 OK.Initial Installation #+ This will take some minutes.rpm /tmp/node_RPMS/Dolphin-SuperSockets-3.0-1. This requires that all user #+ applications using SuperSockets (if any) need to be stopped NOW.conf' using the graphical tool 'dishostseditor' which will be launched now. #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #* #+ #+ #+ #+ #+ #+ #+ Installing node tiger-1 OK. Installing node tiger-6 OK.3.rpm /tmp/frontend_RPMS/Dolphin-NetworkHosts-3. If the interconnect cables are not yet installed.3. The script will wait at this 12 .3.rpm /tmp/node_RPMS/Dolphin-SCI-3. frontend RPMs have been built..rpm The binary RPM packages matching the nodes and frontend are built and copied to the directory from where the installer was invoked. #* Logfile is /tmp/DIS_install.

For typical problems at this point of the installation.1.dimension) according to the topology type you selected. 4. Cluster Edit When dishostseditor is launched. “Cluster Edit dialog of dishostseditor”). The number of fabrics needs to be set to the minimum number of adapters in every node. Topology The dialog will let you enter he selected topology information (number of nodes in X-.1. and the position of each node within the interconnect topology needs to be specified. A few global interconnect properties need to be set.1. it first displays a dialog box where the global interconnect properties need to be specified (see Figure 3. If the cables are not yet mounted (which is the recommended way of doing it). The topology settings should already be correct by default if dishostseditor is launched by the installation script.and Z.3.conf and the network manager configuration file /etc/dis/ networkmanager. it is critical to verify that the actual cable installation matches the dimensions shown here if you install a cluster with a 2D.. please refer to section Chapter 9.3. FAQ.or 3D-torus interconnect topology. and until you confirm that all cables have been connected according to the cabling instructions.3. a 12 node cluster 13 . if the cables are already in place. you simply choose the settings that matches the way you plan to install.e.1. Cluster Edit dialog of dishostseditor 4. The product of all nodes in every dimension needs to be equal (for regular topologies) or less (for irregular topology variants). Y. I.Initial Installation point until the configuration files have been created with disthostseditor. Working with the dishostseditor dishostseditor is a GUI tool that helps gathering the cluster configuration (and is used to create the cluster configuration file /etc/dis/dishosts.conf). 4.1. Figure 3. However. This is described in the next section.

the main pane of the dishostseditor will present the nodes in the cluster arranged in the topology that was selected in the previous dialog.conf accordingly. SuperSockets will try to use the Dolphin Express for any node in this subnet when it connects to another node of this subnet. Node Arrangement In the next step. and link 1 on the adapter board (the one where the plug is on the piggy-back board) is mapped to the Y-dimension. If the cluster has its own subnet. if a node gets assigned a new IP address within this subnet. if all your node communicate via an IP interface with the address 192. or a node has gone down and the interconnect traffic was rerouted).4. this option is recommend. SuperSockets will automatically fall back to Ethernet.0/8 here.sh and will send an e-mail to the address specified as alert target.4. active the checkbox Alert target and enter the alert target and the alert script to be executed. See section Section 1. 4. 4.Initial Installation can be set up as 3 by 4 or 4 by 3 or even 2 by 6. you would enter 192. Also. please refer to Section 1. Other alert scripts can be created and used. To change this topology and other general interconnect settings. a cell phone number to send an SMS).3.1.e.e. you need to configure SuperSockets for each node as described in the following section. If the font settings of your X server cause dishostseditor to print unreadable characters. “dishosts.3.168. “Notification on Interconnect Status Changes”. you can change the font size and the type with the drop-down box at the top of the windows. you don't need to change the SuperSockets configuration. which may require another type of alert target (i. next to the floppy disk icon. the setup script cannot verify that the cabling matches the dimensions that you selected.168. If this type of configuration is not possible in your environment. you can simplify the configuration by specifying the address of this subnet in this dialog. I. you can always click Edit in the Cluster Configuration area which will bring up the Cluster Edit dialog again. Remember that link 0 on the adapter boards (the one where the plug is right on the PCB of the adapter board) is mapped to the X-dimension. If using Dolphin Express is not possible. Status Notification In case you want to be informed on any change of the interconnect status (i. an interconnect link was disabled due to errors. but this type of configuration is not yet supported by dishostseditor. because one or both nodes are only equipped with an Ethernet interface. SuperSockets Network Address If your cluster operates within its own subnet and you want all nodes within this subnet to use SuperSockets (having Dolphin Express installed).2.1.conf” on how to edit dishosts. i. 4. 14 .e.3.e. Assigning more than one subnet to SuperSockets is also possible..2. For more information on using status notification. activate the Network Address field and enter the cluster IP subnet address including the mask.3. The default alert script is alert. To do so.*.1.

from the drop-down list.2. Main dialog of dishostseditor At this point. You can also type a hostname if a hostname that you specified during the installation was wrong. You do this by assigning the correct hostname for each node by double-clicking its node icon which will open the configuration dialog of this node. In this dialog. you need to arrange the nodes (marked by their hostnames) such that the placement of each node in the torus as shown by dishostseditor matches its placement in the physical torus. select the correct machine name. which is the hostname as seen from the frontend. 15 .Initial Installation Figure 3.

Initial Installation Figure 3. then SuperSockets will use this subnet address and will not allow for editing this property on the nodes. Node dialog of dishostseditor After you have assigned the correct hostname to this machine. in a fail-over setup. choose the menu item File Create Cabling Instructions. Choosing a static socket means that the mapping between the node (its adapters) and the specified hostname/IP address is static and will be specified within the configuration file dishosts. This incurs a certain initial overhead when the first connection is set up.conf. Please do this also when the cables are actually installed: you really want to verify if the actual cable setup matches the topology you just specified.3. Cabling Instructions You should now generate the cabling instructions for your cluster. 16 .3. Use this option if nodes change their IP addresses or node identities move between physical machines. This hostname or IP address will be dynamically resolved to the DolpinExpress interconnect adapter that is installed in the machine with this hostname/IP address. you may need to configure SuperSockets on this node. If you set this option for both fields.e. You can save and/or print the instructions. This hostname or IP address will be statically assigned to this physical node (its DolpinExpress interconnect adapter). i. It is a good idea to print the instructions so you can take them with you to the cluster. dynamic Enter the hostname or IP address for which SuperSockets should be used. If you selected the Network Address in the cluster configuration dialog (see above). this is not really relevant. but resolves only the explicitly specified IP addresses and not all IP addresses of a subnet. To create the cabling instruction. This option is similar to using a subnet. static 4. SuperSockets will therefore resolve the mapping between adapters and hostnames/IP addresses dynamically. but as the mapping is cached. Enter the hostname or IP address for which SuperSockets should be used. although the related kernel modules will still be loaded. you can choose between 3 different options for each of the currently supported 2 SuperSocket-accelerated IP interfaces per node: disable Do not use SuperSockets. All nodes will use this identical file (which is automatically distributed from the frontend to the nodes by the network manager) to perform this mapping. This option works fine if the nodes in your cluster don't change their IP addresses over time. Otherwise. SuperSockets can not be used with this node.3.

Location of channel A and channel B on D350 adapter Please consider the hints below for connecting the cables: • Never apply force: • The plugs of the cable will move into the sockets easily.5. and the same location of IN and OUT connectors. please proceed with section Section 4. The setup script will wait with a question for you to continue: # >>> Are all cables connected. Cluster Cabling If the cables are already connected. Note In order to achieve a trouble-free operation of your cluster. For D352 (D350). “Location of link 0 and link 1 on D352 adapter”and Figure 3.5. Connecting the cables Please proceed by connecting the nodes as described by the cabling instructions generated by the dishostseditor. and do all LEDs on the SCI adapters ligtht green? [Y/n] 4. For both links (channels). respectively. • The cables have a minimum bend diameter of 5cm. setting up the cables correctly is critical.2. “Location of channel A and channel B on D350 adapter”. the IN connectors are located at the lower end of the adapter.1. The cabling instructions refer to link 0 and link 1 if you are using D352 adapters (for 2D-torus topology). It is critical that you correctly locate the different links/channels on the back of the card.4. Location of link 0 and link 1 on D352 adapter Figure 3. Make sure the orientation is correct. 17 . and the IN and OUT connectors. Figure 3. Each of the two links/channels will form an independent ring with its adjacent adapters. This is illustrated in Figure 3. The cables can be installed while nodes are powered up. while the connectors for link 1 (channel B) are located on the piggy-back board.Initial Installation 4. and thus has an IN and OUT connector to connect to these adjacent adapters. link 0 (channel A) is formed by the connectors that are directly connected to the PCB (printed circuit board) of the adapter.4. and channel A and channel B in case of D350 adapters being used (for dual-channel operation).4. “Verifying the Cabling”.4. and the OUT connectors at the top of the adapter. Please take your time to perform this task properly. The D351 adapter has only a single link (0).4.

the LED should turn green and emit a steady light (not blinking). Do not tighten only one screw of the plug. • Fasten gently. as this is likely to tilt the plug within the connector. the LED will still turn green.1. swap the cable of the problematic connection with a working one and observe if the problem moves with the cable. and apply a maximum of 0. • Observe LEDs: When an adapter has both input and output of a link connected to it's neighboring adapter.4.2. 18 . • Contact Dolphin support if you can not make the LEDs turn green after trying all proposed measures. • Power-cycle the nodes with the orange LEDs according to Q: 1. Re-insert and fasten the plug according to the guidelines above. Use a torque screw driver if possible. • Don't mix up links: When using a 2D-torus topology. but not to the grey CHH cables (part number D707). Verifying the Cabling Important A green link LED indicates that the link between the output plug and input plug could be established and synchronized. • LEDs are also placed on the top of the adapter • the PCI/PCI-X/PCI-Express bus connector is mounted on the lower side of the adapter The left pair of connectors on the Dolphin Express interconnect adapter is what we refer to as Link 0.4 Nm. It does not assure that the cable is actually placed correctly! It is therefore important to verify once more that the cables are plugged according to the cabling instructions generated by the dishostseditor! If a pair of LEDs do not turn green. please perform the following steps: • Disconnect the cables. • If the LEDs still do not turn green. • Fasten evenly. When fastening the screws of the plugs. use a different cable. it is important not to connect link 0 of one adapter with link 1 of another adapter. • If the LEDs still do not turn green. The cabling test of sciadmin will reveal such cabling errors. but packet routing will fail. the minimum bend diameter is 10cm. Note If the links have been mixed up. Make sure you connect an Output with an Input plug. and then the other one. make sure you fasten both lightly before tightening them. In order to determine a left side this you may hold the Dolphin Express interconnect adapter in a vertical position: • the blue "O" (indicating the OUT port) should be located at the top. link 0 is the left pair of connectors on the Dolphin Express SCI interconnect adapter when the adapter is placed in a vertical position. With the CHH cables. 4. As decribed above. As a rule of thumb: do not apply more torque with the screw driver than you possibly could using only your finger (if there was enough space to grip the screw).Initial Installation Note This specification applies to black All-Best cables (part number D706).1. Link 1 is the right pair of connectors on the Dolphin Express interconnect adapter when the adapter is placed in a vertical position.

node tiger-1: TEST RESULT: *PASSED* .. Please confirm that all cables are connected and all LEDs shine gren. ..1. and the installation will proceed. node #* OK.. Otherwise.. It is located in the frontend_RPMS and node_RPMS directories. It should report TEST RESULT: *PASSED* for all nodes: #* NOTE: Testing static interconnect connectivity between nodes....... node .. you can answer "Yes" to the question "Are all cables connected. node .... node tiger-4: TEST RESULT: *PASSED* . node tiger-2: TEST RESULT: *PASSED* . a number of tests are run on the cluster to verify that the SCI interconnect was set up correctly and delivers the expected performance.. You will see output like this: #* NOTE: .7. Success in this test means that all adapters have been configured correctly.Initial Installation When you are done connecting the cables..... 4... the SISCI-devel RPM needs to be #+ installed. #* OK. node tiger-9: TEST RESULT: *PASSED* If this test reports errors or warning. The network manager will be started on the frontend. and do all LEDs on the SCI adapters ligtht green? " and proceed with the next section to finalize the software installation.. and that the cables are inserted properly. node .. node tiger-6: TEST RESULT: *PASSED* .. you are done with the installation and can start to use your Dolphin Express accelerated cluster.. After this. node .. Static SCI Connectivity Test The Static SCI Connectivity Test verifies that links are up and all nodes can see each other via the SCI interconnect.5.5. Finalising the Software Installation Once the cables are connected... “Interconnect Validation with sciadmin”)... node tiger-3: TEST RESULT: *PASSED* . node tiger-5: TEST RESULT: *PASSED* . no more user interaction is required. node tiger-8: TEST RESULT: *PASSED* . 4. you should let the installer continue and analyse the problems using sciadmin after the installation finishes (see Section 4. checking for cluster configuration to take effect: tiger-1: tiger-2: tiger-3: tiger-4: tiger-5: tiger-6: tiger-7: tiger-8: tiger-9: #* Installing remaining frontend packages #* NOTE: #+ To compile SISCI applications (like NMPI)...7. configuring all cluster nodes according to the configuration specified in dishosts.. you are offered to re-run dishostseditor to validate and possibly fix the interconnect configuration.. node . 19 . “Interconnect Validation with sciadmin” to learn about the individual tests and how to fix problems reported by each test.conf. refer to the next subsections and Section 4. all LEDs have turned green and you have verified the connections. node tiger-7: TEST RESULT: *PASSED* .. node . If the problems persist. If no problems are reported (like in the example above). node .. node .

sciadmin serves as a single-point-of-control and manage the Dolphin Express interconnect in your cluster. The SuperSockets latency is rated based on our platform validation experience. Handling Installation Problems If for some reason the installation was not successful. single-byte latency: 3.. SuperSockets Configuration Test The SuperSockets Configuration Test verifies that all nodes have the same valid SuperSockets configuration (as shown by /proc/net/af_sci/socket_maps). 4.00 us .. #+ Remember to use LD_PRELOAD=libksupersockets. you need to specify --enforce. # >>> Do you want to start the GUI tool for interconnect adminstration (sciadmin)? [y/N]n #* RPM packages that were used for installation are stored in #+ /tmp/node_PRMS and /tmp/frontend_PRMS. SuperSockets are working well. In this case. • To avoid that the binary RPMs are built again. it's name is printed at the very beginning of the installation. SuperSockets Performance Test The SuperSockets Performance Test runs a simple socket benchmark between two of the nodes. and then configure SuperSockets by calling dis_ssocks_cfg. If a failure is reported. it means the the interconnect configuration did not propagate correctly to this node.2.5.6. you should contact Dolphin support. The installation finishes with the option to start the administration GUI tool sciadmin. If the rating indicates that SuperSockets are not performing as expected. #+ Checking Ethernet performance . and performance is reported for both cases. Latency rating: Very good.. wait for a minute. use the option --use-rpms or simply run the SIA in the same directory as before where it can find the RPMs in the node_RPMS and frontend_RPMS subdirectories. or if it shows that a fallback to Ethernet has occurred.so for all applications that #+ should use Dolphin Express SuperSockets. #+ SuperSockets performance tests done. #* NOTE: #+ Verifying SuperSockets performance for tiger-2 (testing via tiger-1). existing RPM packages of the same or even more recent version will not be replaced. Every installation attempt creates a differently named logfile.5. You should check if the dis_nodemgr service is running on this node. single-byte latency: 56.3. #* OK: Cluster installation completed. #+ No SuperSocket configuration problems found. it is important that you supply the installation log (see above). 4. Please consider: • By default. please contact Dolphin Support.63 us #+ Checking Dolphin Express SuperSockets performance . you can easily and safely repeat it by simply invoking the SIA again. Please provide all installation logfiles. To enforce re-installation with the version provided by the SIA. It shows an overview of the status of all adapters and links of a 20 .7. 4. Success in this test means that the SuperSockets service dis_supersockets is running and is configured identically on all nodes. • To start an installation from scratch.. start it. Please also include the configuration files that can be found in /etc/dis on the frontend. The benchmark is run once via Ethernet and once via SuperSockets. a hint to use LD_PRELOAD to make use of SuperSockets and a pointer to the binary RPMs that have been used for the installation. you can run the SIA on each node and the frontend using the option --wipe to remove all traces of the Dolphin Express software stack and start again. If you still fail to install the software successfully. Interconnect Validation with sciadmin Dolphin provides a graphical tool named sciadmin.Initial Installation 4... #* NOTE: Verifying SuperSockets configuration on all nodes. If not.

Verify that the node manager service is running: On Red Hat: # service dis_nodemgr status On other Linux variants: # /etc/init. It is also possible to download a binary version for Windows that runs without the need for extra compilation or installation. You can use sciadmin on any machine that can connect to the network manager on the frontend via a standard TCP/IP socket. You have to make sure that connections towards the frontend using the ports 3445 (sciadmin).log 21 . 2. It also provides means to manually control the interconnect. Installing sciadmin sciadmin had been installed on the frontend machine by the SIA if this machine is capable to run X applications and has the Qt toolkit installed. To solve this problem: 1. please see /var/log/dis_nodemgr. b. “Cabling Correctness Test”.Initial Installation cluster and allows to perform detailed status queries. After it has been started. we will only describe how to use sciadmin to verify the newly installed Dolphin Express interconnect. If the frontend does not have these capabilities. Click the Connect button in the tool bar and enter the appropriate hostname or IP address of the network manager. If this is not the case: a. please refer to Appendix B. but can also be run by non-root users.7. It will be within the PATH after you login as root. Starting sciadmin sciadmin will be installed in the sbin directory of the installation path (default: /opt/DIS/sbin/sciadmin).1.d/dis_nodemgr start should tell you that the node manager has started successfully.3. 3444 (network manager) and 3443 (node manager) are possible (potentially firewall settings need to be changed). you will need to connect to the network manager controlling your cluster. all nodes and interconnect links should be shown green. Cluster Overview Normally.4. 4. sciadmin Reference. or use the Dolphin-NetworkAdmin RPM package from the frontend_RPMS directory (this RPM will only be there if it could be build for the frontend). If a node is plotted red. inspect and set options and perform interconnect tests. This is a requirement for a correctly installed and configured cluster and you may proceed to Section 4. it means that the network manager can not connect to the node manager on this node. Try to start the node manager: On Red Hat: # service dis_nodemgr start On other Linux variants: # /etc/init. Make sure that the node is powered and has booted the operating system. 4. sciadmin will present you a graphical representation of the cluster nodes and the interconnect links between them. meaning that their status is OK.7.7. Here.d/dis_nodemgr status should tell you that the node manager is running.2. you can install it on any other machine that has these capabilities using SIA with the --install-frontend option. 4. If the node manager fails to start. For a complete description of sciadmin.7.

conf on all nodes and the frontend. If you run this test while your cluster is in production. you might experience communication timeouts. Such a Fabric Quality Test can be started for each installed interconnect fabric (0 or 1) from within sciadmin via Cluster Fabric * Test Traffic. Verify that the cable connections are placed within one ringlet: i.4. When you arrive at node B. Look up the path of cable connections between node A and node B in the Cabling Instructions that you created (or still can create at this point) using dishostseditor. 22 . 2. a more intense test is required. which also leads to increased communication delays.7. start over with this verification loop. Cabling Correctness Test sciadmin can validate that all cables are connected according to the configuration that was specified in the dishostseditor. please refer to the system documentation to determine the required steps 4. If you can't find a problem for the first problem reported. Try to fix the first reported problem by tracing the cable connections from node A to node B: a. 3. You will typically get more than one error message in case of a cabling problem. Please proceed as follows: 1. If the test detects a problem. To perform the cable test. If this is not the case. SuperSockets in operation will fall back to Ethernet during this test. 4. you might experience communication timeouts. Make sure that the service is configured to start in the correct runlevel (Dolphin installation makes sure this is the case). re-run the cable test to verify if this change solves all problems. To verify the actual signal quality of the interconnect fabric. and which is now stored in /etc/dis/dishosts. Warning Running this test will stop the normal traffic over the interconnect as the routing needs to be changed.Initial Installation c. verify the cable connections for all following pairs of node reported bad. b. On Red Hat: # chkconfig --add 2345 dis_nodemgr on On other Linux variants. as such a problem does in most cases affect more than one pair of nodes. make sure: i. If you run this test while your cluster is in production.7. it will inform you that node A can not communicate with node B although they are supposed to be within the same ringlet. ii. Fabric Quality Test The Cable Correctness Test performs only minimal communication between two nodes to determine the functionality of the fabric between them. This Cabling Correctness Test runs for only a few seconds and will verify that the nodes are cabled according to the configuration provided by the dishostseditor. Warning Running this test will stop the normal traffic over the interconnect as the routing needs to be changed. After the first change. select Cluster -> Test Cable Connections. do the same check for the path back from node B to node A. Each cable plug is connected to the right link (0 or 1) as indicated by the cabling instructions. That each cable plug is securely fitted into the socket of the adapter.5. Along the path. ii.

SuperSockets will use the Dolphin Express interconnect for low-latency. and will transparently fall back to Ethernet when connecting to nodes outside the cluster. This can be achieved by two means as described in the next two sections. like bending it to sharply or too much force on the plugs. Perform the previous check for all other node pairs. b. Exchange the cables between the most close pair of nodes one-by-one with a cable of a connection for which no errors have been reported. Normally. This test will run for a few minutes. high-bandwidth communication inside the cluster. as it tests communication for about 20 seconds between each pair of nodes within the same ring. unplug cable and re-fasten it. No configuration change is required for the application as the same host names/IP v4 addresses can be used. Thus. It will then report if any CRC errors or other problems have occurred between any pairs of nodes. a pair of nodes located next to each other will show up. If the test reports communication errors. depending on the size of your cluster. If communication errors persist. an communication error does not mean data might get lost. To make an application use SuperSockets. 23 . please proceed as follows: 1. Cable plugs need to be placed in the connectors on the adapters evenly (not tilted) and securely fastened. Generic Socket Applications All applications that use generic BSD sockets for communication will be accelerated by SuperSockets. it will take 8 * ( 3 + 2 +1) * 20 seconds = 16 minutes. the problem might be with one of the adapters. if any errors are reported.8. every communication error reduces the performances. then this cable might be damaged. However. i. 4.8. locate the pair of nodes which is located most closely (has the smallest number of cable connections between them). 4. though.1. you will want your cluster application to make use of the increased performance. Please contact Dolphin support for further analysis.Initial Installation SuperSockets in operation will fall back to a second fabric (if installed) or to Ethernet during this test. if nodes are located next to each other) for being properly mounted: a. or are reported. and an optimally set up Dolphin Express interconnect should not show any communication errors. A small number of communication errors is acceptable. then re-run the test. If the communication error remains unchanged. Remember (note down) which cables you exchanged. No excessive stress on the cable. This means. change cables to locate a possibly damaged cable: a. All relevant socket types are supported by SuperSockets: TCP stream sockets as well as UDP and RDS datagram sockets. you need to preload a dynamic library on application start. 2. If errors are reported between multiple pairs of nodes. 3. Please contact Dolphin support if in doubt. Note Any communication errors reported here are either corrected automatically by retrying a data transfer (like for CRC errors).and software has been installed and tested. 4. for a 4 by 4 2D-torus cluster which features 8 rings with 4 nodes each. Check the cable connection on the shortest path between these two nodes (a single cable. ii. Please contact your sales representative for exchange. Making Cluster Application use Dolphin Express After the Dolphin Express hard. which also leads to increased communication delays. If in doubt. b. Run the Fabric Quality Test after each cable exchange. If the communication errors move with the cable you just exchanged.

Launch with LD_PRELOAD As an alternative to using this wrapper script. To verify that the preloading works.8. A better solution than setting LD_LIBRARY_PATH is to configure the dynamic linker ld to include these directories in its search path.. To verify that the dynamic linking is the problem.1 root root 48731 Nov 14 12:43 /opt/DIS/lib64/libksupersockets.3.3.so. To have i.1 root root 901 Nov 14 12:43 lrwxrwxrwx 1 root root 25 Nov 14 12:50 lrwxrwxrwx 1 root root 25 Nov 14 12:50 -rw-r--r-.so.la /opt/DIS/lib/libksupersockets.so. The default locations are /opt/DIS/lib/libksupersockets.so. use the ldd command on any executable.1 root root 19746 Nov 14 12:43 -rw-r--r-. If this is not the case.2 (0x00000033ec600000) The library libksupersockets.1. Use man ldconfig to learn how to achieve this. you need to configure the dynamic linker manually.3 /opt/DIS/lib64/libksupersockets.1.0 Also.0 /opt/DIS/lib/libksupersockets.so -> libksupersockets. This script is installed to the bin directory of the installation (default is /opt/DIS/bin) which is added to the default PATH environment variable.6 => /lib64/tls/libc. please verify that they use SuperSockets as follows: 1. the socket benchmark netperf run via SuperSockets™. for sh-style shells such as bash: export LD_PRELOAD=libksupersockets.0 => /lib64/tls/libpthread..2 (0x00000033ecb00000) /lib64/ld-linux-x86-64.so. i.6 (0x00000033ec800000) libdl. if you did not install via RPM.Initial Installation 4.1.so and /opt/DIS/ lib64/libksupersockets. 24 . and libksupersockets.so $ ldd netperf libksupersockets.so and verify again with ldd: $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/DIS/lib:/opt/DIS/lib64 $ echo $LD_PRELOAD libksupersockets.e. the netperf binary mentioned above: $ export LD_PRELOAD=libksupersockets.a /opt/DIS/lib64/libksupersockets.so $ ldd netperf .so (0x0000002a95577000) libpthread.so 4.2.so. make sure the library file actually exists.so.so.3.so.e.so.8.so on 64-bit platforms.3. The dynamic linker is configured accordingly on installation of the RPM.3. Troubeshooting If the applications you are using do not show increased performance.3 /opt/DIS/lib/libksupersockets.1 root root 65160 Nov 14 12:43 -rw-r--r-.so -> libksupersockets.1 root root 29498 Nov 14 12:43 -rw-r--r-.1.3.so => /opt/DIS/lib64/libksupersockets. set LD_LIBRARY_PATH to include the path to libksupersockets.a /opt/DIS/lib/libksupersockets. i. you can also make sure to set LD_PRELOAD correctly to preload the SuperSockets library.0 (0x00000033ed300000) libc.so.3 -> libksupersockets.2 => /lib64/libdl.so has to be listed at the top position. you just need to run them via the wrapper script dis_ssocks_run that sets the LD_PRELOAD environment variable accordingly.1 root root 899 Nov 14 12:43 lrwxrwxrwx 1 root root 25 Nov 14 12:50 lrwxrwxrwx 1 root root 25 Nov 14 12:50 -rw-r--r-.3 /opt/DIS/lib/libksupersockets.8. start the server process on node server_name like dis_ssocks_run netperf and the client process on any other node in the cluster like dis_ssocks_run netperf -h server_name 4.la /opt/DIS/lib64/libksupersockets..so.so. make sure that the dynamic linker is configured to find it in this place. Launch via Wrapper Script To let generic socket applications use SuperSockets™.3 -> libksupersockets.so /opt/DIS/lib64/libksupersockets.e.so actually is a symbolic link on a library with the same name and a version suffix: $ ls -lR /opt/DIS/lib*/*ksupersockets* -rw-r--r-.so.

2. and you should not see a message about SuperSockets not being configured.e. when running out of resources. only the system port numbers below 1024 are excluded from using SuperSockets.8. 2. The active configuration is shown in /proc/net/af_sci/socket_maps: # cat /proc/net/af_sci/socket_maps IP/net Adapter NodeId List ----------------------------------------------172. The examle above shows a four-node cluster with a single fabric and a static SuperSockets configuration. If you can't solve the problem. please refer to Section 2. 4.0 ( November 13th 2007 ) is running.4/32 0x0000 72 0 0 Depending on the configuration variant you used to set up SuperSockets. Nov 7th 2007 (built Nov 14 2007) running.5.3. 3. 1.5.16.3.0 ( November 13th 2007 ) is running. 4. Dolphin SISCI 3.16. Check the system log for messages of the SuperSockets kernel module. The SISCI library libsisci. Verify the configuration of the SuperSockets to make sure that all cluster nodes will connect and communicate via SuperSockets.3/32 0x0000 68 0 0 172.16.1/32 0x0000 4 0 0 172. By default. “Software”. 3. DMA transfers or remote interrupts. the content of this file may look different. “SuperSockets Configuration”).16.so is installed on the nodes by default.Initial Installation 2.Martin". or the application itself have been explicitly been exclued from using SuperSockets. 5.1. Make sure that the host names/IP addresses used effectively by the application are the ones that are configured for SuperSockets.5. for both applications that should communicate via SuperSockets.2/32 0x0000 8 0 0 172. but it must never be empty and should be identical on all nodes. Dolphin Node Manager is running (pid 3172). Don't forget to check if the port numbers used by this application. which will accelerate one socket interface per node. Native SCI Applications Native SCI applications use the SISCI API to use the Dolphin Express hardware features like transparent remote memory access. It will report problems all problems.3. You need to make sure that the preloading of the SuperSockets library described above is effective on both nodes. 25 . “dishosts. # cat /var/log/messages | grep dis_ssocks It is a good idea to monitor the system log while you try to connect to a remote node if you suspect problems being reported there: # tail -f /var/log/messages For an explanation of typical error messages. i. At least the services dis_irm and dis_supersockets need to be running. 6. please refer to Section 1. especially if the nodes have multiple Ethernet interfaces configured. Make sure that the SuperSockets kernel module (and the kernel modules it depends on) are loaded and configured correctly on both nodes. please contact Dolphin Support.5. For more information on the configuration of SuperSockets.conf”.0 "St. Dolphin SuperSockets 3. Check the status of all Dolphin kernel modules via the dis_services script (defaut location /opt/DIS/ sbin): # dis_services status Dolphin IRM 3. but you should verify the current configuration (see Section 2.

3. the SISCI-devel RPM needs to be installed on the respective machine. However. respectively. GNBD or others. such services need to be adapted to actually use SuperSockets (a minor modification to make them use a different address family when opening new sockets). If you are interested in accelerated kernel services like iSCSI. This RPM is built during installation and placed in the node_RPMS and frontend_RPMS directory.Initial Installation Note The SISCI library is only available in the native bit width of a machine. To compile and link SISCI applications like the MPI-Implementation NMPI. 4. please contact Dolphin Support. Kernel Socket Services SuperSockets can also be used to accelerate kernel services that communicate via sockets. This implies that on 64-bit machines. 26 . only 64-bit SISCI applications can be created and executed as there is no 32-bit version of the SISCI library on 64-bit machines.8.

You can specifiy --install-node or --install-frontend here to update only the current node or the frontend (you need to execute the SIA on the respective node in these cases!) --batch Using this option. the update installation can be performed in a fully automatic manner without manual intervention. This will make Dolphin Express functionality unavailable for the duration of this update. Stop the applications using Dolphin Express on all nodes. This option is recommended. the update of a 16-node cluster takes about 30 minutes. non-interactive and enforced installation of a specific driver version (provided via the SIA) with a reboot of all nodes will be invoked as follows: # sh DIS_install_<version>. --reboot --enforce As an example. Update Installation This chapter describes how an existing Dolphin Express software stack is to be updated to a new version using the SIA.sh --install-all --batch --reboot --enforce 4. Typically. This option can safely be used if no configuration changes are needed. This option is recommended if you are unsure about the state of the installation.Chapter 4. or due to resource problems. assuming the default answers to all questions which would otherwise be posed to the user. A complete update is also required in case or protocol incompatibilities between the installed version and the version to be installed. followed by the installation of the new package. If this is applies. Therefore. the script will run without any user interaction. This step can be omitted if you choose the --reboot option below. and if you know that all services/applications using Dolphin Express are stopped on the nodes. This option will enforce the uninstallation of the installed package. Dolphin Express software supports "rollling upgrades" between all release unless explicitly noted otherwise in the release notes. This kind of update needs to be performed node by node. The updated Dolphin Express services will be running on the nodes and the frontend. It requires that you stop all applications which use the Dolphin Express software 27 . Rolling Update A rolling update will keep your cluster and all its services available on all but one node. Proceed as follows to perform the complete update installation: 1. this convenient update method is recommended if you can afford some downtime of the whole cluster. Run the SIA on the frontend with any combination of the following options: --install-all 2. 3. 1. Such incompatibilities are rare and will be described in the release notes. Wait for the SIA to complete. Complete Update Opposed to the initial installation. packages on a node or the frontend will only be updated if the new package has a more recent version than the installed package. 2. Become superuser on the frontend. This is the default installation variant and will update all nodes and the frontend. the complete. a rolling update is not possible. Rebooting the nodes in the course of the installation will avoid any problems when loading the updated drivers. By default. but you will need to update the system completely in one operation. Such problems can occur because the drivers are currently in use.

See the --reboot option in the next step. and that the applications are shut down properly. but you have to reboot it to enable the updated services. However. Run the SIA with the --install-node --use-rpms <path> options to install and updated RPM packages and start the updated drivers and services. Before performing a rolling update.sh --build-rpm The created binary RPM packages will be stored in the subdirectories node_RPMS and frontend_RPMS which will be created in the current working directory. If you had run the SIA in /tmp in step 1. If the services can not be stopped for some reason. Note The SIA will also try to stop all services when doing an update installation. you will notice that this node will show up as disabled (not active). “Installing from Binary RPMs” for more information. 5. The <path> parameter to the --use-rpms option has to point to the directory where the binary RPM packages have been built (see step 1). A reboot is not required if the services were shut down successfully in step 4. 2. you would issue the following command: # sh DIS_install_<version>. you can still update the node. 3. Stop all applications on this node that use Dolphin Express services. This means your systems needs to tolerate applications going down on a single node.3. Build the new binary RPM packages for this node: # sh DIS_install_<version>. Please see Section 2. Tip To save a lot of time. in this case the updated Dolphin Express services will not become active until you restart them (or reboot the machine). please refer to the release notes of the new version to be installed if it supports a rolling update of the version currently installed. you can use the binary RPM packages built on the first node that is updated on all other nodes (if they have the same CPU architecture and Linux version). Perform the following steps on each node: 1. like a MySQL server or NDB process. Performing this step explicitly will just assure that the services can be stopped. Note It possilbe to install the updated files while the applications are still using Dolphin Express services. you need to perform a complete update (see previous section). Stop all Dolphin Express services on this node using the dis_services command: # dis_services stop Stopping Dolphin SuperSockets drivers Stopping Dolphin SISCI driver Stopping Dolphin Node Manager Stopping Dolphin IRM driver [ [ [ [ OK OK OK OK ] ] ] ] If you run sciadmin.Update Installation stack (like a database server using SuperSockets) on the node you intend to update. Log into the node and become superuser (root). If this is not the case. 4. 28 .sh --install-node --use-rpms /tmp Adding the option --reboot will reboot the node after the installation has been successful. but recommend to allow the low-level driver the allocation of suffcient memory resources for remote-memory access commuincation.

6. The updated services will be started by the installation and are available for use by the applications.Update Installation Important If the services could not be stopped in step 4. If the services failed to start. the new drivers will only be installed on disk. If for some reason you want to re-install the same version. you need to use the --enforce option. This can be caused by situations where the memory is too fragmented for the low-level driver (see above). a reboot is required to allow the updated drivers to be loaded. 29 . or even an older version of the Dolphin Express software stack than is currently installed. Make sure that node has shown up as active (green) in sciadmin again before updating the next node. Otherwise. but will not be loaded and used. a reboot of the node will fix the problem.

Dolphin SuperSockets 3. Do not reboot the nodes now! Tip You can speed up this node installation by re-using binary RPMs that have been build on another node with the same kernel version and the same CPU architecture. Power off the node. Proceed with the next node until all nodes have the Dolphin Express hardware and software installed. Start all your own applications on the current node and make sure the whole cluster operates normally. located in the directory where you launched the SIA. 30 . but not configur 4. use the --use-rpms option to tell SIA where it can find matching RPMs for this node. It will be necessary to power off single nodes in the course of this installation (unless your machines support PCI hotplug . Nov 7th 2007 (built Nov 14 2007) loaded. Because MySQL Cluster™ is fault-tolerant by default. this operation will report errors which can be ignored. After the first installation on a node. Dolphin Node Manager is running (pid 3172). Installing the Dolphin Express hardware For an installation under load. Installing the drivers on the nodes On all nodes. To do so.Chapter 5. However. Manual Installation This chapter explains how to manually install the different software packages on the nodes and the frontend. “Adapter Card Installation”). and how to install the software if the native package format of a platform is not supported by the Dolphin installer SIA. 1. Do not yet connect any cables! Power on the node and boot it up. # dis_services status Dolphin IRM 3.3. and install the Dolphin Express adapter (see Section 3. Shut down your application processes on the current node. 2.in this case. Dolphin SISCI 3.0 ( November 13th 2007 ) is running. This is a local operation which will build and install the drivers on the local machine only. 1.3. When installing on another node with the same Linux kernel version and CPU architecture. 6. so it does not have to build them once more. please contact Dolphin support). Stop the SuperSockets™ service: # service dis_supersockets stop Stopping Dolphin SuperSockets drivers [ OK ] 5. 2. This type of installation does not require more than one node at a time being offline. 2. the binary RPMs are located in the directories node_RPMS and frontend_RPMS.0 ( November 13th 2007 ) is running. As the Dolphin Express hardware is not yet installed. perform the following steps for each node one by one: 1. performance may suffer to some degree. proceed as follows: 1.3. run the SIA with the option --install-node. The Dolphin Express drivers should load successfully now. Verify this via dis_services: 3. Installation under Load This section describes how to perform the initial Dolphin Express™ installation on a cluster in operation without the requirement to stop the whole cluster from operating. although the SuperSockets™ service will not be configured.Martin". Copy these sub-directories to a path that is accessible from the other nodes. this will not stop your cluster application.0 "St.

you need to perform the following steps on each node one-by-one: 1. Creating the cluster configuration files If you have a Linux machine with X available which can run GUI applications.conf to optimally fit your cluster. To make your application use SuperSockets™. “Working with the dishostseditor”. this can be achieved by simply starting the process via the dis_ssocks_run wrapper script (located in /opt/DIS/bin by default). Cable Installation Using the cabling instructions created by dishostseditor in the previous step. Otherwise. you should create the directory /etc/dis and make it writable for root: # mkdir /etc/dis # chmod 755 /etc/dis After the SIA has completed the installation. Dolphin Express™ and SuperSockets™ are ready to use. see above). Because SuperSockets™ fall back to Ethernet transparently. “Making Cluster Application use Dolphin Express” to determine the best way to have you application use SuperSockets™. Start all your own applications on the current node and make sure the whole cluster operates normally. like: $ dis_ssocks_run mysqld_safe 3. Note This single-node installation mode will not adapt the driver configuration dis_irm. This might be necessary for clusters with more than 4 nodes. 2. run the SIA with the --install-editor option to install the tool dishostseditor. Start all services on all the nodes: # dis_services start Starting Dolphin IRM 3. “Verifying Functionality and Performance”.conf” to perform recommended changes. run the SIA with the --install-frontend option. Make sure you create the cabling instructions needed in the next step. This will start the network manager. “dis_irm. this step is performed on the frontend. Typically. the interconnect cables should now be connected (see Section 4. your applications will start up normally independently from applications on the other nodes already using SuperSockets™ or not. 5. or contact Dolphin support. 7.4.1.conf which you have just created to the frontend and place it there under /etc/dis (you may need to create this directory. but your application is still running on Ethernet. On the frontend machine.3. 4. proceed with the next step. copy the configuration files dishosts.Manual Installation 3. start the tool dishostseditor (default installation location is /opt/DIS/sbin): # /opt/DIS/sbin/dishostseditor Information on how to work with this tool can be found in Section 4. 8. 31 . which will then configure the whole cluster according to the configuration files created in the previous steps. After you have performed these steps on all nodes.3. all applications that have been started accordingly will now communicate via SuperSockets™. Please refer to Section 3.8.0 ( November 13th 2007 ) Starting Dolphin SuperSockets drivers [ [ [ [ OK OK OK OK ] ] ] ] 6.conf and networkmanager.3. Ideally. If this is the case. Refer to Section 4. At this point.0 ( November 13th 2007 ) Starting Dolphin Node Manager Starting Dolphin SISCI 3. Verify the functionality and performance according to Section 1. “Cluster Cabling”). Shut down your application processes on the current node. If the dishostseditor was run as root on the frontend.

“dis_irm. use the --use-rpms option to tell SIA where it can find matching RPMs for this node. power up all nodes again. Dolphin Node Manager is running (pid 3172). or contact Dolphin support. the binary RPMs are located in the directories node_RPMS and frontend_RPMS.conf to optimally fit your cluster. Verify this via dis_services: # dis_services status Dolphin IRM 3. Dolphin SuperSockets 3. Installing the drivers on the nodes 1. 2. although the SuperSockets™ service will not be configured.0 ( November 13th 2007 ) is running.0 ( November 13th 2007 ) is running. Tip You can speed up this node installation by re-using binary RPMs that have been build on another node with the same kernel version and the same CPU architecture. Creating the cluster configuration files If you have a Linux machine with X available which can run GUI applications. Dolphin SISCI 3.Manual Installation 2. you should create the directory /etc/dis and make it writable for root: # mkdir /etc/dis # chmod 755 /etc/dis 32 . “Adapter Card Installation”).e. proceed as follows: 1.Martin". On all nodes.3. 2. Installing the Dolphin Express hardware Power off all nodes. After the first installation on a node. To do so. located in the directory where you launched the SIA. Note This single-node installation mode will not adapt the driver configuration dis_irm. run the SIA with the --install-editor option to install the tool dishostseditor.1. or even different operating systems). Ideally. Stop the SuperSockets™ service: # service dis_supersockets stop Stopping Dolphin SuperSockets drivers [ OK ] 3. This is a local operation which will build and install the drivers on the local machine only.0 "St. When installing on another node with the same Linux kernel version and CPU architecture.3. this step is performed on the frontend. and install the Dolphin Express adapter (see Section 3. 1. Do not yet connect any cables! Then. This might be necessary for clusters with more than 4 nodes. run the SIA with the option --install-node. so it does not have to build them once more. Please refer to Section 3. but not configur 3. 2. The Dolphin Express drivers should load successfully now. Linux kernel version).conf” to perform recommended changes. Installation of a Heterogeneous Cluster This section describes how to perform the initial Dolphin Express™ installation on with heterogeneous nodes (different CPU architecture. Copy these sub-directories to a path that is accessible from the other nodes.3. Nov 7th 2007 (built Nov 14 2007) loaded. different operating system version (i. If this is the case.

3. start the tool dishostseditor (default installation location is /opt/ DIS/sbin): # /opt/DIS/sbin/dishostseditor Information on how to work with this tool can be found in Section 4. Depends on Dolphin-SCI. “Verifying Functionality and Performance”. “Working with the dishostseditor”. 3. Dolphin-NetworkHosts Installs the GUI application dishostseditor for creating the cluster configuration files on the frontend.3. plus the run-time I libary and header files for the SISCI API on a node. To be installed on all nodes. Otherwise. the interconnect cables should now be connected (see Section 4.conf and networkmanager.Manual Installation After the SIA has completed the installation.0 ( November 13th 2007 ) Starting Dolphin Node Manager Starting Dolphin SISCI 3.3.4. Dolphin-SISCI User-level access to the adapter capabilites via the SISCI API. This will start the network manager. Cable Installation Using the cabling instructions created by dishostseditor in the previous step. 33 .conf which you have just created to the frontend and place it there under /etc/dis (you may need to create this directory). Start all services on all the nodes: # dis_services start Starting Dolphin IRM 3. “Cluster Cabling”).3. Installs the dis_irm kernel module. Make sure you create the cabling instructions needed in the next step. To be installed on all nodes. Installs the dis_mbox. RPM Package Structure The Dolphin Express™ software stack is organized into a number of RPM packages. Also installs some template configuration files for manual editing. dis_msq and dis_ssocks kernel modules and the dis_supersockets service. Depends on Dolphin-SCI. On the frontend machine. which will then configure the whole cluster according to the configuration files created in the previous steps. Manual RPM Installation It is of course possible to manually install the RPM packages on the nodes and the frontend. Dolphin-SuperSockets Standard Berkeley sockets with low latency and high bandwidth. To be installed on all nodes. Some of these packages have inter-dependencies.1. Dolphin-SCI Low-level hardware driver for the adapter. 7. 5. the node manager daemon and the dis_irm and dis_nodemgr services on a node. proceed with the next step. run the SIA with the --install-frontend option. copy the configuration files dishosts. Installs the dis_sisci kernel module and the dis_sisci service. If the dishostseditor was run as root on the frontend.0 ( November 13th 2007 ) Starting Dolphin SuperSockets drivers [ [ [ [ OK OK OK OK ] ] ] ] 6. Verify the functionality and performance according to Section 1. and the redirection library for preloading on a node. 4. This section describes how to do this if it should be necessary.

and install it to a path that you specify. This will take some minutes. Depending on whether this machine will be a node or the frontend. please proceed as follows: 1. 3. and the resulting RPMs are stored in three directories: node_RPMS Contains the binary RPM packages for the driver and kernel modules to be installed on each node. These RPM packages can be installed on every node with the same kernel version. just enter the directory and install them all with a single call of the rpm command.Manual Installation To be installed on the frontend (and addtionally other machines that should run dishostseditor). you have to install different drivers or services from there. which talks to all node managers on the nodes. In this case. sciadmin talks to the network manager and can be installed on any machine that has connection to the frontend. The Dolphin-SISCI-devel and Dolphin-NetworkAdmin packages can also be installed on other nodes. a non-package based installation via a tar-archive is supported. To install using this method. not being the frontend or any of the nodes. From there. or any other machine on which SISCI applications should be compiled and linked. To be installed on the frontend. like: # cd node_RPMS # rpm -Uhv *. Become superuser: $ su # 2. RPM Build and Installation On each machine. you have to perform the actual driver and service installation using scripts provided with the installation. Unpackaged Installation Not all target operating systems are supported with native software packages. Dolphin-NetworkAdmin Contains the GUI application sciadmin for managing and monitoring the interconnect. for development and administration. the matching binary RPM packages need to be built by calling the SIA with the --build-rpm option. Create the tar archive from the SIA. This type of installation will build all software for both. The source RPM packages contained in this directory can be used to build binary RPMs on other machines using the standard rpmbuild command. Dolphin-NetworkManager Contains the network manager on the frontend.2. Depends on Dolphin-NetworkHosts. respectively. To be installed on the frontend (or any other machine). To be installed on the frontend.rpm 4. Installs the service dis_networkmgr. Dolphin-SISCI-devel To compile and link applications that use the SISCI API on other machines than the nodes. this RPM installs the header files and library plus examples and documentation on any machine. Contains the binary RPM packages for the user-level managing software to be installed on the frontend. This type of installation installs the complete software into a directory on the local machine. node and frontend. and upack it: 34 . frontend_RPMS source_RPMS To install the packages from one directory.

install the necessary drivers and services as follows: 1. sbin.gz # tar xzf DIS. lib./ssocks_setup -i 35 . # cd DIS # .gz 3. Build the software stack using make./nodemgr_setup -i --start . etc) relative to this path and install into them. provided they features the same Linux kernel version and CPU architecture.. Enter the created directory and configure the build system. Install the drivers and services depending on whether the local machine should be a node or the frontend. install the software: # make install . The option --start will start the service after a successful installation: # # # # .tar. # make supersockets . man./irm_setup -i --start . Check the output when the command returns to see if it the build operation was successful. doc.tar.gz DIS Transfer this file to /opt on all nodes and unpack it there: # cd /opt # tar xzf DIS_binary.. We recommend that you use the standard path /opt/DIS.../sisci_setup -i --start . The best way is to create a tar archive: # cd /opt # tar czf DIS_binary.Software installation (version: 1.sh --get-tarball #* Logfile is /tmp/DIS_install. Tip You can speed up the installation on multiple nodes if you copy over the installation directory to the other nodes. Change to the sbin directory in your installation path: # cd /opt/DIS/sbin 2. Invoke the scripts for driver installation using the option -i. # make .gz 6../configure --prefix=/opt/DIS 4.Manual Installation # sh DIS_install_<version>. lib64.31 $ of: 2007/09/27 15:05:05 $) #+ #* Generating tarball distribution of the source code #* NOTE: source tarball is /tmp/DIS. The installation procedure will create subdirectories (like bin. specifying the target path <install_path> for the installation. It is recommended to first install all nodes.. than configure and test the cluster from the frontend..tar. # make supersockets-install . If the build operations were successful.. For a node.tar. 5. but you can use any other path. then the frontend.log_260 on node1 #* #+ Dolphin ICS .

Configure the cluster via the GUI tool dishostseditor: # . on all nodes.3. 5. Enable all services. Help is available via -h.Manual Installation Note Please make sure that SuperSockets are not started yet (do not provide option --start to the setup script). please refer to Section 4./networkmgr_setup -i --start You can remove the service from the system by calling the script with the option -e. 3./dishostseditor For more information on using dishostseditor. Invoke the script for service installation using the option -i: # . including SuperSockets.0 ( November 13th 2007 ) Starting Dolphin Node Manager Starting Dolphin SISCI 3. please refer to Appendix B.3. sciadmin Reference and Section 1. # dis_services start Starting Dolphin IRM 3. Repeat this procedure for each node. 4. “Working with the dishostseditor”. Test the cluster via the GUI tool sciadmin: # . not only on the frontend! 36 . For the frontend. install the necessary services and perform the cluster configuration and test as follows: 1. Change to the sbin directory in your installation path: # cd /opt/DIS/sbin 2.0 ( November 13th 2007 ) Starting Dolphin SuperSockets drivers [ [ [ [ OK OK OK OK ] ] ] ] Note This command has to be executed on the nodes. You can remove the driver from the system by calling the script with the option -e.3. “Verifying Functionality and Performance”./sciadmin For more information on using sciadmin to test your cluster installation.

0 "St.e. Verifying Functionality and Performance When installing the Dolphin Express™ software stack (which includes SuperSockets™) via the SIA. only the user-space service dis_networkmgr (the central network manager) needs to be running. the user-space service dis_nodemgr (node manager. The tool to perform this test is scidiag (default location /opt/DIS/sbin/scidiag). dis_sisci (upper level hardware services) and dis_ssocks (SuperSockets) need to be running. 1.1.1.4..0 ( November 13th 2007 ) is running.0 ( November 13th 2007 ) is running. 1.7. the software and hardware work correctly. for Red Hat-based Linux distributions. which talks to the central network manager) needs to be active for configuration and monitoring. Because the drivers do also appear as services. 1. It is used in the same way as the individual service command provided by the distribution: # dis_services status Dolphin IRM 3. Nevertheless. Nov 7th 2007 (built Nov 14 2007) running. If any of the required services is not running. “Cabling Correctness Test”. Dolphin provides a script dis_services that performs this task for all Dolphin services installed on a machine.3. Dolphin SuperSockets 3. 1. On the frontend. The tests go from the most low-level functionality up to running socket applications via SuperSockets™. Availability of Drivers and Services Without the required drivers and services running on all nodes and the frontend. I. you will find more information on the problem that may have occured in the system log facilities. Call dmesg to inspect the kernel messages. This means.Chapter 6.3. Dolphin SISCI 3.Martin". but this has already been done in the Cable Connection Test. or check /var/log/messages for related messages. it is very likely that both. 37 . which means that all nodes can communicate with all other nodes via the Dolphin Express interconnect by sending low-level control packets and performing remote memory access. Interconnect and Software Maintenance This chapter describes how to perform a number of typical tasks related to the Dolphin Express interconnect. Next to these kernel drivers. It will also check if all cables are plugged in to the adapters. the cluster will fail to operate. the following sections describe the tests that allow you to verify the functionality and performance of your Dolphin Express™ interconnect and software stack. Low-level Functionality and Performance The following sections describe how to verify that the interconnect is setup correctly. you can do # service dis_irm status Dolphin IRM 3.1.1.3. you can query their status with the usual tools of the installed operating system distribution. Cable Connection Test To ensure that the cluster is cabled correctly. Static Interconnect Test The static interconnect test makes sures that all adapters are working correctly by performing a self-test.2. and determines if the setup of the routing in the adapters is correct (matches the actual hardware topology). the basic functionality and performance is verified at the end of the installation process by some of the same tests that are described in the following sections. that if the tests performed by the SIA did not report any errors. please perform the cable connection test as described in Section 4. On the nodes.1.3. the kernel services dis_irm (low level hardware driver).. Dolphin Node Manager is running (pid 3172).0 ( November 13th 2007 ) is running.3. 1.

6. Link alive in adapter 0. Link 0 .2. and using one adapter per node looks like this: =========================================================================== SCI diagnostic tool -. Local adapter 0 ok.SciDiag version 3. Probe of local node ok. to perform the static interconnect test on a full cluster.uptime 11356 seconds Link 1 . SRAM test ok for Adapter 0 LC-3 chip accessible from blink in adapter 0.3.MCP55 .downtime 0 seconds Cable insertion ok.9-42.0. 0x37610de Local adapter 0 > Type NodeId(log) NodeId(phys) SerialNum PSB Version LC Version PLD Firmware SCI Link frequency B-Link frequency Card Revision Switch Type Topology Type Topology Autodetect OK: SCI SCI SCI SCI OK: OK: OK: OK: OK: ==> : : : : : : : : : : : : : D352 140 0x2204 200284 0x0d66706d 0x1066606d 0x0001 166 MHz 80 MHz CD not present 2D Torus No Psb chip alive in adapter 0.ELsmp #1 SMP Fri Oct 6 06:28:26 CDT 2006 x86_64 x86_64 x86_64 GNU/Linu Number of configured local adapters found: 1 Hostbridge : NVIDIA nForce 570 . and if the adapters in each node can see all remote adapters installed in the other nodes. This means.6d ( September 6th 2007 ) Date : Thu Oct 4 14:20:45 CEST 2007 System : Linux tiger-9 2. you will basically need to run scidiag on each node and see if any problems with the adapter are reported.2. An example output of scidiag for a node which is part of a 9-node cluster configured in a 3 by 3 2D-torus. ******************** ******************** TOPOLOGY SEEN FROM ADAPTER 0 Adapters found: 9 Switch ports found: 0 ----.uptime 11356 seconds Link 0 .List of all ranges (rings) found: In range 0: 0004 0008 0012 In range 1: 0068 0072 0076 In range 2: 0132 0136 0140 REMOTE NODE INFO SEEN FROM ADAPTER 0 Log | Phys | resp | resp nodeId | nodeId | conflict | address 4 | 0x0004 | 0| 8 | 0x0104 | 0| 12 | 0x0204 | 0| 68 | 0x1004 | 0| 72 | 0x1104 | 0| 76 | 0x1204 | 0| 132 | 0x2004 | 0| 136 | 0x2104 | 0| 140 | 0x2200 | 0| ---------------------------------- | | 0| 0| 0| 0| 0| 0| 0| 0| 0| resp type | | 0| 0| 0| 0| 0| 0| 0| 0| 0| resp data | req | | timeout | 0| 4| 0| 1| 0| 0| 0| 2| 0| 0| 0| 0| 0| 1| 0| 1| 0| 0| TOTAL 4 1 0 2 0 0 1 1 0 38 .Interconnect and Software Maintenance Running scidiag on a node will perform a self test on the local adapter(s) and list all remote adapters that this adapter can see via the Dolphin Express interconnect.downtime 0 seconds Link 1 .6d ( September 6th 2007 ) =========================================================================== ******************** VARIOUS INFORMATION ******************** Scidiag compiled in 64 bit mode Driver : Dolphin IRM 3.

scidiag discovered 0 warning(s). size=8192) is shared.1.Interconnect and Software Maintenance scidiag discovered 0 note(s). More information on running scidiag is provided in ???. Before running this test. Note It is recommended to run this test from the sciadmin GUI (see previous section) because it will perform a more controlled variant of this test and give more helpful results. 1.e. sciadmin Reference for details. Test Execution from Command Line To run this test from the command line. This can happen if cables are not correctly connected.1. Local segment (id=8.1. i.1.4. Local segment (id=8. simply invoke sciconntest (default location /opt/DIS/bin/sciconntest) on all nodes. size=8192) is created. 39 . the Interconnect Load Test puts significant stress on the interconnect and observes if any data transmissions have to be retried due to link errors. size=8192) is shared.4. make sure your cluster is cabled and configured correctly by running the tests described in the previous sections. size=8192) is created. Interconnect Load Test While the static interconnect test sends very a few packets over the links to probe remote adapters. 1. : 0 Segment size : 8192 MinSize : 4 Time to run (sec) : 10 Idelay : 0 No Write : 0 Loopdelay : 0 Delay : 0 Bad : 0 Check : 0 Mcheck : 0 Max nodes : 256 rnl : 0 Callbacks : Yes ---------------------------Probing all nodes Response from remote node 4 Response from remote node 8 Response from remote node 12 Response from remote node 68 Response from remote node 72 Response from remote node 132 Response from remote node 136 Response from remote node 140 Local segment (id=4. Please refer to Appendix B. or reports different topology on different nodes. which can take up to 30 seconds. Test Execution from sciadmin GUI This test can be performed from within the sciadmin GUI tool. 1. The output of sciconntest on one node which is part of a 9-node cluster looks like this: /opt/DIS/bin/sciconntest compiled Oct 2 2007 : 22:29:09 ---------------------------Local node-id : 76 Local adapter no.4.2. TEST RESULT: *PASSED* The static interconnect test passes if scidiag delivers TEST RESULT: *PASSED* and reports the same topology (remote adapters) on all nodes. All instances of sciconntest will connect and start to exchange data. where you will also find hints on what to do if scidiag reports warning or errors. Local segment (id=4. scidiag discovered 0 error(s). plugged in without screws being tightened.

Connect to remote segment. node 136 Remote segment on node 136 is connected. they might be temporarily disabled. size=8192) is created. size=8192) is created.00 (ms) SCICONNTEST_REPORT_END SCI_CB_DISCONNECT:Segment removed on the other node disconnecting. Local segment (id=68. size=8192) is created. size=8192) is created. Connect to remote segment. Although this test can be run while a system is in production. If the test reports failures.00 (ms) node 136 : Found node 136 : Number of failiures : 0 node 136 : Longest failiure : 0.00 (ms) node 12 : Found node 12 : Number of failiures : 0 node 12 : Longest failiure : 0. size=8192) is shared. Local segment (id=72. The test passes if all nodes report 0 failures for all remote nodes.00 (ms) node 68 : Found node 68 : Number of failiures : 0 node 68 : Longest failiure : 0. Connect to remote segment. node 72 Remote segment on node 72 is connected. Local segment (id=140. Local segment (id=68. size=8192) is shared. Connect to remote segment.. SCICONNTEST_REPORT NUM_TESTLOOPS_EXECUTED 1 NUM_NODES_FOUND 8 NUM_ERRORS_DETECTED 0 node 4 : Found node 4 : Number of failiures : 0 node 4 : Longest failiure : 0. size=8192) is shared.00 (ms) node 140 : Found node 140 : Number of failiures : 0 node 140 : Longest failiure : 0. node 68 Remote segment on node 68 is connected.. Local segment (id=12. size=8192) is created. Local segment (id=132. node 4 Remote segment on node 4 is connected..00 (ms) node 72 : Found node 72 : Number of failiures : 0 node 72 : Longest failiure : 0. Local segment (id=132. Connect to remote segment. node 132 Remote segment on node 132 is connected. Connecting to 8 nodes Connect to remote segment. size=8192) is shared. node 8 Remote segment on node 8 is connected. 40 . node 12 Remote segment on node 12 is connected. size=8192) is shared. Local segment (id=136. Connect to remote segment. but you have to take into account that performance of the productive applications will be reduced significantly while this test is running.00 (ms) node 132 : Found node 132 : Number of failiures : 0 node 132 : Longest failiure : 0. size=8192) is shared.. Local segment (id=140. The numerical node identifies shown in this output are the node ID numbers of the adapters (which identify an adapter in the Dolphin Express™ interconnect). Local segment (id=136.00 (ms) node 8 : Found node 8 : Number of failiures : 0 node 8 : Longest failiure : 0. Local segment (id=72. If links actually show problems. size=8192) is created. stopping all communication until rerouting takes place. node 140 Remote segment on node 140 is connected. you can determine the closest pair(s) of nodes for which these failures are reported and check the cabled connection between them.Interconnect and Software Maintenance Local segment (id=12. Connect to remote segment.

39 MBytes/s 4096 12.00 us 341. bytes. To simply gather all relevant low-level performance data.24 MBytes/s 65536 192. bytes. start the client-side benchmark with the options -client and -rn <node id of A>. bytes. bytes.33 MBytes/s scipp The minimal round-trip latency for writing to remote memory should be below 4µs.sh can be called in the same way. bytes.00 us 341. For the D33x and D35x series of Dolphin Express adapters.39 us 81. bytes.89 MBytes/s 32 0.09 MBytes/s 64 0.5.20 us 40.56 us 340.2µs maximal bandwidth for streaming writes to remote memory: 340 MB/s --------------------------------------------------------------Segment Size: Average Send Latency: Throughput: --------------------------------------------------------------4 0.00 us 341.32 MBytes/s 16384 48. average average average average average average average average average average average retries= retries= retries= retries= retries= retries= retries= retries= retries= retries= retries= 1292 365 359 357 4 346 871 832 1072 1643 2738 3. Determine the Dolphin Express node id of both nodes using the query command (default path /opt/DIS/ bin/query).72 MBytes/s 8 0.00 us 41 .03 us 341.45 MBytes/s 8192 24. It will run all of the described tests.01 us 4.99 us 10.Interconnect and Software Maintenance 1. The tests that are relevant for this are scibench2 (streaming remote memory PIO access performance).20 us 19. Interconnect Performance Test Once the correct installation and setup and the basic functionality of the interconnect have been verified. 3. the script sisci_benchmarks.25 us 254. On node A. bytes. Ping Ping Ping Ping Ping Ping Ping Ping Ping Ping Ping Pong Pong Pong Pong Pong Pong Pong Pong Pong Pong Pong round round round round round round round round round round round trip trip trip trip trip trip trip trip trip trip trip latency latency latency latency latency latency latency latency latency latency latency for for for for for for for for for for for 0 4 8 16 32 64 128 256 512 1024 2048 bytes.99 us 17.20 us 80.44 MBytes/s 16 0. the following results can be expected for each test using a single adapter: scibench2 minimal latency to write 4 bytes to remote memory: 0. like: $ scibench2 -client -rn 4 4. bytes. dma_bench (streaming remote memory DMA access performance) and intr_bench (remote interrupt performance).17 MBytes/s 256 0.74 us 344. start the server-side benchmark with the options -server and -rn <node id of B>.49 us 7. The Dolphin Express node id is reported as "Local node-id".89 MBytes/s 512 1.90 MBytes/s 2048 6.69 us 3.05 MBytes/s 1024 3. like: $ scibench2 -server -rn 8 2. On node B. it is possible to perform a set of low-level benchmarks to determine the base-line performance of the interconnect without any additional software layers.98 us 4. bytes.30 us 6.26 us 6.1.00 us 341.37 us 348.04 us 341.03 MBytes/s 32768 96. The test results are reported by the client. scipp (request-response remote memory PIO write performance). All these tests need to run on two nodes (A and B) and are started in the same manner: 1. The average number of retries is not a performance metric and can vary from run to run.58 us 4.49 us 343.94 us 3.22 MBytes/s 128 0. bytes.

On Red Hat systems. Once the status of SuperSockets is running.2.2. The following number have been measured with RHEL 4 (Linux Kernel 2. Here.50 12. Typically. SuperSockets Functionality A benchmark that can be used to validate the functionality and performance of SuperSockets is installed as /opt/ DIS/bin/socket/sockperf. average retries= 8192 bytes.2. you can verify their actual configuration via the files in /proc/net/ af_sci.41 241.85 200.60 50. while the maximum bandwidth (for larger blocks) is at about 250MB/s: 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 19. it means that a configuration file could not be parsed correctly.57 24.06 us intr_bench The interrupt latency is the only performance metric of these tests that is affected by the operating system which always handles the interrupts and can therefore vary.6. The basic usage requires two machines (n1 and n2). “SuperSockets Configuration”.conf and supersockets_profiles. If the status shown here is loaded.73 270.29 44. run the client side of the benchmark like: $ sockperf -h n1 42 . you can verify them according to the reference in Section 2.43 226.08 23.2. which IP address (or network mask) the local node's SuperSockets know about. dma_bench The typical DMA bandwidth achieved for 64kB transfers is 240MB/s. At any time. 15. but not configured. it means that the SuperSockets configuration failed for some reason. The configuration can be performed manually like # /opt/DIS/sbin/dis_ssocks_cfg If this indicates that a configuration file is corrupted.Interconnect and Software Maintenance Ping Pong round trip latency for Ping Pong round trip latency for 4096 bytes.36 21. Start the server process on node n1 without any parameters: $ sockperf On node n2.26 6.99 MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s MBytes/s 1. average retries= 4974 29.40 162. This file should be non-empty and identical on all nodes in the cluster.9): Average unidirectional interrupt time : Average round trip interrupt time : 7.69 20. dis_ssocks_cfg. /proc/net/af_sci 1.05 76.conf) from the default versions that have been installed in /opt/DIS/etc/dis.330 us.30 81. SuperSockets Status The general status of SuperSockets can be retrieved via the SuperSockets init script that controls the service dis_supersockets. sockperf. the file socket_maps shows you. you can re-create dishosts. SuperSockets Functionality and Performance This section describes how to verify that SuperSockets are working correctly on a cluster.74 144.82 us us us us us us us us us us us 3.00 us 9401 53.conf using the dishostseditor and restore modified SuperSockets configuration files (supersockets_ports.80 34. 1.665 us. this can be done like # service dis_supersockets status which should show a status of running.1.42 118.63 19.25 26.

16 7.21 44.26 2215 MB/s 0.20 11.39 52.27 11.38 17.55 17. Recent machines deliver latencies below 3µs.48 0. 1.90 59.47 6.33 6.93 1.3.17 11.87 72.97 4014 65536 64 225.91 111112 64 1000 5. typical Ethernet latencies start at 20µs and more. SuperSockets Utilization To verify if and how SuperSockets are used on a node in operation.05 14. the file /proc/net/af_sci/stats can be used: $ cat /proc/net/af_sci/stats STREAM sockets: 0 DGRAM sockets: 0 TX connections: 0 RX connections: 0 Extended statistics are disabled.82 6.Interconnect and Software Maintenance The output for a working setup should look like this: # # # # # # # # # # # sockperf 1.69 11. please verify if SuperSockets are running and configured as described in the previous section.pattern: sequential wait for data: blocking recv() send mode: blocking client/server pairs: 1 (running on 2 cores) socket options: nodelay 1 communication pattern: PINGPONG (back-and-forth) bytes loops avg_RTT/2[us] min_RTT/2[us] max_RTT/2[us] msg/s 1 1000 4.50 4.77 30.65 85411 192 1000 6.81 115889 12 1000 4.08 22318 8192 512 47.66 42764 2048 1000 15.04 90400 128 1000 5.16 10.32 120177 8 1000 4. In case of latencies being to high.73 224.12 0.38 4.86 2.06 7.14 13.29 4.20 114233 48 1000 4.67 117247 4 1000 4.40 22.60 43. Only the root user can do this: # echo enable >/proc/net/af_sci/stats 43 .79 67.54 94687 80 1000 5. verify that the environment variable LD_PRELOAD is set to libksupersockets.66 5. For more detailed information.31 4.08 116537 16 1000 4.20 11.41 5.47 77291 256 1000 6. but LD_PRELOAD also needs to be set correctly on the server side.67 15.26 3. See Section 4.43 131.01 11.40 1. Latencies above 10µs indicate a problem.56 123. Also. This is reported for the client in the second line of the output (see above).17 116468 24 1000 4.17 The latency in this example starts around 4µs.53 5.24 102.16 3.52 132.68 10.45 10596 16384 256 72.31 18.41 86.80 112.so.87 11.35 .67 18.72 32792 4096 1000 22.29 92473 112 1000 5.88 10.68 230.37 5.16 116251 32 1000 4.18 7.19 46.so address family: 2 client node: n2 server nodes: n1 sockets per process: 1 .37 8.24 17. and on older machines.08 93170 96 1000 5.96 87033 160 1000 5.28 5.59 11.12 11.test stream socket performance and system impact LD_PRELOAD: libksupersockets.30 4.41 73314 512 1000 8.20 80.17 9.8.30 6.54 145. The first line shows the number of open TCP (STREAM) and UDP (DGRAM) sockets that are using SuperSockets.05 6862 32768 128 124.85 5. “Making Cluster Application use Dolphin Express” for more information on how to make generic socket applications (like sockperf) use SuperSockets.20 78.29 4.24 79383 224 1000 6.52 59766 1024 1000 11.16 91. the extended statistics need to be enabled.45 8.03 33.79 3.74 5. the latency may be higher.25 14.

proceed as follows: 1. If the error should persist. Thus. Perform Scidiag -V 1 on both nodes and verify if any error counters have increased. To replace a single SCI cable. However. swap the position of the cable with another one that is known to be working.Interconnect and Software Maintenance With enabled statistics. please keep in mind that the listed receive sizes (RX) may be incorrect as it refers to the maximal number of bytes that a process wanted to recv when calling the related socket function. If running Cluster Test is not an option. If one or more cables need to be replaced (or just need to be disconnected and reconnect again for some reason).65 MB/s . 2. the cable is likely to be bad.43 MB/s (1 s) RX: 149.. b. if the cluster is in production. To observe the current throughput on all SuperSockets-driven sockets.83 MB/s TX: 168. you should proceed cable by cable. This will cause dis_ssocks_stat to loop until interrupted. To verify that the new cable is working properly. The link will show up as enabled in the sciadmin GUI. especially the CRC error counter. If it does. Note that this test will stop all other communication on the interconnect while it is running. and the link will show up as disabled in the sciadmin GUI. Replacing SCI Cables The Dolphin Express interconnect is fully hot-pluggable. Show available options. make sure that the cable is plugged in cleanly. but are currently in fallback (Ethernet) mode. Print all output to a single line. Scidiag Clear to reset the statistics of both nodes connected to the cable that has been replaced. /proc/net/af_sci/stats will display a message size histogram (next to some internal information).. 2. 44 . Typically. • If any of the verifications did report errors. there will be no fallback traffic. you can do this while all nodes are up and running. Properly (re)connect the (replacement) cable. Print time stamp next to measurement point. Otherwise. 3. one of the adapters might have a problem. Supported options are: -d -t -w -h Delay in seconds between measurements.82 MB/s TX: 165. two alternative procedures can be performed using the sciadmin GUI: • Run the Cluster Test from within sciadmin. The number pair in parentheses shows the throughput of sockets that operated by SuperSockets. only the send (TX) values are reliable. Example: # dis_ssocks_stat -d 1 -t (1 s) RX: 162. The LEDs on the affected adapters will turn yellow for this link. and observe if the problem is wandering with the cable. c. the tool dis_ssocks_stat can be used. Disconnect the cable at both end. you can check for potential transfer errrors as follows: a. ( 0 B/s 0 B/s ) ( 0 B/s 0 B/s ) Mon Nov 12 17:59:33 CET 2007 Mon Nov 12 17:59:34 CET 2007 The first two pairs show the receive (RX) and send (TX) throughput via Dolphin Express of all sockets. Operate the nodes under normal load for some minutes. and that the screws are secured. Many applications use larger buffers then actually required. and not disconnect all affected cables at once to ensure continue operation without significant performance degradation. Observe that the LEDs on the adapters within the ringlet ligth green again after the cable is connected. When looking at this histogram.

Make sure that all LEDs on all adapters in the affected ringlets light green again. If this is not an option. Physically Moving Nodes If you move a node physically without changing the cabling. it is obvious that no configuration change are necessary.1. 2. which can be up to a minute. 6. Remember (or mark the cables) into which plug on the adapter each cables belongs. as illustrated in this example for a 4-node machine: Remote 2. Power up the node. and insert the new one into the node. Power down the node. The network manager will automatically reroute the interconnect traffic. 3. please use scidiag from the commandline to verify the functionality of the interconnect (see Section 1.3. and all nodes should be visible to this adapter. Please proceed as follows: 1. Unplug all cables from the adapter. The icon of the node in the sciadmin GUI must have turned green again. Run Scidiag -V 1 for this adapter from sciadmin. When you run sciadmin on the frontend. 4. When you run sciadmin on the frontend. Warning Powering down more than one node will make other nodes not accessible via SCI if the powereddown nodes are not located within one ringlet. you will see the icon of the node turn red within the GUI representation of the cluster nodes. This can easily be done using either dishostseditor or a plain text editor.Interconnect and Software Maintenance 3. 5. or for other reasons place a node which has been part of the cluster at another position within the interconnect that requires the node to be connected to other nodes than before. If however you have to exchange two nodes. Replacing a PCI-SCI Adapter In case an adapter needs to be replaced. “Static Interconnect Test”). 4. Perform the cable test from within sciadmin to ensure that the cabling is correct (see Section 4. Move the nodes to the new location and connect the cables to the adapters. Do not yet power them up! 45 . The output of the dis_services script should list all services as running. To verify the installation of the new adapter: 1. proceed as follows: 1. Communication between all other nodes will continue uninterrupted during this procedure. 3.7. Unmount the old adapter from to be replaced.4. No errors should be reported. than this change has to reflect in the cluster configuration as well. 2. The network manager will automatically reroute the interconnect traffic. “Cabling Correctness Test”). Warning Running the cable test will stop other traffic on the interconnect for the time the test is running. Power down all nodes that are to be moved. you will see the icon of the node turn red within the GUI representation of the cluster nodes. then connect the SCI cables in the same way they had been connected before.

If this is not an option. Power up the nodes. load the original configuration (by running dishostseditor on the frontend. 6.Interconnect and Software Maintenance 3.1. then connect the SCI cables in the same way they had been connected before. When using a plain text editor. Communication between all other nodes will continue uninterrupted during this procedure. which can be up to a minute. Remember (or mark the cables) into which plug on the adapter each cables belongs. To verify the installation after the SIA has finished: 1. 5. Install the adapter in the nodes to be added and power the nodes up. Run the SIA with the option --install-node.7. Update the cluster configuration file /etc/dis/dishosts. Unmount the adapter from the node to be replaced. you will see the icon of the node turn red within the GUI representation of the cluster nodes. 5. Restart the network manager on the frontend to make the changed configuration effective. their configuration will be change by the network manager to reflect their new position within the interconnect. • 4.conf on the frontend by either using dishostseditor or a plain text editor: • If using dishostseditor. Bring this group of nodes back to operation before powering down the next group of nodes. 5. The network manager will automatically reroute the interconnect traffic. Power up the node. Perform the cable test from within sciadmin to ensure that the cabling is correct (see Section 4. 2. please proceed as follows: 1. The output of the dis_services script should list all services as running. In case that more than one node needs to be replaced.4. please use scidiag from the commandline to verify the functionality of the interconnect (see Section 1. Save the new configuration. and insert it into the new node. you should replace nodes in a ring-by-ring manner: power down nodes within one ringlet only. 6. Warning Running the cable test will stop other traffic on the interconnect for the time the test is running. it will be loaded automatically) and change the positions of the nodes within the torus. Replacing a Node In case a node needs to be replaced. 3. Unplug all cables from the adapter. but this will not affect functionality. proceed as follows concerning the SCI interconnect: 1. 2. Power down the node. please consider the following advice: To ensure that all nodes that are not be replaced can continue to communicate via the SCI interconnect while other nodes are replaced. The icon of the node in the sciadmin GUI must have turned green again.3. 46 . “Cabling Correctness Test”). exchange the hostnames of the nodes in this file. When you run sciadmin on the frontend. You can also change the adapter and socket names accordingly (which typically contain the hostnames). Once they come up. “Static Interconnect Test”). 4. Make sure that all LEDs on all adapters in the affected ringlets light green again. Adding Nodes To add new nodes to the cluster.

Removing Nodes To permanently remove nodes from the cluster. If this is not an option. 47 . If you are not running dishostseditor on the frontend. 4.4. Proceed cable-by-cable. “Static Interconnect Test”). 2. 2. If the SuperSocket™ configuration is based on the hostnames (not on the subnet addresses). you have to make sure that the hostnames and the placement within the new topology for the remaining nodes is correct. 6. all LEDs on all adapters should light green. Warning Running the cable test will stop other traffic on the interconnect for the time the test is running. which can be up to a minute. Change the cluster configuration using dishostseditor: 1. Restart the network manager on the frontend. 3. 3. Save the new cluster configuration. “Cabling Correctness Test”).conf and cluster. change the topology to match the topology with all nodes removed. which means you disconnect a cable at an "old" node and immeadeletly connect it to a new node (without disconnecting another cable first). change the hostnames of nodes by double-clicking their icon and changing the hostname in the displayed dialog box. 3. Install the DIS software stack on all nodes via the --install-node option of the SIA. 3.7. 5. Load the existing configuration. Also make sure that the socket configuration matches those of the existing nodes. Then connect the new nodes to the cluster. change the topology to match the topology with all new nodes added. 4. please use scidiag from the commandline to verify the functionality of the interconnect (see Section 1. Load the existing configuration. create and save or print the cabling instructions for the extended cluster. 2. 5. If desired. 7.Interconnect and Software Maintenance Important Do not yet connect the SCI cables. If you run sciadmin. This will ensure continue operation of all "old" nodes. please proceed as follows: 1. Perform the cable test from within sciadmin to ensure that the cabling is correct (see Section 4. All other nodes should continue to stay green. Change the cluster configuration using dishostseditor: 1. Cable the new nodes: 1.3. Because the topology change might cut of nodes from the cluster at the "wrong" end. the new nodes should show up as red icons. but make sure that Ethernet communication towards the frontend is working. make sure that the name of the socket interface matches a modified hostname.conf to the directory /etc/dis on the frontend. First create the cable connections between the new nodes. transfer the saved files dishosts. 2. and all node icons in sciadmin should also light green.1. To do this. When you are done. Change the hostnames of the newly added nodes via the node settings of each node. In the cluster settings. In the cluster settings.

If you are not running dishostseditor on the frontend. like: # sh DIS_install_<version>. On the nodes that have been removed from the cluster. All other nodes should continue to stay green. Perform the cable test from within sciadmin to ensure that the cabling is correct (see Section 4. If desired. services and configuration data from the node.3.conf and cluster. If no SIA is available. This will remove all Dolphin software packages. create and save or print the cabling instructions for the reduced cluster. the same effect can be achieved by manually uninstalling all packages that start with Dolphin-. which can be up to a minute.4. the removed nodes should no longer show up. please use scidiag from the commandline to verify the functionality of the interconnect (see Section 1.1. transfer the saved files dishosts. Restart the network manager on the frontend. Warning Running the cable test will stop other traffic on the interconnect for the time the test is running. Uncable the nodes to be removed one by one. 2. If this is not an option. and remove the configuration directory /etc/dis. Save the new cluster configuration. 5.7. If you run sciadmin. “Static Interconnect Test”).sh --wipe 4. 5. the Dolphin Express software can easily be removed using the SIA option --wipe. “Cabling Correctness Test”).Interconnect and Software Maintenance 4. 48 . 3. making sure that the remaining nodes are cabled according to the cabling instructions generated above.conf to the directory /etc/dis on the frontend. remove potentially remaining installation directories (like /opt/DIS).

To set the node-global default. It has however shown that for certain MySQL Cluster setups and loads.log) to see if any interconnect events have been logged. If no events have been logged. This optimization can be controlled on a per-process basis and with a node-global default.2. This should rarely happen. please proceed as follows: 1. you should try to verify that no node is overloaded or stuck.5).4. It is therefore recommended that you evaluate which setting of this optimization delivers the best performance for your setup and load. SuperSockets Poll Optimization SuperSockets offer and optimization of the functions used by processes to detect new data or other status changes on sockets (poll() and select()). 1. Please refer to Section 2. but to solve this problem. offered by MySQL. “Traffic Test”). One other possible reason could be that the time for a socket fail over between Dolphin Express and Ethernet exceeded the current timeout. performance with SuperSockets is significantly better than with the SCI Transporter. The SCI Transporter is a SISCI-based data communication channel. you need to set the environment variable SSOCKS_SYSTEM_POLL to 0 to enable the optimization and to 1 to disable the optimization and use the functions provided by the operating system. which is no longer maintained. However. 1. 3. Generally. “SuperSockets Configuration” for more details. For a per-process setting. This optimization typically helps to increase performance. Platforms that have SuperSockets available should use those instead of SCI Transporter. Verify the state of the Dolphin Express by checking the logfile of the networkmanager (/var/log/ scinetworkmanager. Explicit per-process settings override the global default. 1. Increase the value of the NDBD configuration parameter TransactionDeadlockDetectionTimeout (see MySQL reference manual. Instead. increasing it to 5000ms might be a good start. SuperSockets operate fully transparently and no change in the MySQL Cluster configuration is necessary for a working setup. it is very unlikely that the interconnect or SuperSockets are the cause of the problem. this optimization can have a negative impact on performance.4. try restarting transaction Such timeout problems can have different reasons like dead or overloaded nodes.3. which by default is 1200ms.1. you will see error messages like ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded.2. SCI Transporter Prior to SuperSockets. the setting of the variable system_poll needs to be changed in /etc/dis/supersockets_profiles. If there are repeated logged error events for which no clear reason (such as manual intervention or node shutdown) can be determined. you should test the interconnect using sciadmin (see Section 4. MySQL Cluster Using MySQL Cluster with Dolphin Express and SuperSockets will significantly increase throughput and reduce the response time. 1. 49 . NDBD Deadlock Timeout In case you experience timeout errors between the ndbd processes on the cluster nodes. section 16. MySQL Operation This chapter covers topics that are specific to operating MySQL and MySQL Cluster with SuperSockets. Please read below for some hints on performance tuning and trouble-shooting specific to SuperSockets with MySQL Cluster. As the default value is 1200ms. MySQL Cluster could use SCI for communication by means of the SCI Transporter. You will need to add this line to the NDBD default section in your cluster configuration file: [NDBD DEFAULT] TransactionDeadlockDetectionTimeout: 5000 2.Chapter 7.conf.

All that is necessary is to make sure that all MySQL server processes involved run with the LD_PRELOAD variable set accordingly. MySQL Replication SuperSockets do also significantly increase the performance in replication setups (speedups up to a factor of 3 have been reported). No MySQL configuration changes are necessary. 50 .MySQL Operation 2.

but i. FAILED or UNSTABLE. SuperSockets will fall back to communicate via Ethernet if it is available. 1.Chapter 8. Can be either UP. the content of this chapter is not relevant. While in status UNSTABLE. Can be either UP. After a certain period of less frequent internal status changes (which are continuously recorded by network manager). or the removal of an environment variable. Notification Interface When the network manager invokes the specified script or executable. This is unlikely and does not necessarily make an alert script fail. For most installations.. Advanced Topics This chapter deals with techniques like performance analysis and tuning. DIS_ALERT_VERSION The version number of this interface (currently 1). 1 or 2. DEGRADED. irregular topologies and debug tools. the content of this variable needs to be an email address. Can be 0. FAILED UNSTABLE 1. This target address is provided by the user when the notification is enabled (see below). 1. The following variables are set: DIS_FABRIC DIS_STATE The number of the fabric for which this notification is generated. The previous state of the fabric.e.1. If the interconnect is changing states frequently (i. UNSTABLE is a state which is only visibly externally.e. and the user needs to make sure that the content of this variable is useful for the chosen alert script. The content of these variables can be evaluated by the script or executable. but one or more interconnect links have been disabled. Disabling links can either happen manually via sciadmin. if the alert script should send an email. but the overall performance of the interconnect may be reduced. or through the network manager because of problems reported by the node managers. It will be increased if incompatible changes to the interface need to be introduced. it hands over a number of parameters by setting environment variables. DIS_OLDSTATE DIS_ALERT_TARGET This variable contains the target address for the notification. FAILED or UNSTABLE. all nodes can still communicate via Dolphin Express. The new state of the fabric. the network manager will enable verbose logging (to /var/log/ scinetworkmanager. Notification on Interconnect Status Changes The network manager provides a mechanism to trigger actions when the state of the interconnect changes. In status DEGRADED. but a script 51 .e. DEGRADED or FAILED. Interconnect Status The interconnect can be in any of these externally visible states: UP DEGRADED All nodes and interconnect links are functional. the external state will again be set to either UP. All nodes are up.2. I. because nodes are rebooted one after the other). which could be a change in the possible content of an existing environment variable. The action to be triggered is a user-definable script or executable that is run by the network manager when the interconnect status changes. One or more nodes are down (the node manager is not reachable via Ethernet). These nodes can not communicate via Dolphin Express. and/or a high number of links has been disabled which isolates one or more nodes from the interconnect.log) to make sure that no internal events are lost. DEGRADED. the interconnect will enter the state UNSTABLE.

3. To disable notification.3. tick the check box above Alert target as shown in the screenshot below.1. Dolphin provides an alert script /opt/DIS/etc/dis/alert. 1.Advanced Topics that relies on this interface in a way where this matters needs to verify the content of this variable. Please consider that this script will be executed in the context of the user running the network manager (typically root).sh (for the default installation path) which sends out an email to the specified alert target. so the permissions to change this file should be set accordingly. After the file has been edited. Configure Notification Manually If the dishostseditor can not be used.3.conf. you need to save the configuration files (to /etc/dis on the frontend) and then restart the network manager: # service dis_networkmgr restart 1. This parameter specifies the alert target <target> which is passed to the chosen alert script. these lines can be commented out (precede them with a #). In the Cluster Edit dialog. Then enter the alert target and choose the alert script by pressing the button and selecting the script in the file dialog. Notification is controlled by two options in this file: -alert_script <file> -alert_target <target> This parameter specifies the alert script <file> to be executed.2. Setting Up and Controlling Notification 1. the network manager needs to be restarted to make the changes effective: 52 . Any other executable can be specified here. it is also possible to configure the notification by editing /etc/dis/ networkmanager. To make the changes done in this dialog effective. Configure Notification via the dishostseditor Notification on interconnect status changes is done via the dishostseditor.

This files contains numerous parameter settings.1. and then copy the file over to all other nodes. Disabling and Enabling Notification Temporarily Once the notification has been configured. it can be controlled via sciadmin. Verifying Notification To verify that notification is actually working. it is necessary to restart the complete software stack on the nodes: # dis_services restart 2. Generally.3. 2. This can easily be done from sciadmin by disabling any link via the Node Settings dialog of any node. open the Cluster Settings dialog and switch the setting next to Alert script as needed. Because all other drivers depend on it. This is useful if the alerts should be stopped for some time. 1. interconnect status changes will not be notified until the network manager is restarted.conf will 53 . “dis_irm. This is a per-session setting and will be lost if the network manager is restarted. To make changes in the configuration file effective.conf file needs to be changed on each node. Managing IRM Resources A number of resources in the low-level driver IRM (service dis_irm) are run-time limited by parameter in the driver configuration file /opt/DIS/lib/modules/<kernel version>/dis_irm.1. please refer to Section 3. for those parameters that are relevant for changes by the user. you should provoke a interconnect status change manually. To disable alerts. Updates with Modified IRM Configuration You need to be careful when updating RPMs on the nodes with a modified dis_irm. If you directly use RPMs to update the existing Dolphin-SCI RPM like using rpm -U. to change (increase) default limits. Typically. dis_irm.4.conf.Advanced Topics # service dis_networkmgr restart 1.conf”. you need to restart the dis_irm driver.3. Otherwise. Warning Make sure that the messages are enabled again before you quit sciadmin. the existing and modified dis_irm.3. you should edit and test the changes on one node.conf (for the default installation path).

You will need to undo this file renaming. 54 .Advanced Topics be moved to dis_irm. If you update your system with the SIA as described in Chapter 4.rpmsave. SIA will take care that the existing dis_irm. and the default dis_irm.conf will replace previously modified version.conf will be preserved and stay effective.conf. Update Installation.

1.3. 2. all required services and drivers are running on all nodes. These symptoms indicate that the cabling is not correct.1. and all LEDs shine green on the adapter boards. To resolve the problem. 2.Chapter 9. and power it up again. If you are building a small cluster you may be able to run your application with less SCI address space. This is done by editing /boot/grub/grub.9-11. the related resources in the kernel have to be increased.ELsmp ro root=/dev/sda6 rhgb quiet vmalloc=256m initrd /i386/initrd-2. Running dmesg shows that the syslog contains the line Out of vmalloc space. For x86-based machines. All cables are connected. A value of 64 or 16 will most likely overcome the problem. during the installation) that this node does not contain an SCI adapter! The SCI adapter might not have been recognized by the node during the power-up initialization after power was applied again. Run the cable test from sciadmin (Server Test Cable Connections).1. wait for at least 5 seconds.1. There are two alternative solutions: 1.e. If no problem is reported. I am told (i.2. Hardware 1. This problem has so far only been observed on 32 bit operating systems.ELsmp.0) uppermem 524288 kernel /i386/vmlinuz-2. If reducing the prefetch memory size is not desired. please contact Dolphin support. the problem should be solved. To make the adapter be recognized again.6. If this does not fix the problem. However. Between some other pairs of nodes. 1. What's wrong? The problem is that the SCI adapter requires more virtual PCI address space than supported by the installed kernel.conf as shown in the following example: title CentOS-4 i386 (2.and y-direction in a 2D-torus) are exchanged. 3.6. This operation can also be performed from the command line using the options -c to specify the card number (1 or 2) and -spms to specify the prefetch memory size in Megabytes: # sciconfig -c 1 -spms 64 Card 1 .until no more problems are reported.img 55 . i. please contact Dolphin Support. When rebooting the machine. To fix the cable problem. The SCI driver dis_irm refuses to load. The specification requires that a node needs to be powered down for at least 5 seconds before being powered up again.ELsmp) root (hd0.e. proceed as follows: 1. and 2. the communication works fine. You can change the SCI address space size for the adapter card by using sciconfig with the command set-prefetch-mem-size. you will need to power-down the node (restarting or resetting is not sufficient!). dreate a cabling description via dishotseditor (File Get Cabling Instructions) and the cabling between the nodes that have been reported in the cable test.9-11. the links 0 and 1 (x. Repeat step 1.6. Although I have properly installed the adapter in a node and its LEDs light orange.9-11. 1. FAQ This chapter lists problems that have been reported by customers and the proposed solutions. this is achieved by passing the kernel option vmalloc=256m and the parameter uppermem=524288 at boot time.Prefetch space memory size is set to 64 MB A reboot of the machine is required to make the changes take effect. or driver install never completes. 1. some nodes can not see some other nodes via the SCI interconnect.

FAQ

2. Software
2.1.1. The service dis_irm (for the low-level driver of the same name) fails to load after it has been installed for the first time. Please follow the procedure below to determine the cause of the problem. 1. Verify that the adapter card has been recognized by the machine. This can be done as follows:
[root@n1 ~]# lspci -v | grep Dolphin 03:0c.0 Bridge: Dolphin Interconnect Solutions AS PSB66 SCI-Adapter D33x Subsystem: Dolphin Interconnect Solutions AS: Unknown device 2200

If this command does not show any output similar to the example above, the adapter card has not been recognized. Please try to power-cycle the system according to FAQ Q: 1.1.1. If this does not solve the issue, a hardware failure is possible. Please contact Dolphin support in such a case. 2. Check the syslog for relevant messages. This can be done as follows:
# dmesg | grep SCI

Depending on which messages you see, proceed as described below: SCI Driver : Preallocation failed The driver failed to preallocate memory which will be used to export memory to remote nodes. Rebooting the node is the simplest solution to defragment the physical memory space. If this is not possible, or if the message appears even after a reboot, you need to adapt the preallocation settings (see Section 3.1, “dis_irm.conf”). See FAQ Q: 1.1.3.

SCI Driver: Out of vmalloc space

3.

If the driver still fails to load, please contact Dolphin support and provide the driver's syslog messages:
# dmesg > /tmp/syslog_messages.txt

2.1.2. Although the Network Manager is running on the frontend, and all nodes run the Node Manager, configuration changes are not applied to the adapters. I.e., the node ID is not changed according to what is specified in /etc/dis/dishosts.conf on the frontend. The adapters in a node can only be re-configured when they are not in use. This means, no adapter resources must be allocated via the dis_irm kernel module. To achieve this, make sure that upper layer services that use dis_irm (like dis_sisci and dis_supersockets) are stopped. On most Linux installations, this can be achieved like this (dis_services is a convenience script that come with the Dolphin software stack):
# dis_services stop ... # service dis_irm start ... # service dis_nodemgr start ...

2.1.3. The Network Manager on the frontend refuses to start. In most cases, the interconnect configuration /etc/dis/dishosts.conf is corrupted. This can be verified with the command testdishosts. It will report problems in this configuration file, as in the example below:
# testdishosts socket member node-1_0 does not represent a physical adapter in dishosts.conf

56

FAQ

DISHOSTS: signed32 dishostsAdapternameExists() failed

In this case, the adapter name in the socket definition was misspelled. If testdishosts reports a problem, you can either try to fix /etc/dis/dishosts.conf manually, or re-create it with dishostseditor. If this does not solve the problem, please check /var/log/scinetworkmanager.log for error messages. If you can not fix the problem reported in this logfile, please contact Dolphin support providing the content of the logfile. 2.1.4. After a node has booted, or after I restarted the Dolphin drivers on a node, the first connection to a remote node using SuperSockets does only deliver Ethernet performance. Retrying the connection then delivers the expected SuperSockets performance. Why does this happen? Make sure you run the node manager on all nodes of the cluster, and the network manager on the frontend being correctly set up to include all nodes in the configuration (/etc/dis/dishosts.conf). The option Automatic Create Session must be enabled. This will ensure that the low-level "sessions" (Dolphin-internal) are set up between all nodes of the cluster, and a SuperSockets connection will immediately succeed. Otherwise, the set-up of the sessions will not be done until the first connection between two nodes is tried, but this is too late for the first connection to be established via SuperSockets. 2.1.5. Socket benchmarks show that SuperSockets are not active as the minimal latency is much more than 10us. The half roundtrip latency (ping-pong latency) with SuperSockets typically starts between 3 and 4us for very small messages. Any value above 7us for the minimal latency indicates a problem with the SuperSockets configuration, benchmark methodology of something else. Please proceed as follows to determine the reason: 1. Is the SuperSockets service running on both nodes? /etc/init.d/dis_supersockets status should report the status running. If the status is stopped, try to start the SuperSockets service with /etc/init.d/ dis_supersockets start. 2. Is LD_PRELOAD=libksupersockets.so set on both nodes? You can check using the ldd command. Assuming the benchmark you want to run is named sockperf, do ldd sockperf. The libksupersockets.so should appear at the very top of the listing. 3. Are the SuperSockets configured for the interface you are using? This is a possible problem if you have multiple Ethernet interfaces in your nodes with the nodes having different hostnames for each interface. SuperSockets may be configured to accelerate not all of the available interfaces. To verify this, check which IP addresses (or subnet mask) are accelerated by SuperSockets by looking at /proc/net/af_sci/socket_maps (Linux) and use those IP addresses (or related hostnames) that are listed in this file. 4. If the SuperSockets service refuses to start, or only starts into the mode running, but not configured, you probably have a corrupted configuration file /etc/dis/dishosts.conf: verify that this file is identical to the same file on the frontend. If not, make sure that the Network Manager is running on the frontend (/etc/init.d/dis_networkmgr start). 5. If the dishosts.conf files are identical on frontend and node, they could still be corrupted. Please run the dishostseditor on the frontend to have it load /etc/dis/dishosts.conf; then save it again (dishostseditor will always create syntactically correct files). 6. Please check the system log using the dmesg command. Any output there from either dis_ssocks or af_sci should be noted and reported to <support@dolphinics.com>. 2.1.6. I am running a mixed 32/64-bit platform, and while the benchmarks latency_bench and sockperf from the DIS installation show good performance of SuperSockets™, other applications do only show Ethernet performance for socket communication.

57

FAQ

Please use the file command to verify if the applications that fail to use SuperSockets are 32-bit applications. If they are, please verify if the 32-bit SuperSockets™ library can be found as /opt/DIS/lib/ libksupersockets.so (this is a link). If this file is not found, then it could not be built due to a missing or incomplete 32-bit compilation environment on your build machine. This problem is indicated by the message #* WARNING: 32-bit applications may not be able to use SuperSockets of the SIA. If on a 64-bit platform 32-bit libraries can not be built, the RPM packages will still be built successfully (without 32-bit libraries included) as many users of 64-bit platforms do only run 64-bit applications. To fix this problem, make sure that the 32-bit versions of the glibc and libgcc-devel (packages) are installed on your build machine, and re-build the binary RPM packages using the SIA option --build-rpm, making sure that the warning message shown above does not appear. Then, replace the existing RPM package DolphinSuperSockets with the one you have just build. Alternatively, you can perform a complete re-installation. 2.1.7. I have added the statement export LD_PRELOAD=libksupersockets.so to my shell profile to enable the use of SuperSockets™. This works well on some machines, but on other machines, I get the error message
ERROR: ld.so object 'libksupersockets.so' from LD_PRELOAD cannot be preloaded : ignore

whenever I log in. How can this be fixed? This error message is generated on machines that do not have SuperSockets installed. On these machines, the linker can not find the libksupersockets.so library. This can be fixed to set the LD_PRELOAD environment variable only if SuperSockets™ are running. For a sh-type shell such as bash, use the following statements in the shell profile ($HOME/.bashrc):
[ -d /proc/net/af_sci ] && export LD_PRELOAD=libksupersockets.so

2.1.8. How can I permanetly enable the use of SuperSockets™ for a user? This can be achieved by setting the LD_PRELOAD environment variable in the users' shell profile (i.e. $HOME/ .bashrc for the bash shell). This should be done conditionally by checking if SuperSockets™ are running on this machine:
[ -d /proc/net/af_sci ] && export LD_PRELOAD=libksupersockets.so

Of course, it is also possible to perform this setting globally (in /etc/profile). 2.1.9. I can not build SISCI applications that are able to run on my cluster because the frontend (where the SISCI-devel package was installed by the SIA) is a 32-bit machine, while my cluster nodes are 64-bit machines (or vice versa). I fail to build the SISCI applications on the nodes as the SISCI header files are missing. How can this deadlock be solved? When the SIA installed the cluster, it has stored the binary RPM packages in different directories node_RPMS, frontend_RPMS and source_RPMS. You will find a SISCI-devel RPM that can be installed on the nodes in the node_RPMS directory. If you can not find these RPM file, you can recreate them on one of the nodes using the SIA with the --build-rpm option. Once you have the Dolphin-SISCI-devel binary RPM, you need to install it on the nodes using the --force option of rpm because the library files conflict between the installed SISCI and the SISCI-devel RPM:
# rpm -i --force Dolphin-SISCI-devel.<arch>.<version>.rpm

58

Appendix A.conf needed on the frontend by the Network Manager. kernel headers and configuration (kernel-devel) as well as GUI development package (qt-devel) are needed on the local machine. Option: --install-frontend 1. Kernel header files and the kernel configuration are required (package kernel-devel). Option: --build-rpm 1.3. 1. or when you want source-code access in general.1. This is a single file that contains the complete source code as well as a setup script that can perform various operations. due to limitations of the current build system. qt-devel. the kernel headers and configuration are still required for Linux systems. This tool is used to define the topology of the interconnect and the placement of the nodes within this topology. Full Cluster Installation The full cluster installation mode will install and test the full cluster in a wizard-like guided installation. and it will be tested if the requirements to perform the installation are met. Building RPM Packages Only Build all source and binary RPMs on the local machine. but can also be selected explicitly. Extraction of Source Archive It is possible to extract the sources from the SIA as a tar archive DIS. X packages).e.conf and networkmanager. installing and testing the required software on all nodes and the frontend. it can create the detailed cabling instructions (useful to cable non-trivial cluster setups) and the cluster configuration files dishosts. but no GUI packages (like qt. This is required to build and install on non-RPM platforms. i. SIA Operating Modes This section explain the different operations that can be performed by the SIA.gz in the current directory. Option: --install-editor 1.6. Frontend Installation The frontend installation will only build and install those RPM packages on the local host that are needed to have this host run as a frontend. Both.tar. 59 . Installation of Configuration File Editor Build and install the GUI-based cluster configuration tool dishostseditor. Option: --install-all 1. Option: --install-node 1. However. A short usage information will be displayed when calling the SIA archive with the --help option: 1.4. With this information. All required information will be asked for interactively.2. This mode is the default mode. Node Installation Only build and install the kerrnel modules and Node Manager service needed to run an interconnect node. Self-Installing Archive (SIA) Reference Dolphin provides the complete software stack as a self-installing archive (SIA). compiling.5.

Example: --nodes n01.n04 If this option is provided. the same installation path will be used on all nodes. or will use a very high number of stream sockets per node (datagram sockets are multiplexed and thus need less resources). When doing a full cluster install (--install-all. the driver allocates 8 + N*MB Megabytes of memory. 2. Installation Path Specification By default. 2. Change this setting if you know you will need more memory to be exported. just as the SIA does. The packages have to be placed in two subdirectories node_RPMS and frontend_RPMS. with N being the number of nodes in the cluster and MB = 4 by default. the complete software stack will be installed to /opt/DIS. the frontend and potentially the installation machine (if different from the frontend). SIA Options Next to the different operating modes. A maximum of 256MB will be allocated. you can save time by not building these packages again. Then.2.1. existing configuration files like /etc/dis/dishosts.n03. It is recommended to install into a dedicated directory that is located on a local storage device (not mounted via the network). Installing from Binary RPMs If you are re-running an installation for which the binary RPM package have already been built. Example: 60 . To change the installation path. but use the existing ones. 2. but the RPM installation itself will fail in this case. 2. Preallocation of SCI Memory It is possible to specify the number of Megabytes per node that the low-level interconnect driver dis_irm should allocate on startup for exportable memory segments. Node Specification In case that you want to specify the list of nodes not interactively but on the command line. Example: --use-rpms $HOME/dolphin The installer does not verify if the provided packages match the installation target.3. or default operation). how many SuperSocket™-based sockets can be opened. Not all options have an impact on all operating modes. provide the name of the directory containing these two subdirectories to the installer using the --use-rpms option.conf will not be considered.e. The factor MB can be specified on installation using the --prealloc option. Example: --prefix /usr/dolphin This will install into /usr/dolphin. The default setting was chosen to work well with clustered databases.Self-Installing Archive (SIA) Reference Option: --get-tarball 2. use the --prefix option.n02. a number of options are available that influence the operation. you can use the option --nodes together with a comma-separated list of hostnames and/or IP addresses to do so. The amount of this memory determines i.4. By default.

it is possible to specify that no GUI-applications (sciadmin and dishostseditor) should be build. your cluster is guaranteed to be freshly installed unless any error messages can be found in the file install. 2. Enforce Installation If the installed packages should be replaced with the packages build from the SIA you are currently using even if the installed packages are more recent (have a higher version number).conf will never be changed. because you have created them on another machine or received them from Dolphin and stored them someplace else).8.5.7. Example: --config-dir /tmp The script will look for both configuration files in /tmp.conf in the default path /etc/dis on the installation machine. This will enforce the installion of the same software version (the one delivered within this SIA) on all nodes and the frontend no matter what might be installed on any of these machines.6. you can use the --batch option to have the script assume the default answer for every question that is asked. use the option --enforce. Examples: --enforce 2. use the options --dishostsconf <filename> and --networkmgr-conf <filename>. Additionally. the installation script will automatically look for the cluster configuration files dishosts. An existing configuration file dis_irm. This option can be very useful if you are upgrading an already installed cluster./DIS_install_<version> --batch --reboot --enforce >install. this will make the dis_irm allocate 8 + 16*8 = 136MB on each node. 2. If these files are not stored in the default path (i. you can avoid most of the console output (but still have the full logfile) by providing the option --quiet. Non-GUI Build Mode When building RPMs only (using the --build-rpm option). you could issue the following command on the frontend: # .conf.it is effectively invisible.e. Configuration File Specification When doing a full cluster install. Setting MB to 0 is also valid (8 MB will be allocated). to enforce the installation of newly compiled RPM packages and reboot the nodes after the installation. This option changes a value in the module configuration file dis_irm. Setting MB to -1 will disable all modifications to this configuration option.Self-Installing Archive (SIA) Reference --prealloc 8 On a 16 node cluster. when upgrading an existing installation. and the fixed default of 16MB will be preallocated independently from the number of nodes. This removes the dedency on the QT libraries and header files for the build process Example: --disable-gui 61 . Batch Mode In case you want to run the installation unattended. Note The operating system can not use preallocated memory for other purposes . you can specify this path using the --config-dir option. This is done by providing the --disable-gui option. i.conf and networkmanager. respectively.e.log. to specify where each of the configuration files can be found. If you need to specify the two configuration files being stored in different locations. 2.e.log After this command returns. I. It is only effective on an initial installation..

9.Self-Installing Archive (SIA) Reference 2. can be achieved with the --wipe options: --wipe This option is a superset of --uninstall. Software Removal To remove all software that has been installed via SIA. simply use the --uninstall option: --uninstall This will remove all packages from the node. and stop all drivers (if they are not in use). including all configuration data and possible remainings of non-SIA installations. A more thorough cleanup. 62 .

In order to connect to the network manager you may either start sciadmin with the -cluster option. 2. Type the hostname or IP-address of the machine running the Dolphin Express network manager. Icons As a visual tool. Interconnect Status View 2. sciadmin uses icons to let the user trigger actions and to display information by changing the icon shape or color. Node or Adapter State Dolphin Network Manager has a valid connection to Dolphin Node Manager on the node. adapter is wrongly configured. broken or the driver is in an invalid state. you can connect to the network manager from your sciadmin process (which needs to be running on a different machine than the other sciadmin process). Table B. 63 . Afterwards.Appendix B. If you should ever need to connect to the network manager while another sciadmin process is blocking the connection. The icons with all possible states are listed in the tables below. Note Only one sciadmin process can connect to the network manager at any time. Adapter has gone into a faulty state where it cannot read system interrupts and has been isolated by the Dolphin IRM driver. you can restart the network manager to terminate this connection.1. Dolphin Network Manager cannot reach the node using TCP/IP.1. or choose the Connect button after the startup is complete. Startup Check out sciadmin -h for startup options. sciadmin Reference 1.

UP.2. Typically. • Check Interval SCIAdmin shows the number of seconds between each time the Network Manager sends updates to the Dolphin Admin GUI. Operation 2. Yellow pencil strokes indicate that links have been disabled. Red pencil strokes indicate that a link is broken. a cable is unplugged. Off or default). This may be done if you want to debug the cluster. Link State Green pencil strokes indicates that the links are up.1. FAILED or UNSTABLE (see below). or not seated well. 64 . A red dot (cranberry) indicates that the node has lost connectivity to other nodes in the cluster. 2. • Topology shows the current topology of the fabric. • Check Interval Network Manager shows the number of seconds between each time the Network Manager receives updates from the Node Managers. Links are typically disabled when there are broken cables somewhere else in the ringlet and automatic fail over recovery has been enabled. Blue pencil strokes indicate that an administrator has chosen to disable this link in sciadmin. • Fabric status shows the current status of the fabric.2. DEGRADED. A number of settings can be changed in the Cluster Settings dialog that is shown when pressing the Settings button. • Auto Rerouting shows the current automatic fail over recovery settings (On.2. Cluster Status The area at the top right informs about the current cluster status and shows settings of sciadmin and the connected network manager.sciadmin Reference Table B. Fabric is UP when all nodes are operational and all links ok and therefore plotted in green.

To get more information on the interconnect status for node tiger-1. Fabric is UP Fabric is DEGRADED when all nodes are operational. some links are broken. The other links on this ring have become disabled by the network manager and therefore plotted in yellow. In the snapshot below. but we still have full connectivity. 65 .sciadmin Reference Figure B. the input cable of link 0 of node tiger 1. which is the output cable at node tiger-3. is defunct (this typically means unplugged) and therefore the link is plotted in red.1. All other links are functional and plotted in green. get its diagnostics via Node Diag -V 1.

sciadmin Reference Figure B. the input cable of link 0 and the output cable of link 1 are defunct. and not tiger-7 as it would be the case for non-interleaved cabling. In this case. and SuperSockets-driven sockets will have fallen back to Ethernet. Node tiger-1 can not communicate via SCI in this situation. the link 1 output cable of tiger 1 is the link 1 input cable of tiger-4.2. 66 . Fabric is DEGRADED Fabric is in status FAILED if several links are broken in a way that breaks the full connectivity. Because the cluster is cabled in an interleaved pattern.

1. Please check the node. Fabric has FAILED due to loss of connectivity The fabric status is also set to FAILED if one or more nodes are dead as this node can not be reached via SCI. • The IRM low-level driver is not running. The reason for a node being dead can be • Node is not powered up. and also consider the related topic in the FAQ (Q: 1. Solution: start the node manager like # service dis_nodemgr start • The adapter is in an invalid state or is missing. 67 . Solution: reboot the node.1). Solution: power up the node. • Node has crashed.3. Solution: start the IRM driver like # service dis_irm start • The node manager is not running.sciadmin Reference Figure B.

disabled (either by the network manager or manually) or up.sciadmin Reference Figure B.1. Admin Menu The items in the Admin menu specifies information that are relevant for the Dolphin Admin GUI 68 . • Adapter Type: The Dolphin part number of the adapter • Adapter number: The number of the adapter selected • SCI Link 0: current status of link 0 (enabled or disabled) • SCI Link 1: current status of link 1 (enabled or disabled) 3. Fabric has FAILED due to dead nodes 2.4. When selecting a node you will see details in the Node Status area: • Serial number. A unique serial number given in production.2. Node Status The status of a node icon tells if a node is up/dead or if a link is broken.2. Node and Interconnect Control 3.

69 .5. • Refresh Status of the node and interconnect (instead of waiting for the update interval to expire). Diag (-V 1).sciadmin Reference Figure B. • Diag (-V 0) prints only errors that have been found. Options in the Admin menu • Connect to the network manager running on the local or a remote machine. Figure B. • Diag (-V 1) prints more verbose status information (verbosity level 1). • Diag (-V 9) prints the full diagnostic information including all error counters (verbosity level 9). Cluster Menu The commands in the cluster menu are executed on all nodes in parallel and the results are displayed by sciadmin. the Diag (-V 0).2. Diag (-V 9) are diagnostics functions that can be used to get more detailed information about a fabric that shows problem symptoms. Within this sub-menu. • Disconnect from the network manager.6. Options in the Cluster menu Each fabric in the cluster has a sub-menu Fabric <X>. 3. When choosing one of the fabric options the command will be executed on all nodes in that fabric. • Switch to Debug Statistics View will show the value of selected counters of each adapter instead of the node icons which is useful for debugging fabric problems.

2. This helps to observe if error counters are changing.1. “Adapter Settings”. Node Menu The options in the Node menu are identical to the options in the Cluster and Cluster Fabrics <X> menu. • Toggle Network Manager Verbose Settings to increase/decrease the amount of logging from the Dolphin Network Manager • Select the Arrange Fabrics option to make sure that the different adapters in your hosts are connected to the same fabric.5.3. 70 . “Cable Test”t 3. Figure B. • Power down cluster nodes powers down all cluster nodes after a confirmation. • Test Cable Connections is described in Section 4.4. • Diag -prod prints production information about the Dolphin Express interconnect adapters (serial number. This option is only displayed for clusters with more than one fabric. “Traffic Test” The other commands in the Cluster menu are: • Settings displays the Cluster Settings dialog (see below).sciadmin Reference • Diag -clear clears all the error counters in the Dolphin Express interconnect adapters. • Reboot cluster nodes reboot all cluster nodes after a confirmation.7. Cluster Settings The Dolphin Interconnect Manager provides you with several options on how to run the cluster. The only additional option is Settings that is described in the Section 3. Options in the Node menu 3. only that commands are executed on the selected node only. card type. firmware revision etc) • The Test option is described in Section 4.

Changes to the topology setting can be performed with dishostseditor. 71 . Cluster configuration in sciadmin • Check Interval Admin alters the number of seconds between each time the Network Manager sends updates to the SCIAdmin GUI.Z dimension shows how the interconnect is currently dimensioned. • Wait before removing session defines the number of seconds to wait until removing sessions to a node that has died or became inaccessible by other means.Y. the latter also means that no automatic rerouting will take place. or use the default routing tables in the driver (Default). • Topology lets you select the topology of the cluster.sciadmin Reference Figure B. while Topology found displays the auto-determined topology.8. choose to freeze the routing to a current state (Off). Changes to the dimension settings can be performed with dishostseditor. • Remove Session to dead nodes lets you decide whether to remove the session to nodes that are unavailable. • Nodes in X. • Alert script lets you choose to enable/disable the use of a script that may alert the cluster status to an administrator. • Check Interval Network Manager alters the number of seconds between each time the Network Manager receives updates from the Node Managers. • Auto Rerouting lets you decide to enable automatic fail over recovery (On). • Automatic Create Sessions to new nodes lets you decide if the Network Manager shall create sessions to all available nodes.

A changed value will not become effective until the IRM driver is restarted on the node. The disable link option can also be used as a temporary measure to disable an unstable adapter or ringlet so that it does not impose unnecessary noise on the adapters.5. Advanced settings for a node • Link Frequency sets the frequency of a link. • SCI LINK 0 / 1 / 2 allows to set the way a link is controlled: • Automatic lets the network manager control the link to enable and disable it as required by the link and the interconnect status. A manually disabled link is marked blue in the sciadmin interconnect display. 72 . Setting this value too high (> 512MB) can cause problem with some machines. as shown in the screenshot below.sciadmin Reference 3.9. This is a per-session setting (the link will be under control of the network manager if it is restarted). please contact Dolphin support. • Prefetch Memsize shows the maximum amount of remote memory that can be accessed by this node. and only required as a temporary measure for trouble shooting. • Disabled forces a link down. If such an unlikely event occurs. It is not recommended to change the default setting. Adapter Settings The Advanced Settings button in the node menu allows you to retrieve more detailed information about an adapter and to disable/enable links of this adapter. especially for 32bit platforms. which has to be done outside of sciadmin. Figure B.

). This allows you to copy or print the test result to fix the described problems right at the cluster.sciadmin Reference Warning Please note that when Auto Rerouting is enabled (default setting). but also every time you worked on the cabling. which means it servers to ensure that the physical cabling matches the interconnect description in the dishosts. Link disabled by administrator (Disabling the links on the machine with hostname tiger-5 takes down the corresponding links on the other machines that share the same ringlet.conf configuration file. you can verify that the cables are connecting the right nodes via the right ports. Figure B. disabling a link within a ringlet will disable the complete ringlet. It will only take a few seconds to complete and display its results in an editor. 4. This test is very useful after a fresh installation.1. Cable Test Test Cable Connections tests the cluster for faulty cabling by reading serial numbers and adapter numbers from the other nodes on individual rings only. Disabling to many links can thus isolate nodes from access to the Dolphin Express interconnect.10. 73 . Interconnect Testing & Diagnosis 4. Using this test.

11. Although this will not introduce any communication errors except the delay.sciadmin Reference Warning Please note that while this test is running. Result of running cable test on a good cluster Figure B. Figure B.12. it therefore is recommended to run the test on an idle cluster. all traffic over the Dolphin Express interconnect will be blocked. Result of cable test on a problematic cluster 74 . SuperSockets will fall back to Ethernet while this test is running.

This is the case if the installation was performed via SIA.sciadmin Reference 4. but the performance will decrease to some degree. A fabric should show no errors on this test.7. Note To perform this test. Warning Please note that while this test is running. an error will be logged and displayed as shown below. it therefore is recommended to run the test on an idle cluster.5. It will search for bad connections by imposing the maximum amount of traffic on individual rings and observe the internal error counters of all adapters involved. If SISCI is not installed on a node. Traffic Test The Test option for each fabric of a cluster verifies the connection quality of the links that make up the fabric. If errors are displayed. Such CRC errors do not cause data corruption as the corrupted packet will be detected and retransmitted. 75 .2. A typical problem that can be found with this test are not well-fitted cables as this will cause CRC errors on the related link. Although this will not introduce any communication errors except the delay. the SISCI RPM has to be installed on all nodes. For more information. SuperSockets will fall back to Ethernet while this test is running. all traffic over the Dolphin Express interconnect will be blocked. “Fabric Quality Test”. the cable connections between the affected nodes should be verified. please see Section 4.

sciadmin Reference Figure B. Result of fabric test without installing all the necessary rpms 76 .13.

Result of fabric test on a proper fabric 77 .sciadmin Reference Figure B.14.

conf for the interconnect topology and the SuperSockets acceleration of existing Ethernet networks. dis_ssocks_cfg needs to be run on every node as SuperSockets are not controlled by the network manager. All nodes specified by a HOSTNAME need at least one physical adapter. DISHOSTVERSION 1 or higher is required for running with multiple adapter cards and transparent fail over etc. This 78 . dishosts.conf file. do edit and maintain this file on the frontend only. Therefore. Information about a node's physical adapters is listed right below the hostname. Dolphin network manager and diagnostic tools will assume that components are mis configured. or the nodes IP-address.conf is used as a specification of the Dolphin Express interconnect (in a way just like /etc/ hosts specifies nodes on a plain IP based network).conf is by default automatically distributed to all nodes in the cluster when the Dolphin network man- agement software is started.1.conf can be done with the tool testdishosts. A syntactical and semantic validation of dishosts.1. DISHOSTVERSION 2 provides support for dynamic IP-to-nodeId mappings (sometimes also refered to as virtual IPs). Using this tool vastly reduces the risk of creating an incorrect configuration file. 1.conf. HOSTNAME: host1.dolphinics. The Dolphin network manager and diagnostic tools will always assume that the current file dishosts.conf by using the dishostseditor GUI (Unix: /opt/DIS/sbin/dishostseditor). To make changes in dishosts. and another file networkmanager. The following sections describe the keywords used.3.conf is specified after the keyword DISHOSTVERSION. the network manager needs to be restarted like # service dis_networkmgr restart In case that SuperSockets settings have been changed. 1. If dynamic information read from the network contradicts the information read in the dishosts. DISHOSTVERSION 0 designates a very simple configuration (see Section 1.1. Configuration Files 1. Each cluster node is assigned a unique dishostname. dishosts. “Miscellaneous Notes”).21 HOSTNAME: <hostname/IP> ADAPTER: <physical adaptername> <nodeId> <adapterNo> A Dolphin network node may hold several physical adapters.Appendix C. faulty or removed for repair.165. Normally. The DISHOSTVERSION should be put on the first line of the dishosts file that is not a comment.no HOSTNAME: 193. It is a system wide configuration file and should be located with its full path on all nodes at /etc/dis/dishosts. DISHOSTVERSION > 0 maps hosts/IPs to adapters.conf The file dishosts.1. which has to be be equal to its hostname.conf that contains the basic options for the mandatory Network Manager.conf is valid.69.conf effective. Basic settings DISHOSTVERSION [ 0 | 1 | 2 ] The version number of the dishosts. Templates of this file can be found in /opt/DIS/ etc/dis/. Both of these files should be created using the GUI tool dishostseditor. You should create and maintain dishosts. there is no reason to edit this file manually. which in turn are mapped to nodeids and physical adapter numbers by means of the ADAPTER entries. Cluster Configuration A cluster with Dolphin Express interconnect requires one combined configuration file dishosts. The hostname is typically the network name (as specified in /etc/hosts).

Configuration Files

physical adapter has to be specified on the next line after the HOSTNAME. The physical adapters are associated with the keyword ADAPTER.
#Keyword name ADAPTER: host1_a0 ADAPTER: host1_a1 nodeid 4 4 adapter 0 1

STRIPE: <virtual adaptername> <physical adaptername 1> <physical adaptername 2>

Defines a virtual striping adapter comprising two physical adapters, which will be used for automatic data striping (also referred to as channel bonding). Striping adapters will also be used as redundant adapters in the case of network failure.
STRIPE: host1_s host1_a0 host1_a1

REDUNDANT: <virtual adaptername> <physical adaptername 1> <physical adaptername 2>

Defines a virtual redundant adapter comprising two physical adapters, which will be used for automatic fail over in case one of the fabrics fails.
REDUNDANT: host1_r host1_a0 host1_a1

1.1.2. SuperSockets settings
The SuperSockets configuration is responsible for mapping certain IP addresses to Dolphin Express adapters. It defines which network interfaces are enabled for Dolphin SuperSockets. SOCKET: <host/IP> <physical or virtual adptername> Enables the given <host/IP> for SuperSockets using the specified adapter. In the following example we assume host1 and host2 have two network interfaces each designated host1pub, host1prv, host2pub, host2prv, but only the 'private' interfaces hostXprv are enabled for SuperSockets using a striping adapter:
SOCKET: SOCKET: host1prv host2prv host1_s host2_s

Starting with DISHOSTVERSION 2 SuperSockets can handle dynamic IP-to-nodeId mappings. I.e. a certain IP address does not need to be bound to a fixed machine but can roam in a pool of machines. The address resolution is done at runtime. For such a configuration a new type of adapter must be specified: SOCKETADAPTER: <socket adaptername> [ SINGLE | STRIPE | REDUNDANT ] <adapterNo> [ <adapterNo> ... ] This keyword basically only defines an adapter number, which is not associated to any nodeId. Example:
SOCKETADAPTER: sockad_s STRIPE 0 1

Defines socket adapter "sockad_s" in striping mode using physical adapters 0 and 1. The resulting internal adapter number is 0x2003. Such socket adapters can now be used in order to define dynamic mappings, and, in extension to DISHOSTVERSION 1 whole networks can be specified for dynamic mappings: SOCKET: [ <ip> | <hostname> | <network/mask_bits> ] <socket adapter> Enables the given address/network for SuperSockets and associates it with a socket adapter. It is possible to mix dynamic and static mappings, but there must be no conflicting entries. Example:
SOCKET: SOCKET: SOCKET: SOCKET: host1 sockad_s host2 sockad_s host3 host3_s 192.168.10.0/24 sockad_s

79

Configuration Files

1.1.3. Miscellaneous Notes
• Using multiple nodeids per node is supported.This can be used for some advanced high-availability switchconfigurations. • A short version of dishosts.conf is supported for compatibility reasons, which corresponds to DISHOSTVERSION 0. Please note that neither virtual adapters nor dynamic socket mappings are supported by this format. Example:
#host/IP host1prv host2prv nodeid 4 8

1.2. networkmanager.conf
The networkmanager.conf specifies the startup parameters for the Dolphin Network Manager. It is created by the dishostseditor.

1.3. cluster.conf
This file must not be edited by the user. It is a configuration file of the Network Manager that consists of the user-specified settings from networkmanager.conf and derived settings of the cluster (nodes). It is created by the Network Manager.

2. SuperSockets Configuration
The following sections describe the configuration files that specifically control the behaviour of Dolphin SuperSockets™. Next to these files, SuperSockets retrieve important configuration information from dishosts.conf as well. To make changes in any of these file effective, you need to run dis_ssocks_cfg on every node. Changes do not apply to sockets that are already open.

2.1. supersockets_profiles.conf
This file defines system-wide settings for all SuperSockets™ applications using LD_PRELOAD. All settings can be overridden by environment variables named SSOCKS_<option> (like export SSOCKS_DISABLE_FALLBACK=1). SYSTEM_POLL [ 0 | 1 ] Usage of poll/select optimization. Default is 0 which means that the SuperSockets optimization for the poll() and select() system calls is used. This optimization typically reduces the latency without increasing the CPU load. To only use the native system methods for poll() and select(), set this value to 1. Receive poll time [µs]. Default is 30. Increasing this value may reduce the latency as the CPU will spin longer to wait for new data until it blocks sleeping. Reducing this value will send the CPU to sleep earlier, but this may increase message latency. Transmit poll time [µs]. Default is 0, which means that the CPU does not spin at all when a no buffers at the receiving side are available. Instead, it will imeadeatly block until the receiver reads data from these buffers (which makes buffer space available again for sending). The situation of no available receive buffers does rarely occur, and increasing this value is not recommended. Message buffer size [byte]. Default is 128KB. This value determines how much data can be sent without the receiver reading it. It has no significant impact on bandwidth.

RX_POLL_TIME <int>

TX_POLL_TIME <int>

MSQ_BUF_SIZE <int>

80

Configuration Files

MIN_DMA_SIZE <int> MAX_DMA_GATHER <int> MIN_SHORT_SIZE <int> MIN_LONG_SIZE <int> FAST_GTOD [ 0 | 1 ] DISABLE_FALLBACK [ 0 | 1 ]

Minimum message size for DMA [byte]. Default is 0 (DMA disabled). Maximum number of messages gather into a single DMA transfer. Default is 1. Switch point [byte] from INLINE to SHORT protocol. Default depends on driver. Switch point [byte] from SHORT to LONG protocol. Default depends on driver. Usage of accelerated gettimeofday(). Default is 0 which disables this optimization. Set to 1 to enable it. Control fallback from SuperSockets™ to native sockets. Default is 0, which means fallback (and fallforward) is enabled. To ensure that only SuperSockets™ are used (i.e. for benchmarking), set it to 1. Usage of fully asynchronous transfers. Default is 1, which means that the SHORT and LONG protocol is processed by a dedicated kernel thread. By this, the sending process is available immedeatly, and the actual data transfer is performed asynchronously. This generally increases throughput and reduces CPU load with affecting small message latency. To disable asynchronous transfers, set this option to 0; in this case, all data transfers are performed by the CPU that runs the process that called the send function.

ASYNC_PIO [ 0 | 1 ]

2.2. supersockets_ports.conf
This file is used to configure the port filter for SuperSockets. If no such file exists all ports will be enabled by default. It is, however, recommended to exclude all system ports. A suitable port configuration file is part of the SuperSockets software package. You can adjust it to your specific needs.
# Default port configuration for Dolphin SuperSockets # Ports specifically enabled or disabled to run over SuperSockets. # Any socket not specifically covered, is handled by the default: EnablePortsByDefault yes # Recommended settings: # Disable the privileged ports used by system services. DisablePortRange tcp 1 1023 DisablePortRange udp 1 1023 # Disable Dolphin Interconnect Manager service ports. DisablePortRange tcp 3443 3445

The following keywords are valid: EnablePortsByDefault [ yes | no ] DisablePortRange [ tcp | udp ] <from> <to> EnablePortRange [ tcp | udp ] <from> <to> Determines the policy for unspecified ports. Explicitlely disables the given port range for the given socket type. Explicitlely enables the given port range for the given socket type.

3. Driver Configuration
The Dolphin drivers are designed to adapt to the environment they are operating in; therefore, manual configuration is rarely required. The upper limit for memory allocation of the low-level driver is the only setting that may need to be adapted for a cluster, but this is also done automatically during the installation.

81

Only few options are to be modified by the user.conf as those described below may cause the interconnect to malfunction. 2 per SuperSocket connection) integers > 0 The upper limit is the consumed memory. the driver needs to be reloaded to make the new settings effective.1. Option Name Description Unit Valid Values Default Value number-of-megabytes-preallocated Defines the number of MiB MiB memory the IRM shall try to allocate upon initialization.1. Option Name Description Unit Valid Values integers > 0 4 Default Value dis_max_segment_size_megabytes Sets the maximum size MiB of a memory segment that can be allocated for remote access. values > 16384 are typically not necessary. Whenever a setting in this file is changed.e.script) sible 0: disable sub-pools 1: enable sub-pools 1 82 . 3.Configuration Files Warning Changing parameters in these files can affect reliability and performance of the Dolphin Express™ interconnect.1. options has been added to let the IRM driver allocate blocks of memory upon initialization and to provide memory from this pool under certain conditions for allocation of remotely accessible memory segments. The problem is the memory fragmentation over time which can cause problems to allocate large segments of contiguous physical memory after the system has been running for some time. 1024 3. i. Resource Limitations These parameters control memory allocations that are only performed on driver initialization. Some systems may lock up if too much memory is requested.. Only do so if instructed by Dolphin support.conf Warning Changing other values in dis_irm. max-vc-number Maximum number of n/a virtual channels (one virtual channel is needed per remote memory connection.conf is located in the lib/modules directory of the DIS installation (default /opt/DIS) and contains options for the hardware driver (dis_irm kernel module).1. use-sub-pools-for-preallocation If the IRM fails to n/a allocate the amount memory specified by number-of- 0: disable preallocation 16 (may be increased by >0: MiB to preallocate the installer in as few blocks as pos. To overcome this situation. Memory Preallocation Preallocation of memory is recommended on systems without IOMMU (like x86 and x86_64). dis_irm. These options deal with the memory pre-allocation in the driver: dis_irm. 3.2.

The IRM will allways request the system for additional memory than resolving memory request less than this size.Configuration Files Option Name Description megabytes-preallocated it will by default repetively decrease the amount and retry until success. number-of-preallocated-blocks The number of block to n/a be preallocatd (see previous parameter) This sets a lower lim.Due to the aspect of the need of the preallocation mechanism. By enabling use-sub-poolsfor-preallocation the IRM will continue allocate memory (possibly in small chunks). this parameter has to be set to a value > 0. on- 83 .KiB it on the size of memory segments the IRM may # try to allocate from the preallocated pool. 0: don't preallocate 0 memory in this manner > 0: size in bytes (will be aligned upwards to page size boundary) of each memory block. Directs the IRM when n/a to try to use memo- 0: don't preallocate 0 memory in this manner > 0: number of blocks minimum-size-to-allocate-frompreallocated-pool 0: always allocate from 0 pre-allocated memory > 0: try to allocate memory that is smaller than this value from non-preallocatd memory try-first-to-allocate-from-preallocated-pool 0: The preallocated 1 memory pool becomes a backup solution. until the amount specified by number-ofmegabytes-preallocated is reached. but multiple blocks of the same size. there is a "hard" lower limit of one SCI_PAGE (currently 8K). Unit Valid Values Default Value block-size-of-preallocated-blocks To allocate not a single bytes large block. The mininum size is defined in 1K blocks. Pre-allocating memory this way is useful if the application to be run on the cluster uses many memory segments of the same (relatively small) size.

edit and uncomment the appropriate line. which means that the CPU does not spin at all when a no buffers at the receiving side are available. Increasing this value may reduce the latency as the CPU will spin longer to wait for new data until it blocks sleeping. but this may increase message latency. If a value different from the default is required. Control printing of n/a driver messages to the system console Unit Valid Values 0: no link messages 1: show link messages 0: show notice mes. Reducing this value will send the CPU to sleep earlier.1. The following keywords are valid: tx_poll_time Transmit poll time [µs]. and increasing this value is not recommended. The situation of no available receive buffers does rarely occur. Default is 30. Control logging of gen. #address_family=27. #min_short_size=1009.conf Configuration file for SuperSockets™ (dis_ssocks) kernel module.3. it will imeadeatly block until the receiver reads data from these buffers (which makes buffer space available again for sending). #rx_poll_time=30.0 sages 1: no warning messages dis_report_resource_outtages 0: no messages 1: show messages 0: also print to console 0 1: only print to syslog 0 0 Default Value notes-disabled notes-on-log-file-only 3. Control logging of non n/a critical notices during operation. #rds_compat=0. #tx_poll_time=0. Control logging of out. dis_ssocks. Receive poll time [µs]. 1:IRM to prefers to allocate memory from the preallocated pool when possible.n/a eral warnings during operation. Unit Valid Values ly to be used when the system can't honor a request for additional memory. #min_long_size=8192. Default Value 3.0 sages 1: no notice messages warn-disabled 0: show warning mes. Default is 0. Logging and Messages Option Name link-messages-enabled Description Control logging of non n/a critical link messages during operation. rx_poll_time 84 .Configuration Files Option Name Description ry from the preallocated pool.n/a of-resource messages during operation.2. Instead. #min_dma_size=0.

If this value is set explicitly. the driver will automatically chose another index between 27 and 32 until it finds an unused index. Minimum message size for using SHORT protocol. Default is 0. Default is 0. this value will be chosen.Configuration Files min_dma_size min_short_size min_long_size address_family Minimum message size for using DMA (0 means no DMA). If not set. Default is 8192. this value is only required if SuperSockets should be used explictely without the preload library. and no search for unused values is performed. 85 . AF_SCI address family index. The chosen index can be retrieved via the /proc file system like cat /proc/net/af_sci/family . Generally. Default and maximum is 1009. Default value is 27. Minimum message size for using LONG protocol. rds_compat RDS compatibility level.

This avoids costly endian conversions and works fine in typical clusters where all nodes use the same CPU architecture. it is possible to increase the Maximum Number of Nodes 3. However. SuperSockets UDP Broadcasts Heterogeneous Cluster Operation (Endianess) SuperSockets do not support UDP broadcast operations. limiting the SuperSockets performance. If you plan to use such a platform. Only if you mix nodes with Intel or AMD (x86 or x86_64) CPUs i. The following socket options are supported by SuperSockets™ for communication over Dolphin Express: • SO_DONTROUTE (implicit. please contact Dolphin support in advance. if you hit a resource limit. limitations on how this chipset can be used with SCI have to be considered. Please contact Dolphin support if this situation applies to you. this default setting won't work. We recommend to use x86_64 platforms instead. 2. the vector length is not limited.Appendix D. By default. Platform Issues and Software Limitations This chapter lists known issues of Dolphin Express with certain hardware platforms and limitations of the software stack.e. IRM Resource Limitations The IRM (Interconnect Resource Manager) manages the hardware and related software resources of the Dolphin Express interconnect. In this case. an internal flag needs to be set. Some resources are allocated once when the IRM is loaded.or Sparc-based CPUs. with nodes using PowerPC. SuperSockets are configured to operate in clusters where all nodes use the same endian representation (either little endian or big endian). as SuperSockets don't use IP packets for data transport and thus are never routable) • TCP_NODELAY 86 Socket Options . The default setting are sufficient for typical cluster sizes and usage scenarios. For readv() and recvmsg(). Platforms with Known Problems Intel Chipset 5000V Due to a PCI-related bug in the chipset Intel 5000V. Sending and Receiving Vectors The vector length for the writev() and sendmsg() functions is limited to 16. one example of such a configuration is the Motherboard SuperMicro X7DVA-8 when the onboard SCSI HBA is used and the SCI adapter is placed in a specific slot. 1. This does only apply to very specific configurtations. Some of these limitations can be overcome by changing default settings of runtime parameters to match your requirements. Intel IA-64 The write performance to remote memory within the kernel is low on IA-64 machines.

Removal of this limitation as well as a simple way to diagnose the precise state of a SuperSockets-driven socket is scheduled for updated versions of SuperSockets. Therefore. you can either contact Dolphin support to help you diagnose the problem. To resolve such limitations if they occur (i. Resource Limitations SuperSockets allocate resources for the communication via Dolphin Express by means of the IRM. SuperSockets logs messages to the syslog for two typical out-of-resources situations: • No more VCs available. “IRM”). Fallforward for Stream Sockets While SuperSockets offer fully transparent fall-back and fall-forward between Dolphin Express-based communication and native (Ethernet) communication for any socket (TCP or UDP) while it is open and used. “IRM”). please refer to the relevant IRM section above.e. the resource limitations listed for the IRM indirectly apply to SuperSockets as well. 87 . Instead. This is a rare condition that typically will not affect operation. when using very large number of sockets per node). there is currently a limitation on sockets when they connect: a socket that has been created via SuperSockets and connected to a remote socket while the Dolphin Express interconnect was not operational will not fall forward to Dolphin Express communication when the interconnect comes up again. The maximum number of virtual channels needs to be increased (see Section 2. The amount of pre-allocated memory needs to be increased (see Section 2. it will continue to communicate via the native network (Ethernet). or restart the application making sure that the Dolphin Express interconnect is up. • No more segment memory available. • All other socket options are not supported (ignored). If you suspect that one node performs not up to the expectations.Platform Issues and Software Limitations • SO_REUSEADDR • SO_TYPE The following socket options are passed to the native (fallback) socket: • SO_SENDBUF and SO_RECVBUF (the buffer size for the SuperSockets is fixed).

Sign up to vote on this title
UsefulNot useful