Professional Documents
Culture Documents
Related Links...........................................................................................................................35
Introduction
High-performance computing is now within reach for many businesses by clustering industry-
standard servers. These clusters can range from a few nodes to hundreds of nodes. In the
past, wiring, provisioning, configuring, monitoring, and managing these nodes and providing
appropriate, secure user access was a complex undertaking, often requiring dedicated
support and administration resources. However, Microsoft® Windows® Compute Cluster
Server 2003 simplifies installation, configuration, and management, reducing the cost of
compute clusters and making them accessible to a broader audience.
Windows Compute Cluster Server 2003 is a high-performance computing solution that uses
clustered commodity x64 servers that are built with a combination of the Microsoft Windows
Server® 2003 Compute Cluster Edition operating system and the Microsoft Compute Cluster
Pack. The base operating system incorporates traditional Windows system management
features for remote deployment and cluster management. The Compute Cluster Pack
contains the services, interfaces, and supporting software needed to create and configure the
cluster nodes, as well as the utilities and management infrastructure. Individuals tasked with
Windows Compute Cluster Server 2003 administration and management have the advantage
of working within a familiar Windows environment, which helps enable users to quickly and
easily adapt to the management interface.
Windows Compute Cluster Server 2003 is a significant step forward in reducing the barriers to
deployment for organizations and individuals who want to take advantage of the power of a
compute clustering solution.
- Integrated software stack. Windows Compute Cluster Server 2003 provides an
integrated software stack that includes operating system, job scheduler, message passing
interface (MPI) layer, and the leading applications for each target vertical.
- Better integration with IT infrastructure. Windows Compute Cluster Server 2003
integrates seamlessly with your current network infrastructure (for example, Active
Directory®), enabling you to leverage existing organizational skills and technology.
- Familiar development environment. Developers can leverage existing Windows-based
skills and experience to develop applications for Windows Compute Cluster Server 2003.
Microsoft Visual Studio® is the most widely used integrated development environment
(IDE) in the industry, and Visual Studio 2005 includes support for developing HPC
applications, such as parallel compiling and debugging. Third-party hardware and
software vendors provide additional compiler and math library options for developers
seeking an optimized solution for existing hardware. Windows Compute Cluster Server
2003 supports the use of MPI with Microsoft’s MPI stack, or the use of stacks from other
vendors.
This step-by-step guide is based on the highly successful cluster deployment at National
Center for Supercomputing Applications (NCSA) at the University of Illinois at Champaign-
Urbana. The cluster was built as a joint effort between NCSA and Microsoft, using commonly
available hardware and Microsoft software. The cluster was composed of 450 x64 servers,
achieving 4.1 teraflops (TFLOPs) on 896 processors using the widely accepted LINPACK
benchmark. Figure 1 shows the cluster topology used for the NCSA deployment, including the
public, private, and MPI networks.
• Windows Server2003 R2
Standard x64 Edition
• Microsoft Compute Cluster
Pack
• SQL Server2005 Standard
Public Network Head Node Edition x64
(GigE)
• Windows Server2003
Compute Cluster Edition
• Microsoft Compute
Cluster Pack
• All other required apps
and drivers installed
Internet
• Windows Server2003
Enterprise Edition 8x6
• Automated
Deployment Services
• Compute node images
• DNS & DHCP
• Active Directory
Service Node
(ADS Server)
•
Windows Server2003
Compute Cluster
Edition
• Compute
Figure 1 Supported cluster topology similar to NCSA-deployment Cluster Pack
topology
MPI Network(IB)
Although every IT environment is different, Compute
this guide can serve as a basis for setting up your
Nodes
large-scale compute cluster. If you need additional guidance, see the Related Links section at
the end of this guide for more resources.
Note
The intended audience for this document is network administrators who have at least two
years’ experience with network infrastructure, management, and configuration. The example
deployment outlined in this document is targeted at clusters in excess of 100 nodes. Although
the steps discussed here will work for smaller clusters, they represent steps modeled on large
deployments for enterprise-scale and research-scale clusters.
Note
This is Version 1 of this document. To download the latest updated version, visit the Microsoft
Web site (http://www.microsoft.com/hpc/). The update may contain critical information that
was not available when this document was published.
Additional node types that can be used but are not required are remote administration nodes
and application development nodes. For an overview of device roles in the cluster, see the
Windows Compute Cluster Server 2003 Reviewers Guide
(http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx).
Your cluster also depends on the number and types of networks used to connect the nodes.
The Reviewer’s Guide discusses the topologies that you can use to connect your nodes, by
using combinations of private and public adapters for message passing between the nodes
and system traffic among all of the nodes. For the cluster detailed in this guide, the head node
and service node have public and private adapters for system traffic, and the compute nodes
have private and message passing interface (MPI) adapters. (Note: This is not a supported
topology but is very similar to one that is.) Consult the Reviewer’s Guide for the advantages of
each network topology.
Lastly, you should consider the level of cluster expertise, networking knowledge, and amount
of management time available on your staff to dedicate to your cluster. Although deployment
and management is simplified with Windows Compute Cluster Server 2003, keep in mind that
no matter what the circumstances, a large-scale compute cluster deployment should not be
taken lightly. It is important to understand how management and deployment work when
Note
The head node and the network services node each use two Gigabit Ethernet network
adapters; both the compute nodes and the head nodes use the private MPI network, though
the head node’s MPI interface was disabled for this specific deployment. Also, the service
node requires a 32-bit operating system, since ADS will only work with 32-bit, but you can run
the operating system on 32-bit or 64-bit hardware. (This is a custom configuration used on the
QFE KB910481
QFE KB914784
Note
Every item in Table 4 must have an entry or the input file will not work properly. If you do not
have a value for a field, use a hyphen ‘-‘ for the field instead.
Latest network adapter drivers: Contact the manufacturer of your network adapters for the
most recent drivers. You will need to install these drivers on your cluster nodes.
The service node provides all the back-end network services for the cluster, including
authentication, name services, and image deployment. It uses standard Windows technology
and services to manage your network infrastructure. The service node has two Gigabit
Ethernet network adapters and no MPI adapters. One adapter connects to the public network;
the other connects to the private network dedicated to the cluster.
There are five tasks that are required for installation and configuration:
1. Install and configure the base operating system.
2. Install Active Directory, Domain Name Services (DNS), and DHCP.
3. Configure DNS.
4. Configure DHCP.
5. Enable Remote Desktop for the cluster.
Install and configure the base operating system. Follow the normal setup procedure for
Windows Server 2003 R2 Enterprise Edition, with the exceptions as noted in the following
procedure.
Install Active Directory, DNS, and DHCP. Windows Server 2003 provides a wizard to
configure your server as a typical first server in a domain. The wizard configures your
server as a root domain controller, installs and configures DNS, and then installs and
configures DHCP.
6. On the Network Boot Service Settings page, make sure that Use this path is selected.
Insert the Windows Server 2003 R2 Enterprise Edition x86 CD into the drive. Browse to
the CD drive, or type the drive containing the CD, and then click Next.
7. On the Windows PE Repository page, select Location of Windows PE. Browse to the
folder containing the WinPE binaries (for example, C:\WinPE). In the Repository name
text box, type a name for your repository (for example, NodeImages). Click Next.
8. On the Image Location page, type the path to the folder where the images will be stored.
These must be on the second partition that you created on your server (for example,
E:\Images). The folder will be created and shared automatically. Click Next.
9. If ADS Setup Wizard detects more than one network adapter in your computer, the
Network Settings for ADS Services page is displayed. In the Bind to this IP address
drop-down list, select the IP address that the ADS services will use to distribute images
on the private network, and then click Next.
10. On the Installation Confirmation page, click Install.
11. On the Completing the Automated Deployment Services Setup Wizard page, click
Finish. Close the Automated Deployment Services Welcome dialog box.
12. To open the ADS Management console, click Start, click All Programs, click Microsoft
ADS, and then click ADS Management.
13. Expand the Automated Deployment Services node, and then select Services. In the
center pane, right-click Controller Services, and then click Properties. On the Controller
Service Properties page, select the Service tab, and then change Global job template to
boot-to-winpe. For the Device Identifier, select MAC Address. For the WinPE Repository
Name, type NodeImages or the repository name that you created earlier. Click Apply,
and then click OK.
14. In the ADS Management console, right-click Image Distribution Service, and then click
Properties. Select the Service tab, and ensure that Multicast image deployment is
selected. Click OK.
Share the ADS certificate. ADS creates a computer certificate when it is installed. This
certificate is used to identify all computers in the cluster. The certificate must be shared so
that the compute node image can import the certificate and then use it during the
configuration process.
Import ADS templates. ADS includes several templates that are useful when managing your
nodes, including reboot-to-winpe and reboot-to-hd. The templates are not installed by default;
you must add them to ADS using a batch file. You also need to add the compute cluster
templates to ADS so that you can capture and deploy the compute node image on your
network.
Add devices to ADS. Follow the normal setup procedure for Windows Server 2003 R2
Enterprise Edition, with the exceptions noted later.
If your company uses a proxy server to connect to the Internet, you should configure your
server so that it can receive system and application updates from Microsoft.
1. To configure your proxy server settings, open Internet Explorer®. Click Tools, and
then click Internet Options.
2. Click the Connections tab, and then click LAN Settings.
3. On the Local Area Network (LAN) Settings page, select Use a proxy server for
your LAN. Enter the URL or IP address for your proxy server.
4. If you need to configure secure HTTP settings, click Advanced, and then enter the
URL and port information as needed.
5. Click OK three times, and then close Internet Explorer.
When you have finished configuring your server, click Start, click All Programs, and then
click Windows Update. This will ensure that your server is up-to-date with service packs and
software updates that may be needed to improve performance and security.
There are two tasks that are required for installing and configuring your head node:
1. Install and configure the base operating system.
2. Install and configure SQL Server 2005 Standard Edition.
When you have configured the base operating system, you can install SQL Server 2005
Standard Edition on your head node.
If your company uses a proxy server to connect to the Internet, you should configure your
head node so that it can receive system and application updates from Microsoft.
1. To configure your proxy server settings, open Internet Explorer. Click Tools, and then
click Internet Options.
2. Click the Connections tab, and then click LAN Settings.
3. On the Local Area Network (LAN) Settings page, select Use a proxy server for
your LAN. Enter the URL or IP address for your proxy server.
4. If you need to configure secure HTTP settings, click Advanced, and then enter the
URL and port information as needed.
5. Click OK three times, and then close Internet Explorer.
When you have finished configuring your server, click Start, click All Programs, and then
click Windows Update. This will ensure that your server is up-to-date with service packs and
software updates that may be needed to improve performance and security. You should elect
to install Microsoft Update from the Windows Update page. This service provides service
packs and updates for all Microsoft applications, including SQL Server. Follow the instructions
on the Windows Update page to install the Microsoft Update service.
When you have configured the base operating system, you can then install and configure the
ADS Agent and the Compute Cluster Pack on your image.
To install and configure the ADS Agent and Compute Cluster Pack
1. Copy the ADS binaries to a folder on the compute node. Browse to the folder, and then
run ADSSetup.exe.
2. A Welcome page appears. Click Install ADS Administration Agent. The Administration
Agent Setup Wizard starts. Click Next.
3. On the License Agreement page, select I accept the terms of the license agreement,
and then click Next.
4. On the Configure Certificates page, select Now. Type the fully-qualified path to the
certificate share on the service node (for example, \\servicenode \Certificate\ adsroot.cer).
Click Next.
5. On the Configure the Agent Logon Settings page, select None, and then click Next.
6. On the Installation Confirmation page, click Install.
7. On the Completing the Administration Agent Setup Wizard page, click Finish. Close
the Automated Deployment Services Welcome page.
8. Insert the Compute Cluster Pack CD into the head node. The Microsoft Compute
Cluster Pack Installation Wizard appears. Click Next.
9. On the Microsoft Software License Terms page, select I accept the terms in the
license agreement, and then click Next.
10. On the Select Installation Type page, select Join this server to an existing compute
cluster as a compute node. Type the name of the head node in the text box (for
example, HEADNODE). Click Next.
11. On the Select Installation Location page, accept the default. Click Next.
12. On the Install Required Components page, a list of required components for the
installation appears. Each component that has been installed will appear with a check
next to it. Select a component without a check, and then click Install.
13. Repeat the previous step for all uninstalled components. When all of the required
components have been installed, click Next. When the Microsoft Compute Cluster Pack
completes, click Finish.
When you have installed and configured the ADS Agent and Compute Cluster pack, you can
update your image with the latest service packs, and then prepare your image for deployment.
Deploy the image to nodes on the cluster. When you have captured the compute node
image to the service node, you can deploy the image to compute nodes on the cluster.
Disable Windows Firewall on all nodes on the cluster. The Compute Cluster Administrator
console enables you to define how the firewall is configured on all cluster node network
adapters. For best performance on large-scale deployments, it is recommended that you
disable Windows Firewall on all interfaces.
Approve nodes that have joined the cluster. When you deploy Compute Cluster Edition
nodes, they have joined the cluster but have not been approved to participate or process any
jobs. You must approve them before they can receive and process jobs from your users.
Add users and administrators to your cluster. In order to use and maintain the cluster, you
must add cluster users and administrators to your cluster domain. This will make it possible
for others to submit jobs to the cluster, and to perform routine administration and maintenance
on the cluster. If your organization uses Active Directory, you will need to create a trust
relationship between your cluster domain and other domains in your organization. You will
also need to create organizational units (OUs) in your cluster domain that will act as
containers for other OUs or users from your organization. You may need to work with other
groups in your company to create the necessary security groups so that you can add users
from other domains to your compute cluster domain. Because each organization is unique, it
is not possible to provide step-by-step instructions on how to add users and administrators to
the cluster domain. For help and information on how best to add users and administrators to
your cluster, see Windows Server Help.
Table 6: Troubleshooting
Issue Mitigation Details
Application Hangs
N/A For ease of debugging, switch off MPI environment variable
shared memory communication MPICH_DISABLE_SHM
using environment variable If you disable the shared memory,
MPICH_DISABLE_SHM=1 MSMPI stops looking at communication
Note: This is done from the between processors and focuses on
network queues (node to node)
command line with the command: communication.
mpiexec –env VARIABLE SETTING
-env OTHERVARIABLE
OTHERSETTING
Note: You can also set up
WinDbg for just-in-time debugging
with the command Windbg –I
Application Fails
Network The last line of the MPI output file Output file
connectivity issue gives you information on network
errors
Note: The stdout output is located
where you route it in your job; i.e.,
it is specified by the /stdout:
switch to “job submit”
SYN protection Turn SYN protection off Registry setting for SYN attack
interferes with completely on all CNs (leave SYN protection:
connectivity under protection active on the head KEY_LOCAL_MACHINE\System\CurrentContr
heavy load node to avoid denial-of-service) olSet\Services\Tcpip\Parameters\SynAttackPr
otect=0
To deploy this setting to all nodes:
clusrun /all reg add
HKLM\SYSTEM\CurrentControlSet\Service
s\Tcpip\Parameters /v
SynAttackProtect /t REG_DWORD /d 0 /f
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the
date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment
on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS
DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this
document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic,
mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft
Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in
this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not
give you any license to these patents, trademarks, copyrights, or other intellectual property.
© 2007 Microsoft Corporation. All rights reserved.
Microsoft, Active Directory, Internet Explorer, Virtual Studio, Windows, the Windows logo, and Windows Server are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.