GPFS Installation

INF-110
GPFS Installation
Overview
Plan the installation
Before installing any software, it is important to plan the GPFS installation by choosing the hardware, deciding which kind of disk connectivity to use (direct attached or network attached disks), selecting the network capabilities (which depends a lot on the disk connectivity), and, maybe the most important, verifying that your application can take advantage of GPFS.
Install the packages
At this point, the GPFS architecture has been defined and the machines have Linux installed . It is time now to install the packages on all the nodes that will be part of the GPFS cluster.
Create the GPFS cluster
Once the GPFS packages are installed on the nodes, you need to create the GPFS cluster. To create the GPFS cluster, we need a file that contains all of the node host names or IP addresses. Then we have to use the mmcrcluster command to create the cluster. This command will create cluster data information on all nodes chosen to be part of the GPFS cluster. In case a new node needs to be added to an already existing GPFS cluster, the mmaddcluster command can be used.
Overview (continued)
Start GPFS
After the nodeset is created, you should start it before defining the disk. Use the mmstartup command to start the GPFS daemons.
Disk definition
All disks used by GPFS in a nodeset have to be described in a file, and then this file has to be passed to the mmcrnsd command. This command gives a name to each described disk and ensures that all the nodes included in the nodeset are able to gain access to the disks with their new name.
Creating the file system
Once the cluster, the nodeset(s), and the disks have been defined, then it is time to create the file system. With GPFS, the mmcrfs command is used for that purpose. There are many options that can be activated at this time, like file system automounting, file system block size, data or metadata replication, and so on.
Mounting the file system
At last, you have to mount the file system after it is being created. Once the file system had been mounted, it can be used by the nodes for read and write operations. If you set auto-mounting option, your GPFS file system will be automatically mounted when the nodes reboot.
Setup GPFS Environments

Add a path to the GPFS binary directory to your $PATH environment in all nodes. Type:
mkdir -p /cfmroot/etc/profile.d
Create /cfmroot/etc/profile.d/mmfs.sh, which contains:

PATH=$PATH:/usr/lpp/mmfs/bin MANPATH=$MANPATH:/usr/lpp/mmfs/man
Type:
chmod 755 /cfmroot/etc/profile.d/mmfs.sh cfmupdatenode -a cp /cfmroot/etc/profile.d/mmfs.sh /etc/profile.d . /etc/profile.d/mmfs.sh
This way, you will distribute /etc/profile.d/mmfs.sh to all nodes, including its attributes.
Install GPFS
The GPFS install and update files are located on the management node in the /lab/gpfs directory. Extract updates.
mkdir -p /tmp/gpfs/updates cp r /lab/gpfs/* /tmp/gpfs cd /tmp/gpfs/updates tar zxvf *update.tar.gz
Install GPFS and updates on management node.

cd /tmp/gpfs;rpm -ivh gpfs*rpm cd /tmp/gpfs/updates;rpm -Uvh gpfs*rpm
Copy packages on compute nodes.
dsh -a mkdir p /tmp/gpfs/updates cd /tmp/gpfs dcp -a *rpm /tmp/gpfs dcp a updates/*rpm /tmp/gpfs/updates
Install GPFS on compute nodes. Install GPFS updates.

5
dsh -a 'cd /tmp/gpfs;rpm -ivh gpfs*rpm
dsh -a 'cd /tmp/gpfs/updates;rpm -Uvh gpfs*rpm
Prepare kernel
Since GPFS code works at the kernel level (as kernel extensions), it highly depends on the kernel level to run properly. Therefore, you have to build your GPFS open source portability module before building a GPFS cluster, and a kernel source file is required for that. You may check the list of kernel versions that may be supported at the following site: http://www-1.ibm.com/servers/eserver/clusters/software/gpfs_faq.html Lab Note: There are a few patches that should be applied. Read the FAQ in the future. In this Lab we will not be apply the patches to save time. Create Link to kernel source
cd /usr/src ln s linux-2.4 linux
Clean up tree
cd /usr/src/linux make mrproper
Prepare kernel (continued)

Check the content of the VERSION, PATCHLEVEL, SUBLEVEL, and EXTRAVERSION variables in the /usr/src/linux/Makefile file to match the release version of your kernel.
uname -r to check your version, e.g. 2.4.21-27.ELsmp
Edit Makefile
VERSION = 2 PATCHLEVEL = 4 SUBLEVEL = 21 EXTRAVERSION = -27.ELsmp
Copy kernel configuration file Type:
cp configs/kernel-2.4.21-i686-smp.config .config
make oldconfig make dep
Build the GPFS open source portability layer

You have to build the GPFS open source portability layer manually on one node (in our case, the management node), then copy them through all nodes. Below are the steps to build GPFS open source portability layer. Also, check the /usr/lpp/mmfs/src/README file for more up to date information on building the GPFS Open Source portability layer:
export SHARKCLONEROOT=/usr/lpp/mmfs/src cd /usr/lpp/mmfs/src/config cp site.mcr.proto site.mcr Edit the /usr/lpp/mmfs/src/config/site.mcr file. There are some sections that need to be checked (bold):
/* $Id: site.mcr.proto,v 1.442.2.5 2004/06/07 15:45:28 gjertsen Exp $ */ ........ /* Linux distribution (select/uncomment only one) */ /* LINUX_DISTRIBUTION = REDHAT_LINUX */ LINUX_DISTRIBUTION = REDHAT_AS_LINUX ........ /* #define LINUX_DISTRIBUTION_LEVEL 80 */ ........ /* Linux kernel versions supported for each architecture */ #define LINUX_KERNEL_VERSION 2042127
cd .. make World make InstallImages

8
Distribute the GPFS portability layer

Copy the above binaries to the /cfmroot/usr/lpp/mmfs/bin directory and distribute them to all nodes using the cfmupdatenode command or your own scripts :
mkdir -p /cfmroot/usr/lpp/mmfs/bin cd /usr/lpp/mmfs/bin cp mmfslinux lxtrace tracedev dumpconv /cfmroot/usr/lpp/mmfs/bin cfmupdatenode -a
Creating the GPFS nodes descriptor file

ssh to node1. All GPFS commands should be run from nodes that will be running GPFS. The management node will NOT be running GPFS.
ssh node1
When creating your GPFS cluster, you need to provide a file containing a list of node descriptors, one per line for each node to be included in the cluster, including the storage nodes. Each descriptor must be specified in the form:
NodeName:NodeDesignations
where:
NodeName The host name or IP address of the node for GPFS daemon to daemon communication. An optional, - separated list of node roles. Roles include: manager|client quorum|nonquorum
NodeDesignations
10
Creating the GPFS nodes descriptor file

Create a file /tmp/gpfs.allnodes with a list of your nodes and their roles. Ensure there is at least one node with quorum and manager roles defined. For example:
node1:manager-quorum node2:manager-quorum Node3:quorum node4:
The above file signifies that we have four nodes in our GPFS cluster.
Node1 has configuration manager and quorum roles. Node2 has configuration manager and quorum roles. Node3 has the quorum role. Node4 is using the defaults of non-quorum and client roles.
11
Defining the GPFS cluster

. For example:
mmcrcluster -p node1 -s node2 -n /tmp/gpfs.allnodes -r /usr/bin/ssh -R /usr/bin/scp
Run the mmcrcluster command to define the GPFS cluster. Defined your node1 as the primary, node2 as the secondary (for GPFS data), ssh as remote shell command and scp as remote file copy commands.
Tue Aug 10 14:00:46 CDT 2004: mmcrcluster: Processing node node1.cluster.net Tue Aug 10 14:00:48 CDT 2004: mmcrcluster: Processing node node2.cluster.net Tue Aug 10 14:00:49 CDT 2004: mmcrcluster: Processing node node3.cluster.net Tue Aug 10 14:00:50 CDT 2004: mmcrcluster: Processing node node4.cluster.net Tue Aug 10 14:00:55 CDT 2004: mmcrcluster: Initializing needed RSCT subsystems. mmcrcluster: Command successfully completed
12
After creating the cluster definitions, you can see the definitions using the mmlscluster command. Type:
mmlscluster
Starting GPFS
After creating the GPFS cluster, you can start the GPFS services on every node in the cluster by issuing the mmstartup command with the -a parameter. The -a parameter will start GPFS on all nodes in the cluster. Type:
mmstartup a Note: To shutdown GPFS type: mmshutdown -a (do not type it now)
13
Prepare Disks (Skip)

For Fiber disks, create arrays, LUNs, and mappings Use CSM and GPFS Redbook as a guide Use GPFS documentation Use DS4xxx (FAStT) documentation Lab Note: We were unable to obtain DS4xxx controllers and disk. For each disk to be used for GPFS on each node use fdisk to remove any partitions. NOTE: We will be using disk /dev/hdc in nodes1nodes4.
For Example:
ssh node1 fdisk /dev/hdc Use ? for a list of commands to display and remove partitions.
14
Disk definitions
A GPFS cluster with NSD network attached servers means that all access to the disks and replication will be through one or two storage attached servers (also known as storage node). If your cluster has an internal network segment, this segment will be used for this purpose. If a disk is defined with one storage attached server only, and the server fails, the disks would become unavailable to GPFS. If the disk is defined with two NSD network attached servers, then GPFS automatically transfers the I/O requests to the backup server. Lab Note: We were unable to provide Fiber storage. You will be unable to define two paths to the storage. Lab Note: The four nodes in your cluster (e.g. node1 - node4) each contain a single 40GB drive (/dev/hdc). You will use this as your GPFS storage.
15
Creating Network Shared Disks (NSDs)

You will need to create a descriptor file before creating your NSDs. This file should contain information about each disk that will be a NSD, and should have the following syntax:
DeviceName:PrimaryNSDServer:SecondaryNSDServer:DiskUsage:FailureGroup DeviceName PrimaryServer The real device name of the external storage partition (such as /dev/hdc). The host name of the server that the disk is attached to; Remember you must always use the node names defined in the cluster definitions.
SecondaryServer DiskUsage FailureGroup
The server where the secondary disk attachment is connected. The kind of information should be stored in this disk. The valid values are data, metadata, and dataAndMetadata (default).
An integer value (0 to 4000) that identifies the failure group to which this disk belongs. All disks with a common point of failure must belong to the same failure group. The value -1 indicates that the disk has no common point of failure with any other disk in the file system. GPFS uses the failure group information to assure that no two replicas of data or metadata are placed in the same group and thereby become unavailable due to a single failure. When this field is not specified, GPFS assigns a failure group (higher than 4000) automatically to each disk.
16
Creating Network Shared Disks (NSDs)

Create a new file /tmp/descfile
E.g. /dev/hdc:node1::dataAndMetadata:-1 /dev/hdc:node2::dataAndMetadata:-1 /dev/hdc:node3::dataAndMetadata:-1 /dev/hdc:node4::dataAndMetadata:-1
Now create the Network Shared Disks by using the mmcrnsd command:
mmcrnsd -F /tmp/descfile -v no
After successfully creating the NSD for GPFS cluster, mmcrnsd will comment the original disk device and put the GPFS assigned global name for that disk device at the following line. cat /tmp/descfile to see the changes.
cat /tmp/descfile
You can see the new device names by using the mmlsnsd command.
mmlsnsd
17
Creating the GPFS file system

Once you have your NSDs ready, you can create the GPFS file system. In order to create the file system, you will use the mmcrfs command, where you must define the following attributes in this order:
The mount point. The name of the device for the file system. The descriptor file (-F).
Type:
mmcrfs /gpfs1 /dev/gpfs1 -F /tmp/descfile -A yes -B 256K -n 4 -v no
Validate with mmlsdisk

mmlsdisk gpfs1
Mount filesystems, exit node1 and type from the mgmt1 node:
dsh -a mount a
Validate with df. You should have a single 156GB filesystem spanning 4 disks in 4 nodes available to all nodes.
dsh -a df
Please review the CSM and GPFS Redbook and GPFS documentation for a list of administrative functions.
18
Removing GPFS (Skip)

Often it is desired to completely remove GPFS and start over. The most common cause is SSH and DNS setup issues that cause distributed GPFS commands to fail. Cleanup can be difficult. Remove GPFS from management node.
rpm -e gpfs.base gpfs.docs gpfs.gpl gpfs.msg.en_US rm -rf /var/mmfs
Remove GPFS from all nodes.

dsh -a rpm -e gpfs.base gpfs.docs gpfs.gpl gpfs.msg.en_US dsh -a rm -rf /var/mmfs
Do not remove any SRC or RSCT components.
19
Authentication
HPC clusters require a global authentication solution enabling all nodes view all users with the same properties. Many authentication solutions exist. The most common are:
NIS LDAP File synchronization
File synchronization is most popular with HPC clusters and is the most scalable solution for very large clusters. It is also easy to setup. Create a cluster use on your management node:
useradd (username), For example: useradd bob
20
Authentication (continued)
Backup existing /etc/passwd and /etc/group files first. If for any reason /etc/passwd gets corrupted you will be unable to login even as root. A reboot to single user mode will be required to recover the backup.
dsh -a cp /etc/passwd /etc/passwd.SAVE dsh -a cp /etc/group /etc/group.SAVE
Push the /etc/passwd and /etc/group files to all nodes Verify
dsh -a grep (username) /etc/passwd, For example: dsh -a grep bob /etc/passwd (check the output)
Each time a new user is added, a node is added, or a node is reinstalled run cfmupdatenode -a again. Generate SSH keys for each cluster user. (root is NOT a cluster user). rsh clusters may need to create a .rhosts file per user.
21
File Systems
Like authentication, HPC clusters also require a global file system solution enabling all nodes to view the same files with the same properties. There are many solutions available. The most common are:
NFS GPFS
GPFS is usually not required for user, application, and library directories. GPFS is best suited for data directories.
In this LAB we will create 2 global name spaces.
NFS: /home for user application NFS: /usr/local for system applications and libraries.
To setup NFS you must first export the /home and /usr/local file systems from your management node. Append the follow lines to your /etc/exports file:
/home *(rw,no_root_squash,sync) /usr/local *(rw,no_root_squash,sync)
22
Restart NFS.
service nfs restart
File Systems (continued)

Verify
dsh -a ls -l /home | grep (user name you added, e.g. bob) For example: dsh -a ls -l /home | grep bob
This verification checks that both the file systems and authentication are working properly. Your dsh output should have listed the /home/username directory for your cluster user AND the user should have owned the directory, e.g.
node1: node2: node3: node4: drwx-----drwx-----drwx-----drwx-----5 5 5 5 bob bob bob bob bob bob bob bob 4096 4096 4096 4096 Mar Mar Mar Mar 24 24 24 24 05:01 05:01 05:01 05:01 bob bob bob bob
Also verify the /usr/local was mounted.

dsh -a df | grep /usr/local
23
MPICH-IP
MPICH is a freely available, portable implementation of MPI, the Standard for message-passing libraries the runs over IP. MPICH URL http://www-unix.mcs.anl.gov/mpi/mpich Install MPICH for GNU compiler.
mkdir p /tmp/mpi cp /lab/hpc/mpich*tar.gz /tmp/mpi cp /lab/hpc/mpimaker /tmp/mpi export MPICHROOT=/usr/local/mpich cd /tmp/mpi ./mpimaker mpich-1.2.7 up gnu ssh
A successful build should return:

mpimaker: 1.2.5.2 up gnu ssh build start mpimaker: 1.2.5.2 up gnu ssh make mpimaker: 1.2.5.2 up gnu ssh build successful MPICH installed in /usr/local/mpich/1.2.5.2/ip/up/gnu/ssh
Please check config.cmd make.log install.log configure.log in /usr/local/mpich/1.2.5.2/ip/up/gnu/ssh for errors. config.cmd was the command used to build MPICH If the build failed check the files config.cmd make.log install.log configure.log in /tmp/hpc/mpich-1.2.5.2.
24
mpiiotest
mpiiotest is a simple utility to test parallel file systems su to the the user you created earlier
su (user name you added, e.g. bob) su bob For example:
Copy the mpiiotest to the users home directory

mkdir ~/bench/ cp /lab/hpc/mpiiotest.tgz ~/bench/ cd ~/bench/ tar zxvf mpiiotest.gz
Build mpiiotest
export MPICH=/usr/local/mpich/1.2.7/ip/i686/up/gnu/ssh export PATH=$MPICH/bin:$PATH cd ~/bench/ make clean make
25
mpiiotest (continued)
Setup the users environment:
ssh node1 cd ~/bench export MPICH=/usr/local/mpich/1.2.7/ip/i686/up/gnu/ssh export PATH=$MPICH/bin:$PATH
Create a file "machinefile" with 1 entries per node, e.g.:

node1 node2 node3 node4
Open another xterm on your workstation machine as root and type:

xhost +
Type the following as root on your management node:

dsh -a chmod 777 /gpfs1
Type on one line:
mpirun -machinefile machinefile -np 4 mpiiotest --filename /gpfs1/test --filesize 10240 --blocksize 64 --display mgmt1:0 -g 1000x30
26
First mpiiotest creates the file in parallel. Each red band represents the status of the current process write progress. When the bar is red the file has been written. Next mpiiotest reads the created file. Each blue band represents the status of the current process read progress. When the bar is blue the file has been red.
27
Exit back to the mgmt1 node.
The performance of any filesystem is affected by the blocksize used by that filesystem vs the blocksize that the application is using. Since the GPFS filesystem was setup with a 256K blocksize, the optimal blocksize for this test should be 256K. Test this by trying a couple of different blocksizes, recording the total read and write performance for each run. Type on one line:
mpirun -machinefile machinefile -np 4 mpiiotest --filename /gpfs1/test --filesize 10240 --blocksize 128 --display mgmt1:0 -g 1000x30 mpirun -machinefile machinefile -np 4 mpiiotest --filename /gpfs1/test --filesize 10240 --blocksize 256 --display mgmt1:0 -g 1000x30 mpirun -machinefile machinefile -np 4 mpiiotest --filename /gpfs1/test --filesize 10240 --blocksize 512 --display mgmt1:0 -g 1000x30
28
Modify GPFS Block Size

If time permits modify the blocksize of the GPFS filesystem and rerun the mpiiotest benchmark with the same three blocksizes used above. Follow the steps above to remove GPFS (page 20) Follow the steps above to reinstall GPFS (page 6) Follow the steps above to reconfigure GPFS (pages 10-19). On page 19 modify the mmcrfs command to read as follows:
mmcrfs /gpfs1 /dev/gpfs1 -F /tmp/descfile -A yes -B 64K -n 4 -v no
Type the following as root on your management node:

dsh -a chmod 777 /gpfs1
Follow the steps above to rerun the mpiiotest benchmark with the three blocksizes (pages 27-30)
29

GPFS Installation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GPFS Installation

Uploaded by

Copyright:

Available Formats

INF-110

Install the packages

Create the GPFS cluster

Creating the file system

Mounting the file system

Setup GPFS Environments

Create /cfmroot/etc/profile.d/mmfs.sh, which contains:

Install GPFS and updates on management node.

Copy packages on compute nodes.

Install GPFS on compute nodes. Install GPFS updates.

dsh -a 'cd /tmp/gpfs;rpm -ivh gpfs*rpm

dsh -a 'cd /tmp/gpfs/updates;rpm -Uvh gpfs*rpm

Prepare kernel (continued)

Copy kernel configuration file Type:

make oldconfig make dep

Build the GPFS open source portability layer

cd .. make World make InstallImages

Distribute the GPFS portability layer

Creating the GPFS nodes descriptor file

Creating the GPFS nodes descriptor file

Defining the GPFS cluster

Prepare Disks (Skip)

Creating Network Shared Disks (NSDs)

SecondaryServer DiskUsage FailureGroup

Creating Network Shared Disks (NSDs)

Creating the GPFS file system

mmcrfs /gpfs1 /dev/gpfs1 -F /tmp/descfile -A yes -B 256K -n 4 -v no

Validate with mmlsdisk

Removing GPFS (Skip)

Remove GPFS from all nodes.

Do not remove any SRC or RSCT components.

Push the /etc/passwd and /etc/group files to all nodes Verify

In this LAB we will create 2 global name spaces.

File Systems (continued)

Also verify the /usr/local was mounted.

A successful build should return:

Copy the mpiiotest to the users home directory

Create a file "machinefile" with 1 entries per node, e.g.:

Open another xterm on your workstation machine as root and type:

Type the following as root on your management node:

Type on one line:

Exit back to the mgmt1 node.

Modify GPFS Block Size

Type the following as root on your management node:

You might also like