You are on page 1of 52

MareNostrum Training

IBM Systems Group


Greg Rodgers
Peter Morjan

Sept 27, 2005

9/29/2005

2004 IBM Corporation

IBM Systems Group

Agenda
Date

Time

Instructor

Topics

Tuesday
Sept 27

9:30-11:00

Greg Rodgers

Blade Cluster Architecture


JS20 Overview
MareNostrum Layout

11:30-1

Greg Rodgers

Network Overview
and Linux Services

1:2:30

LUNCH

2:30-4:00

Greg Rodgers

Storage Subsystem

4:30-6

Greg Rodgers &


Peter Morjan

DIM and Image Management

Some detail on these charts will be added during class.


Final charts will be available after class.
2

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

High-Capacity Multi-Network Linux Cluster Model


A multi-purpose multi-user supercomputer

Myrinet
Hi speed fabric

POWER server
Reliable
Gigabit Network
POWER server
Service LAN
3

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Multiple Networks in BladeCenter Clusters:


3 Networks in BladeCenter Cluster Architecture

service , out of band systems management

reliable gigabit network for global access, net boot, image service, and GPFS

high speed fabric for distributed memory apps (e.g. MPI) and optional IO

Features:

Out of band service network gives physical security. Systems management


network isolated from users. BladeCenters controlled by SNMP commands on
service network.

Cluster can be brought up without high speed fabric

High reliable Gbe network helps to diagnose and recover from complex high
speed fabric issues

Gbe Bandwidth sufficient bandwidth for root file system, allows for diskless
image management.

Independent IO traffic. Heavy file IO wont impact a concurrent MPI user

2nd gigabit interface available for expansion

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

172 BladeCenters
2406 Blades

The MareNostrum Blade Cluster

20 DS4100 storage nodes

2560 port
Myrinet 2000
switch

p615 server
FORCE10
Gigabit Network
P615 server
Service LAN
5

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

BladeCenter System
Management
Methodology

Cluster
Management
Server

BladeCenter
Chassis

Power

Management
Module

Blower

ENet
ENet

Control Panel

Ethernet
Switch Module
CDROM/Floppy

Service
Processor
VPD
LEDs
Voltage
Temperature
CPU I2C Interface
Flash Update

l
ENet
CPU

Processor
Blade
Redundant System Components not shown
6

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

JS20 Blades, BladeCenter and Compute Racks


1.48 TF
17.6GF

246 GF

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

New Technologies used in MareNostrum


Hi-density Myrinet interconnect
IBM Advanced semiconductor
technology (CMOS10S 90nm)
Hi speeds at low power

Significant reduction in switching hardware


MPI performance that scales

2.2Ghz PowerPC 970FX processor

Hi-density gigabit switch w/48 port


linecards

IBM Blade Center Integration

Enterprise scale-out FAStT IBM Storage


(TotalStorage 4100) with GPFS on
2000+ nodes

Industry leading 64-bit commodity processor


Record price/perf in HPC workloads

Record cluster density


Improved cluster operating efficiency
(power, space, cooling)
Speed of installation

IBM e1350 Support

Provides Cluster level testing, integration,


and fulfillment

Reliable and scalable global access filesystem

Linux 2.6
Enterprise and performance features to
exploit the POWER architecture
VMX, large pages, modular boot

Diskless node capability


Improved node reliability
Reduced installation and maintenance costs
-> Flexibility to change node personality

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Agenda
Date

Time

Instructor

Topics

Tuesday
Sept 27

9:30-11:00

Greg Rodgers

Blade Cluster Architecture


JS20 Overview
MareNostrum Layout

11:30-1

Greg Rodgers

Network Overview
and Linux Services

1:2:30

LUNCH

2:30-4:00

Greg Rodgers

Storage Subsystem

4:30-6

Greg Rodgers &


Peter Morjan

DIM and Image Management

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Anatomy of a Blade
to I/O Exp.

to Front Panel
Battery
Buzzer

SW3

DRIVE 1

DRIVE 2

SW4

Daughter
Card
DIMM 1

to Midplane

CPU 2

DIMM 2

CPU 1

10

MareNostrum Training Class | 9/29/2005

DIMM 3
DIMM 4

2004 IBM Corporation

IBM Systems Group

JS20 Blade

11

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

JS20 Blade Logic


PPC 970

PPC 970

JS20 Processor Blade:


D
D
R

D
D
R

D
D
R

VRM

U3
Northbridge

D
D
R

16 bit HT
NV
RAM

Flash

AMD 8111
Hyper Transport
I/O HUB
Super
I/O

8 bit
HT

PCI-X

IDE

PCI-X
USB K/M

IDE
Serial
Port

AMD 8131
Hyper Transport PCI-X
Tunnel

USB
FDD/CDROM
SMBus

GBIT
BCM5704S

Hawk
SP

RS-485

HDM
Connector

Baier/Lichtenau Nov1502

12

HDM
Connector

Opt
Fiber
Ch.
Or Gbit
ext

2-way PPC970 SMP


Northbridge with Memory Controller and
Hypertransport I/O Bus
AMD Hypertransport tunnel to PCI-X
AMD southbridge
2 or 4 DIMMs (up to 4-8 GB)
BladeCenter Service Processor
2x1Gb Ethernet on board,
PCI-X attached (Broadcom)
optional additional IO daughter card:
2x1Gb Ethernet (Broadcom) ... or
2x2Gb FibreChannel (QLogic) or
Myrinet
PCI-X attached
Single-wide blade
14 blades per chassis
84 servers (168 processors) in a 42 U rack

MIDPLANE

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

VMX vs MMX/SSE/SSE2
MMX / SSE / SSE2

VMX

32 x 128-bit VMX registers

No interference with FP registers

No context or mode switching

Max. throughput: 8 Flops / cycle

Char Short Int Float

8 x 128-bit SSE registers plus


8 x 64-bit MMX registers
MMX registers == FP registers
MMX stalls FP

Context switching required for


MMX
Max. throughput: 2 Flops / cycle
Char Short Int Long Float
Double

Much more about VMX on Friday

16

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Agenda
Date

Time

Instructor

Topics

Tuesday
Sept 27

9:30-11:00

Greg Rodgers

Blade Cluster Architecture


JS20 Overview
MareNostrum Layout

11:30-1

Greg Rodgers

Network Overview
and Linux Services

1:2:30

17

LUNCH

2:30-4:00

Greg Rodgers

Storage Subsystem

4:30-6

Greg Rodgers &


Peter Morjan

DIM and Image Management

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

MareNostrum Rack Summary


34 xSeries e1350 Racks
29 Compute Racks (RC01-RC29)
- 171 BC chassis w/OPM & gb ESM
- 2406 JS20+ nodes w/myrinet card
1 Gigabit Network Rack (RN01)
- 1 Force10 E600 for Gb network
- 4 Cisco 3550 48-port switchs
4 Myrinet Racks (RM01-RM04)
- 10 clos256+256 myrinet switches
- 2 Myrinet spine1280s

18

MareNostrum Training Class | 9/29/2005

8 pSeries 7014-T42 Racks


1 Operations Rack (RH01)
- 1 7316-TF3 display
- 2 p615 mgmt nodes
- 2 HMC 7315-CR2
-3 Remote Async Nodes
- 3 cisco 3550 (installed on site)
- 1 BCIO (installed on site)
7 Storage Server Racks (RS01-RS07)
- 40 p615 storage servers
- 20 FAStT100 controllers
- 20 EXP100 expansion
drawers
- 560 250M SATA disks
2004 IBM Corporation

IBM Systems Group

27 bladecenter 1350 xSeries racks


(RC01-RC27)
BladeCenter(7U)
Box Summary per rack

6 Blade Center Chassis

BladeCenter(7U)

Cabling

External
6 10/100 cat5 from MM
6 Gb from ESM to E600
84 LC cables to myrinet switch

Internal

BladeCenter(7U)
BladeCenter(7U)

24 OPM cables to 84 LC cables

BladeCenter(7U)
BladeCenter(7U)

19

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

MareNostrum Rack Names

RS07

RC07

RC11

RC19

RC27

RS06

RC06

RC10

RC18

RC26

RS05

RC05

RM04

RC17

RC25

RS04

RC04

RM03

RC16

RC24

RH01

RN01

RM02

RC15

RC23

RS03

RC03

RM01

RC14

RC22

RS02

RC02

RC09

RC13

RC21

RS01

RC01

RC08

RC12

RC20

Blade Centers
Myriet Switchs
Storage Servers

21

Operations rack and display


Gigabit Switch
10/100 cisco switches

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

MareNostrum Logical Names

s41c2mm s39c4mm s38c2mm s36c4mm s35c2mm s33c4mm s32c2mm s30c4mm


s41c1mm s39c3mm s38c1mm s36c3mm s35c1mm s33c3mm s32c1mm s30c3mm
s40c4mm s39c2mm s37c4mm s36c2mm s34c4mm s33c2mm s31c4mm s30c2mm
s40c3mm S39c1mm s37c3mm S36c1mm s34c3mm S33c1mm s31c3mm S30c1mm
s40c2mm s38c4mm s37c2mm s35c4mm s34c2mm s32c4mm s31c2mm s29c4mm

RC18

RC17

RC16

RC15

RC14

RC13

RC12

s40c1mm s38c3mm s37c1mm s35c3mm s34c1mm s32c3mm s31c1mm s29c3mm

RC19
s29c2mm s27c4mm s26c2mm s24c4mm s23c2mm s21c4mm s20c2mm s18c4mm
s29c1mm s27c3mm s26c1mm s24c3mm s23c1mm s21c3mm s20c1mm s18c3mm
s28c4mm s27c2mm s25c4mm s24c2mm s22c4mm s21c2mm s19c4mm s18c2mm
s28c3mm S27c1mm s25c3mm S24c1mm s22c3mm S21c1mm s19c3mm S18c1mm
s28c2mm s26c4mm s25c2mm s23c4mm s22c2mm s20c4mm s19c2mm s17c4mm

RC18

ms0
mc4

ms1

mc0

mc1

ms2

RC14

ms9
mc6
mc3
RM01

RC15

mc8
mc5
RM02

RC16

mc7
RM03

RC04

RC13

RC12

s14c2mm s12c4mm
s14c1mm s12c3mm
s13c4mm s12c2mm
s13c3mm S12c1mm
s13c2mm s11c4mm

RC08

s13c1mm s11c3mm

RC09

s05c1mm s03c3mm s02c1mm


s04c4mm s03c2mm s01c4mm
s04c3mm S03c1mm s01c3mm
s04c2mm s02c4mm s01c2mm
s04c1mm s02c3mm s01c1mm

s06
s05
s04
s03
s02
s01

RC01

s12
s11
s10
s09
s08
s07

RS01

RC02

s18
s17
s16
s15
s14
s13

RS02

RC03

mc2
hmc1
cabeza
s41
cisco7
cisco5
cisco5

R303

RN01

RH01

s41c3mm

s05c2mm s03c4mm s02c2mm

RM04

RC17

s28c1mm s26c3mm s25c1mm s23c3mm s22c1mm s20c3mm s19c1mm s17c3mm

RC19
s17c2mm s15c4mm
s17c1mm s15c3mm
s16c4mm s15c2mm
s16c3mm S15c1mm
s16c2mm s14c4mm

RC10

s16c1mm s14c3mm

RC11

RC05

s22
s21
s20
s19

e600

s11c2mm s09c4mm s08c2mm s06c4mm


cisco4
s11c1mm s09c3mm s08c1mm s06c3mm cisco3
s10c4mm s09c2mm s07c4mm s06c2mm cisco2
cisco1

s10c3mm S09c1mm s07c3mm S06c1mm

s10c2mm s08c4mm s07c2mm s05c4mm

RC06

s28
s27
s26
s25
s24
s23

s10c1mm s08c3mm s07c1mm s05c3mm

s34
s33
s32
s31
s30
s29

RS05

RC07

s40
s39
s38
s37
s36
s35
RS06

RS04

RS07

2004 IBM Corporation


MareNostrum Training Class | 9/29/2005
23

Operations rack and display


Gigabit Switch
10/100 cisco switches
Blade Centers
Myriet Switchs
Storage Servers

IBM Systems Group

Mare Nostrum Scaled Floor Plan v14


31 x 11 tiles (60cm x 60cm)
18.6m x 6.6m = 123 sq m
18.6m x 8.2m = 153 sq m (including AC)

B
cisco

cisco

E600

Row 1

2
Blade Centers
Myriet Switchs
Storage Servers

24

3
Operations rack and display
Gigabit Switch
10/100 cisco switches

MareNostrum Training Class | 9/29/2005

Back Door to loading dock

5
L
2004 IBM Corporation

IBM Systems Group

26

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

1 operations pSeries rack (RH01)


Box summary

1 Display

2 HMC

2 p615

3 16-port Remote Async Nodes RAN#0-2

BCIO (manually installed)

3 CISCO 3550 (manually installed)

External

Cabling

27

2 Gb for p615 to E600


40 Serial lines from RAN#0-2 to p615s
8 Gb for BCIO to E600
40 cat5 from p615s to cisco
4 cat5 uplinks from ciscos in RN01
Internal
HMC RAN#0
RAN#0 to RAN#1
RAN#1 to RAN#2
2 p615s to RAN#0
KVM Display to HMC
P615s cat5 to cisco
BCIO MM cat5 to cisco
2 cat5 uplinks from cisco to cisco

MareNostrum Training Class | 9/29/2005

RH01
BCIO
BladeCenter(7U)
Display (1U)

Final placement subject to


on-site analysis.

P615 (4U)
P615 (4U)
HMC (4U)
7135-C02
Backup HMC
7135-C02
3 RANs
Serial mux
Note: One of the p615s in this operations
rack will do diskless image support for
3 Bladecenters
2004 IBM Corporation

IBM Systems Group

Agenda
Date

Time

Instructor

Topic

Tuesday
Sept 27

9:30-11:00

Greg Rodgers

Blade Cluster Architecture

11:30-1

Greg Rodgers

Network Overview and


Linux Services

1:2:30

29

LUNCH

2:30-4:00

Greg Rodgers

Storage Subsystem

4:30-6

Greg Rodgers &


Peter Morjan

DIM and Image Management

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

172 BladeCenters
2406 Blades

The MareNostrum Blade Cluster

20 DS4100 storage nodes

2560 port
Myrinet 2000
switch

p615 server
FORCE10
Gigabit Network
P615 server
Service LAN
30

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Service network

BladeCenter I/O Switch Flexibility


Ethernet
Switches
Layer 3 Nortel Switch
Layer 2/3 Cisco Switch
Layer 3/7 Nortel
Switch
D-Link Switch

Pass Thru
Module
Optical Pass-thru

31

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

MareNostrum Networks
Gigabit Network
Myrinet Network
Service Network
Serial Network
The p615s remote management network

32

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

MareNostrum Networks

'

!
% &
$

"

33

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

1 network xSeries 1350 rack


(RN01)

RN01

Box summary

1 FORCE10 E600

4 Cisco 3550

Free (24U)

Cabling

External (VERY HEAVY 390 total cables)

162 Gb from BC in compute racks to E600


8 Gb from BCIO to E600
42 Gb from p615s to E600
163 cat5 from MM to cisco
12 cat5 from Myrinet switches to cisco
3 cisco uplink to cisco in RH01
Future option (42 fiber gige cables to e600 fiber
card)
Internal

FORCE10 E600
(16U)

1 10/100 from Force10 service to Cisco

34

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Gigabit Network : Force 10 E600

Interconnection of BladeCenters
Used for system boot of every BladeCenter
212 internal network cables

170 for blades

42 for file servers

76 ports available for external links

36

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

MareNostrum Network Review

20 DS4100 storage nodes

172 BladeCenters
2406 Blades

2560 port
Myrinet 2000
switch

p615 server
FORCE10
Gigabit Network
P615 server
Service LAN
38

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Myrinet Switch
Internals
LED diagnostic display
PPC Linux Diagnostic
Module
14U Aluminum chassis with
handles
Integrated quad-ported
spine slots 4x64
16x16 host port slots
Front to rear air flow
Hot swap redundant power
supply

39

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Myrinet Switch Cabling


- 126 host cables per side from a
full 84-blade rack bundle and half a
rack bundle. Call these H84B and
H42B bundles. Each switch manages
three racks.
- 64 quad cables routed vertically
upward to spine from 4 center cards.
Call this bundle a Q64B.
There are 10 Q64Bs.
- Avoid blocking air intake at bottom.
Worst case blockage by 2 Q64Bs
in top switch. Ensure enough slack
to swap middle power supplies.
-- LCD display will not be blocked.
40

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

4 myrinet xSeries 1350 racks


(RM01 RM04)
Box summary

10 Clos256x256
switches

2 Spine 1280

Cabling

External
12 10/100 cat5
2364 LC cables

Internal
640 Quad Spine
cables (over top)

RM01

RM02

Myrinet
Myrinet
Clos256x256 Spine1280
(14U)
(14U)

RM03
Myrinet
Spine1280
(14U)

RM04
Myrinet
Clos256x256
(14U)

Myrinet
Myrinet
Myrinet
Myrinet
Clos256x256 Clos256x256 Clos256x256 Clos256x256
(14U)
(14U)
(14U)
(14U)
Myrinet
Myrinet
Myrinet
Myrinet
Clos256x256 Clos256x256 Clos256x256 Clos256x256
(14U)
(14U)
(14U)
(14U)

Complex myrinet cabling is covered with


more detail in the next charts.
41

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Myrinet Spine Cabling


- 10 Q64B bundles from center 4 cards in switches provide 640 quad cables at
top of myrinet racks to redistributed to 4 Q160B bundles. Cables are 5m.

Q64B Q64B Q64B

42

Q160B

Q64B Q64B

Q160B Q160B

Q64B Q64B

Q160B

Q64B Q64B Q64B

Myrinet
Clos256x256
(14U)

Myrinet
Spine1280
(14U)

Myrinet
Spine1280
(14U)

Myrinet
Clos256x256
(14U)

Myrinet
Clos256x256
(14U)

Myrinet
Clos256x256
(14U)

Myrinet
Clos256x256
(14U)

Myrinet
Clos256x256
(14U)

Myrinet
Clos256x256
(14U)

Myrinet
Clos256x256
(14U)

Myrinet
Clos256x256
(14U)

Myrinet
Clos256x256
(14U)

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Myrinet Cable Bundle Summary


Q64B Q64B Q64B

Q160B

640 Quad 5m Interswitch Cables


Q64B Q64B Q160B Q160B Q64B Q64B Q160B

Q64B Q64B Q64B

2364 Host Fiber Cables


Note: 8 racks have 84-way bundle split to two 42-way bundles below myrinet rack.

43

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Myrinet 2560-port full bisection

44

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Local Customization of Linux Services


All scripts for Linux services should be in /etc/init.d
All scripts should be installed with insserv command
All scripts should follow rules for specifying dependencies.
See: man init.d , man insserv

51

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Agenda
Date

Time

Instructor

Topic

Tuesday
Sept 27

9:30-11:00

Greg Rodgers

Blade Cluster Architecture

11:30-1

Greg Rodgers

Network Overview

1:2:30

53

LUNCH

2:30-4:00

Greg Rodgers

Storage Subsystem
and p615 management

4:30-6

Greg Rodgers &


Peter Morjan

DIM and Image Management

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

MareNostrum Network Review

20 DS4100 storage nodes

172 BladeCenters
2406 Blades

2560 port
Myrinet 2000
switch

p615 server
FORCE10
Gigabit Network
P615 server
Service LAN
54

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

MareNostrum Storage Subsystem


Each POWER storage server
manages 56 blades and half of
7TB Fiber channel storage node
for redundancy

POWER
Storage
Server

Root filesystems are contained on


scsi disks. Fiber channel storage is
used for parallel file system.

Fiber Channel
Storage

FastT100 controller

Repeat 20 times

scsi disks

POWER
Storage
Server
scsi disks

Storage servers are both image servers


and GPFS storage servers
55

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

140TB Storage Subsystem


p615

20 * 7TB Storage Server Nodes


Each storage server node consists of
2 p615
1 FAStT 100 controller with 3.5TB
1 EXP100 SATA drawer with 3.5TB
2 p615 is 8U
FAStT100 is 3U
EXP100 is 3U
Total Storage Node is 14U
3 Nodes per rack

p615
FAST-T100
EXP100

56

RS01

RS02

RS03

RS04

RS05

RS06 RS07

SN03

SN06 SN09

SN11

SN14

SN17

SN20

SN02

SN05

SN08

SN10

SN13

SN16

SN19

SN01

SN04

SN07

14U Free

SN12

SN15

SN18

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

6 storage pSeries racks


with 3 storage nodes each
(RS01, RS02, RS03, RS05, RS06, RS07)
Box summary

6 p615

3 FAStT100

3 EXP100

Cabling

External
12 10/100 cat5
12 Gb
12 Myrinet
6 Serial

Internal
2 p615 to FAStT100
FAStT100 to EXP100

57

MareNostrum Training Class | 9/29/2005

P615 (4U)
P615 (4U)
FAStT100(3U)
EXP100(3U)
P615(4U)
P615(4U)
FAStT100(3U)
EXP100(3U)
P615(4U)
P615(4U)
FAStT100(3U)
EXP100(3U)

2004 IBM Corporation

IBM Systems Group

1 storage pSeries rack


with 2 storage nodes
(RS04)
Box summary

4 p615

2 FAStT100

2 EXP100

Cabling

External

Internal
2 p615 FC to FAStT100
FAStT100 to EXP100

58

P615 (4U)
P615 (4U)
FAStT100(3U)
EXP100(3U)
P615(4U)

8 10/100 cat5
8 Gb
8 myrinet
4 serial

RS04

MareNostrum Training Class | 9/29/2005

P615(4U)
FAStT100(3U)
EXP100(3U)

Free (14U)

2004 IBM Corporation

IBM Systems Group

p615 Remote Control


The hmc can remote power and provide console to any p615.
<NEED ACCESS TO DOCUMENT PROCESS>

See HMC manual for instructions.


Effective System Management using IBM Hardware Mangement Console for
pSeries

Manual has lots of stuff not related to MareNostrum regarding partitioning.

Remember: no partitioning means one partition per system


Two key commands youll need to learn
mkvterm
Chsysstate

If you write scripts on the hmc, back them up somewhere else.


An hmc reload will wipe them out
Recommend scripting remote console command

62

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Serial cabling for p615 service network


3 RANs needed
only 2 are shown.
No connection to 7040
frame required.
Managed System is each
p615 server

63

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

p615 performance

Optimal adapter placement depends on bus structure


Built-in 10/100/1000 is an optimal IO interface
Build-in 10/100 is used for service network
Adapters on MareNostrum p615s

64

Two Myrinet 4 meg cards

1 Emulex fiber channel adapter

1 Fiber gigabit card (not used)

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

p615 performance continued

65

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Agenda
Date

Time

Instructor

Topic

Tuesday
Sept 27

9:30-11:00

Greg Rodgers

Blade Cluster Architecture

11:30-1

Greg Rodgers

Network Overview

1:2:30

66

LUNCH

2:30-4:00

Greg Rodgers

Storage Subsystem

4:30-6

Greg Rodgers &


Peter Morjan

DIM and Image Management

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

DIM
DIM = Diskless Image Management
DIM not a great name, not really diskless

Prototyped on MareNostrum to operate blades as if they were diskless


DIM is utility software copyright IBM, made available to BSC, not for redistribution.

Other advantages:

67

Asynchronous image management

Single image maintenance

Speed: No noticable performance degredation even with oversubscribed ethernet.

Zero blade install time.

No Linux distro modification.

Efficient image management: over 2 million rpms on MareNostrum.

efficient yet not minimalistic


Local hard drive available for user /scratch

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Basic DIM Process


Install Linux and manage image on master blade(s)
Can use multiple blades for different images

gnode (s41c3b13)

mnode (s41c3b14)

Clone blade image using dim_update_master

brute force rsync of root directories

Distribute master copy to clones read-only and read-write parts

68

Intelligent rsync with filters

Can be done with blades up or down

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

DIM verses Warewolf


DIM scales to thousands of nodes with 2-level hierarchy
DIM has large shared read-only parts of the image

Fast update

Storage efficiency
Allows complete distro, not minimalistic

Exploit caching on image server

DIM can update images during operation

with or without a running client.

Warewolf (like rocks) rebuilds the image for any change, like a new rpm or new user

DIM uses loopback mounted filesystems on the image server to control quota

69

Also allow several types of network data transport including NFS, NBD, and iSCSI.

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Diskless Image Management

Extensive use of Linux 2.6 dynamic loading and linuxrc

70

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation

IBM Systems Group

Dim Services Required


DHCP

The /etc/dhcpd.conf is the master database

NFS

NFS server required on all DIM image servers and NFS client required
on DIM

rsync

Required on DIM image servers

ssh
tftp
xinetd

72

MareNostrum Training Class | 9/29/2005

2004 IBM Corporation