You are on page 1of 40

POWER8

<주의> 본 자료는 2014 년 10월 기준으로 작성되어 있으므로, 수시 변경될 수 있습니다.

1 KOLON Techline
POWER Systems Product Portfolio

TM TM
•PowerLinux •PowerLinux •Power 710/720 •Power 730/740 •Power 750 •Power 760 •Power 770 •Power 780
•7R1/7R2 7R4

Power7 Product LineUp

•Power
TM •Powe S822 •Power S814 •Power S824
S812L/S822L

Power8 Product LineUp

2 KOLON Techline
POWER8 New Naming

Scale-out

POWER8

1socket

2U
Power Systems S812L

Linux
3 KOLON Techline
Power Systems April Announcement

1 Socket 2 Socket
POWER S812L POWER S822/S822L

512GB (Future 1TB) 1TB (Future 2TB)


12SFF or 8SFF + 6SSD 12SFF or 8SFF + 6SSD
2U 6 PCIe Gen3 9 PCIe Gen3

POWER S814 POWER S824

512GB (Future 1TB) 1TB (Future 2TB)


12SFF or 18SFF 12SFF or 18SFF + 6SSD
4U 7 PCIe Gen3 11 PCIe Gen3

4 KOLON Techline
Power Systems April Announcement Summary

New 2014 Power Systems


GA Short Name Cores Frequency Memory Disk CAPI Max PCIe O/S

3Q14 Power S812L 8247-21L 1x 10c 3.42 GHz 512 GB 12SFF or Max1 2x PCIe G3 x16 RED HAT
(one socket) 1x 12c 3.02 GHz 8SFF + 4x PCIe G3 x8 SUSE
6SSD UBUNTU

2Q14 Power S822L 8247-22L 2x 10c 3.42 GHz 1 TB 12SFF or Max2 4x PCIe G3 x16 RED HAT
(two socket) 2x 12c 3.02 GHz 8SFF + 5x PCIe G3 x8 SUSE
6SSD UBUNTU

2Q14 Power S822 8284-22A 1x 6c 3.89 GHz 1 TB 12SFF or Max2 4x PCIe G3 x16 AIX
(one socket upgradeable or two 1x 10c 3.42 GHz 8SFF + 5x PCIe G3 x8 RED HAT
socket) 2x 6c 3.89 GHz 6SSD SUSE
2x 10c 3.42 GHz

2Q14 Power S814 8286-41A 1x 6c 3.02 GHz 512 GB 12SFF or Max1 2x PCIe G3 x16 AIX
(one socket) 1x 8c 3.72 GHz 18SFF 5x PCIe G3 x8 System i
1x 4c 3.02 GHz RED HAT
SUSE

2Q14 Power S824 8286-42A 1x 6c 3.89 GHz 1 TB 12 SFF or Max2 4x PCIe G3 x16 AIX
(one socket upgradeable or two 1x 8c 4.15 GHz 18 SFF + 7x PCIe G3 x8 System i
socket) 2x 6c 3.89 GHz 8 SSD RED HAT
2x 8c 4.15 GHz SUSE
2x 12c 3.52 GHz

5 KOLON Techline
1 & 2 Socket Servers

New Scale-Out Servers with POWER8 technology


 1 socket : 4U S814
 2 socket: 2U and 4U S822 and S824

Linux-only Power Systems (Not called “PowerLinux”)


 2 socket: 2U S822L

1 Socket 1~2 Socket 1~2 Socket 2 Socket

S814 S824 S822 S822L

8286-41A 8286-42A 8284-22A 8247-22L

4U 4U 2U 2U

6 KOLON Techline
POWER8 Extended Operating System

All applications which run these OS levels will run on POWER8

AIX 6.1 or 7.1 IBM i 7.1 or 7.2

Linux RHEL 6, SUSE 11,


Ubuntu 14

7 KOLON Techline
POWER8 - 한 눈에 보는 변화

8 KOLON Techline
POWER8 New / Enhanced Feature

Feature Technology Enhancement

Core per Chip  8core  12core

Cache  L2 : 256kB / core  512kB / core


 L3 : 80MB / Chip  96MB / Chip
 L4 :  Max 128MB / Chip
Bandwidth  Memory : 100GB/s  230GB/s
 Max I/O : 40GB/s  96GB/s
Simultaneous Multi-Threading  SMT4  SMT8

Memory  Transactional Memory


 Nova 1060MHz  Centaur 1600MHz
PCIe  Gen2 to Gen3
 GX++  PCIe Direct
Internal I/O Performance  Easy Tiering Support (with SSD)

Coherent accelerator processor interface  One CAPI Adapter per Socket

Endian  Big Endian to Bi-Endian


 LE Linux : Ubuntu(1H), SUSE(2H)
Mobile CoD  Enterprise Systems Pools

Virtualization  PowerVM & PowerKVM

KOLON Techline
9
9
POWER8 : Leadership & Innovation (Details)

Feature Technology Enhancement Technology Enhancement

Designed for Big CPU  8core to 12core Socket당 12core의 CPU는 보다 많은 데이터를 동일한 Socket에서 처리하게 됩니다. 또한 이전 모
 SMT4 to SMT8 델에서는 4개의 SMT (Simultaneous Multi-Threading)가 지원되었으나, POWER8에서는 8개의 SMT
Data
가 지원이 되어서 동시에 더 많은 연산을 할 수 있습니다

Cache  L2 : 256kB / core  512kB / core L2 Cache의 경우 단위 Core당 512kB로 이전 모델의 2배가 되었습니다. 보다 더 많은 데이터를 보다
 L3 : 80MB / Chip  96MB / Chip 더 빠른 속도로 처리할 수 있습니다. 시스템의 전 영역에 걸쳐 존재하는 이러한 캐쉬는 이전 세대의
 L4 :  Max 128MB / Chip
모델에 이어서 역시 현재 시스템 중에서도 역시 최대의 사이즈를 제공하게 됩니다.

메모리  Transactional Memory 메모리 Buffer Chip을 통한 L4 Cache (Off Chip)를 추가하였으며, 단위 CPU Socket 당 최대 512MB
 Nova 1060MHz to Centaur 가 장착됩니다. (추후 1GB로 확장 예정) 이를 통해 대용량 In-Memory DB 등의 처리가 강화되었습
1600MHz
니다.

대역폭  Memory : 100GB/s  230GB/s 메모리와 CPU, Cache와 Cache, CPU와 I/O 등 전 영역에 걸쳐서 이전 모델에 비해 2~3배 개선이 되
Bandwidth  Max I/O : 40GB/s  96GB/s 어 대용량 데이터를 보다 빠른 속도로 처리할 수 있습니다.
(Gen2 to Gen3, PCIe Direct)

Superior Cloud Power KVM  Power KVM Linux 전용 제품에 탑재되는 오픈 소스 기반의 POWER KVM을 통하여 Linux 기반의 Cloud를 보다
용이하게 구축할 수 있습니다.
Economics
Performance  x86 Ivy Bridge 대비 2.1배 이전 세대의 제품에 비해 단위 코어당 1.5배의 성능 개선을 바탕 으로 강력한 경제성을 제공하게 됩
니다. (성능 기준은 4월 29일 발표된 공인 SAPs 성능 수치 기준)

More Linux  Red Hat, SUSE, Ubuntu POWER8의 Linux 전용 제품에는 UBUNTU O/S가 추가되어 고객의 다양한 운영 환경 및 클라우드
환경에 대응하게 되며, 향후 추가로 더 많은 Linux O/S가 추가될 예정입니다.

Open Innovation CAPI  Coherent accelerator processor interface POWER8에 새롭게 도입되는 CAPI를 통해 고객 업무의 특성에 따라 요구되는 사항들을 추가로 강화
One CPAI Adapter per Socket 할 수 있습니다. CAPI를 통해 GPGPU(General Purpose GPU) 또는 FPGA(Field Programming Gate
Platform
Array)와 같은 외부 가속기 등을 CPU에 직접 연결할 수 있는데, 이러한 GPGPU나 FPGA에는 시스템
POWER 에 탑재된 개별 Solution이 특별히 요구하는 기능들을 별도의 H/W Logic이나 프로그래밍 등을 통
해 지원하게 됩니다. 이러한 외부 가속기가 CPU와 동일한 메모리 어드레스를 공유함으로써, 복잡성
을 줄이고 메모리의 속도로 가속 기능을 사용할 수 있게 됩니다.

OpenPOWER  OpenPOWER Foundation의 지속적 IBM과 구글(Google), 엔비디아(NVIDIA), 멜라녹스(Mellanox), 타이안(Tyan)에 의해 설립된 오픈파워
Foundation 인 확대 및 POWER8 신기술 접목 파운데이션은 현재 25개의 세계적인 기술 기업들로 구성되어 있으며 지속적으로 규모가 증가하고
있습니다. 국내 기업 중에서는 지난 2월 삼성전자가 오픈파워 파운데이션에 합류한 데에 이어, SK하
이닉스도 합류하여 오픈 서버 생태계를 위한 보다 발전된 메모리 기술을 지원하고 있습니다.
10 KOLON Techline
POWER8
Processor

11 KOLON Techline
POWER Processor Technology

POWER9
Extreme Analytics
Optimization
POWER8 Extreme Big Data
POWER5/5+ POWER6/6+ POWER7/7+ Optimization
65/65 nm 45/32 nm 22 nm On-chip accelerators
130/90 nm
2 2 8 8 12 Cores
Compute
Threads
SMT2 SMT2 SMT4 SMT4 SMT8
On-chip
1.9MB 8MB 2 + 32/80MB 6 + 96MB Caching
Off-chip
36MB 32MB None 128MB

15GB/s 30GB/s 100GB/s 230GB/s Sust. Mem. B/W


6GB/s 20GB/s 40GB/s 96GB/s Peak I/O (밴드위쓰)

2004 2007 2010 2014

12 KOLON Techline
POWER8 Processor

Technology
22nm SOI, eDRAM, 15 ML 650mm2

Cores Caches
•12 cores (SMT8) •512 KB SRAM L2 / core
•8 dispatch, 10 issue •96 MB eDRAM shared L3
•16 execution pipe

Accelerators
Core Core Core Core Core Core •Up to 128 MB off-chip L4

SMP Links
•2X internal dataflow/queue
•Enhanced prefetching L2 L2 L2 L2 L2 L2
•64K data cache 8M L3
Region
•32K instruction cache Memory
Mem. Ctrl. L3 Cache & Chip Interconnect Mem. Ctrl. •Up to 230 GB/s bandwidth
•Up to 1 TB capacity / socket
Accelerators

SMP Links
L2 L2 L2 L2 L2 L2
•Crypto & Mem expansion
PCIe
•Transactional Memory
Core Core Core Core Core Core Bus Interfaces
•VMM assist
•Durable open memory attach
•Data Move / VM Mobility
•Robust SMP Interconnect
•Integrated PCIe Gen3
Energy Management •CAPI
•On-chip Power Management Micro-controller
•Integrated Per-core VRM

13 KOLON Techline
Scale Out Systems - DCMs and POWER8 Chips

1S & 2S servers use DCM (Dual Chip Module)


– 1 DCM fills 1 socket …. Similar to POWER7+ 750 / 760
– 1 DCM has two Scale Out POWER8 chips
– 1 DCM can provide 6-core, 8-core, 10-core or 12-core sockets

Local SMP Links


6-core Processor Chip

Accelerators
 362 mm2 Core Core Core
 22nm SOI w/ eDRAM
Strengthen Cores
 8 Threads per Core L2 L2 L2
Caches 8M L3
 D Cache: 64KB Region
 L2: 512KB
L3 Cache & Chip Intercon MemCtrl
 L3: 8 MB per Region Total: 48MB

Remote SMP Links


Fine Grained Power Management
 On Chip power management PCI Gen 3 Links
Excellent I/O bandwidth per socket L2 L2 L2
2-Hop fabric topology
Integrated SMP Interconnect w/ improved “Flatness”
Core Core Core
On Chip PCIe Controller

14 KOLON Techline
POWER8
SMT

15 KOLON Techline
POWER8 Multi-threading Options

SMT1: Largest unit of execution work


SMT2: Smaller unit of work, but provides greater amount of execution work per cycle
SMT4: Smaller unit of work, but provides greater amount of execution work per cycle
SMT8: Smallest unit of work, but provides the maximum amount of execution work per cycle

Can dynamical shift between modes as required: SMT1 / SMT2 / SMT4 / SMT8
Mixed SMT modes supported within same LPAR
– Requires use of “Resource Groups”

4
2.5
3.5

3 2

2.5
1.5
2

1.5 1

0.5
0.5

0
0
P7 P8 P8 P8 P8 SMT1 SMT2 SMT4 SMT8
SMT1 SMT1 SMT2 SMT4 SMT8

16 KOLON Techline
rPerf – Multiple SMT Levels

SMT1 SMT2 SMT4 SMT8


Power S814
6-core 3.0 GHz 48.3 70.1 91.1 97.5
8-core 3.7 GHz 71.4 103.5 134.5 143.9
Power S824
6-core 3.8 GHz 59.9 86.9 112.9 120.8
12-core 3.8 GHz 116.8 169.4 220.2 235.6
8-core 4.1 GHz 82.3 119.3 155.1 166.0
16-core 4.1 GHz 160.4 232.7 302.4 323.6
24-core 3.5 GHz 209.1 303.2 394.2 421.8
Power S822
6-core 3.8 GHz 59.9 86.9 112.9 120.8
12-core 3.8 GHz 116.8 169.4 220.2 235.6
10-core 3.4 GHz 88.2 127.8 166.2 177.8
20-core 3.4 GHz 171.9 249.3 324.0 346.7

17 KOLON Techline
18 KOLON Techline
POWER8 OS Support

19 KOLON Techline
Compatible Mode Architecture

POWER6 MODE POWER7 MODE


POWER8 MODE
(and POWER6+ Mode)* (No POWER7+ Mode)

2-Thread SMT 4-Thread SMT, IntelliThreads 8-Thread SMT

8 Protection Keys *(16 in P6+ 32 Protection Keys 32 Protection Keys


Mode) User Writeable AMR User Writeable AMR
VMX (Vector Multimedia Extension VSX2,
VSX (Vector Scalar Extension)
/ AltiVec) In-Core Encryption Acceleration

CPU/Memory Affinity Enhancements


HW Memory Affinity Tracking Assists,
ON by Default, HomeNode,
Affinity OFF by Default MicroPartition Prefetch,
3-tier Memory,
Concurrent LPARs per Core
MicroPartition Affinity

> 1024-thread Scaling


64-core / 256-thread Scaling Hybrid Threads
64-core/128-thread Scaling
256-core / 1024-thread Scaling Transactional Memory
Active System Optimization HW Assists

HW Accelerated/Assisted Active Memory


N/A Active Memory Expansion
Expansion

P7+ : AME compression acceleration and Coherent Accelerator /


N/A
Encryption acceleration FPGA Attach

20 KOLON Techline
AIX Levels

11 / 2012 2 / 2012 3 / 2013 5 / 2013 8 / 2013 9 / 2013 10 / 2013 12 / 2013 2Q / 2014 3Q / 2014

AIX 6
SP6 SP7 SP8 SP9 SP10
TL7

AIX 6
SP1 SP2 SP3 SP4 SP5
TL8

AIX 6
SP1 SP3
TL9

AIX 7
SP6 SP7 SP8 SP9 SP10
TL1

AIX 7
SP1 SP2 SP3 SP4 SP5
TL2

AIX 7
SP1 SP3
TL3

P7 or P6 Modes with Virtual I/O

P7 or P6 Modes with Full I/O Support

P8, P7 or P6 Modes with Full I/O Support

21 KOLON Techline
22 KOLON Techline
POWER8
CAPI

23 KOLON Techline
CAPI (Coherent Accelerator Processor Interface) 개요

Virtual Addressing
POWER8
•Adapter 기반의가속기가CPU와같은가상메모리주소를사용
POWER8
•OS와device driver 등의오버헤드를제거

Hardware Managed Cache Coherence Coherence Bus

•Adapter 기반의가속기가보통의app thread처럼“lock” 활동에참여가능


CAPP
•I/O 및통신모델에있어서의latency를크게감소

PSL PCIe Gen 3


Custom Transport for encapsulated messages
Hardware
Application
FPGA or ASIC Processor Service Layer(PSL)
•서버의app에대해견고한interface 제공
•CPU로부터complexity / content를offloading
Customizable Hardware
Application Accelerator
•특정시스템SW, 미들웨어, 사용자application 등을탑재가능
•PSL에서제공되는interface에따라작성
24 KOLON Techline
CAPI (Coherent Accelerator Processor Interface)

• CAPI를이용하여POWER8에flash memory storage를연결


• Application에서Read/Write 명령을수행시instruction path length에서97%를제거
• 1백만IOPs 수행당10 core 절감효과

Application
Read/Write
Syscall

FileSystem
Application
strategy() iodone()
20,000 Instructions Posix Async aio_read()
I/O Style API aio_write()
LVM
strategy() iodone() User Library
< 500
Disk & Adapter DD Shared
Instructions Memory Work
Queue
Pin buffers, Interrupt,
Translate, unmap,
Map DMA, unpin,Iodone
Start I/O scheduling

25 KOLON Techline
26 KOLON Techline
POWER8
Memory

27 KOLON Techline
POWER8 Memory Buffer Chip

POWER8 Memory Cards


DRAM Memory
 Capacity: 16 GB / 32 GB / 64 GB
Chips Buffer
 1600 MHz
 Memory Sparing - RAS improvement
 8 Cards per socket (Scale-Out Systems)

DDR Interfaces

Intelligence Moved into Memory


• Scheduling logic, caching structures
• Energy Mgmt, RAS decision point
– Formerly on Processor 16MB
– Moved to Memory Buffer Scheduler & POWER8
Memory
Management Link
Processor Interface Cache
• 9.6 GB/s high speed interface
• More robust RAS
•“ On-the-fly” lane isolation/repair
Performance Value
• End-to-end fastpath and data retry (latency)
• Cache  latency/bandwidth, partial updates
• Cache  write scheduling, prefetch, energy

28 KOLON Techline
POWER8 Memory Organization (Max Config shown)

최대 32개의 DDR ports


DRAM Memory 최대 410GB/s
Chips Buffer 8개의 고속 메모리 채널
채널당 8GB/s 의 대역폭
최대 192GB/s의 메모리 대역폭
128 GB 16MB 16MB 128 GB

POWER8 DCM
128 GB 16MB 16MB 128 GB

128 GB 16MB 16MB 128 GB

128 GB 16MB 16MB 128 GB

 Up to 1 TB / Socket
 First P8 Systems:
512 GB /Socket

29 KOLON Techline
Active Memory Expansion

Like POWER7, provides POWER8 advantage


Expand memory beyond physical limits
More effective server consolidation
 Run more application workload / users per partition
 Run more partitions and more workload per server
60-day trial like Power 7xx
AIX only

#4793 Power Active Memory Expansion Enablement 1 8820

30 KOLON Techline
Memory Performance/Configuration Insights

Can Mix different size DIMMs


– Can not mix sizes within a pair
– Can mix different size pairs on a server

Always plug in pairs, except for one DIMM possible on 1-socket servers
– 2-socket servers always have a minimum of two DIMMs (one pair min)
• Above true even if only 1 socket populated
• STRONGLY urge for performance, at least one DIMM pair per DCM
• Having two DIMM pairs per DCM is a very good thing (gives 50% of bandwidth)
– 1-socket server can have a single DIMM for entry price reasons
• When add any add’l memory, resulting configuration result in valid pairs
• STRONGLY urge for performance, at least one DIMM pair per DCM
• Having two DIMM pairs per DCM is a very good thing (gives 50% of bandwidth)

Performance testing not done yet with servers with less-than-max memory configurations to understand detailed trade off
considerations. Testing not planned prior to announce. ????to GA????
31 KOLON Techline
32 KOLON Techline
POWER8
IO (PCI)

33 KOLON Techline
POWER8 Integrated PCI Gen 3

Native PCI – PCIe Gen3 인터페이스를 프로세서에 직접탑재, 추가적인 경유로직을 제거하여I/O 성능향상시킴
POWER8

Native PCIe Gen 3 지원


POWER7 프로세서에PCIe Gen 3 인터페이스를직접구현
기존GX 및I/O Bridge 기술을대체
레이턴시경감
•Gen3 x16 대역폭지원(32 GB/s)

CAPI Protocol 의 전송 레이어로 활용


외부가속디바이스가PCIe Gen 3 레인을통해서프로세서와직접
연결
GX PCIe Gen 3 레인상에서프로토콜사용
Bus

I/O
Bridge

PCIe G2

PCI
Devices

34 KOLON Techline
PCIe Gen3

Gen1 x8 Gen2 x8 Gen3 x8


2.5 GHz

Though these cards physically look the same … and fit in the same slots
Gen3 cards/slots have up to 2X more bandwidth than Gen2 cards/slots
Gen3 cards/slots have up to 4X more bandwidth than Gen1 cards/slots

– More virtualization
– More consolidation saving PCI slots and I/O drawers
– More ports per adapter

18
16
14
Peak A Gen1 x8 PCIe adapter has a theoretical max (peak) bandwidth of 4 GB/sec.
12 A Gen2 x8 adapter has a peak bandwidth of 8 GB/sec. A Gen3 x8 adapter has a
Sustained
10 peak bandwidth of 16 GB/sec.
8
6
4
2
0

Gen1 Gen2 Gen3


35 KOLON Techline
PCIe x8 and x16

POWER8 servers have x8 AND x16 PCIe slots

Compared to POWER7+ PCIe Gen2 x8 slot,


a POWER8 PCIe Gen3 x16 slot has a peak bandwidth of 4X
(2X going Gen2 to Gen3 plus 2X going x8 to x16)

x1 x4 x8 x16

x8 x16

x16 slot/card has more connections than a x8 slot/card


“x16” or “x8” refers to the number of lanes. More lanes = more physical connections = more bandwidth
A x8 card can be placed in a x16 slot, but only uses half the connections

36 KOLON Techline
PCIe x16 and x8 Slot 사용의고려사항

• PCIe x16이고려되어야하는어댑터의종류
–CAPI cards: PCIe x16
–2-port 40Gb Ethernet 과IB cards: PCIe x16에장착시더나은성능제공
–아래의adapter는PCIe x16 slot에서만지원됨
#5901/#5278(LP)/#EL10(LP) PCIe Dual-x4 SAS Adapter
#5287(LP)/#5288 PCIe2 2-port 10GbE SR Adapter

• 대부분의Card는어떤Slot을사용해도문제없음
–모든low profile slots = 2U box
–모든full-high slots = 4U box
–모든Slot은PCIe Gen3 지원

37 KOLON Techline
PCIe Slots - High Level

4U 2U
1S 4U 2S 4U 1S 2U 2S 2U

Total PCIe slots 7 11 6 9


(all hot swap)
Required* LAN adapter (available
for client use) 1 1 1 1
PCIe slots after required* LAN
adapter 6 10 5 8
However if use high performance,
expanded function backplane -1 -1 -1 -1

PCIe slots after required* LAN and


if using high performance 5 9 4 7
backplane

• PCIe slots are all Gen3 slots


• 2U are all low profile and 4U are all full high
• There is no PCI expansion drawer announced. There is an SOD.

38 KOLON Techline
PCIe Slots - More Detail -- x8 and x16

4U 2U
S822L
S814 S824 S824
S822 S822
8247-22L
8286-41A 8286-42A 8286-42A
8284-22A 8284-22A
Only 1S in 2S Only 1S in 2S
1S box 2S box 2S

Total PCIe slots 7 7 11 6 9


x16 2 x16 2 x16 4 x16 2 x16 4 x16
x8 5 x8 5 x8 7 x8 4 x8 5 x8
Required LAN adapter (available for
client use) 1 x8 1 x8 1 x8 1 x8 1 x8
6 6 10 5 8
PCIe slots after required LAN
adapter 2 x16 2 x16 4 x16 2 x16 4 x16
4 x8 4 x8 6 x8 3 x8 4 x8

However if use high performance,


expanded function backplane -1 x8 -1 x8 -1 x8 -1 x8 -1 x8

5 5 9 4 7
PCIe slots after required LAN and if
using high performance backplane 2 x16 2 x16 4 x16 2 x16 4 x16
3 x8 3 x8 5 x8 2 x8 3 x8

• PCIe slots are all Gen3 slots (Higher MHz used than Gen2 = 2x theoretical bandwidth)
• Some slots are x16 and some are x8. (x16 have 2x theoretical bandwidth)

39 KOLON Techline
40 KOLON Techline

You might also like