You are on page 1of 36

Optimizing

OpenSolaris* for
Xeon
May, 2008

For Sun Tech Days


Agenda

– Intel Server Advances


– Intel and Sun collaboration
– Key Development Areas
– Summary/Call to Action

Intel is a trademark of Intel Corporation in the U.S. and other


countries.

* Other names and brands may be claimed as the property of others.


Executive Summary
• One year Anniversary of Intel/Sun collaboration agreement
• Engineering teams show excellent collaboration
• Collaboration intensifies in 2008, more projects in flight
• Both companies are very upbeat about collaboration on SW, meeting
all the goals
• Deep and long term engineering engagement and relationship

– Solaris + Intel Architecture + 1 year = New Opportunities for


our developers and customers in 2008
• Strong Intel roadmap
• Best in class mission critical OS positioned to take advantage of new
Intel server technologies
• Solaris Openness, Indiana
• IBM, Dell to OEM Solaris
• Choice of Virtualization environments
• Expansion of Sun SW portfolio
Intel Server Advances
Intel’s Sustained Architecture Leadership
Stable roadmap for continued software innovation
Shrink/Derivative
2 YEARS

Presler · Yonah · Dempsey


65nm
New Microarchitecture
Intel® Core™ Microarchitecture

Shrink/Derivative
2 YEARS

Penryn Family
45nm
New Microarchitecture
Nehalem

Shrink/Derivative
2 YEARS

Westmere
32nm “Tick Tock”
New Microarchitecture (Shrink) (Innovate)
Sandy Bridge
See “Intel Architecture and Silicon Cadence”. Whitepaper
http://download.intel.com/technology/eep/cadence-paper.pdf

Source: Intel. All future products, computer systems, dates, and figures specified are
preliminary based on current expectations, and are subject to change without notice.
Intel® Quad-Core - A Superior Design

Dual-die vs Monolithic: Intel Core™ uArch:


Faster to design:6-9 mos Leading Perf and Perf/W
Lower Cost 64-bit
• Smaller die size Intel Virtualization Tech.
• Better yield (~20%)
• Lower mfg cost (~12%) Core 0 Core 1 Core 2 Core 3
Better supply
Extends to 45nm
32KB 32KB 32KB 32KB 32KB 32KB 32KB 32KB
L1 I L1 D L1 I L1 D L1 I L1 D L1 I L1 D
Cache Cache Cache Cache Cache Cache Cache Cache
Large L2 cache:
4 MB Shared 4 MB Shared 2X competitors size
Lower latency (vs L3)
L2 Cache L2 Cache
Fewer cache misses
More efficient inclusive design
Front Side Bus Front Side Bus Reduces bus traffic
Interface Interface
Socket
compatible: Front-side Bus: up to
From dual-core 1333MHz
through to 45nm Enables uniform access to
quad-core shared memory

Leading performance, low cost and extensible


Legal Disclaimers
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate
performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration
may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or
components they are considering purchasing. For more information on performance tests and on the performance of Intel
products, visit http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800-628-8686 or 1-916-356-3104.

All dates and products specified are for planning purposes only and are subject to change without notice

Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual
benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and
assigning them a relative performance number that correlates with the performance improvements reported.

SPEC, SPECint2000, SPECfp2000, SPECint2006, SPECfp2006, SPECjbb, SPECWeb are trademarks of the Standard Performance
Evaluation Corporation. See http://www.spec.org for more information.

Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor
(VMM) and, for some uses, certain platform software enabled for it. Functionality, performance or other benefits will vary
depending on hardware and software configurations and may require a BIOS update. Software applications may not be
compatible with all operating systems. Please check with your application vendor.

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor
series, not across different processor sequences. See http://www.intel.com/products/processor_number for details.

Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear
facility applications. All dates and products specified are for planning purposes only and are subject to change without notice

* Other names and brands may be claimed as the property of others.

Copyright © 2007-2007 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or
registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Quad-Core Intel® Xeon® Processor 5400 series based platforms 312%
Top500
Performance Comparison of 5400 Series versus AMD Opteron* Linpack
250% Relative Performance. Higher is 312%
better
Quad-Core Intel Xeon 5400 Series 21 6%
225% Quad-Core AMD Opteron 1.9 GHz
Quad-Core AMD Opteron 2.0 GHz Java
200% Quad-Core AMD Opteron 2.3 GHz
165
Quad-Core AMD Opteron 2.5 GHz
16%
5%
175%
13 8% 14 1%
150%
TPC- 12 5% 12 6% 12 6%

125% C 10 7%
112 %

SPECf 96%
96 %
88 % Integer
100% p Rate 57% (QC) 94 %
-7% 78 %
75% (QC)
51 %
57 % 70 % 69 %
59 % 63 %
57 %
50% 29 % 42 %

25%
16 %
Best available Dual-Core AMD Opteron* results used as baseline.
0%

SAP-SD* 2-TierΦ
SPECOMPM*2001€

Abaqus Explicit 6.6-1β


TPC-C*Φ

Cinebench*Φ

Linpack*Φ
SPECfp*_rate2006€

BlackScholes*Φ
SPECint*_rate2006€

SPECjbb*2005€
SPECWeb*2005€

Fluent 6.3 (9 Workloads


3dsmax*Φ
SPECfp*_rate_base2006€

SPECint*_rate_base2006€

bmk)β
Quad-Core Intel Xeon’s sustained leadership continues
Data Source: Published, measured, submitted or approved results as of April 7, 2008. See backup for details;
€ Dual-Core AMD Opteron* Model 2222SE (3.0GHz) Φ Dual-Core AMD Opteron* Model 2220SE (2.80 GHz); β Dual-Core AMD Opteron* Model 2218 (2.60 GHz);
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or
software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on
performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/limits.htm Copyright © 2008, Intel Corporation. * Other names and brands may be claimed as the property of others. 8
Quad-Core Intel® Xeon® Processor 7300 Series based Servers
Comparison to AMD Opteron* MP on Performance and Energy-Efficiency (Perf/Watt)

15 0% Data source: Published/Measured/Submitted Java Java


results as of Sept 12, 2007. See backup for details
147 131
% %
12 5%
Integer
10 0%
92%
SAP-
(DC) FSI
SD
73%
75 % TPC- 78%
C Integer
55% 33%
50 %
(QC)

25 % Quad-Core AMD Opteron* 2.0GHz results


Performance Comparison using Xeon 7350
Baseline: Best published Dual-Core AMD Opteron* results
Perf/Watt Comparison using Xeon 7340
0%

5#
*$

6#

^
r#

6^
5#

06

05
s*

s*
-C

ie

00
00
00

00
le

le
20

20
-T
C

*2
e2
*2

e2
ho

ho
TP

*2

te

b*
b
eb

as

s
Sc

Sc
ra

jb

jb
SD

ba
W

_b

C
*_

EC
ck

ck

e_
E
EC

P-

t
te

in
la

la
SP

SP
at
SA

ra
SP

C
B

_r
*_

t*
SP
t
in

in
EC

EC
SP

SP
Xeon 7350 – Quad-Core Intel® Xeon® Processor X7350; Xeon 7340 – Quad-Core Intel® Xeon® Processor E7340 ; # Dual-Core AMD Opteron* Model 8222SE (3.0
GHz); $ Dual-Core AMD Opteron* Model 8220SE(2.80 GHz); ^ Dual-Core AMD Opteron* Model 8220(2.80 GHz, 95 Watt TDP);

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any
difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or
components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/limits.htm
or call (U.S.) 1-800-628-8686 or 1-916-356-3104. Copyright © 2007, Intel Corporation. * Other names and brands may be claimed as the property of others.
Sun and Intel Collaboration
Sun-Intel Solaris Collaboration is Significant

• Broad multi-year strategic


alliance
• Sun roadmap commitment
– 1P, 2P, 4P, >4P
• Telco, WS, Enterprise
• Intel endorsement of Solaris* as
® ®
a mainstream OS for Intel Xeon
processors
• Joint investment in engineering,
design, and marketing alliance
for Solaris (and Java*)

Get the best software and hardware for mission-critical applications


Solaris on Xeon Experience
• What worked before (Solaris 10): • What we’re improving:
– Over 800 x86 systems supported – Power-on, errata, memory/string
in Hardware Compatibility List operation speedups for Penryn,
http://www.sun.com/bigadmin/hcl Nehalem
– 146 Intel-based servers supported – Microcode update for
by Solaris 10 (vs. 86 AMD servers, serviceability
61 SPARC servers)
– Drivers for Intel wireless,
– Performance optimizations to win graphics, ICH storage,
World Record SPECint using Sun manageability
Studio 12
– Xen / VT roadmap, IOAT
– Majority of existing certified x64
– Power optimizations (Powertop,
Solaris applications already run on
improved P-States, C-States,
Intel Arch.
NPTM, etc)
– Fully supported by Sun on Intel
– Xeon enhancements for Fault
servers, workstations, mobile, and
Management
desktop from multiple OEMs
Solaris Development Model
Sun

Community

Intel- Open Selective


Nevada Solaris Intel-based
Platform Solaris
Back ports
10 HW
project

Intel
OpenSolaris
Updates
(6m beat rate)
Hidden (6m beat rate)
Hidden
projects
Hidden
projects
projects Sun
Firewall
Key Development Areas
Areas of Development
• Performance Enhancement
• CPU Performance tools
• Compiler Vectorization and Tools
• Power Management
• Driver Support
• I/O Acceleration Technology
• Virtualization Technology
• Predictive Self-Healing

Join us: http://opensolaris.org

Joint work touches most significant areas of OS


Intel Core Silicon Enabling

• microcode update – increased serviceability


• iommu – increased DMA capability and security
• extended xAPIC – extend processor addressability for
interrupt delivery (up to 4G-1 cores)
• ICHx – LAN, AHCI, managibility (AMT)
• CPUID – lead to 4.5% SPECint performance on C2D
• MONITOR/MWAIT – replaced halt leads to 1.2x in certain
microbenchmarks
Join us at http://opensolaris.org/os/project/intel-platform/
Performance Enhancements
• Goal: Use Intel current and future
technologies to improve Solaris performance
– Libc optimizations
• memcpy(), memmove(), and memset()
– Optimized to use SSE2 and/or SSSE3
instructions
– Significant performance improvements as
measured by libMicro
– Available soon in OpenSolaris
• Str(n)cpy(), str(n)cmp() and strlen()
– Optimized to use SSE2 and/or SSSE3
instructions
– Significant performance improvements as
measured by libMicro
– Available soon in OpenSolaris

– Kernel optimizations in progress


– bzero(), bcopy(), kcopy(), etc.
Power Management
• P-states - Active Power Management
– Performance states. Different P-states are at different frequency and voltage. You
actually save energy.

• C-states – Idle Power Management


– C0 - you execute code, no other mode executes code
– C1 - HALT instruction, no instructions get executed
– C2 - like C1, no code executed. Clock stopped
– C3 - FSB shut down. No snooping and caches can be shut off

• T-state
– Emergency brake

• Parts of ACPI:
– Static Tables that the BIOS creates
– Captures the platform power capabilities (how many P/C-states, power, switch latency,
etc.)

Stay as long as you can in deeper C-states


Power Management Development Areas
• PowerTOP available for Solaris
– To show what wakes up your system from saving power model
– Uses DTrace P-State
C-State residency
residency

ACPI info

Top causes
for wakeup

Download at http://www.opensolaris.org/os/project/tesla/work/powertop
Power Management Development Areas
• Lots of kernel
improvement areas
• Tickless kernel
• Power-friendly scheduling
• P-State improvement
• C-state support
• HPET timer
• Interrupt binding

Join us at http://opensolaris.org/os/project/tesla/
OpenSolaris vs “best in class” power use
CPU

GMH

ICH

Memory

PCI x16 slot

LAN

OpenSolaris Backlight
SNV b87
PS2

Serial I/O

CLK

SATA
“Best in Class” OS
USB

We have more work to do for Solaris to be best-in-class


IO Acceleration Technologies
Intel® I/O Acceleration Technology
Intel® Intel® 82575 Gigabit
Supported Features GbE Controller Controller, Intel® Next gen Gigabit Controller, Next
(Gilgal, 82598 10GbE gen 10GbE Controller
Ophir) Controller

Intel® QuickData Technology   


LAN stateless Header/data split   
offloads
Receive Side Scaling   
TX/RX checksum
  
offload
TCP segmentation   
Header-splitting /  
replication
Receive Side  
Coalescing (Intel® 82598 10GbE
Controller) (Next gen 10GbE Controller)

Message Signaled Interrupts MSI MSI-X MSI-X


Direct Cache Access  
Low Latency Interrupt  

IOAT v1 and v2 in progress


Today’s Virtualization Usage Models
Static Server Mainframe Multi-OS High Availability/ Dynamic Load
Consolidation Migration Workstation Disaster Balancing
Recovery
App App App
App App
OS OS OS
App OS OS
VMM VMM VMM 4
HW HW HW OS VMM
HW
App App App App App App App
App App App
OS OS OS OS OS OS OS
OS OS OS
VMM VMM VMM VMM
HW HW HW HW

End User Reduce CapEx, OS and HW Workstation Maintain high Reduce OpEx,
Value increase utilization freedom for Consolidation levels of business streamline
mission critical without continuity resource
applications compromise on utilization
graphics balancing real-
performance time computing
demands with
capacity

Time
Without hardware support
VM1 VMn

• What the VMM Does …


App App

• Emulates a complete hardware


OS OS
environment for every Virtual
Machine
Virtual Machine Monitor

Shared Physical Hardware


• Allocates platform resources
Memory
Processors Graphics
• Isolates execution in each
Network Storage KY/MS
virtual machine

Virtualization solutions without hardware support work, but


there are limitations and require frequent software intervention
Sun xVM and Innotek VirtualBox
Complete Virtualization and Management: Desktop to Datacenter
Unlocking Virtualization on Xeon

 Intel® Virtualization
Technology
 Interoperability
• Intel®
Virtualization  Performance • Manageability at
Technology optimizations scale
• Interoperability  Manageability at scale • Availability
 Availability
• Performance • Security and
 Security and compliance
optimizations
compliance
Intel® Virtualization Technology Evolution
•Assists for IO sharing:
• PCI IOV compliant devs
Vector 3: • VMDq: Multi-context IO
• End-pointVT-c
DMA translation
IO Device Focus caching
• IO virtualization assists

Core support for IO Interrupt filtering &


robustness & remapping
Vector 2:
performance via VT-d extensions to
Chipset Focus DMA VT-d trackVT-d2
PCI-SIG IOV
remapping

Close basic Richer/faster: Intel Perf improvements


Vector 1: processor VT FlexPriority, for interrupt
Processor Focus “virtualization FlexMigration intensive env, faster
VT-x
holes” in Intel® 64
VT-x2
EPT, VPID, ECRR, VM boot
VT-x3
& Itanium CPUs APIC-V

Software-only VMMs Simpler and more Better IO/CPU perf Richer IO-device
VMM Binary translation secure VMM through and functionality via functionality and IO
Software Paravirtualization use of hardware VT hardware-mediated resource sharing
Evolution Device emulations support access to memory

Past 2005 2010


VMM software evolution over time, with hardware support
We are adding vt-d, vt-d2, vt-x, and vt-x2 into Solaris xVM
xVM Server Enabling in Solaris

Today Tomorrow

xVM Server V1.0 will support VT-x, Future version


extended page tables, supports VT-d and VT-
VTPR, WBINVD for d2 device assignment
better performance, and interrupt
reliability remapping for higher
performance
xVM VirtualBox VT-x, good Blazing fast on Intel
performance Architecture
xVM Ops Center Intel Architecture Device assignment
support

Join us at http://opensolaris.org/os/community/xen/
Fault Management Architecture

• Error – an incor rect signal, datum, result • Fault – a defect that may pr oduce error s
• O bservation that is a symptom of a fault • The outpu t of the diagnosis of error s
• O ld systems only know how to repo rt • Something we can associate wi th an impact
error s and a corrective action
• Diagnosis left to humans • Diagnosis softwar e automa tes the steps
FMA and Intel® Xeon® processors
• Fault Management Architecture
in Solaris saves millions in
Location
service costs of failed
DIMMs
• Intel platform support – Bensley Intel LAN
Zoar

and Caneland platforms platform


x4

FMA

PCI-e x8 in x16

LP IPMI
PCI-33

PCI-e x16
PCI-X-100 ZCR

PCI-X-133/100

PCI-X-133/100
• Error injection: ensures that FMA model

PWR
code paths work correctly North
Bridge

PXH-V
• Reporting of physical location of x8 DDR2 FBD
16GB

failed DIMMs SCSI


ESB2 CPU2 CPU1

SCSI
• Future processors – new RAS SATA x6

PWR
Error
FLPY
SCSI IDE-M VRM

features in Nehalem 4+4


IDE-S

injection

RAS support is great for 2 and 4 socket servers


Developer Tools
• Sun Studio 12 Compiler (released June 2007) with
Xeon-specific optimizations
• Sun Studio Performance Analyzer: latest Intel
Architecture performance counters

• Threading Building Blocks for Solaris –


– threadingbuildingblocks.org

• Transitive QuickTransit -
® ®

– Run Solaris/SPARC binaries on Solaris/Xeon


Sun Studio Compiler Optimization Flags
• Aggressive – For large projects
– -fast -xtarget=woodcrest -m64 -xvector=simd,lib -xipo -xprofile=collect/use -Wu,-
sched_first_pass=1
– -xtarget=woodcrest expands to “-xarch=ssse3 -xchip=core2 -xcache=32
/64/8:4096/64/16”
– SSE3 code generation, core2 architecture optimization and cache configuration
selections, 05 level optimization, and inter-procedural optimization
– Enable instruction scheduler for FP calculation on IA
– Profile guided optimization

• Medium – For most applications


– -fast -xtarget=woodcrest -m64 -Wu,-sched_first_pass=1
– All aggressive optimization but no IPO and profile guidance

• Low – For extra precise floating point calculations


– -O -xtarget=woodcrest -m64
– O3 (medium) optimization level
– Quickest compilation

Use Sun Studio to optimize your application on Solaris/IA


Solaris system-level tuning
• Tuning is critical for best performance
– Solaris is designed for safe handling of heavy, mixed workloads
“out-of-the-box”; tune for optimal handling of specific workload
characteristics
• Processor binding/scheduling
– Monitor application for threads that dominate CPU
– Tie these to CPUs in dedicated processor set to guarantee
resource without contention
– Shield application CPUs from interrupts
– Use Fixed-Priority scheduling class for critical processes
• Network stack tuning
– Update driver: tuning as new NICs appear
– Solaris buffers – size to avoid retransmissions without consuming
too much memory
• Look at applications that communicate with the app
– Analyse with Dtrace
– Infiniband for lowest-latency interconnection
– Running on the same box using containers for ultimate low-latency
Desktop/Mobile Driver Support - Wireless Driver
• www.opensourcewireless.org
• Focus on 4965 and future Wifi planned
• Downloadable uCode and dual licensed
header files
• Phase 1 – completed
– 802.11 A/B/G
– Infrastructure mode
– power/temperature calibration (FCC regulatory)
– Rx sensitivity calibration
– WEP

• Phase 2 -- expected completion Jun


– 802.11 A/N
– WPA

4965 wireless driver is working and improvement on the way


Device Drivers Support - Others

• Graphics
– All Intel graphic silicon is supported
• AMT
– HECI driver and LMS service are available for AMT3.0
– AMT 4/5 are under planning.
• NICs
• Others
– Audio codec, USB, etc.

Intel platform laptop/desktop is supported.


Summary/Call to Action

• Intel platform and Solaris bring the best technology to end


user
• Intel and Sun teams at full strength through the community
• Result is significant in various kernel areas
– Performance, drivers, FMA, virtualization, etc.

• Call to action
– Run OpenSolaris/Solaris on latest Intel server platforms
– Joint development with us at OpenSolaris projects

You might also like