You are on page 1of 59

DCB and FCoE Deep dive

Jaromr Pila (jpilar@cisco.com) Consulting Systems Engineer, CCIE 2910

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

What Is I/O Consolidation


IT organizations operate multiple parallel networks
IP and other LAN protocols over an Ethernet network SAN over a Fibre Channel network HPC/IPC over an InfiniBand network

I/O consolidation supports all three types of traffic onto a single network

Servers have a common interface adapter that supports all three types of traffic

IPC: Inter-Process Communication


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Consolidation - one of major trends in datacenter


But where is the main consolidation potential ? Majority of ports in fabric is in access layer regardless of fabric type => access layer has the highest potential for consolidation

Different fabrics (network, SAN, HPC) have different requirements => do we have the technology which can serve them all at once? If we have it => is the technology mature enough and affordable to be massively deployed?

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

I/O Consolidation in the Network

Processor Memory

Processor Memory

I/O
Storage

I/O

I/O
LAN

I/O Subsystem
Storage LAN

IPC

IPC: Inter-Process Communication


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

IPC

I/O Consolidation in the Host


Fewer CNAs (Converged Network Adapters) instead of NICs, HBAs, and HCAs Limited number of interfaces for Blade Servers
FC HBA FC HBA NIC NIC NIC HCA HCA
2006 Cisco Systems, Inc. All rights reserved.

FC Traffic FC Traffic

Enet Traffic Enet Traffic Enet Traffic

CNA CNA

All Traffic Goes over 10 GE

IPC Traffic IPC Traffic


Cisco Confidential

Cabling and I/O Consolidation

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

Merging the Requirements


LAN/IP
Must be Ethernet
Too much investment Too many applications that assume Ethernet

Storage
Must follow the Fibre Channel model

(Inter-Process Communication)

IPC

Losing frames is not an option

Doesnt care of the underlying network, provided that:


It is cheap It is low latency

It supports APIs like OFED, RDS, MPI, sockets

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

Why Consolidation Attempts Have Not Succeeded Yet?


Previous attempts
Fibre Channel InfiniBand iSCSI Never credible as data network infrastructure Not Ethernet Not Fibre Channel

Before PCI-Express there was not enough I/O bandwidth in the servers It needs to be Ethernet, but
2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

1 GE didnt have enough bandwidth


8

Drivers for 10GE to the Servers


Multicore CPU Architectures Allowing Bigger and Multiple Workloads on the Same Machine Server Virtualization Driving the Need for More Bandwidth per Server Due to Server Consolidation Growing Need for Network Storage Driving the Demand for Higher Network Bandwidth to the Server Multicore CPUs and Server Virtualization Driving the Demand for Higher Bandwidth Network Connections
2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Enabling Technologies

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

10

Three Challenges + One

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

11

Why Are Frames Lost?


Collision
No longer present in full duplex Ethernet

Transmission Error
Very rare in the data center

Congestion
Most common cause

Congestion is a switch issue, not a link issue

It must be dealt with in the bridge/switch


By IEEE 802.1

A full duplex IEEE 802.3 link does not lose frames

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

12

Can Ethernet Be Lossless?


Yes, with Ethernet PAUSE Frame
Ethernet Link

STOP
Switch A

PAUSE

Queue Full
Switch B

Defined in IEEE 802.3Annex 31B

Ethernet PAUSE transforms Ethernet into a lossless fabric


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

The PAUSE operation is used to inhibit transmission of data frames for a specified period of time

13

How PAUSE Works


Threshold

Start Sending Stop Frames for This Frames Again Interval of Time
2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

PAUSE Frame

14

Lets Compare PAUSE with FC Buffer to Buffer Credit


Eight credits preagreed

A
R_RDY

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

15

PAUSE Frame Format


PAUSE Frame
01:80:C2:00:00:01 Source Station MAC EtherType = 0x8808 Opcode = 0x0001 Pause_Time

A standard Ethernet frame, not tagged

EtherType = 0x8808 means MAC Control Frame Pause_Time is the time the link needs to remain paused in Pause Quanta (512-bits time)

Opcode = 0x0101 means PAUSE

Pad 42 Bytes

CRC

There is a single Pause_Time for the whole link

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

16

Why Is PAUSE Not Widely Deployed?


Inconsistent implementations
Easy to fix Standard allows for asymmetric implementations

PAUSE applies to the whole links

This may cause traffic interference

Single mechanism for all traffic classes

e.g., Storage traffic paused due to a congestion on IP traffic

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

17

Priority Flow Control (PFC)


IEEE 802.1Q defines eight priorities No traffic interference Or, vice versa Traffic classes are mapped to different priorities:

a.k.a. PPP (Per Priority Pause) PFC enables PAUSE functionality per Ethernet priority

IP traffic may be paused while storage traffic is being forwarded

High level of industry support


Cisco distributed proposal

Requires independent resources per priority (buffers)

Standard track in IEEE 802.1Qbb

16
EtherType = IEEE 802.1Q

IEEE 802.1Q Tag

Priority CFI

12 Bits
VLAN ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

18

Priority Flow Control in Action


Transmit Queues
One

Ethernet Link

Receive Queues
One Two Three Four Five

Two

Three Four Five Six


STOP PAUSE

Eight Priorities

Seven Eight

Six

Seven Eight

Switch A
2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Switch B
19

PFC Frame Format


Priority Flow Control
01:80:C2:00:00:01 Source Station MAC EtherType = 0x8808
Opcode = 0x0101 Class Enable Vector Time (Class 0) Time (Class 1) Time (Class 2) Time (Class 3) Time (Class 4) Time (Class 5) Time (Class 6) Time (class 7)

Similar to the PAUSE frame

Opcode = 0x0101 is used to distinguish PFC from PAUSE

Class vector indicates for which priorities the frame carries valid Pause information There are eight Time fields, one per priority

Pad 28 Bytes

CRC

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

20

Is Anything Else Required?


In Order to Build a Deployable I/O Consolidation Solution, the Following Additional Components Are Required:

Discovery protocol (DCBX) Bandwidth manager

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

21

DCBX
Hop-by-hop negotiation for:
Priority Flow Control (PFC) Bandwidth management Applications Logical link-down

Based on LLDP (Link Level Discovery Protocol) Allows either full configuration or configuration checking
Link partners can choose supported features and willingness to accept configuration from peer Added reliable transport

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

22

Bandwidth Management
IEEE 802.1Q defines priorities, but not a simple, effective, and consistent scheduling mechanism

Products typically implement some form of Deficit Weighted Round Robin (DWRR) Proposal for HW-efficient, two-level DWRR with strict priority support Standard track in IEEE 802.1Qaz
Configuration and interworking is problematic

Consistent behavior and configuration across network elements

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

23

Priority Groups
LAN Priorities Are Assigned to Individual Traffic Classes

Priority Groups Are Then Scheduled

SAN

IPC Priority Groups First Level of Scheduling Inside Each Group


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Final Link Behavior

24

Example of Link Bandwidth Allocation


Offered Traffic
3 Gbs 3 Gbs 2 Gbs

10 GE Link Realized Traffic Utilization (30%) (30%) HPC Traffic (30%) LAN Traffic (40%) Storage Traffic (30%) T2 (20%)

3 Gbs

4 Gbs

6 Gbs

(50%)

3 Gbs

3 Gbs

3 Gbs

(30%) T1

(30%) T3

T1

T2

T3

HPC TrafficPriority Class High20% Guaranteed Bandwidth

LAN TrafficPriority Class Medium50% Guaranteed Bandwidth


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Storage TrafficPriority Class Medium-High30% Default Bandwidth


25

FCoE: Fibre Channel over Ethernet

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

26

What is Fibre Channel over Ethernet?


From a Fibre Channel standpoint its From an Ethernet standpoints its
FC connectivity over a new type of cable called an Ethernet cloud Yet another ULP (Upper Layer Protocol) to be transported, but a challenging one!

And technically FCoE is an extension of Fibre Channel onto a Lossless Ethernet fabric
2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

27

FCoE Enablers and Encapsulation


10Gbps Ethernet Lossless Ethernet
Matches the lossless behavior guaranteed in FC by B2B credits Max FC frame payload = 2112 bytes Total max frame size = 2180 bytes
Normal ethernet frame, ethertype = FCoE Same as a physical FC frame FC Payload
CRC EOF FCS

Ethernet jumbo frames

Ethernet Header

FCoE Header

FC Header

Control information: version, ordered sets (SOF, EOF)


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

28

FCoE Is Fibre Channel


Easy to Understand Same Operational Model Same Techniques of Traffic Management Same Management and Security Models

FCoE Is Fibre Channel at the Host and Switch Level


Completely Based on the FC Model Same Host-to-Switch and Switch-to-Switch Behavior of FC e.g., in Order Delivery or FSPF Load Balancing

WWNs, FC-IDs, Hard/Soft Zoning, DNS, RSCN

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

29

Protocol Organization
FCoE itself FIP (FCoE Initialization Protocol)
It is used to discover the FC entities connected to an Ethernet cloud

Is the data plane protocol

It is used to carry most of the FC frames and all the SCSI traffic Uses Fabric Assigned MAC address (dynamic)

It is the control plane protocol

It is also used to login to and logout from the FC fabric

Uses unique BIA on CNA for MAC

The two protocols have:

Two different Ethertypes Two different frame formats Both are defined in FC-BB-5

http://www.cisco.biz/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-560403.html 30 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Status of the Standards


Technically stable in October, 2008 FC-BB-5 Completed in June 2009 Published in May, 2010

All Standards for FCoE Are Technically Stable


Inv Inv Inv Inv Dev Dev Dev Dev Appr Appr Appr Appr Pub Pub Pub Pub

PFC

Completed in July 2010, awaiting publication

ETS

Completed in July 2010 (completing Approval Phase 3)

DCBX

Completed in July 2010 (completing Approval Phase 3)

DCB
2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Technically Stable
31

Myths and Misunderstandings

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

32

Myth: You Cant Do End-to-End FCoE.


The FC-BB-5 standard fully supports end-to-end FCoE

If someone says it does not, it means he did not read the standard

However, current implementations may be behind the standard and do not fully support it yet

StorageNewsletter.com., Exclusive Interview with Darren Thomas, Head of Dell Storage. June 29, 2010

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

33

Myth: "FC-BB-6 Means the FCoE Standard Isn't Done Yet."


Standards are like operating systems - they add features to previous versions Different versions (e.g., FC-BB-4, FC-BB-5, FC-BB-6) have different features

FC-BB-5 fully defined the way to transport Fibre Channel over Ethernet
FC-BB-6 is working on adding features and functionality

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

34

Myth: You Need QCN (802.1Qau) for End-To-End FCoE.


QCN Operates At a Different Level Than FCoE
DA: H1 SA: H3 DA: H2 SA: H3

QCN is a core-to-edge protocol to deal with persistent congestion situations in a Layer 2 network

H1

QCN message H3 Traffic


Congestion

H2

QCN message
DA: H3 SA: H1 DA: H3 SA: H2

When congestion is detected the core switch samples some frames, swaps their MAC addresses, and sends notifications backward
35

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

FCoE Crosses Layer 2 Domains


Therefore, QCN is Useless for FCoE
H1 FCF A H2 FCF B FCF C T2 T1

DA: FCF-MAC(A) SA: FPMA(H2)

DA: FCF-MAC(B) SA: FCF-MAC(A)

DA: FCF-MAC(C) SA: FCF-MAC(B)

SA: FCF-MAC(C) Encaps. FC frame S_ID = FC-ID(H2) D_ID = FC-ID(T2)

DA: FPMA(T2)

Encaps. FC frame S_ID = FC-ID(H2) D_ID = FC-ID(T2)

Encaps. FC frame S_ID = FC-ID(H2) D_ID = FC-ID(T2)

Encaps. FC frame S_ID = FC-ID(H2) D_ID = FC-ID(T2)

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

36

FCoE = Layer 3 From Ethernet Perspective

What Does This Mean? Layer 2 of the Fibre Channel model maps to Layer 3 of the Ethernet model
FC-4 FC-3 FC-2V FC-4 FC-3 FC-2V FCoE happens here (including Multihop) QCN happens here FCoE Entity Layer 2 - MAC Layer 1 - PHY Layer 3 FC Levels (Unchanged)

FC-2

FC-2M FC-2P FC-1 FC-0

IEEE 802.3 Layers

FC Model

Ethernet (OSI) Model

Source: FC-BB-5 rev 2.00, June 4, 2009


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

37

Myth: "You Need TRILL To Run FCoE"


Response

TRILL defines an alternative way to Spanning Tree to forward Ethernet frames in an Ethernet network
Also supports multipathing Has nothing to do with congestion
Source: Mellor, Chris. DCB is Not Enough. The Register August 3, 2010

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

38

FCoE = Layer 3 From Ethernet Perspective

What Does This Mean? Layer 2 of the Fibre Channel model maps to Layer 3 of the Ethernet model
FC-4 FC-3 FC-2V FC-4 FC-3 FC-2V FCoE happens here (including Multihop) TRILL happens here FCoE Entity Layer 2 - MAC Layer 1 - PHY Layer 3 FC Levels (Unchanged)

FC-2

FC-2M FC-2P FC-1 FC-0

IEEE 802.3 Layers

FC Model

Ethernet (OSI) Model

Source: FC-BB-5 rev 2.00, June 4, 2009


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

39

Nexus 5000 and 2000 family products for Unified Fabric

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

40

Cisco Nexus 5010/5020

1st generation of Nexus 5xxx family

Industrys First I/O Consolidation Virtualization Fabric for Enterprise Data Center

Nexus 5000 Switch Family

Nexus 5010 - 28-Port L2 Switch


20 Ports 10GE/FCoE/DCE, fixed 1 Expansion Module

Nexus 5020 - 56-Port L2 Switch


40 Ports 10GE/FCoE/DCE, fixed 2 Expansion Modules

Expansion Modules

Fibre Channel

8 Ports 1/2/4G FC

Fibre Channel

6 Ports 2/4/8G FC 4 Ports 10GbE/FCoE/DCE 4 Ports 1/2/4G FC

FC + Ethernet

6 Ports 10GE/FCoE/DCE

Ethernet

Partners OS Mgmt
2006 Cisco Systems, Inc. All rights reserved.

2x10GE/DCE/FCoE

SW FCoE/DCE + 2x10GE

2x10GE

Cisco NX-OS Cisco Fabric Manager and Cisco Data Center Network Manager
Cisco Confidential

41

Cisco Nexus 5548

2nd generation of Nexus 5xxx family

32 Fixed SFP+ Ports Line Rate Hardware Capable of 1/10 Gigabit Ethernet Traditional Ethernet or Fibre Channel over Ethernet L3 capable (post FCS) FabricPath and TRILL capable (post FCS) 40 GE ready

Expansion Modules (GEM2) 16p SFP+ Ethernet Ports 8p Eth + 8p Native FC

Mgmt 0, Console, USB

Redundant Fan Modules


Cisco Confidential

Redundant 750W AC Power Supplies


42

2006 Cisco Systems, Inc. All rights reserved.

Nexus 2000 family extension

10GE FCOE capable Fabric Extender - Nexus 2232

32 10GE/FCoE SFP+ Downlinks

8 10GE/FCoE SFP+ Uplinks

32x 1/10GE host interfaces; 8x 10GE on network interfaces 10GE interfaces support FCoE HW supports 1G but SW support in a post-FCS release

Can mix-and-match with existing GE and next-gen GE FEX in network topologies Host port-channel support ACL classification

SPAN source/destination support

Only FIP enabled CNAs supported (Gen 2)

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

43

Unified Fabric

Initial Deployments

Direct Attached Topology


SAN A SAN A SAN B SAN B

Ethernet/LAN Ethernet/LAN

Servers and FCoE targets are directly connected to the Nexus 5000 over 10Gig FCoE Native Ethernet LAN network and Native Fibre Channel network break off at the Nexus 5000 access layer

Nexus 5000 operates as the FCF

FCoE Targets FCoE Targets

Nexus 5000 Nexus 5000 FCF FCF vPC vPC

Nexus 5000 Nexus 5000 FCF FCF

FIP enabled CNAs FIP enabled CNAs

FIP or Pre-FIP FIP or Pre-FIP enabled CNAs enabled CNAs


Cisco Confidential

Native Fibre Channel Ethernet LAN Enhanced Ethernet and FCoE 44

2006 Cisco Systems, Inc. All rights reserved.

Unified Fabric
Ethernet/LAN Ethernet/LAN

Multihop FCoE Deployment with Nexus 4000I


SAN A SAN A SAN B SAN B

Blade servers connect to Nexus 4000 over 10Gig FCoE Nexus 4000 is a FIPSnooping Bridge

FCoE Targets FCoE Targets

Nexus 4000 connects to Nexus 5000 over 10Gig FCoE

Nexus 5000 Nexus 5000 FCF FCF

Nexus 5000 Nexus 5000 FCF FCF


Nexus 4000: FIP Nexus 4000: FIP Snooping Bridge Snooping Bridge CNA mezzanine CNA mezzanine cards cards

Native Ethernet LAN network and Native Fibre Channel network break off at the Nexus 5000

Nexus 5000 operates as the FCF

Blade Chassis

Native Fibre Channel Ethernet LAN Enhanced Ethernet and FCoE 45

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

Unified Fabric
Ethernet/LAN Core Ethernet/LAN Core

Multihop FCoE Deployment with Nexus 2232PP


SAN A SAN A SAN B SAN B

Servers connect to FEX 2232 over 10Gig FCoE

Nexus 5000 Nexus 5000 FCF FCF FEX-2232 FEX-2232 vPC vPC

Nexus 5000 Nexus 5000 FCF FCF

FEX 2232 is single homed to upstream Nexus 5000

Server connections to the FEX can be Active/Standy or over a vPC

FEX 2232 can be connected with individual links or a port-channel

FEX-2232 FEX-2232

Native Fibre Channel

FIP enabled CNAs


2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Ethernet LAN

Enhanced Ethernet and FCoE 46

Unified Fabric
FC FCoE

Multihop FCoE Deployment using VE ports


With NX-OS 5.0(2)N2, VE_Ports are supported on/between the Nexus 5000 and Nexus 5500 Series Switches

VN VF

VE VE

VE VE

VE_Ports are run between switches acting as Fibre Channel Forwarders (FCFs) VE_Ports are bound to the underlying 10G infrastructure

VF VN

VE_Ports can be bound to a single 10GE port VE_Ports can be bound to a port-channel interface consisting of multiple 10GE links
47

All above switches are Nexus 5X00 Series All above switches are Nexus 5X00 Series acting as an FCF acting as an FCF
2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

FCoE Adapters

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

48

LAN
10GbE

CNA: Converged Network Adapter


HBA
HBA HBA

CNA
10GbEE

10GbEE

Link

PCIe

Ethernet Drivers

10GbE Ethernet

Link

Link

PCIe

Fibre Channel Drivers

Fibre Channel Ethernet

PCIe

Fibre Channel Ethernet

Ethernet Drivers

Fibre Channel Drivers

Operating System
2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Operating System
49

View from Operating System


Standard drivers Same management

Operating system independent interfaces

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

50

Qlogic 8042 Gen1 CNA


PCI Express Gen1 x8 Dual port 10GbE
Optical SR Passive copper

Multi-chip solution Power = 27W

Full height, full length QLogic 4Gb FC controller and drivers Intel Ethernet controller and drivers Windows, Linux, & Vmware (ESX 3.5U4 & 4.0) support
51

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

Qlogic 8100 Series Gen 2 CNA


PCI Express x8 slot Single and dual port 10GbE
Active & passive copper Optical SR & LR Operates with QLogic optics only

Fully Integrated ASIC

Power ~7.4W (Dual Port with Optical SR) Low Profile Form Factor
No heat sink required

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

52

FCOE vs. FC performance test

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

53

FCOE vs. FC test - results

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

54

Open-FCoE Software
HBA HBA Mgmt Plane Linux Kernel File System layers SCSI Layer HBA Driver HBA Linux Kernel File System layers SCSI Layer OpenFC Layer FCoE Layer Ethernet Driver Ethernet Net Device FCoE FCoE Mgmt Plane

Fibre
2006 Cisco Systems, Inc. All rights reserved.

Server
Cisco Confidential

Ethernet

Server
55

More information

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

56

Standards Sites

More Information
http://ieee.org http://t11.org/fcoe

Case Studies http://www.cisco.com/en/US/products /ps9670/prod_case_studies_list.html Book I/O Consolidation in the Data Center

http://fcoe.com

- By Silvano Gai & Claudio DeSanti

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

57

Thank You

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

58

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

59