0% found this document useful (0 votes)

730 views9 pages

InfiniBand An Overview

1) InfiniBand provides high-speed connectivity between servers and storage using a serial switched architecture instead of a shared parallel bus like PCI. This avoids bandwidth bottlenecks and contention issues that arise on shared buses. 2) InfiniBand uses channel adapters and switches to connect nodes like servers and storage devices over a switched fabric. It employs a virtual interface protocol to reduce latency for applications like server clustering through remote direct memory access and bypassing the operating system. 3) While InfiniBand was initially positioned as a replacement for PCI, PCI remains widely used in practice due to its market acceptance and familiarity. InfiniBand is finding adoption in high performance computing and server clustering applications that require

Uploaded by

io8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

730 views9 pages

InfiniBand An Overview

Uploaded by

io8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

InfiniBand: An Overview

Introduction
Everything changes. In the early 90s the microprocessor was a prized possession. By the year 2000, PCs were running microprocessors at GHz of clock speeds. But the way, in which I/O was carried out, remained much or less, the same. The processor is now capable of delivering data at blistering speeds but the I/O subsystem that is supposed to accept it, is incapable of receiving the same. The bottleneck is the shared bus architecture.

Problems with PCI

The various components connected to the bus vie for the control of the bus. A prime example of this is the familiar Peripheral Component Interconnect (PCI) bus [4].

Fig 1. The shared PCI bus in the system architecture

Source: How PCI works? by Jeff Tyson (http://www.howstuffworks.com/pci.htm)

As shown in the Fig 1 the PCI devices are all attached to a parallel PCI bus, which they all contend for. In this kind of a scenario, contention is inevitable. The performance chart is shown below.

PCI Max BW Architecture Issues

PCI (66MH) 4 Gbps

PCI-X DDR* (133 MHz) 8 Gbps 16 Gbps Shared Parallel Bus Bus Contention

QDR** 32 Gbps

Fig 2. Table of PCI standards.

*Double Data Rate ** Quad Data Rate

Though the maximum bandwidth shown in the tables looks enormous, the fact is that the bandwidth at hand, turns out be about 533 Mbps for PCI 66 MHz version. Also, due to the shared nature of the PCI bus, as the frequency of operation is increased, the fanout has to be lowered. This means that the number of devices that can be attached to the bus decreases. So PCI does not look like a viable option for the next generation I/O systems, though it looks poised to exist for quite some time due to wide market acceptability. What could be the solution to the bus contention issue? It is the use of Serial Switched Architectures. InfiniBand is a technology that employs a serial switched architecture.

InfiniBand to the Rescue

Only a technology that is implemented very close to the processor memory bus can be seen as a replacement for PCI. InfiniBand (IB) breaks through the bandwidth and fanout constraints posed by the PCI bus by moving to a serial switched fabric architecture. Now the question is that when there are already certain established networking technologies like Fibre Channel (FC) and Gigabit Ethernet (GigE) which provide the same serial switched architecture, then what is the need of a new one? The answer can be summarized in 3 key-words [6]: Data Storage Networking Server Clustering

The FC technology is a proven technology in the field of data storage. GigE is also coming up in a big way. Networking is the USP of GigE. But what about server clustering? Server clustering needs a low overhead, quick messaging service that is very reliable. This is where InfiniBand scores. Unlike other networking technologies InfiniBand is designed to bypass the multi-layered protocol-processing overhead. The comparison in other areas is shown in the graphic.

Technology InfiniBand Ethernet

Application focus Server I/O & Clustering Local Area Networks Storage Area Networks

Data Transport/ Reliability High Reliability Data Packets dropped during Congestion-No failover capability High Reliability

Fibre Channel

Systems Management Built-in, in-band fabric and H/W management. No form factors or built-in management systems No form factors or built-in management systems

Fig 3. Differences between technologies.

Source: Understanding InfiniBand by Gene Risi & Philip Bender

Components of InfiniBand
System Area Layout

Fig 4. InfiniBand topology Fig 4. Shows the InfiniBand topology in its most basic form. The node could be server, a PC an I/O device like RAID subsystem. The fabric may be a single switch or an interconnection of switches and routers. All connections in this topology are switched i.e. they are point to point, thus eliminating congestion. Also due to the serial nature, they require only four cables instead of the wide parallel connection of the PCI bus.

Fig 5. An system level view of the basic topology In the system level view (Fig 5.) there are certain elements that need explanation. The leftmost part of the figure depicts the internals of a node. The memory controller is connected to a Host Channel Adapter (HCA), which is the entry point of the node into the fabric. The HCA provides an interface for InfiniBand to integrate with the Operating System. The HCA links the node with the switch, which in-turn is connected to a number of Target Channel Adapters (TCA). The TCA interfaces present target I/O devices like RAID and JBOD subsystems with the InfiniBand fabric. Each TCA serves a specific kind of target though Multi-utility TCAs are also a possibility. These channel adapters contain ports. A single TCA/HCA can contain more than a single port. These ports connect the node to the fabric and vice-versa.

InfiniBand Architecture
As is evident from the fig 7. InfiniBand operates via a Network Protocol Stack. This protocol stack has been compared with the OSI model layers for convenience.

Fig 7. InfiniBand Protocol Stack compared with the OSI network Model
Source: InfiniBand Architecture Tutorial Hot Chips by Daniel Cassiday (InfiniBand Trade Association)

At the top client layers communicate in the form of Transactions. These transactions are composed of Messages that are moved through the transport layer. These messages are then further divided into packets at the network layer as shown in the graphic. IB routers can rout these packets across network domains. The routers use a global identifier called GID[3] for this purpose. For subnet routing in the data-link layer an identifier local to the subnet is used, known as the LID [3]. An IB switch generally does this.

Fig 6. IB PDU s at various layers.

At the lowest layer of the stack (which corresponds to the physical and data-link layers of the OSI model) the standards are more or less, similar to FC. InfiniBand uses both optic Fibre cables and copper cables. The IB error rate is 10-12 and uses 8B/10B-encoding standards. 8B/10B means that for every 8 bits of data to be sent, 10 bits are actually sent over the physical cabling. A new concept of aggregating links into physical lanes [6] of 4 or 12 cables is also supported. They are known as 4X and 12X respectively. Moreover, the IB cabling is fully duplex, i.e. a 4X channel contains 4 send and 4 receive lanes. This combination gives a faster throughput. Though there are 4 lanes, they are a single entity for management issues. IB incorporates a concept of segmenting bandwidths using virtual lanes (VL) [6]. These VLs are formed by a multiplexing arrangement where unrelated data can flow sharing the same link. IB has configurations of 1,2,4,8 & 15 virtual lanes. V15 is only used for network management and the rest are data lanes. By implementing this, IB allows multipoint communication among nodes and provides better utilization of the fabric. IB provides a method to logically group together nodes, which are otherwise physically distant. This is known as partitioning [6]. It is analogous to VLAN s in Ethernet data networks.

Virtual Interface Protocol

The Virtual Interface protocol is used at the IB transport layer and is what makes IB different. As mentioned earlier, the main area where IB scores over FC and GigE is clustering. For clustering heartbeat a very low latency network has to be present. The Virtual Interface (VI) [6] protocols main motive is to reduce the latency between communicating servers. Using network protocol architecture for cluster heartbeat causes latency because of the overhead involved in executing the network protocol code and due to the context switches needed to accept data in the privileged mode of the OS. The privileged mode comes into the picture, because the network adapter, which receives the data, has to hand it over to the OS. The VI protocol reduces the latency by allowing the network adapter to bypass the OS and perform functions in the non-privileged mode. VI uses certain memory like operations to directly access buffers on the receiver. This process is known as Remote Dynamic Memory Access (RDMA)[3]. In order to bypass the privileged mode the OS, the various I/O and process related management functions have to be taken up by the VI protocol. Each application that wants to send/receive creates a QueuePair (QP)[3]. A QP is a combination of a send & a receive queue at each port. An application that wants to communicate places a Work Queue Element (WQE) [3] in the send queue. From the send queue of the sender, the data is sent to receiving queue of the receiver. When a WQE is executed, a Completion Queue Element (CQE)[3] is generated and placed in a completion queue. The completion queue is used to inform the WQE parent application of the completion and also reduces the number of interrupts generated. There are certain functions defined for both the send and the receive queues. The send queue can perform basic message sending, and 3 RDMA related functions known as RDMA-read, RDMA-write and RDMA-Atomic.

For Receive Queue the only type of operation is Post Receive Buffer, which identifies a buffer into which a client may send to or receive data from through a Send, RDMAWrite, RDMA-Read operation.

Fig 8. VI protocol communication mechanism

Source: An introduction to InfiniBand Architecture by Odysseas Pentakalos (http://www.oreillynet.com/pub/a/network/2002/02/04/windows.html)

Types of services:
IB provides 5 different types of transport services [6]: Reliable Connection Unreliable Connection Reliable Datagram Unreliable Datagram Raw Datagram

Scope as a PCI replacement

IB came into the market and was immediately being touted as the PCI replacement. But any technology takes a while to become popular in the market. PCI is an established technology and a lot of IT professionals are at ease with PCI. In this scenario, the chances of IB displacing PCI seem very slim. IB is making inroads into the market, not as a competitor for PCI but as a complimentary technology. In fact, adapters are already in the markets that provide support for both IB and PCI-X [2]. A comparative chart is shown in the figure:

Comparison PCI, PCI-X, DDR, QDR

InfiniBand

Advantages Lower Cost Simpler for chip to chip Clustering Clustering Scalability Quality of Service Security Fault Tolerance Multi-Cast Fabric Convergence PCB, Copper & Fiber

Disadvantages Bus Contention

Low market acceptance

Fig 8. Table showing comparison between PCI and IB

Source: Introduction to the value proposition of InfiniBand by Marc Staimer (Dragon Slayer Consulting)

Conclusion
The response to IB has been positive. As per analysts, very soon a huge percentage of servers will be IB enabled. This growth will take place when IB becomes native with the server motherboard. It is predicted that soon the use of IB as a technology for clustering, storing as well as networking will ensue. The predictions may be positive but the IT world is such that what is hot property today may be obsolete tomorrow. So what lies in store for InfiniBand, is for time to tell.

Glossary
1. AGP Advanced Graphics Processor 2. BW - Bandwidth 3. CPU Central Processing Unit 4. CQE Completion Queue Element 5. DDR Double Data Rate 6. FC Fibre Channel 7. GID Global Identifier 8. GigE Gigabit Ethernet 9. HCA Host Channel Adapter 10. IB InfiniBand 11. IBTA InfiniBand Trade Association 12. ISA Industry Standard Architecture 13. LID Local Identifier 14. PCI - Peripheral Component Interconnect 15. QDR Quadruple Data Rate 16. QP Queue Pair 17. RAM Random Access Memory

18. RDMA Remote Dynamic Memory Access 19. SNIA Storage Networking Industry Association 20. TCA Target Channel Adapter 21. VI Virtual Interface 22. VL Virtual Lanes 23. WQE Work Queue Element References 1. InfiniBand Architecture Tutorial Hot Chips by Daniel Cassiday (InfiniBand Trade Association) 2. Introduction to the value proposition of InfiniBand by Marc Staimer (Dragon Slayer Consulting) 3. An introduction to InfiniBand Architecture by Odysseas Pentakalos (http://www.oreillynet.com/pub/a/network/2002/02/04/windows.html) 4. How PCI works? By Jeff Tyson (http://www.howstuffworks.com/pci.htm) 5. Understanding InfiniBand by Gene Risi & Philip Bender 6. Building Storage Networks - 2nd Edition by Marc Farley (Storage Networking Industry Association)

InfiniBand Architecture Overview
No ratings yet
InfiniBand Architecture Overview
39 pages
1 - Introduction To InfiniBand
100% (1)
1 - Introduction To InfiniBand
21 pages
InfiniBand Key Features - Summary
No ratings yet
InfiniBand Key Features - Summary
38 pages
InfiniBand Architecture Guide
No ratings yet
InfiniBand Architecture Guide
22 pages
Infiniband for IT Professionals
No ratings yet
Infiniband for IT Professionals
30 pages
Nvidia h100 Datasheet 2430615
No ratings yet
Nvidia h100 Datasheet 2430615
4 pages
SLURM: Linux Cluster Resource Manager
No ratings yet
SLURM: Linux Cluster Resource Manager
21 pages
Slurm Talk
No ratings yet
Slurm Talk
40 pages
Understanding Hypervisors in Cloud Computing
No ratings yet
Understanding Hypervisors in Cloud Computing
18 pages
Converged Network Quality of Service Insights
No ratings yet
Converged Network Quality of Service Insights
40 pages
qm9700 qm9790 1u NDR 400gb S Infiniband Switch Systems User Manual
No ratings yet
qm9700 qm9790 1u NDR 400gb S Infiniband Switch Systems User Manual
56 pages
Campus Network Architectures and Technologies (Data Communication Series) 1st Edición
100% (1)
Campus Network Architectures and Technologies (Data Communication Series) 1st Edición
505 pages
Install - Guide CentOS7 xCAT Stateful SLURM 1.3.9 x86 - 64
No ratings yet
Install - Guide CentOS7 xCAT Stateful SLURM 1.3.9 x86 - 64
57 pages
Designing A Deterministic Ethernet Network
No ratings yet
Designing A Deterministic Ethernet Network
10 pages
Understanding RESTful Web Services
No ratings yet
Understanding RESTful Web Services
38 pages
02 01 Troubleshooting PCI Express Link Training and Protocol Issues FROZEN
No ratings yet
02 01 Troubleshooting PCI Express Link Training and Protocol Issues FROZEN
61 pages
User Manual
No ratings yet
User Manual
116 pages
BRKSPG-2015-Carrier Grade Disaggregation With IOS XR
No ratings yet
BRKSPG-2015-Carrier Grade Disaggregation With IOS XR
36 pages
Nexus 9000 Vs Catalyst 65000
No ratings yet
Nexus 9000 Vs Catalyst 65000
2 pages
KVM vs. Xen in INFN Production Systems
No ratings yet
KVM vs. Xen in INFN Production Systems
28 pages
Complete Roce Guide
No ratings yet
Complete Roce Guide
25 pages
IBM POWER9 Processor Architecture Overview
No ratings yet
IBM POWER9 Processor Architecture Overview
12 pages
3 - InfiniBand Architecture - The Physical Layer
No ratings yet
3 - InfiniBand Architecture - The Physical Layer
23 pages
Cisco Overlay Technology
No ratings yet
Cisco Overlay Technology
19 pages
Mellanox Ethernet Switch Brochure
No ratings yet
Mellanox Ethernet Switch Brochure
4 pages
6WIND-Intel White Paper - Optimized Data Plane Processing Solutions Using The Intel® DPDK v2
No ratings yet
6WIND-Intel White Paper - Optimized Data Plane Processing Solutions Using The Intel® DPDK v2
8 pages
VxLAN vs OTV: Key Differences Explained
No ratings yet
VxLAN vs OTV: Key Differences Explained
3 pages
Cisco Firepower NGFW Datasheet
No ratings yet
Cisco Firepower NGFW Datasheet
17 pages
Ccna Data Center Dcicn 200
No ratings yet
Ccna Data Center Dcicn 200
6 pages
Instinct Mi300 Series Custer Reference Guide
No ratings yet
Instinct Mi300 Series Custer Reference Guide
30 pages
CVD ServerRoomDesignGuide AUG14 - 2
No ratings yet
CVD ServerRoomDesignGuide AUG14 - 2
84 pages
Brocade Cisco CLI Comparison
100% (1)
Brocade Cisco CLI Comparison
11 pages
GPFS Faq
100% (1)
GPFS Faq
41 pages
Anatomy of A Linux Bridge
No ratings yet
Anatomy of A Linux Bridge
6 pages
802.11ac and LTE Testing Overview
No ratings yet
802.11ac and LTE Testing Overview
51 pages
Infiniband Networking Sales Training
No ratings yet
Infiniband Networking Sales Training
17 pages
Inside A Router
No ratings yet
Inside A Router
10 pages
FreeBSD Network Protocol Testing
No ratings yet
FreeBSD Network Protocol Testing
29 pages
Future of Structured Cabling: NetClear
No ratings yet
Future of Structured Cabling: NetClear
53 pages
BIRD Routing Daemon Guide
No ratings yet
BIRD Routing Daemon Guide
43 pages
03 Network Basics For Cloud Computing
No ratings yet
03 Network Basics For Cloud Computing
26 pages
Pci Sig SR Iov Primer
No ratings yet
Pci Sig SR Iov Primer
28 pages
100 Gigabit Ethernet PCS Implementation
No ratings yet
100 Gigabit Ethernet PCS Implementation
14 pages
7-Ethernet Switching
No ratings yet
7-Ethernet Switching
8 pages
Chapter 19 Testing and Troubleshooting Networks
No ratings yet
Chapter 19 Testing and Troubleshooting Networks
39 pages
Cuda Emulator
No ratings yet
Cuda Emulator
7 pages
Cisco Nexus Switches Introduction
No ratings yet
Cisco Nexus Switches Introduction
15 pages
Cisco HyperFlex Hardware Guide
No ratings yet
Cisco HyperFlex Hardware Guide
34 pages
Configuration Guide For BIG-IP Application Security Manager 11.4.1
No ratings yet
Configuration Guide For BIG-IP Application Security Manager 11.4.1
373 pages
BRKDCN 3346
No ratings yet
BRKDCN 3346
161 pages
2014 Usa PDF BRKARC-2001
No ratings yet
2014 Usa PDF BRKARC-2001
105 pages
Computer Communication Lab Manual
No ratings yet
Computer Communication Lab Manual
47 pages
New Gotocert 200-150 Dumps PDF - DCICN Introducing Cisco Data Center Networking
No ratings yet
New Gotocert 200-150 Dumps PDF - DCICN Introducing Cisco Data Center Networking
7 pages
AI-Driven Smart Manufacturing Solutions
No ratings yet
AI-Driven Smart Manufacturing Solutions
19 pages
Infiniband & Firewire: Assignment 4
No ratings yet
Infiniband & Firewire: Assignment 4
12 pages
Infiniband: Bart Taylor
No ratings yet
Infiniband: Bart Taylor
13 pages
Introduction To Infiniband: Executive Summary
No ratings yet
Introduction To Infiniband: Executive Summary
20 pages
Understanding InfiniBand
No ratings yet
Understanding InfiniBand
4 pages
InfiniBand Programming for High Performance
No ratings yet
InfiniBand Programming for High Performance
9 pages
IB Architecture - Summary
No ratings yet
IB Architecture - Summary
31 pages
The Muslim Responses To Evolution
No ratings yet
The Muslim Responses To Evolution
6 pages
Multiprocessor & Memory Mapping
No ratings yet
Multiprocessor & Memory Mapping
8 pages
XML as a GOAML JSON Alternative
No ratings yet
XML as a GOAML JSON Alternative
12 pages
AIM242 S Agentic AI and The Journey To Gen AI Value Realization Sponsored by ZS
No ratings yet
AIM242 S Agentic AI and The Journey To Gen AI Value Realization Sponsored by ZS
16 pages
Hajj: A Pillar of Unity and Worship
No ratings yet
Hajj: A Pillar of Unity and Worship
2 pages
Short Q2
100% (1)
Short Q2
2 pages
Label
No ratings yet
Label
1 page
W3Schools Online Web Tutorials
No ratings yet
W3Schools Online Web Tutorials
16 pages
(IELTS) Writing Model Answer
No ratings yet
(IELTS) Writing Model Answer
19 pages
Level Set Methods and Dynamic Implicit Surfaces 1st Edition Stanley Osher
No ratings yet
Level Set Methods and Dynamic Implicit Surfaces 1st Edition Stanley Osher
154 pages
Complete Metric Spaces and Baire Theorem
No ratings yet
Complete Metric Spaces and Baire Theorem
5 pages
Vooks Wheredoeskittygo Intherain
No ratings yet
Vooks Wheredoeskittygo Intherain
7 pages
Mcr3U - Mcf3M Horizontal and Vertical Stretches/Compressions of Functions
No ratings yet
Mcr3U - Mcf3M Horizontal and Vertical Stretches/Compressions of Functions
4 pages
Text Visual Audio Motion Manipulative and Multimedia Information and Media.
No ratings yet
Text Visual Audio Motion Manipulative and Multimedia Information and Media.
6 pages
Documentation
No ratings yet
Documentation
22 pages
Indian Literature Part 1
No ratings yet
Indian Literature Part 1
26 pages
Lesson 9 What's Your Name?: Acadsoc Spoken English For Beginners
No ratings yet
Lesson 9 What's Your Name?: Acadsoc Spoken English For Beginners
25 pages
Math 10 - TOS - 2nd Quarter Test SY 2025-2026
No ratings yet
Math 10 - TOS - 2nd Quarter Test SY 2025-2026
2 pages
111.thukydides - 78-89 Pericles Funeral Oration and The Plague (Ch. 3)
No ratings yet
111.thukydides - 78-89 Pericles Funeral Oration and The Plague (Ch. 3)
12 pages
Cluases
0% (1)
Cluases
112 pages
Understanding Organic Unity in Literature
No ratings yet
Understanding Organic Unity in Literature
17 pages
Function-Oriented Software Design Guide
No ratings yet
Function-Oriented Software Design Guide
74 pages
Nagendra Haraya Trilochanaya
No ratings yet
Nagendra Haraya Trilochanaya
2 pages
Peer Teaching: Strategies & Benefits
No ratings yet
Peer Teaching: Strategies & Benefits
22 pages
Understanding Disobedience in Class
No ratings yet
Understanding Disobedience in Class
2 pages
IELTS Essay Combined Pie Chart and Table
No ratings yet
IELTS Essay Combined Pie Chart and Table
4 pages
2) Pandas DataFrame
No ratings yet
2) Pandas DataFrame
46 pages
GreenSock Animation Cheat Sheet
No ratings yet
GreenSock Animation Cheat Sheet
2 pages
Comuniada, Cls A V-A Pe Scoala.2024
No ratings yet
Comuniada, Cls A V-A Pe Scoala.2024
3 pages
Best .TXT Editing Apps For Android
No ratings yet
Best .TXT Editing Apps For Android
1 page

InfiniBand An Overview

Uploaded by

InfiniBand An Overview

Uploaded by

InfiniBand: An Overview

Problems with PCI

Fig 1. The shared PCI bus in the system architecture

PCI Max BW Architecture Issues

PCI (66MH) 4 Gbps

Fig 2. Table of PCI standards.

InfiniBand to the Rescue

Technology InfiniBand Ethernet

Fig 3. Differences between technologies.

Fig 6. IB PDU s at various layers.

Virtual Interface Protocol

Fig 8. VI protocol communication mechanism

Scope as a PCI replacement

Comparison PCI, PCI-X, DDR, QDR

Disadvantages Bus Contention

Low market acceptance

Fig 8. Table showing comparison between PCI and IB

You might also like