Professional Documents
Culture Documents
Overview
Goal: Measure how the three fabrics support by Troy compare to each
other in POWER/AIX?
How well does the CF scale going from 1 to 2 ports on the CF?
System Configuration
Networks server
IB M
RoCE
CF 4 member configuration.
P730-IOC
QDR IB Five P730-IOC CECS 16 core 128 GB Memory
DDR IB Galaxy2 adapter
DDR IB Travis-IB QDR IB adapter
Travis-EN RoCE adapter ( PRQ)
Only the CF used both ports in the adapters for 2 port
measurements
Latency (usec)
DDR IB
QDR IB
30
RoCE
●
Findings comparing DDR IB to RoCE ctraces.
20
●
Instruction counts for SLS, RAR and
WARM are higher by 3.99%, 4.80%
and 3.79% respectively. 10
●
All increases in instruction counts
are in RoCE specific code, 0
specifically mxibQpPostSend. DIAGPTEST RARPTEST RARPTESTND REGPTEST SLSPTEST WARMOPTEST
●
Of the increased instruction counts Opertation type
30% are from the stamp_wqe routine.
This is used to place an end of list
pattern at end of WQE list for the
adapter read ahead function. Working Latency usec Galaxy2 DDR QDR IB RoCE
with RoCE development on improving DIAGPTEST 6.91 9.11 11.08
this routine.
●
RoCE currently uses up to 3 lwsync() RARPTEST 18.27 19.36 25.82
calls per mxibQpPostSEnd call, RARPTESTND 7.76 9.93 12.02
which may be unnecessary. At the REGPTEST 23.22 25.65 30.82
end of the routine we ring the
doorbell which does a full sync() SLSPTEST 8.77 10.68 12.95
anyway. WARMOPTEST 34.10 34.37 48.90
Microb Single port Operation rates Troy single port small message rates
build 1146A_61ps111
2500000
Small Messages
2000000
Operation type
Data Operations Troy single port data pages per second rates
build 1146A_61ps111
QDR IB limit
●
DDR IB and RoCE write multiple page rates 600000
are link limited.
●
Read rates for DDR IB and RoCe are lower DRR IB limit 500000
than writes because of additional per page
100000
0
Read Write
Operation type
1 WARMO connection
Interoperation gap 175 usec. 400000
60
40
20
0
1 port 2 ports
200000
●
For 2 ports RoCE is 46.74% lower than DDR IB.
1202565 vs. 640531 operations per second. 0
1 port 2 ports
●
Average latency for RoCE 1 ports is 43.73% higher
than DDR IB. It is 78.51 vs. 53.60 usec.
Microb peak profile performance latency
●
For 2 ports RoCE latency is 78.51% higher than DDR Build 1146A_61ps111
IB. 138.65 vs. 74.28 usec. 160
●
2 port scaling is only 45% for DDR IB at peak traffic 140
100 DDR
QDR
80 RoCE
60
40
20
0
1 port 2 ports
Summary
●
Current assessment:
●
Currently DDR IB using Galaxy2 adapter in POWER performs best.
●
RoCE is the worst performing option due to much higher latency and limited link
bandwidth.
●
Yet to be assessed
●
Indepth analysis of the RoCE driver vs. GX++ Galaxy driver.
●
Scaling up to 4 ports on the CF.
●
Comparison to Intel hardware for RoCE.
Backup charts