VSP 4 Vmxnet3 Perf

Performance Study
Performance Evaluation of VMXNET3 Virtual Network Device

VMware vSphere 4 build 164009
Introduction
Withmoreandmoremissioncriticalnetworkingintensiveworkloadsbeingvirtualizedandconsolidated, virtualnetworkperformancehasneverbeenmoreimportantandrelevantthantoday.VMwarehasbeen continuouslyimprovingtheperformanceofitsvirtualnetworkdevices.VMXNETGeneration3(VMXNET3) isthemostrecentvirtualnetworkdevicefromVMware,andwasdesignedfromscratchforhighperformance andtosupportnewfeatures. ThispapercomparesthenetworkingperformanceofVMXNET3tothatofenhancedVMXNET2(theprevious generationofhighperformancevirtualnetworkdevice)onVMwarevSphere4tohelpusersunderstandthe performancebenefitsofmigratingtothisnextgenerationdevice.Weconductedtheperformanceevaluation onanRVIenabledAMDShanghaisystemusingthenetperfbenchmark.OurstudycoversbothWindows Server2008andRedHatEnterpriseLinux5(RHEL5)guestoperatingsystemsandresultsshowthatoverall VMXNET3isonparwithorbetterthanenhancedVMXNET2forboth1Gigand10Gigworkloads. Furthermore,VMXNET3haslaidthegroundworkforfurtherimprovementbybeingabletotakeadvantage ofnewadvancesinbothhardwareandsoftware. TheoverheadofnetworkvirtualizationincludesthevirtualizationoftheCPU,theMMU(Memory ManagementUnit),andtheI/Odevices.Therefore,advancesinanyofthesecomponentswillaffecttheoverall networkingI/Operformance.Inthispaper,wewillfocusondeviceanddriverspecificfeatures. Inthenextsection,wedescribethenewfeaturesintroducedbyVMXNET3.Wethendescribeourexperimental methodologies,benchmarks,anddetailedresultsinsubsequentsections.BothIPv4andIPv6TCP performanceresultsarepresented.Finally,weconcludebyprovidingasummaryofourperformance experiencewithVMXNET3.
Background
BothenhancedVMXNET2andVMXNET3devicesarevirtualizationaware(paravirtualized):thedeviceand thedriverhavebeendesignedwithvirtualizationinmindtominimizeI/Ovirtualizationoverhead.Tofurther closetheperformancegapbetweenvirtualnetworkdeviceandnativedevice,anumberofnewenhancements havebeenintroducedwithVMXNET3. Withtheadventof10GbENICs,networkingthroughputisoftenlimitedbytheprocessorspeedanditsability tohandlehighvolumenetworkprocessingtasks.Multicoreprocessorsofferopportunitiesforparallel processingthatscaletomorethanoneprocessor.Modernoperatingsystems,includingWindowsServer2008 andLinux,haveincorporatedsupporttoleveragesuchparallelismbyallowingnetworkingprocessingon differentcoressimultaneously.ReceiveSideScaling(RSS)isonesuchtechniqueusedinWindowsServer2008 andVistatosupportparallelpacketreceiveprocessing.VMXNET3supportsRSSonthoseoperatingsystems.
VMware, Inc.
TheVMXNET3driverisNAPIcompliantonLinuxguests.NAPIisaninterruptmitigationmechanismthat improveshighspeednetworkingperformanceonLinuxbyswitchingbackandforthbetweeninterruptmode andpollingmodeduringpacketreceive.ItisaproventechniquetoimproveCPUefficiencyandallowsthe guesttoprocesshigherpacketloads.VMXNET3alsosupportsLargeReceiveOffload(LRO)onLinuxguests. However,inESX4.0theVMkernelbackendsupportslargereceivepacketsonlyifthepacketsoriginatefrom anothervirtualmachinerunningonthesamehost. VMXNET3supportslargerTx/Rxringbuffersizescomparedtopreviousgenerationsofvirtualnetwork devices.Thisfeaturebenefitscertainnetworkworkloadswithburstyandhighpeakthroughput.Havinga largerringsizeprovidesextrabufferingtobettercopewithtransientpacketbursts.VMXNET3supportsthree interruptmodes:MSIX,MSI,andINTx.NormallytheVMXNET3guestdriverwillattempttousethe interruptmodesintheordergivenabove,iftheguestkernelsupportsthem.WithVMXNET3,TCP SegmentationOffload(TSO)forIPv6issupportedforbothWindowsandLinuxguestsnow,andTSOsupport forIPv4isaddedforSolarisguestsinadditiontoWindowsandLinuxguests. AsummaryofthenewfeaturesislistedinTable1.TouseVMXNET3,theusermustinstallVMwareToolson avirtualmachinewithhardwareversion7. Table 1. Feature comparison table
Features TSO,JumboFrames,TCP/IPChecksumOffload MSI/MSIXsupport(subjecttoguestoperatingsystem kernelsupport) ReceiveSideScaling(RSS,supportedinWindows2008 whenexplicitlyenabled) IPv6TCPSegmentationOffloading(TSOoverIPv6) NAPI(supportedinLinux) LRO(supportedinLinux,VMVMonly) IPv6TCPSegmentationOffloading(TSOoverIPv6) Enhanced VMXNET2 Yes No No No No No No VMXNET3 Yes Yes Yes Yes Yes Yes Yes
Experimental Methodology
Thissectiondescribestheexperimentalconfigurationandthebenchmarksusedinthisstudy.
Benchmarking Methodology
Weusedthenetworkbenchmarkingtoolnetperf2.4.3forIPv4experimentsand2.4.4forWindowsIPv6 experiments(duetoproblemswithnetperf2.4.3insupportingIPv6trafficonWindows),withvaryingsocket buffersizesandmessagesizes.Testingdifferentsocketbuffersizesandmessagesizesallowsustounderstand TCP/IPnetworkingperformanceundervariousconstraintsimposedbydifferentapplications.Netperfis widelyusedformeasuringunidirectionalTCPandUDPperformance.Itusesaclientservermodeland comprisesaclientprogram(netperf)actingasadatasender,andaserverprogram(netserver)actingasa receiver. WefocusonTCPperformanceinthispaper,withtherelevantperformancemetricsdefinedasfollows:
ThroughputThismetricwasmeasuredusingthenetperfTCP_STREAMtest.Itreportsthenumberof bytestransmittedpersecond(Gbpsfor10GtestsandMbpsfor1Gtests).Weranfiveparallelnetperf sessionsandreportedtheaggregatethroughputfor10Gtestsand1netperfsessionfor1Gtests. LatencyThismetricwasmeasuredusingthenetperfTCPRequestResponse(TCP_RR)test.Itreports thenumberofrequestresponsetransactionspersecond,whichisinverselyproportionaltoendtoend latency. CPUusageCPUusageisdefinedastheCPUresourcesconsumedtorunthenetperftestonthetesting server.Itismeasuredafterthetestentersthestablestateafterthetestisstarted.
VMware, Inc.
DetailsofthedifferentTCPtestsmentionedabovecanbefoundinthenetperfdocumentation.
Experimental Configuration
Ourtestbedsetupconsistedoftwophysicalhosts,withtheservermachinerunningvSphere4andtheclient machinerunningRHEL5.1natively.The1GbEand10GbEphysicalNICsonbothhostswereconnectedusing crossovercables.ThetestbedsetupisillustratedinFigure1. Figure 1. Testbed setup
ConfigurationinformationislistedinTable2.Allvirtualmachinesusedare64bitwith1vCPU.2GBRAMis usedforbothWindowsandLinuxvirtualmachines.ThedefaultdriverconfigurationisusedforVMXNET3 unlessmentionedotherwise. Table 2. Configuration

Server CPUs Memory Networking Virtualization Software Virtual Machine CPUs Memory OperatingSystems 1virtualCPU 2GB WindowsServer2008Enterprise RedHatEnterpriseLinux5.2 Client CPUs Memory Networking OperatingSystem TwoQuadCoreIntelXeonX5355 16GB Intel82572EIGbENIC Intel82598EB10GbEAFDualPortNIC RedHatEnterpriseLinux5.1(64bit) TwoQuadCoreAMDOpteron2384Processors(Shanghai)@2.7GHz 8GB Intel82572EIGbENIC Intel82598EB10GbEAFDualPortNIC VMwarevSphere4(build164009) Hardwarevirtualization,RVIenabled
VMware, Inc.
Windows Server 2008 VM Experiments

Inthissection,wedescribetheperformanceofVMXNET3ascomparedtoenhancedVMXNET2onaWindows Server2008Enterprisevirtualmachine,usingboth1GbEand10GbEuplinkNICs.Allreporteddatapointsare theaverageofthreeruns.
10G Test Results

Aggregated10Gbpsnetworkloadwillbecommoninthenearfutureduetohighserverconsolidationratios andhighbandwidthrequirementsoftodaysapplications.Toefficientlyhandlethroughputatthisrateisvery demanding.Weusedfiveconcurrentnetperfsessionsfora1vCPUvirtualmachine10Gtest. Theresultsontheleftsideofthegrapharefortransmitworkload,andtheresultsontherightsideforreceive workload.Toensureafaircomparison,CPUusageisnormalizedbythroughputandtheratioofVMXNET3 toenhancedVMXNET2ispresented.AnydatapointwithaCPUratiohigherthan1meansthatVMXNET3 usesmoreCPUresourceforthegiventesttodrivethesameamountofthroughput,whileanydatapointwith aCPUratiolowerthan1meansthatVMXNET3ismoreefficientindrivingthesameamountofthroughput forthegiventest.TheXaxisonthegraphsrepresentsthedifferentnetworkconfigurationparameters(socket buffersizeandmessagesizepairs)usedfornetperf. Figure2illustratesthatVMXNET3achievesahigherthroughputcomparedtoenhancedVMXNET2forthe majorityofthetests,withthereceiveworkloadachievingmoresignificantgains.Themostnoticeable improvementisforlargesocketsize/messagesizereceivetests,withanimprovementupto92%. Figure 2. Windows 10G throughput results (higher is better)
Figure3showstheCPUusageratioofVMXNET3toenhancedVMXNET2anditshowsthatVMXNET3is moreCPUefficientthanenhancedVMXNET2acrosstheboard.ThissavesmoreCPUresourceforotheruseful work.
VMware, Inc.
Figure 3. Windows 10G CPU usage ratio comparison (lower than 1 means VMXNET3 is better)
1G Test Results
Figure4showsthethroughputresultsusinga1GbEuplinkNIC.Weuseonenetperfsessionforthistestasa singlesessionwillbeabletoreachlinerateforbothtransmitandreceive.Overall,VMXNET3throughputis onparwithenhancedVMXNET2andisnoticeablybetterforsmallsocket/messagesizetransmittests.There isoneexception,though,for8k/512receivetest.However,thegapisquitesmallandfallsintotherangeof noise. Figure 4. Windows 1G throughput results (higher is better)
Figure5showsthattheCPUusageratioofVMXNET3toenhancedVMXNET2isconsistentlylowerthan1, whichmeansthatVMXNET3ismoreefficientthanenhancedVMXNET2todrivethesameamountof throughput,withupto25%savings.
VMware, Inc.
Figure 5. Windows 1G CPU Usage Ratio Comparison (lower than 1 means VMXNET3 is better)
Latency Results
Inadditiontothroughput,endtoendlatencyisanothermetrictomeasurenetworkingperformance,whichis particularlyimportantforlatencysensitiveapplicationssuchasWebservers,databaseapplications,and interactivedesktopworkloads. Figure6showsthenumberofrequestresponsespersecondforaWindowsServer2008virtualmachinefor both10Gand1Gtraffic.Amessagesizeof1bytewasusedforthelatencyexperiments.VMXNET3isasgood asenhancedVMXNET2usinga10GuplinkandachievesamuchhigherRRratewitha1Guplink.As requests/responsespersecondareinverselyproportionaltoendtoendmessagelatency,thisshowsthat VMXNET3achievesalowerlatencythanenhancedVMXNET2ingeneral. Figure 6. Windows netperf latency (TCP_RR) results (higher is better)
IPv6 Test Results

WithVMXNET3,IPv6supporthasbeenfurtherenhancedwithTSO6(TCPSegmentationOffloadoverIPv6), whichsignificantlyhelpstoreducetransmitprocessingperformedbythevCPUsandimprovesbothtransmit efficiencyandthroughput.IftheuplinkNICsupportsTSO6,thesegmentationworkwillbeoffloadedtothe networkhardware;otherwise,softwaresegmentationwillbeconductedinsidetheVMkernelbeforepassing packetstotheuplink.Therefore,TSO6canbeenabledforVMXNET3whetherornotthehardwareNIC supportsit.
VMware, Inc.
Next,wewilllookatIPv6TCPperformanceinsideaWindowsServer2008virtualmachineona10Gsetup. IPv6performanceonaLinuxvirtualmachineshowssimilarimprovementofthroughputandCPUefficiency fortransmit. Figure7showsthattransmitthroughputsalmostdoublethatofenhancedVMXNET2forlargesocket/message tests.ReceivethroughputshowsasimilarimprovementobservedwithIPv4.Notethatthedifferencebetween IPv4andIPv6transmitthroughputforthesametestisduetothelackofLROsupportforIPv6packetsonthe clienthost(nativeLinuxwiththeixgbedriver). Figure 7. Windows 10G IPv6 throughput results (higher is better)
Figure8showsthatVMXNET3isasgoodasorbetterthanenhancedVMXNET2intermsofCPUefficiency forIPv6. Figure 8. Windows 10G IPv6 CPU usage ratio comparison (lower than 1 means VMXNET3 is better)
Red Hat Enterprise Linux Virtual Machine Experiments

Inthissection,wepresenttheperformanceofVMXNET3ascomparedtoenhancedVMXNET2onaRedHat EnterpriseLinux5.2(RHEL5.2)virtualmachine.
10G Test Results

Figures9and10showthethroughputandCPUefficiencyrespectivelywitha10GuplinkNIC.VMXNET3is onparwithenhancedVMXNET2formostofthetestsandslightlybetterformostreceivetests.
VMware, Inc.
Figure 9. Linux 10G throughput results (higher is better)
Figure 10. Linux 10G CPU Usage Ratio Comparison (lower than 1 means VMXNET3 is better)
1G Test Results
For1Gtests,VMXNET3isingeneralasgoodasenhancedVMXNET2.Thereareafewexceptionsthough, whereVMXNET3spendsmoreCPUresourcetodriveasimilaramountofthroughput.Thiscanbeexplained bythelargerringsizeusedbyVMXNET3bydefaultandtheworkingsetoftheLinuxvirtualmachine. Ringbuffersizecanbetunedtoensureenoughqueuingatthedeviceleveltohandlethepeakloadsgenerated bythesystemandthenetwork.Withalargeringsize,thedevicecanhandlehigherpeakloadswithout droppinganypackets.However,thisalsoincreasesthememorysizeusedbythedeviceand,asaresult,a largercachefootprint.Thisendsupwithasmallercachesizeforotherusefuldata/codesandslight performancedegradation. Thedefaultreceive(Rx)ringsizeis150forenhancedVMXNET2and256forVMXNET3.Weusedthedefault ringsizefortheresultspresentedinbothFigures11and12.Afterreducingthereceiveringsizetothesame usedbyenhancedVMXNET2,VMXNET3sreceiveCPUefficiencyisonparwithenhancedVMXNET2again, withnolossinthroughput.Sodependingontheworkload,usersmayopttotunetheringsizetobestfittheir requirements.(UserscanchangethedefaultreceiveringsizeforVMXNET3fromwithintheLinuxguest operatingsystem:ethtool -G eth<x> rx <new_value>,where<x>referstotheEthernetinterfaceIDinthe guest,and<new_value>referstothenewvaluefortheRxringsize.)
VMware, Inc.
Figure 11. Linux 1G throughput results (higher is better)
Figure 12. Linux 1G CPU usage ratio comparison (lower means VMXNET3 is better)
Latency Results
Figure13showsnetperflatencyresultsforboth10Gand1GsetupwithanRHEL5.2virtualmachine. VMXNET3isagainasgoodasenhancedVMXNET2.
VMware, Inc.
Figure 13. Linux netperf latency (TCP_RR) results (higher is better)
Conclusion
VMXNET3,thenewestgenerationofvirtualnetworkadapterfromVMware,offersperformanceonparwith orbetterthanitspreviousgenerationsinbothWindowsandLinuxguests.Boththedriverandthedevicehave beenhighlytunedtoperformbetteronmodernsystems.Furthermore,VMXNET3introducesnewfeatures andenhancements,suchasTSO6andRSS.TSO6makesitespeciallyusefulforusersdeployingapplications thatdealwithIPv6traffic,whileRSSishelpfulfordeploymentsrequiringhighscalability.Allthesefeatures giveVMXNET3advantagesthatarenotpossiblewithpreviousgenerationsofvirtualnetworkadapters. Movingforward,tokeeppacewithaneverincreasingdemandfornetworkbandwidth,werecommend customersmigratetoVMXNET3ifperformanceisoftopconcerntotheirdeployments.
References

ThenetperfWebsite(http://www.netperf.org/netperf) ReceiveSideScalingEnhancementsinWindowsServer2008 (http://www.microsoft.com/whdc/device/network/ndis_rss.mspx) NAPI(http://www.linuxfoundation.org/en/Net:NAPI) VMware,Inc.whitepaper.PerformanceComparisonofVirtualNetworkDevices, (http://www.vmware.com/files/pdf/perf_comparison_virtual_network_devices_wp.pdf) VMware,Inc.whitepaper,PerformanceEvaluationofAMDRVIHardwareAssist, (http://www.vmware.com/pdf/RVI_performance.pdf)
All information in this paper regarding future directions and intent are subject to change or withdrawal without notice and should not be relied on in making a purchasing decision concerning VMware products. The information in this paper is not a legal obligation for VMware to deliver any material, code, or functionality. The release and timing of VMware products remains at VMware's sole discretion. VMware, Inc. 3401 Hillview Ave., Palo Alto, CA 94304 www.vmware.com Copyright 2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item: EN-000295-00 If you have comments about this documentation, submit your feedback to: docfeedback@vmware.com
10

VSP 4 Vmxnet3 Perf

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VSP 4 Vmxnet3 Perf

Uploaded by

Copyright:

Available Formats

Performance Study

Performance Evaluation of VMXNET3 Virtual Network Device

Performance Evaluation of VMXNET3 Virtual Network Device

Performance Evaluation of VMXNET3 Virtual Network Device

ConfigurationinformationislistedinTable2.Allvirtualmachinesusedare64bitwith1vCPU.2GBRAMis usedforbothWindowsandLinuxvirtualmachines.ThedefaultdriverconfigurationisusedforVMXNET3 unlessmentionedotherwise. Table 2. Configuration

Performance Evaluation of VMXNET3 Virtual Network Device

Windows Server 2008 VM Experiments

10G Test Results

Figure3showstheCPUusageratioofVMXNET3toenhancedVMXNET2anditshowsthatVMXNET3is moreCPUefficientthanenhancedVMXNET2acrosstheboard.ThissavesmoreCPUresourceforotheruseful work.

Performance Evaluation of VMXNET3 Virtual Network Device

Figure5showsthattheCPUusageratioofVMXNET3toenhancedVMXNET2isconsistentlylowerthan1, whichmeansthatVMXNET3ismoreefficientthanenhancedVMXNET2todrivethesameamountof throughput,withupto25%savings.

Performance Evaluation of VMXNET3 Virtual Network Device

IPv6 Test Results

Performance Evaluation of VMXNET3 Virtual Network Device

Red Hat Enterprise Linux Virtual Machine Experiments

10G Test Results

Performance Evaluation of VMXNET3 Virtual Network Device

Figure 9. Linux 10G throughput results (higher is better)

Performance Evaluation of VMXNET3 Virtual Network Device

Figure 11. Linux 1G throughput results (higher is better)

Performance Evaluation of VMXNET3 Virtual Network Device

Figure 13. Linux netperf latency (TCP_RR) results (higher is better)

You might also like