You are on page 1of 10

Performance Study

Performance Evaluation of VMXNET3 Virtual Network Device


VMware vSphere 4 build 164009

Introduction
Withmoreandmoremissioncriticalnetworkingintensiveworkloadsbeingvirtualizedandconsolidated, virtualnetworkperformancehasneverbeenmoreimportantandrelevantthantoday.VMwarehasbeen continuouslyimprovingtheperformanceofitsvirtualnetworkdevices.VMXNETGeneration3(VMXNET3) isthemostrecentvirtualnetworkdevicefromVMware,andwasdesignedfromscratchforhighperformance andtosupportnewfeatures. ThispapercomparesthenetworkingperformanceofVMXNET3tothatofenhancedVMXNET2(theprevious generationofhighperformancevirtualnetworkdevice)onVMwarevSphere4tohelpusersunderstandthe performancebenefitsofmigratingtothisnextgenerationdevice.Weconductedtheperformanceevaluation onanRVIenabledAMDShanghaisystemusingthenetperfbenchmark.OurstudycoversbothWindows Server2008andRedHatEnterpriseLinux5(RHEL5)guestoperatingsystemsandresultsshowthatoverall VMXNET3isonparwithorbetterthanenhancedVMXNET2forboth1Gigand10Gigworkloads. Furthermore,VMXNET3haslaidthegroundworkforfurtherimprovementbybeingabletotakeadvantage ofnewadvancesinbothhardwareandsoftware. TheoverheadofnetworkvirtualizationincludesthevirtualizationoftheCPU,theMMU(Memory ManagementUnit),andtheI/Odevices.Therefore,advancesinanyofthesecomponentswillaffecttheoverall networkingI/Operformance.Inthispaper,wewillfocusondeviceanddriverspecificfeatures. Inthenextsection,wedescribethenewfeaturesintroducedbyVMXNET3.Wethendescribeourexperimental methodologies,benchmarks,anddetailedresultsinsubsequentsections.BothIPv4andIPv6TCP performanceresultsarepresented.Finally,weconcludebyprovidingasummaryofourperformance experiencewithVMXNET3.

Background
BothenhancedVMXNET2andVMXNET3devicesarevirtualizationaware(paravirtualized):thedeviceand thedriverhavebeendesignedwithvirtualizationinmindtominimizeI/Ovirtualizationoverhead.Tofurther closetheperformancegapbetweenvirtualnetworkdeviceandnativedevice,anumberofnewenhancements havebeenintroducedwithVMXNET3. Withtheadventof10GbENICs,networkingthroughputisoftenlimitedbytheprocessorspeedanditsability tohandlehighvolumenetworkprocessingtasks.Multicoreprocessorsofferopportunitiesforparallel processingthatscaletomorethanoneprocessor.Modernoperatingsystems,includingWindowsServer2008 andLinux,haveincorporatedsupporttoleveragesuchparallelismbyallowingnetworkingprocessingon differentcoressimultaneously.ReceiveSideScaling(RSS)isonesuchtechniqueusedinWindowsServer2008 andVistatosupportparallelpacketreceiveprocessing.VMXNET3supportsRSSonthoseoperatingsystems.

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

TheVMXNET3driverisNAPIcompliantonLinuxguests.NAPIisaninterruptmitigationmechanismthat improveshighspeednetworkingperformanceonLinuxbyswitchingbackandforthbetweeninterruptmode andpollingmodeduringpacketreceive.ItisaproventechniquetoimproveCPUefficiencyandallowsthe guesttoprocesshigherpacketloads.VMXNET3alsosupportsLargeReceiveOffload(LRO)onLinuxguests. However,inESX4.0theVMkernelbackendsupportslargereceivepacketsonlyifthepacketsoriginatefrom anothervirtualmachinerunningonthesamehost. VMXNET3supportslargerTx/Rxringbuffersizescomparedtopreviousgenerationsofvirtualnetwork devices.Thisfeaturebenefitscertainnetworkworkloadswithburstyandhighpeakthroughput.Havinga largerringsizeprovidesextrabufferingtobettercopewithtransientpacketbursts.VMXNET3supportsthree interruptmodes:MSIX,MSI,andINTx.NormallytheVMXNET3guestdriverwillattempttousethe interruptmodesintheordergivenabove,iftheguestkernelsupportsthem.WithVMXNET3,TCP SegmentationOffload(TSO)forIPv6issupportedforbothWindowsandLinuxguestsnow,andTSOsupport forIPv4isaddedforSolarisguestsinadditiontoWindowsandLinuxguests. AsummaryofthenewfeaturesislistedinTable1.TouseVMXNET3,theusermustinstallVMwareToolson avirtualmachinewithhardwareversion7. Table 1. Feature comparison table
Features TSO,JumboFrames,TCP/IPChecksumOffload MSI/MSIXsupport(subjecttoguestoperatingsystem kernelsupport) ReceiveSideScaling(RSS,supportedinWindows2008 whenexplicitlyenabled) IPv6TCPSegmentationOffloading(TSOoverIPv6) NAPI(supportedinLinux) LRO(supportedinLinux,VMVMonly) IPv6TCPSegmentationOffloading(TSOoverIPv6) Enhanced VMXNET2 Yes No No No No No No VMXNET3 Yes Yes Yes Yes Yes Yes Yes

Experimental Methodology
Thissectiondescribestheexperimentalconfigurationandthebenchmarksusedinthisstudy.

Benchmarking Methodology
Weusedthenetworkbenchmarkingtoolnetperf2.4.3forIPv4experimentsand2.4.4forWindowsIPv6 experiments(duetoproblemswithnetperf2.4.3insupportingIPv6trafficonWindows),withvaryingsocket buffersizesandmessagesizes.Testingdifferentsocketbuffersizesandmessagesizesallowsustounderstand TCP/IPnetworkingperformanceundervariousconstraintsimposedbydifferentapplications.Netperfis widelyusedformeasuringunidirectionalTCPandUDPperformance.Itusesaclientservermodeland comprisesaclientprogram(netperf)actingasadatasender,andaserverprogram(netserver)actingasa receiver. WefocusonTCPperformanceinthispaper,withtherelevantperformancemetricsdefinedasfollows:

ThroughputThismetricwasmeasuredusingthenetperfTCP_STREAMtest.Itreportsthenumberof bytestransmittedpersecond(Gbpsfor10GtestsandMbpsfor1Gtests).Weranfiveparallelnetperf sessionsandreportedtheaggregatethroughputfor10Gtestsand1netperfsessionfor1Gtests. LatencyThismetricwasmeasuredusingthenetperfTCPRequestResponse(TCP_RR)test.Itreports thenumberofrequestresponsetransactionspersecond,whichisinverselyproportionaltoendtoend latency. CPUusageCPUusageisdefinedastheCPUresourcesconsumedtorunthenetperftestonthetesting server.Itismeasuredafterthetestentersthestablestateafterthetestisstarted.

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

DetailsofthedifferentTCPtestsmentionedabovecanbefoundinthenetperfdocumentation.

Experimental Configuration
Ourtestbedsetupconsistedoftwophysicalhosts,withtheservermachinerunningvSphere4andtheclient machinerunningRHEL5.1natively.The1GbEand10GbEphysicalNICsonbothhostswereconnectedusing crossovercables.ThetestbedsetupisillustratedinFigure1. Figure 1. Testbed setup

ConfigurationinformationislistedinTable2.Allvirtualmachinesusedare64bitwith1vCPU.2GBRAMis usedforbothWindowsandLinuxvirtualmachines.ThedefaultdriverconfigurationisusedforVMXNET3 unlessmentionedotherwise. Table 2. Configuration


Server CPUs Memory Networking Virtualization Software Virtual Machine CPUs Memory OperatingSystems 1virtualCPU 2GB WindowsServer2008Enterprise RedHatEnterpriseLinux5.2 Client CPUs Memory Networking OperatingSystem TwoQuadCoreIntelXeonX5355 16GB Intel82572EIGbENIC Intel82598EB10GbEAFDualPortNIC RedHatEnterpriseLinux5.1(64bit) TwoQuadCoreAMDOpteron2384Processors(Shanghai)@2.7GHz 8GB Intel82572EIGbENIC Intel82598EB10GbEAFDualPortNIC VMwarevSphere4(build164009) Hardwarevirtualization,RVIenabled

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

Windows Server 2008 VM Experiments


Inthissection,wedescribetheperformanceofVMXNET3ascomparedtoenhancedVMXNET2onaWindows Server2008Enterprisevirtualmachine,usingboth1GbEand10GbEuplinkNICs.Allreporteddatapointsare theaverageofthreeruns.

10G Test Results


Aggregated10Gbpsnetworkloadwillbecommoninthenearfutureduetohighserverconsolidationratios andhighbandwidthrequirementsoftodaysapplications.Toefficientlyhandlethroughputatthisrateisvery demanding.Weusedfiveconcurrentnetperfsessionsfora1vCPUvirtualmachine10Gtest. Theresultsontheleftsideofthegrapharefortransmitworkload,andtheresultsontherightsideforreceive workload.Toensureafaircomparison,CPUusageisnormalizedbythroughputandtheratioofVMXNET3 toenhancedVMXNET2ispresented.AnydatapointwithaCPUratiohigherthan1meansthatVMXNET3 usesmoreCPUresourceforthegiventesttodrivethesameamountofthroughput,whileanydatapointwith aCPUratiolowerthan1meansthatVMXNET3ismoreefficientindrivingthesameamountofthroughput forthegiventest.TheXaxisonthegraphsrepresentsthedifferentnetworkconfigurationparameters(socket buffersizeandmessagesizepairs)usedfornetperf. Figure2illustratesthatVMXNET3achievesahigherthroughputcomparedtoenhancedVMXNET2forthe majorityofthetests,withthereceiveworkloadachievingmoresignificantgains.Themostnoticeable improvementisforlargesocketsize/messagesizereceivetests,withanimprovementupto92%. Figure 2. Windows 10G throughput results (higher is better)

Figure3showstheCPUusageratioofVMXNET3toenhancedVMXNET2anditshowsthatVMXNET3is moreCPUefficientthanenhancedVMXNET2acrosstheboard.ThissavesmoreCPUresourceforotheruseful work.

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

Figure 3. Windows 10G CPU usage ratio comparison (lower than 1 means VMXNET3 is better)

1G Test Results
Figure4showsthethroughputresultsusinga1GbEuplinkNIC.Weuseonenetperfsessionforthistestasa singlesessionwillbeabletoreachlinerateforbothtransmitandreceive.Overall,VMXNET3throughputis onparwithenhancedVMXNET2andisnoticeablybetterforsmallsocket/messagesizetransmittests.There isoneexception,though,for8k/512receivetest.However,thegapisquitesmallandfallsintotherangeof noise. Figure 4. Windows 1G throughput results (higher is better)

Figure5showsthattheCPUusageratioofVMXNET3toenhancedVMXNET2isconsistentlylowerthan1, whichmeansthatVMXNET3ismoreefficientthanenhancedVMXNET2todrivethesameamountof throughput,withupto25%savings.

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

Figure 5. Windows 1G CPU Usage Ratio Comparison (lower than 1 means VMXNET3 is better)

Latency Results
Inadditiontothroughput,endtoendlatencyisanothermetrictomeasurenetworkingperformance,whichis particularlyimportantforlatencysensitiveapplicationssuchasWebservers,databaseapplications,and interactivedesktopworkloads. Figure6showsthenumberofrequestresponsespersecondforaWindowsServer2008virtualmachinefor both10Gand1Gtraffic.Amessagesizeof1bytewasusedforthelatencyexperiments.VMXNET3isasgood asenhancedVMXNET2usinga10GuplinkandachievesamuchhigherRRratewitha1Guplink.As requests/responsespersecondareinverselyproportionaltoendtoendmessagelatency,thisshowsthat VMXNET3achievesalowerlatencythanenhancedVMXNET2ingeneral. Figure 6. Windows netperf latency (TCP_RR) results (higher is better)

IPv6 Test Results


WithVMXNET3,IPv6supporthasbeenfurtherenhancedwithTSO6(TCPSegmentationOffloadoverIPv6), whichsignificantlyhelpstoreducetransmitprocessingperformedbythevCPUsandimprovesbothtransmit efficiencyandthroughput.IftheuplinkNICsupportsTSO6,thesegmentationworkwillbeoffloadedtothe networkhardware;otherwise,softwaresegmentationwillbeconductedinsidetheVMkernelbeforepassing packetstotheuplink.Therefore,TSO6canbeenabledforVMXNET3whetherornotthehardwareNIC supportsit.

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

Next,wewilllookatIPv6TCPperformanceinsideaWindowsServer2008virtualmachineona10Gsetup. IPv6performanceonaLinuxvirtualmachineshowssimilarimprovementofthroughputandCPUefficiency fortransmit. Figure7showsthattransmitthroughputsalmostdoublethatofenhancedVMXNET2forlargesocket/message tests.ReceivethroughputshowsasimilarimprovementobservedwithIPv4.Notethatthedifferencebetween IPv4andIPv6transmitthroughputforthesametestisduetothelackofLROsupportforIPv6packetsonthe clienthost(nativeLinuxwiththeixgbedriver). Figure 7. Windows 10G IPv6 throughput results (higher is better)

Figure8showsthatVMXNET3isasgoodasorbetterthanenhancedVMXNET2intermsofCPUefficiency forIPv6. Figure 8. Windows 10G IPv6 CPU usage ratio comparison (lower than 1 means VMXNET3 is better)

Red Hat Enterprise Linux Virtual Machine Experiments


Inthissection,wepresenttheperformanceofVMXNET3ascomparedtoenhancedVMXNET2onaRedHat EnterpriseLinux5.2(RHEL5.2)virtualmachine.

10G Test Results


Figures9and10showthethroughputandCPUefficiencyrespectivelywitha10GuplinkNIC.VMXNET3is onparwithenhancedVMXNET2formostofthetestsandslightlybetterformostreceivetests.

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

Figure 9. Linux 10G throughput results (higher is better)

Figure 10. Linux 10G CPU Usage Ratio Comparison (lower than 1 means VMXNET3 is better)

1G Test Results
For1Gtests,VMXNET3isingeneralasgoodasenhancedVMXNET2.Thereareafewexceptionsthough, whereVMXNET3spendsmoreCPUresourcetodriveasimilaramountofthroughput.Thiscanbeexplained bythelargerringsizeusedbyVMXNET3bydefaultandtheworkingsetoftheLinuxvirtualmachine. Ringbuffersizecanbetunedtoensureenoughqueuingatthedeviceleveltohandlethepeakloadsgenerated bythesystemandthenetwork.Withalargeringsize,thedevicecanhandlehigherpeakloadswithout droppinganypackets.However,thisalsoincreasesthememorysizeusedbythedeviceand,asaresult,a largercachefootprint.Thisendsupwithasmallercachesizeforotherusefuldata/codesandslight performancedegradation. Thedefaultreceive(Rx)ringsizeis150forenhancedVMXNET2and256forVMXNET3.Weusedthedefault ringsizefortheresultspresentedinbothFigures11and12.Afterreducingthereceiveringsizetothesame usedbyenhancedVMXNET2,VMXNET3sreceiveCPUefficiencyisonparwithenhancedVMXNET2again, withnolossinthroughput.Sodependingontheworkload,usersmayopttotunetheringsizetobestfittheir requirements.(UserscanchangethedefaultreceiveringsizeforVMXNET3fromwithintheLinuxguest operatingsystem:ethtool -G eth<x> rx <new_value>,where<x>referstotheEthernetinterfaceIDinthe guest,and<new_value>referstothenewvaluefortheRxringsize.)

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

Figure 11. Linux 1G throughput results (higher is better)

Figure 12. Linux 1G CPU usage ratio comparison (lower means VMXNET3 is better)

Latency Results
Figure13showsnetperflatencyresultsforboth10Gand1GsetupwithanRHEL5.2virtualmachine. VMXNET3isagainasgoodasenhancedVMXNET2.

VMware, Inc.

Performance Evaluation of VMXNET3 Virtual Network Device

Figure 13. Linux netperf latency (TCP_RR) results (higher is better)

Conclusion
VMXNET3,thenewestgenerationofvirtualnetworkadapterfromVMware,offersperformanceonparwith orbetterthanitspreviousgenerationsinbothWindowsandLinuxguests.Boththedriverandthedevicehave beenhighlytunedtoperformbetteronmodernsystems.Furthermore,VMXNET3introducesnewfeatures andenhancements,suchasTSO6andRSS.TSO6makesitespeciallyusefulforusersdeployingapplications thatdealwithIPv6traffic,whileRSSishelpfulfordeploymentsrequiringhighscalability.Allthesefeatures giveVMXNET3advantagesthatarenotpossiblewithpreviousgenerationsofvirtualnetworkadapters. Movingforward,tokeeppacewithaneverincreasingdemandfornetworkbandwidth,werecommend customersmigratetoVMXNET3ifperformanceisoftopconcerntotheirdeployments.

References

ThenetperfWebsite(http://www.netperf.org/netperf) ReceiveSideScalingEnhancementsinWindowsServer2008 (http://www.microsoft.com/whdc/device/network/ndis_rss.mspx) NAPI(http://www.linuxfoundation.org/en/Net:NAPI) VMware,Inc.whitepaper.PerformanceComparisonofVirtualNetworkDevices, (http://www.vmware.com/files/pdf/perf_comparison_virtual_network_devices_wp.pdf) VMware,Inc.whitepaper,PerformanceEvaluationofAMDRVIHardwareAssist, (http://www.vmware.com/pdf/RVI_performance.pdf)

All information in this paper regarding future directions and intent are subject to change or withdrawal without notice and should not be relied on in making a purchasing decision concerning VMware products. The information in this paper is not a legal obligation for VMware to deliver any material, code, or functionality. The release and timing of VMware products remains at VMware's sole discretion. VMware, Inc. 3401 Hillview Ave., Palo Alto, CA 94304 www.vmware.com Copyright 2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item: EN-000295-00 If you have comments about this documentation, submit your feedback to: docfeedback@vmware.com

10

You might also like