Professional Documents
Culture Documents
Introduction
Withmoreandmoremissioncriticalnetworkingintensiveworkloadsbeingvirtualizedandconsolidated, virtualnetworkperformancehasneverbeenmoreimportantandrelevantthantoday.VMwarehasbeen continuouslyimprovingtheperformanceofitsvirtualnetworkdevices.VMXNETGeneration3(VMXNET3) isthemostrecentvirtualnetworkdevicefromVMware,andwasdesignedfromscratchforhighperformance andtosupportnewfeatures. ThispapercomparesthenetworkingperformanceofVMXNET3tothatofenhancedVMXNET2(theprevious generationofhighperformancevirtualnetworkdevice)onVMwarevSphere4tohelpusersunderstandthe performancebenefitsofmigratingtothisnextgenerationdevice.Weconductedtheperformanceevaluation onanRVIenabledAMDShanghaisystemusingthenetperfbenchmark.OurstudycoversbothWindows Server2008andRedHatEnterpriseLinux5(RHEL5)guestoperatingsystemsandresultsshowthatoverall VMXNET3isonparwithorbetterthanenhancedVMXNET2forboth1Gigand10Gigworkloads. Furthermore,VMXNET3haslaidthegroundworkforfurtherimprovementbybeingabletotakeadvantage ofnewadvancesinbothhardwareandsoftware. TheoverheadofnetworkvirtualizationincludesthevirtualizationoftheCPU,theMMU(Memory ManagementUnit),andtheI/Odevices.Therefore,advancesinanyofthesecomponentswillaffecttheoverall networkingI/Operformance.Inthispaper,wewillfocusondeviceanddriverspecificfeatures. Inthenextsection,wedescribethenewfeaturesintroducedbyVMXNET3.Wethendescribeourexperimental methodologies,benchmarks,anddetailedresultsinsubsequentsections.BothIPv4andIPv6TCP performanceresultsarepresented.Finally,weconcludebyprovidingasummaryofourperformance experiencewithVMXNET3.
Background
BothenhancedVMXNET2andVMXNET3devicesarevirtualizationaware(paravirtualized):thedeviceand thedriverhavebeendesignedwithvirtualizationinmindtominimizeI/Ovirtualizationoverhead.Tofurther closetheperformancegapbetweenvirtualnetworkdeviceandnativedevice,anumberofnewenhancements havebeenintroducedwithVMXNET3. Withtheadventof10GbENICs,networkingthroughputisoftenlimitedbytheprocessorspeedanditsability tohandlehighvolumenetworkprocessingtasks.Multicoreprocessorsofferopportunitiesforparallel processingthatscaletomorethanoneprocessor.Modernoperatingsystems,includingWindowsServer2008 andLinux,haveincorporatedsupporttoleveragesuchparallelismbyallowingnetworkingprocessingon differentcoressimultaneously.ReceiveSideScaling(RSS)isonesuchtechniqueusedinWindowsServer2008 andVistatosupportparallelpacketreceiveprocessing.VMXNET3supportsRSSonthoseoperatingsystems.
VMware, Inc.
TheVMXNET3driverisNAPIcompliantonLinuxguests.NAPIisaninterruptmitigationmechanismthat improveshighspeednetworkingperformanceonLinuxbyswitchingbackandforthbetweeninterruptmode andpollingmodeduringpacketreceive.ItisaproventechniquetoimproveCPUefficiencyandallowsthe guesttoprocesshigherpacketloads.VMXNET3alsosupportsLargeReceiveOffload(LRO)onLinuxguests. However,inESX4.0theVMkernelbackendsupportslargereceivepacketsonlyifthepacketsoriginatefrom anothervirtualmachinerunningonthesamehost. VMXNET3supportslargerTx/Rxringbuffersizescomparedtopreviousgenerationsofvirtualnetwork devices.Thisfeaturebenefitscertainnetworkworkloadswithburstyandhighpeakthroughput.Havinga largerringsizeprovidesextrabufferingtobettercopewithtransientpacketbursts.VMXNET3supportsthree interruptmodes:MSIX,MSI,andINTx.NormallytheVMXNET3guestdriverwillattempttousethe interruptmodesintheordergivenabove,iftheguestkernelsupportsthem.WithVMXNET3,TCP SegmentationOffload(TSO)forIPv6issupportedforbothWindowsandLinuxguestsnow,andTSOsupport forIPv4isaddedforSolarisguestsinadditiontoWindowsandLinuxguests. AsummaryofthenewfeaturesislistedinTable1.TouseVMXNET3,theusermustinstallVMwareToolson avirtualmachinewithhardwareversion7. Table 1. Feature comparison table
Features TSO,JumboFrames,TCP/IPChecksumOffload MSI/MSIXsupport(subjecttoguestoperatingsystem kernelsupport) ReceiveSideScaling(RSS,supportedinWindows2008 whenexplicitlyenabled) IPv6TCPSegmentationOffloading(TSOoverIPv6) NAPI(supportedinLinux) LRO(supportedinLinux,VMVMonly) IPv6TCPSegmentationOffloading(TSOoverIPv6) Enhanced VMXNET2 Yes No No No No No No VMXNET3 Yes Yes Yes Yes Yes Yes Yes
Experimental Methodology
Thissectiondescribestheexperimentalconfigurationandthebenchmarksusedinthisstudy.
Benchmarking Methodology
Weusedthenetworkbenchmarkingtoolnetperf2.4.3forIPv4experimentsand2.4.4forWindowsIPv6 experiments(duetoproblemswithnetperf2.4.3insupportingIPv6trafficonWindows),withvaryingsocket buffersizesandmessagesizes.Testingdifferentsocketbuffersizesandmessagesizesallowsustounderstand TCP/IPnetworkingperformanceundervariousconstraintsimposedbydifferentapplications.Netperfis widelyusedformeasuringunidirectionalTCPandUDPperformance.Itusesaclientservermodeland comprisesaclientprogram(netperf)actingasadatasender,andaserverprogram(netserver)actingasa receiver. WefocusonTCPperformanceinthispaper,withtherelevantperformancemetricsdefinedasfollows:
VMware, Inc.
DetailsofthedifferentTCPtestsmentionedabovecanbefoundinthenetperfdocumentation.
Experimental Configuration
Ourtestbedsetupconsistedoftwophysicalhosts,withtheservermachinerunningvSphere4andtheclient machinerunningRHEL5.1natively.The1GbEand10GbEphysicalNICsonbothhostswereconnectedusing crossovercables.ThetestbedsetupisillustratedinFigure1. Figure 1. Testbed setup
VMware, Inc.
VMware, Inc.
Figure 3. Windows 10G CPU usage ratio comparison (lower than 1 means VMXNET3 is better)
1G Test Results
Figure4showsthethroughputresultsusinga1GbEuplinkNIC.Weuseonenetperfsessionforthistestasa singlesessionwillbeabletoreachlinerateforbothtransmitandreceive.Overall,VMXNET3throughputis onparwithenhancedVMXNET2andisnoticeablybetterforsmallsocket/messagesizetransmittests.There isoneexception,though,for8k/512receivetest.However,thegapisquitesmallandfallsintotherangeof noise. Figure 4. Windows 1G throughput results (higher is better)
VMware, Inc.
Figure 5. Windows 1G CPU Usage Ratio Comparison (lower than 1 means VMXNET3 is better)
Latency Results
Inadditiontothroughput,endtoendlatencyisanothermetrictomeasurenetworkingperformance,whichis particularlyimportantforlatencysensitiveapplicationssuchasWebservers,databaseapplications,and interactivedesktopworkloads. Figure6showsthenumberofrequestresponsespersecondforaWindowsServer2008virtualmachinefor both10Gand1Gtraffic.Amessagesizeof1bytewasusedforthelatencyexperiments.VMXNET3isasgood asenhancedVMXNET2usinga10GuplinkandachievesamuchhigherRRratewitha1Guplink.As requests/responsespersecondareinverselyproportionaltoendtoendmessagelatency,thisshowsthat VMXNET3achievesalowerlatencythanenhancedVMXNET2ingeneral. Figure 6. Windows netperf latency (TCP_RR) results (higher is better)
VMware, Inc.
Next,wewilllookatIPv6TCPperformanceinsideaWindowsServer2008virtualmachineona10Gsetup. IPv6performanceonaLinuxvirtualmachineshowssimilarimprovementofthroughputandCPUefficiency fortransmit. Figure7showsthattransmitthroughputsalmostdoublethatofenhancedVMXNET2forlargesocket/message tests.ReceivethroughputshowsasimilarimprovementobservedwithIPv4.Notethatthedifferencebetween IPv4andIPv6transmitthroughputforthesametestisduetothelackofLROsupportforIPv6packetsonthe clienthost(nativeLinuxwiththeixgbedriver). Figure 7. Windows 10G IPv6 throughput results (higher is better)
Figure8showsthatVMXNET3isasgoodasorbetterthanenhancedVMXNET2intermsofCPUefficiency forIPv6. Figure 8. Windows 10G IPv6 CPU usage ratio comparison (lower than 1 means VMXNET3 is better)
VMware, Inc.
Figure 10. Linux 10G CPU Usage Ratio Comparison (lower than 1 means VMXNET3 is better)
1G Test Results
For1Gtests,VMXNET3isingeneralasgoodasenhancedVMXNET2.Thereareafewexceptionsthough, whereVMXNET3spendsmoreCPUresourcetodriveasimilaramountofthroughput.Thiscanbeexplained bythelargerringsizeusedbyVMXNET3bydefaultandtheworkingsetoftheLinuxvirtualmachine. Ringbuffersizecanbetunedtoensureenoughqueuingatthedeviceleveltohandlethepeakloadsgenerated bythesystemandthenetwork.Withalargeringsize,thedevicecanhandlehigherpeakloadswithout droppinganypackets.However,thisalsoincreasesthememorysizeusedbythedeviceand,asaresult,a largercachefootprint.Thisendsupwithasmallercachesizeforotherusefuldata/codesandslight performancedegradation. Thedefaultreceive(Rx)ringsizeis150forenhancedVMXNET2and256forVMXNET3.Weusedthedefault ringsizefortheresultspresentedinbothFigures11and12.Afterreducingthereceiveringsizetothesame usedbyenhancedVMXNET2,VMXNET3sreceiveCPUefficiencyisonparwithenhancedVMXNET2again, withnolossinthroughput.Sodependingontheworkload,usersmayopttotunetheringsizetobestfittheir requirements.(UserscanchangethedefaultreceiveringsizeforVMXNET3fromwithintheLinuxguest operatingsystem:ethtool -G eth<x> rx <new_value>,where<x>referstotheEthernetinterfaceIDinthe guest,and<new_value>referstothenewvaluefortheRxringsize.)
VMware, Inc.
Figure 12. Linux 1G CPU usage ratio comparison (lower means VMXNET3 is better)
Latency Results
Figure13showsnetperflatencyresultsforboth10Gand1GsetupwithanRHEL5.2virtualmachine. VMXNET3isagainasgoodasenhancedVMXNET2.
VMware, Inc.
Conclusion
VMXNET3,thenewestgenerationofvirtualnetworkadapterfromVMware,offersperformanceonparwith orbetterthanitspreviousgenerationsinbothWindowsandLinuxguests.Boththedriverandthedevicehave beenhighlytunedtoperformbetteronmodernsystems.Furthermore,VMXNET3introducesnewfeatures andenhancements,suchasTSO6andRSS.TSO6makesitespeciallyusefulforusersdeployingapplications thatdealwithIPv6traffic,whileRSSishelpfulfordeploymentsrequiringhighscalability.Allthesefeatures giveVMXNET3advantagesthatarenotpossiblewithpreviousgenerationsofvirtualnetworkadapters. Movingforward,tokeeppacewithaneverincreasingdemandfornetworkbandwidth,werecommend customersmigratetoVMXNET3ifperformanceisoftopconcerntotheirdeployments.
References
All information in this paper regarding future directions and intent are subject to change or withdrawal without notice and should not be relied on in making a purchasing decision concerning VMware products. The information in this paper is not a legal obligation for VMware to deliver any material, code, or functionality. The release and timing of VMware products remains at VMware's sole discretion. VMware, Inc. 3401 Hillview Ave., Palo Alto, CA 94304 www.vmware.com Copyright 2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item: EN-000295-00 If you have comments about this documentation, submit your feedback to: docfeedback@vmware.com
10