You are on page 1of 14
INNER 0 0 'US 200902852191 cu») United States cz) Patent Application Publication co) Pub. No.: US 2009/0285219 Al ROMRELL et al. (43) Pub. Date Novy. 19, 2009 (64) DEFICIT AND GROUP ROUND ROBIN Related SCHEDULING FOR EFFICIENT NETWORK, ‘TRAFFIC MANAGEMENT Application Data (62) Division of application No. 11/404,049, filed on Ape. 13,2008, (75) Inveatorst DAVID ROMRELL, Hilshoro, OR cation Classification (US); Christopher Charles Peek, (51) Int. Ch Beaverton, OR (US) Hoa 12486 2005.01), (2) US.CL 370998. Comespondenee Address PATENTRY on, ABSTRACT PO. BOX 151616 Data trafic is schedhled by, ina first scheduler, selecting a SAN RAFAEL, CA 94915-1616 (US) source of traffic from a plurality of sources of traffic, each source being associated with a second scheduler, ina second (73) Assignee: Barracuda Networks, Inc, scheduler associated with the selected source of trafic, ‘Campbell, CA (US) Selecting a type of tlie from a plurality of types of traffic ‘within the souree selected by te first schetler, and trans- (21) Appl. Nos 124466,387 miting dats ofthe selected type and source. Scheduling data trafic apparats and method using defieit and group ratio (22) Filed May 15, 2009 ound robin budgeting. US 20090285219 AI Patent Application Publication Nov. 19, 2009 Sheet 1 of 7 US 20090285219 AI Nov. 19, 2009 Sheet 2 of 7 Patent Application Publication e9iz epiz eziz Ole ovlz Ze Yes ete US 20090285219 AI Nov. 19, 2009 Sheet 3 of 7 Patent Application Publication € 6l4 z Wz z | es ogoe J agoe supers g soney aindwiog ‘sdnoig 1y8!9M OW! HOS ~—] 4 4 z z (soe egos ) \ paoe } \ 9806 J Patent Application Publication Nov. 19, 2009 Sheet 4 of 7 US 20090285219 AI Fig. 4 \ si a $ & 3 e e o 8 g g g 3 3 g| |8 Patent Application Publication Nov. 19, 2009 Sheet 5 of 7 US 20090285219 AI Fig. 5 02a ese 08a 508b 502 5068 — | 404b ‘506 4062 406 406 406 ‘US 2009/0285219 AI Nov. 19, 2009 Sheet 6 of 7 Patent Application Publication 919 zt9 o19 senend sanpayog Jeinpau9s senan youuny pyur) PUUPL —sseId sei p09 Buuwojusues, Bue Jay!SSe|O MOL} ozo ‘SuaquIeg av09 T ee oe I 820014 NOMEN, 8820014 jeuuny, US 20090285219 AI Nov. 19, 2009 Sheet 7 of 7 Patent Application Publication J9pE8H PPY yee 21210089 yeuuny ronsucseq, Woweeg OnBOay oz jouun, J9PE9H PPV WHO fouuny a1ei9Ue9) euun] 10) s880016 JepeRH Snowe wrped WenI61) Zoz jeuuny, US 2009/0285219 AI DEFICIT AND GROUP ROUND ROBIN ‘SCHEDULING FOR EFFICIENT NETWORK "TRAFFIC MANAGEMENT (CROSS-REPERENCE TO RELATED "APPLICATIONS, [0001] This isa division of application Ser No. 11/404049, Filed 13 Ape. 2006, now Pat. No, issued BACKGROUND [0002], Inoperating a network, tis sometimes necessary t0 ‘contol the flow of data from one point to another. This is ‘especially tue in complex network topologies, such as a tiered stuemie as shossn in FIG. 1, with a central site 102d ‘and several layers of sub-networks 102a, 6, 1122, 6 each Boing through one or mote links to reach the central site 102d, Previous systems for managing network rfl have relied ‘lass base! queving (CBQ) or other scheduling systems to implement link level sebeduling that, scheduling which of several links ean send network trafic over an uplink #0 ‘another ier of the network. Other systems have used data ‘compression, requiring modifications to the systems a ether ‘ed ofa compressed link. Istes in scheduling newwork rac Jnchide link oversubscription, where the varios links into @ node havea higher total tral han the link out ofthe node t0 other prt of the network, guaranteeing bandwidthamounts ‘o various links and various classes of data traffic, and eom- pensating for the effects of compression on allocation of bandwidth, SUMMARY, 10003] _Ingeneral,inoneaspect data trafic is scheduled by, Jina fis schedule, Selecting souree of tralfie from a plual- ity of sourves of talc, each source being associated with @ second scheduler, in a second scheduler associated with the flected source of traffic, selecting a type of tale from @ plurality of types of traffic within the source selected by the fst seeder, and transmitting data ofthe selected type and [0003] Implementations include one ormore ofthe follow- ing. Repeating the selecting and trasmiting, The trallc is trallie for passing over a communications fink. The selecting includes scheduling the selection of sources and types acconting 1 characteristics of the communications link Selecting a source of talic includes selecting a source from ‘which packets should be delivered according to are. Deliv ‘ring pockets according wo the rule includes one or more of uaraniceing # minimum bandwidth fora source of the pli- rality of sourves, guaranteeing @ maximum burst Emit for @ source ofthe plurality of sources, and guarantseing a service interval to a source of the plurality of sources. Choosing a source of traffic inelades allowing user to configure. pre- ‘emptive priority for a type of tai. In the frst scheduler accounting for bandwidth used by each souree of trafic Selecting a type of traffic includes selecting a type from hich packets should he delivered according to are. Deliv~ ‘ering pockets according (0 the rule ineludes one or more of ‘Euarantceing @ minimum bandwidth to a type, within an amount of bandwidth allocated by the fist scheduler, guar fntesing a maxinnam burs limit oa type, within burst imit allocated by the frst schediler, and guarantecing a service interval to-a type. The types of tific inckade overlapping ‘lasifcatons of waffic. Before the selecting, filtering the Nov. 19, 2009 traffic based on routes the tailic will use. The filtering includes applying «radix tec algoritn. Determining that packet from the selected typ isto be ansmitted through funnel, und selecting a type includes changing the type for ‘bandwidth usage based on an average efficieney ofthe tune! [00S] In general, inoaeaspect, data iraficis scheduled by selecting a source of trafic from a plurality of sources of tralic using a group mao round robin schediling algorithm. [0006] Implementations may include one or more of the following features. Using 2 group ratio round robin schedul- ing algorithm includes defining an ordered set of groups of sources of trafic having similar weights, computing ratios between ttal weightsaf the groups, repeatedly, choosing one ‘of the groups, within the chosen group, using a second algo- fithm to choose source of trafic, transmitting an amount of trafic from the chosen source. "The second algorithm is a deficit ound robin scheduling algorithm, Computing a credit reach group based onthe ratios, and aftr the transmitting, updating a defieit counter and quantum couater for the chosen group based on the amount of trafic transmitted and ‘the eredit, Choosing one of the groups hy ithe dflet counter ‘and the quantum eounter of the lntchosen group ate above ero, choosing the last-chosen proup, ifthe deficit counter of the last-chosen group is ator below zero, adding the credit t0 the deficit counter, adding quantum to the quantum counter, and choosing the next group of the ordered set of groups, and i the deficit counter of tke lst-chosen group is above Zer0 And the quantum eounter is at or below zero, adding & quan- tum to the quantum counter foe that group, snd choosing the fist group inthe ordered set of groups [0007] Implementations may include a centralized server performing the identifying and recording. [008] Advantages include the following. Bandwidth ean be guaranteed to each branch in an oversubscribed network with thousands of inks. The bandwidth that a particular application or class of applications wses can be conirolled to be within a specified rine. 0009] The details of one or more embodiments of the ‘vention are set forth in the accompanying drawings and the description helow. Other features, objects and advantages of the invention willbe apparent from the description and draw- ings, and from the claims. DESCRIPTION OF DRAWINGS [0010] FIGS. 1 und 2 are block diagrams ofa nework. [0011] F1G.3 isa block diagram ofa schedulingalgorithm. [0012] FIGS. 4,5, and 6 are block diagrams of schedulers DETAILED DESCRIPTION [0013] Ina central site network, such as tha shown in PIG. 1, multiple remote sites 102a, banda central site 102d each have a single connection 1M4a, bd, referred to as a link, ‘through a network 104, such a5 the Intemet ora private IP network. Each site has network hardware 108%, b,d, which Tiilitates connections between devices 110 and the network Jinks 14a, bd, respectively, The remote sites 102a, b may also have links 1162, 5 (0 additional remote sites 11 ‘connected though another network 104], In such a case, the Tink wo the Iocal network hardwaee is shovsn a8 another link 16.; sharing the link 114h back tothe central site L02d with the other remote links 1167, 6. Connections hetween end- points on the network are refered to as Finks, which may ‘ier from actual network connections. Link 114d eonneet- US 2009/0285219 AI ing the central sitet thenetwork may bea larger eapoity link than the remote site links 114a,b whieh feed in toi. oritmay be the same or even smaller capacity. Similarly link 1146 ‘could havea higher o lower eapicty than the sum of remote Finks 1160-¢ 10013} Another depiction ofa network is shown in FIG. 2 Viewed this way, central site ink 114d is at dhe top of the hierarchy, The ro remote site links 114, bare presented by the fist level rectangilar boxes while local systems 110 at ‘each remote site are represented by rounded hoxes. Second, level links 1164, 6 to the mone remote sites 1124, 6 are ‘connected through remote ste link 114h Classes of data teallc originating from the various systems 110 are repre- sented by ovals, eg, classes 212 for VoIP wale, 214 for Cites alle, and 216 forall other network aie, Classes are sometimes shown directly feeding into a fink, rather than ‘coming through a system 110, eg. classes 2124, 2144, 2160 ‘connected to link 14a. At each level ofthe hierarchy, link that represents several links atthe next level down is refered toasa link group. For example, the link 1146 is ink group that eaeres walle from the links 1161 and 1166 From the romote sites 112a and 112 to the central site 1024 vi lnk 1144, aswell a tallicon link 116 from thesysiem 110 local to site 102, [0015] Fach ink may have a minimum guaranteed band- Width, that is, the network is configured to assure that the ‘capacity on link 114d associated with traffic for other links a,b, 1163, 6c isallocated atleast to min ured rate for tht link. Links may also be configired with an allowable burs mit thats, amiaximu rate of tall thatthe Tink cam generate at any one time. Link oversubscription ‘occurs when the total andvdth available or used by set of Finks into system orsite exeoeds the bandwidth avilable on that site's link to the next level ofthe network hierarchy. For ‘example, ifeach of links 116, cond allow I Mbis, but the ‘outgoing link 114couldonky provide |.5 Mbis,thelink 14 ‘would be oversubscribed. With inadequate scheduling, one Tink may use tbo great a portion of the available uplink band- ‘width, preventing another link from achieving its guaranteed ‘minimum rat. Conversely if the upstream link has 2 larger ‘capacity than ul tbe downstream link, eg. iflink 114d had capacity of 10 Mis in the previous example, it could carry ‘oo muh trafic and overwhelm the downstream links 11, ‘to the remote sites 102, 6 The same problems ae present jn muting traffic on emote site link 114 to and from second- level links 1162, & Tink scheviuler manages the traffic over ‘ech link to prevent oversubscription or overflowing o inks, Such a scheduler determines which downstream ink’s trafic shall be carsied by the upstream link (in either direction) at ny particular ime according to fink scheduling algorithm. A single central site scheduler, e., at device 108d, may ‘operate atthe fop level ofthe network, modeling bottlenecks. fatal lvels of the network to assure that link scheduling at ‘each level is compatible with each other lve. Fr example, 2 ‘central siteschediler will not send more teal over link I4d that i ultimately destined for Hinks 1162 andl 11.6 than those Jinks can handle, evea iintenmediate Fink 114% could handle that much incoming trafic. [0016] In addition to actual connections between devices, ‘different classes of network tatfie may have diferent guar antood minimum rates and burst limits. For example, VoIP tealle 212 may havea higher minimum anda lower burst rate than Citsix taffie 214 or regular network traffic 216, Tor tealie within ink. a elas sebedoler determines which tual jum contigs Nov. 19, 2009 data packets to transmit, based on their elass and a class scheduling algorithm. A single scheduler or set of schedulers could be implemented ata high level of de hierarchy, and ‘hie scheduling determinations cascaded down to the classes ‘of tafe at each remote site. As with link scheduling, class schedulers operate on traffic owing ia either dirsetion. la some examples, certain classes may have preemptive priority, in which case they not only take prosty within their ink, but the link ise is temporaily given priority over other Tinks to assure pockets for that class are quickly trinsmitted. As the preemptive cass and link are satisfied the scheduler updates ‘normal usage counters for the class and link scheduling algo- rithms. [0017] In some examples, shown in FIG. 3. ‘lee utes a group ratio round robin (GRRKY) algorithm to Slermine wht order to schedule he links a. Link group 10 represents lower-level links 308 each with a different ‘weight based on a guaranteed or actual rate, The GRRR algorithm uses “weight. groups” 302, 304, 306 to group together inks 308a-/that have similar weights lists main- tained ofa small number of groups 302, 304, 306 of Tinks 3082-/having similar weights. For example, the link group ‘om is in a group 302 of is own bocause it has 2 Weight of fine, The inks 308b-care ina scond group 306 because they have the same weight, two, The links 308e-/likewise form a group 306 of inks with weight one. The groups 302,304, 306 then have total weights of 9,6, and 2 for relative ratios oF 1S (©:6),3 (62) and I, respectively [018] Groups are selected to teansmit packets basedon the ratio of bandwidth noeded by one group to the bandwidh needed by each other group. Each group cansmits an amount ‘of data detemnined by the algorithm in tum Within each _arup, individual links are selected to transmit packets based fn the deficit romnd bin (DRR) algorithm, in which ind- vidual links are selected based on the ratio oftheir traffic volume to that of the othr links inthe group. [0019] | TheGRRR algorithm, asadaptedio link scheduling, proceeds as follows, within the example of FIG. 3. simple ‘weighted round robin algorithm would schedule inks 3080-f as AAAAAAAAABBCCDDEP (here leters A-P eorre- spon! to links 308z-/). While this provides overall fae band- ‘width sharing, t does nt provide optimal service latency. To especie think up 308 wl gt bus of congestion ‘case queting and possibly sustained packet loss. The oe inks gtr baa thy wal for 3080 o exaust is ‘weighted portion of link group 310. [0020] The GRRR approach resolves this by spreading \Weighis overan entire service frame. In the example above it ‘ill schedle theses: AABAACAADEAABAACAADE. To achieve this, each link or link group is sorted into weight soups 302, ¥04, 306 with other peers that have weights Within the same factor of 2 (eg. rates between 2 k to 2 +11). Theweight groups arethen sored based on their total ‘weight (ic. the sum of the weight of the icluded links and Jink groups). Then te ratio ofthe weight from one group t0 the next is calculated, and a credit is assigned based on the [024] _\ pointer is kept on the curent weight group being serviced, Credits and counters are used to maintain the ratios between the weight groups. deficit counter is used 0 deter ‘mine when to move to the next weight gnbup. A quantim counters usd to determine when the curent Weight up ‘iio s satisfied and move the process back tothe start. On the ‘ext invocation ofthe schedule, that weight roup is serviced US 2009/0285219 AI ‘and he counters are deeresed by the amount of data sen I Som examples, the quantum is defined a power of (e 1024 bytes) o simplify ratios by using ashi operation. 10022}” The schodulee moves the pointer between weight groups using the folowing ordre ules aller servicing the current weight group {0023} "1-irthedefcitandquantuncountersarebotbabove ‘exo, then the pointer stays wih cutent weight group. {0024} 2 fhe deficits ato below zor, then the deft ‘rit sade othe deficit counter andthe quantum credit is sekled othe ust couster and the the pointer moves to next weight group. 0025) 5. [Fooly the quantum eat or below zor, then the ‘quantum credit is added tothe quantom counter and the Pointer moves o (or emainn a) theft waht group. [0026] Because items within a group have weights within a power of 2of each other, heehee can uesimple deft Found obin within each weight group and il maintain good fair service latency. Table I demonszates each step forthe above proces (fo simplicity, a quantum sizeof 500 is used ‘with imple packet sizes) In each ep th highlighted group ‘canis @ Variable number of bytes and the ne deficits shown inthe Def+ column, The curen quantum amount and ‘quantum deficit are shawn in the Qu and Qus columns, Fexpoctvely. The two res shove ae repestodly followed as theprocessimovesromonerow tothe next. This resultsinthe schedule idetiied. above, TABLE-US-00001 TABLE. 1 ‘Weight Group 302 Weight Group 304 Weight Group 306 (Client: AD Clients: B-2, C-2. D=2 Chicas: El, F=1 Weight 9 Weight: 6 Weight 2 Ratio: 15 Rato:3 Ratio: 1 Cees 750 Crest: 1500 Cre $00 slot Def Deke Sent Def Dela Qu Qut Sent DefDel+ Qu Que Sent 750.\ 1500 50013 500 5001 750-50 $00 1500 500 500500 E2700. 1500 1100 500 104 B 400 500 500 3 701100 600 100-400 5 500 500 S00 F 4 704) 300. 400 600 100 C 500 S00 ES 400-4004 7060 100 500500 6380.0 100 100-800 {C500 S00 SOO: 7380-480. SOO 100 1001 500 S00 28300 ‘10-400 18-400 1 500 500 500139300 1100 188 500 ‘050001 500 10300-300.A 600 1100 1008 S00 S00 11450 ‘411001000 10008 100 S00 500 12:50-250.A 7001000 $5001C 500 S00 F 13 500.8 1000 500 $00 0 $00 500 500 F 1445000 500 500 50D 500 500 F 15 750.4 50005000 00 S00 S00 F 16-750. 1500 $0018 500 300 500 300 F200 1775081500 5008 3000 300 0F 300 0027). Insome examples, a network includes thousands of Tinks, but there will wenerally only be 38 weight props. Ia most cases, most ofthe links will hve similar res (6, {ypieal network may have 400 Finks at 256 Ks, $00 inks at S12 kbs § links at 1.5 Mlvs, and 2 links at 3 Mbis). Since ‘veal groups are defined as weights within a power of 2, there area maxim of 32 groups posible to cover ll nk types between 1 bs (2sup.6)and2 Gis @sup}). In other words, adding the GRRR mechanism to link schodlee reuires minimal memory overhead, sit requires minimal Proceising time while providing s very wood aby to gar ante rates and fie servce lene. Such a ink schedulers refered 10 a order O(D), meaning hat the amount of ene Potato necessary to operate itis substantially insensitive to the ntimber of links being scheduled. The original GRR algorithms were designed forprocess scheduling. andassume ‘work ute of xed size aa animption that sot necessarily tte for packet scheduler. 0028) "y adding deficits ax described sbove each aroun has the ability to exceed its ratio during one warsmisso Nov. 19, 2009 (ex, the deficit of -50 in step 1) but this will decease the volume of data that group ean send by that amount the next time it comes to be serviced. This error is hounded by the ‘maximum packet size or quantum size (whichever of te two is smaller) per service interval [0029] The variable sizeof packesis also the reason forthe addition ofthe “quantum measurements into the weighted troup schedules, This ensures that groups sending small packets wil still zt thei fir ratio of bandwidl. As a weight ‘sup service, the scheduler maintains the quantum deict fo assure the previous weight group ratio is stistied, When thealgorthm is moved t a new weight group, its recharged ‘with a new quantum (¢g., another 500 bytes is added). Any excess or surphs is taken into account during the next quan- tum Ge itis slighlly less thenit would normally be). The size 500 was used for simplicity of illustration. A size of 1024 is often used and is significant because it allows for efficie ‘uliplicaion and division (by a simple bitwise shift left or right). This calculation of transition credits when weight groups are crested or adjusted accounts for links becoming actveldle In some examples, quantum of 1024 bytes pro- vides a good point in tuning the performance of the scheduler between efficiency and precision. A smaller quantum (eg. 512 bytes) may have less eror because it finds the best wroup {o service next. This can help ensure the best service interval ‘or Hinks with small packet sizes (e, mostly voive or MTU Timited allie). However, this may come atthe expense of cficieney in looping through groups until she quantum is large enough for one to send, Other O(1) packet schedulers use a quantum ofthe maximum packet size (e4., 1600 byte). ‘The eror introduced from the quantum is bounded tobe Tess then a quantum diference per service interval. [0030] In some examples, the GRRR algorithm assumes that all inks within exch weight group are active. IPs group contains idle or over-limit links then the unused handwidts from each sueh link would be given up by that link for the ‘current service frame, It does this by assuming it did its DRR credit amount of work. The work i updated in oth the DRR and GRRR deficits and the algorithm continues as i it was actual work done. Ths efliienly distributes the idle work done across all Tinks aod link groups. Fach weight proup ‘maintains ist of active Finks and idle links. Asa frst packet js queued into an idle link, the link s moved tothe tal af @ RR list and the weight is inerased forthe weight group. If sot already set, then a pointer s set othe curreat location in the DRR list. Despite becoming active, the deficits are still ‘rocked from before (ie, they are not reset), This ensures that Tinks that oscillate from active o inactive are not allowed t0 cheat at bandwidth. As the scheduler exhausts Tink, (e- ‘moves the last packet) it contin to Teave iin the DRR until the next rund, AS the scheduler visits Tink that is stil exhausted on a second pass, it will hen remove it from the DRR aetive lst and put it nto the idle list. Iwill also update the total weight of the woight group. [031] tthe end ofthe service frame, the scheduler recal- ceulates the ratios ancl eres for the effected groups, I the ‘group doesnt shift in ts sorted location then only the ert Torthe current group and the one that has lazer weight needs tobe updated. In some cases the order ofthe list may ebange Decase the adjusted weights for this group cause it to exceed the total weight of group in font of or hehind tin the original Tis In tis case the credit needs to be updated on three groups US 2009/0285219 AI (dhe two listed in the previous paragraph, plus the one above the previous locaton since its ratio is now to the group that hhad been behind us). [0032] -Insome examples, this algorithm is performed even Jess frequently (ie. every Ntimes through the DRR eyele, or for multiple GRRR frames). On average the penaliy for @ delay before increasing the credit Fornew activation should be balanced by a similar delay before decreasing credit for idle Tink removal. The effet of a slighly low eredit is a missed fame for the group but the peritem entar is distributed between the items so they only loose a bit on the service Intency. Conversely, when the credit is slightly high, the troup may be given an extra slot in the frame, and this is also distributed asa slight boost tothe service latency per link 10033] In some cases a weight grup will only have one or ‘wo links within t those Tinks go idle then the entre weight roup should not rake any slots fom te frame. To do this the ‘weight group is Nagged as idle and will give up is slot by crediting either ts quantum or next weight group exedit (shchever is smaller. This maintains te ratio of work done with other weight groups. A‘ the end of the frame this weight group is removed. The ratios and eredits from the group in Toa of where it had resided are recaleulated. When the lnk ‘within the weight group becomes activated again, the weight troup is moved hack into the caret location and adjusts the Fatios credits forthe group in front of it and for tel. 10034] The above examples were of a single link group. ‘This willbe typical of mos branch policies that will have woot level link group with one or two links (ea ink to head- ‘quarters ana Tink othe Intern). In some examples, like that shown in FIG. 2, there are policies that ean have nested Tink aroups in a ierarehy (below 1140), As shown in the above ‘example, the ink group 308 Was givena fractional guarantee Jn the same way that links are given fraction guarantee Within this link group, the central scheduler adds nother ‘GRRR schedulero manages of tschildeen, Theschedulers ‘rerun independently ofeach other but ensure precise service intervals forall tems within the overall scheduler, [0035] The weight yroups and DRR algorithms provide fair bandh sharing and fair service latency based on guaran- tees, However, it has no concept of rate limiting which is @ requirement, in some implementations, to represent the physical eapacity ofa lnk (sos not to exceed its rate, causing congestion). The rte limiting is done in a similar manner to ‘lass based queving (CBQ), Fach ink and ink group objects ‘calculated to have an average time spent per byte for when it Js running at its Jimi, Using this the seheduler tracks the next time to send, and an “average idle time” variable tracks the variance with the tual next ime dataissent. thenext time ‘past or not st) then the ink or Hak group isnot rate Himited and can send. Otherwise, it has exceeded its rate and is skipped, [0036] Within a selocted fink, a class scheduler is used t0 ‘determine which data packets actually get transmitted over the link. Packets may be classified based! on the application that generated them, priorities assigned by aa application, oF other factors. CBQ is one algorithm for scheduling talc based on clas. In CBQ, packets are scheduled according (0 relative priorities based onthe type of data represented, Por ‘example, VoIP data needs low lteney, while regular IP traffic ‘ea tolerate reduced latency but may require higher accuracy. Insuch an example, VoIP packets would be scheduled to be transmitted frequently and promply, hut aot in lange clusters. This sort of scheduling is wreater than order O(1), meaning Nov. 19, 2009 ‘hat the amount of computation necessary fo operate a sted tuler varies linearly with the numberof classes, which may not bbe manageable for lage networks [0037] Link-based scheduling and class-based scheduling ean be combined as shown in FIG. 4 10 achieve benefits of cach without requiring burdensome amounts computation resources. Tink scheduler 402is used to select which ink 10 allocate capacity to, but does actually queue tralie to be transmitted. Rather, it simply selects which class scheduler 4a, b(therebeing one foreach link) o take tae from. The selected class scheduler dda or 404% then selocts packets rom lasses 4060, band delivers them tothe link slider to ‘be transmitted. The link scheduler transmits packets provided by the cass schedulers into the aetwork 410, for example, by sending them to a network interface ofthe machine on which the scheduler is operating. This process may be repeated at cach stage of a hiorarchical network like that shown in FIG. 2, ‘or may be done centrally and communicated to the espon- sible hardware at each site [0038] Thetypical packet fitertodetemine the class queue Tora packet can be based on many packet attributes (addres, port, type of service, packet flags, ete). However, mixing those fering atributes allows filters to overlap so they are sored and searched in precedence oder, which is O(N), On reworks containing hundreds or thousands of links with ‘many classes per link this is not generally scalable. The packer filtering shown in 606 of FIG. 6 uses route based prefiltering based onthe link subnet definition to determine the link a packet will use, This pre-fltering can use routing Algorithms like radix toe tallow an O(log(N)) search. link ‘only has & few clases, so within the ink a normal O(N) precedence search can be done on rules to select the class a packet within the link should use. The elass and link deter ‘mination i then eached as par ofa flow table, as disclosed in US. Pat. No. 7,010,611, whichis incorporated here by efer- ‘ene, so when scheduling future packersthe scheduler ean do ‘quick hush to find flow and the previous class and link ‘determination. In some eases this allows O(1) class and link Seterminaton, [0039] ‘The techniques described herein can be imple ‘mente ia digital electonie ercuitey, oF in computer baed- ‘ware, firmware, software, or in combinations of them. The techniques ean be implemented asa computer program prod- vel ea computer program tangibly embodied in an infor vation carrer, ei a machine-readable storage device or in a propagated signal, for execution by, or 10 contol the ‘perationo data processing apparatus, et. programmable processor, # computer, or mulliple computers. computer program ean be written in any form of programming lan- ‘age, including compiled or interpreted languages, and it fan he deployed in any form, including as a stand-alone programor asa edule, component, subroutine, or oer uit suitable for use in a computing environment. A computer rogram can be deployed t be executed on one computer oF ‘on multiple computers atone site or distributed across mul- tiple sites and interconnected by a communication network [0040] Method steps ofthe techniques described herein can be performed! by’ one or more programmable procestors ‘executing « computer program to perform functions of the invention by operating on input data and generating output. ‘Method steps can also be performed by. and apparatus ofthe invention ean be implemented a, special purpose loge c iy, €.an FPGA (field programmable gate array) or an (application-specfie integrate circuit). Modules ean US 2009/0285219 AI refer to portions ofthe computer program andor the proces: sorspecial circuitry that implements that functional [0041] Processors suitable for the execution of computer progniminclude, by way of example, both general andspecial purpose microprocessor, and any one or more processors of ‘any’ kind of digital computer. Generally, a processor will, receive instructions and data from a read-only memory or @ random access memory or both, The essential clements af & ‘computer area processor for executing instructions and one for mone memory devices for storing instructions and dats ‘Gooeraly, 2 computer will also include, or be operatively ‘coupled to receive data from or transfer daa to, or both, one ‘or more mass storage devices for storing data, c-2, magnetic, ‘magneto-optical disks, or optical disks, Information carriers suitable for embodying computer program instructions and data inelude all forms of non-volatile memory, including by ‘way of example semiconductor memory devices, ex, EPROM, EEPROM, and flash memory deviees; magnetic disks, et, intemal hard disks or removable disks: magneto- “optical disks: aad CD-ROM axl DVD-ROM disks, The pro- ‘cessor and the memory can he supplemented by a incorpo Fated in special purpose loge ereuity. [0042] A number of embodiments of the invention have boen deseribed, Nevertheless, i ill he understood that vari- ‘out modifications may be made without departing fromthe spirit and scope ofthe invention, For example, other network topologies may be used. Accordingly, other embodiments are within the seope ofthe following elas. 1.A method of scheduling dats trafic comprising seleting a source of trafic from a plurality of sources of tralfic swing a group ratio round robin scheduling algorithm, 2. The method of elaim 1 in which ‘using 8 group ratio round robin scheduling algorithm com= prises Aefining an ondered set of groups of sources of tlle having similar weights, ‘computing ratios between total weights ofthe groups, repeatedly, choosing one of the groups, within the cho= ‘en group, using a socond algorithm to choose a source of traffic, ‘transmitting an amount of afc fom the chosen source 3. The method! of elaim 2 in whieh the second algorithm isa deficit round robin scheduling ‘algorithm, 4. The method of elim 2 also comprising computing a defeiteredit and quantum eredit for each soup Based on the ratios, and after the transmitting, updating a defieit counter and a quantum counter forthe chosen group based on the Amount of tlie tamsmited and the credit, 5, The method of claim 2 in which choosing one of the roups comprises if the deficit counter and the quantum counter of the last chosen group are above zero, choosing the last-chosen group, ifthe deficit counter ofthe last-chosen group iat o below Adding the deficit eredit to the deficit counter, adding a quantum credit tothe quantum counter, and choosing the next group ofthe ordered set of groups, and ifthe deficit counter ofthe last-chosen group is above zero and the quantum counter isa or below ze10, Nov. 19, 2009 ‘adding a quantum credit tothe quantum counter for that troup, and choosing the first group in the ordered set of groups, 6. A link scheduler apparatus to determine an onder to sched Kinks ‘comprising & procestor adapted by a software program to ‘group togerher Tinks that have similar weights ‘maintain ist of groups of Finks having similar weights based on a guaranteed or actual rate 7. The appsraus of claim 6 further comprising a processor adapted by a program product 10 select a gxoup to transmit packets based on the ratio of bandwidth necded by each group to the bandwidth needed by every ber group whereby each group trans- mits an amount of data in en, and within each group, select an individual Tink transmit packets based on the ratio of its tafe volume to that of the other links i the selected proup, £8. The apparatus of elaim 6 further comprising 8 processor to spread weights over an entre service ame, the processor adapted by a program product to sorcall groups on the sum ofthe weights ofthe included Hinks and Tink groups, calculate a ratio of weight from one group tothe next, and fssign a credit based on the ratios. 9. The apparatus of claim 6 further comprising 4 pointer store to contain the current weight group being Serviced ‘quantum counter to determine wen the current weight "Up rato is satisled and move the process back to the ‘wherein onthe next invocation ofthe scheduler, that weight ‘group is serviced and the counters are docreased by the amount of data sent; 10, The apparatus of claim 9 furlher comprising a deficit counter to detemnine when te move to the next Weight group, whereby sd delet counter enables cach ‘up toexceed its ato during one ansnission but this will decrease the Volume of data that group can send by that amount the next time it comes to bo serviced, bounded by the smaller of maximum packet size or {quantum size per serviee interval 11, The apparatus of claim 10 further comprising a proces- sorto move the pointerbetween weight groups by application of the following ordered rules after servicing the eurent weight group 1 ifthe deficit and quantum counters are both abowe 22m, then the pointer stays with eurreat weight group. 2, ifthe deficit sat or below ze, ‘ad the diet ereit to the deficit counter, ‘add the quantum credit to the quantum counter, and ‘move the pointer to next weight group, 3 ifonly the quantum is ator below zero, then ‘add the quantum credit to the quantum counter and set the pointer tothe frst weight group. 12, The apparatus of claim 10 wherein the number of ‘weight groups is between three and eight wherein ‘Weight groups are defined as weights within a power of 2, thereby providing a maximum of 3 groups possible (© ‘cover all link types between 1 b/s @-sup.0) and 2 Gis @sup31) whereby the Tink scheduler apparatus requires minimal memory overhead, as it requires minimal processing time while providing a very good ability to guarantee US 2009/0285219 AI ‘aes and fir service latency and the amount of eompu~ {ation necessary to operate itis substantially insensitive to the number of links being scheduled 13. The apparatus of claim 11 wherein the quantum credit Js 512 and the processor ‘maintains the quantum deficit wo assure the previous weight _0up rato is satisfied whereby the best service interval for links with small packet sizes is ensured, 14. The apparatis of clamm 11 wherein the quantum credit is 1024 15. The apparatus of claim 11 wherein the quantum credit is 1000, 16. The apparatus of claim 10 further comprising a proces- sor adapted by a program product to recalculate the ratios and credits for the affected groups periodically, ifthe group doesnot shift ints sored location then only the eredit forthe current group and the one that has larger weight needs to be updated ifthe adjusted weights fora group cause ito exceed the {otal Weight of soup in front of or behind tin the original is ‘update the credit on three groups the curent group, the group that a larger weight and the group above the previous locaton, 17. The apparatus of claim 16 further comprising a proces- so adapted hy # program product wherein periodically sat the end of each Nik serve frame wherein N isan integer variable under algorithmic or mana contr 18, The apparatus of claim 10 further comprising. proces- sor for idelink management wherein the processoris adapted bya program procact to Nov. 19, 2009 ‘28 idle and will give up is ‘ret the smaller ofits quantum or next weight group ered, remove this weight group atthe end ofthe frame, recaleulate the ratios and credits [om the group i frat ‘of where it had resided, ‘on the condition of aay Tink within a weight group lagged ‘8 idle becomes setive: move the weight group back into the correct loeation, ‘and ‘adjust the ratoslredis forthe group and its preceding proup. 19. The apparats of claim 10 for nested link groups in @ hierarchy further comprising nother scheduler to manage al ots chien wiaerein the schedulers are run independently of each other but ‘ensue precise service intervals fr all items within the ‘overall scheduler wherein a scheduler is a process of @ processor 20; The apparatus of claim 10 further comprising a proces sor for rat imitation which isa requirement, in some imple- rmentaions, to represent the physical capacity of a link ‘wherchy congestion is prevented whercin the processor is adapted by a program product to calculate an average time spent per byte for each link snd link group object whe itis nang a its Hin truck the next time to send, and an “average idle i variable, track the variance withthe atual net time data is sen send ifthe next time i past (or not set) because the link or Tink group is aot rate limited, and skip ifthe ink or link group has exceeded is rate

You might also like