You are on page 1of 210

PRODUCED ON ACID-FREE PAPER

MEASURING, UNDERSTANDING
AND MODELLING
INTERNET TRAFFIC

Nicolas Hohn

SUBMITTED IN TOTAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

JULY 2004

DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING
THE UNIVERSITY OF MELBOURNE
AUSTRALIA

A mes parents, pour leur amour, encouragement et constant support,
sans qui rien ne serait.

iii

understanding and modelling Internet traffic. This leads us to also investigate traffic sampling strate- gies and their respective inversion methods. In practice. It allows us to gain valuable insight on the underlying mechanisms creating the observed statistics. and propose a scheme to export router performance information based on busy periods statistics. This model has a small number of parameters with simple networking meaning. We argue that the packet sampling mechanism currently implemented in Internet routers is not practical when one wants to infer the sta- tistics of the full traffic from partial measurements. We complete our understanding of Internet traffic by focusing on the small scale behav- iour of packet traffic. and use a so called semi-experimental approach to isolate certain features of traffic we seek to model. We first study the origins of the statistical properties of Internet traffic. This inversion technique can also be used to fit the Bartlett-Lewis point process model from sampled traffic. in particular its scaling behaviour. for a new packet traffic model. We conclude this thesis by showing how the Bartlett-Lewis point process can model the splitting and merging of packet streams in a router. We base our analysis on a large amount of empirical data measured on different networks. and propose a constructive model of packet traffic with physically motivated parameters. To do so.Abstract This thesis concerns measuring. We show that such sampling strategy is much easier to invert and can give reasonable estimates of higher order traffic statistics such as distribution of number of packets per flow and spectral density of the packet arrival process. known as Bartlett-Lewis point process. and is mathematically tractable. Internet traffic measurements are limited by the very large amount of data generated by high bandwidth links. We advocate the use of flow sampling for many purposes. These results lead to the choice of a particular Poisson cluster process. v . We present a simple router model capable of simply reproducing the measured packet delays. we use data from a fully instrumented Tier-1 router and measure the delays experienced by all the packets crossing it.

.

appendices and footnotes. (ii) due acknowledgement has been made in the text to all other material used.Declaration This is to certify that: (i) the thesis comprises only my original work. Nicolas Hohn vii . and (iii) the thesis is less than 80000 words in length. maps. bibliographies. exclusive of tables.

.

Veitch. Hohn and D.Preface The work presented in this thesis is the result of original research conducted by the author. (fast track submission). Hong Kong. in Proc. [79] N. Clermont Ferrand. pp. International Conference on Self-Similarity and Applications. ACM Internet Measurement Workshop. New York. D. October 2003. and C. in Proc. 222–233. April 2003. Hohn. pp. June 2004. Veitch. “Cluster Processes. Passive and Active Measurment Workshop. D. D. in Proc. IEEE ICASSP. Annales Mathématiques Blaise Pascal. [173] D. [3] P. Hohn. May 2003. in Proc. Nice. Colloque Mesure de l’Internet. France. Marseille. Abry. Abry. a Natural Langage for Network Traffic”. Abry. N. as follows: Chapters 3 and 4: [81] N. USA. Papagiannaki. France. IEEE/ACM Transactions on Net- working. (submitted). November 2002. “Invariance d’échelle dans l’Internet”. and P. “The impact of the flow arrival process in Internet traffic”. USA. Veitch. ix . Hohn. D. Papagiannaki. “Does fractal scaling at the IP level depend on TCP flow arrival processes ?”. April 2004. D. and P. Chapter 5: [78] N. N. K. in Proc. in Proc. IEEE Transactions on Signal Processing. “Investigating the scaling behaviour of Internet flow arrivals”. Abry. Abry. Hohn and D. Veitch. Chapter 6: [142] K. Veitch. Hohn. ACM SIGMETRICS conference. [80] N. France. [83] N. in Proc. D. and D. “Inverting sampled traffic”. Veitch. Hohn. P. Antibes. August 2003. France. 63–68. or submitted for publication. [84] N. May 2002. Hohn. Miami. [82] N. Abry. ACM Internet Measure- ment Conference. “Bridging router performance and queueing theory”. and P. “Origins of microcongestion in an access router”. Hohn. and P. Hohn. Special Issue on Signal Processing in Networking. “Inverting sampled traffic”. VI 37–40. Veitch. Parts of it have been published. Veitch. Diot. Veitch. and P. 51(8):2229–2244. “Multifractality in TCP/IP traffic : the case against”. pp. Flandrin. and N. Best student paper award. Veitch.

Hohn. (submitted). D.Chapter 7: [85] N. “Splitting and merging of a traffic model: validation”. Ye. x . Veitch and T.

members of my PhD committee. I was very lucky to work at Ecole Normale Supérieure de Lyon (France) with Patrice Abry at multiple occasions. He made my PhD studies a great experience. George Bernard Shaw I would like to thank Darryl Veitch. I would also like to give a special thank to the folks from the IP group at Sprint Advanced Technology Laboratories in San Francisco (USA) for making my stay there such a great experience. Intel Research Cambridge and Laboratoire d’Informatique de Paris VI for their kind hospitality and financial support during my short visits. The financial supports from the Commonwealth government of Australia through an In- ternational Postgraduate Research Scholarship. I cannot thank enough the persons involved in these life changing events. I will look back at our late night enlightening discussions and desperate moments before dead lines with fond memories. my PhD advisor. Studying in Australia for my MSc and my PhD has been an amazing journey. not be- cause of all the miles flown. as much scientifically than personally. and a fax from the Vice-Chancellor of the University of Melbourne to support my visa application when I was about to be deported. Ecole Normale Supérieure de Paris. Ericsson and the Australian Research Council Special Research Center for Ultra-Broadband Infor- mation Networks were crucial to the successful completion of this project and are gratefully acknowledged. but because I met some great people along the way. From a research perspective. I am grateful to the people at the Cooperative Association for Internet Data Analysis in San Diego (USA). I thoroughly enjoyed working and “exchanging ideas” with him. then each of us will have two ideas. On a more personal note. then you and I still each have one apple. from the University of Melbourne. A couple of moments stand out: a job offer from the Bionic Ear Institute just days before I was due to reluctantly leave Australia to complete my military duties in France. But if you have an idea and I have an idea and we exchange these ideas. for their assistance and suggestions over the course of my work and in the preparation of this thesis. guidance and availability. Thanks go to Iven Mareels and Stephen Hanly.Acknowledgements If you have an apple and I have an apple and we exchange these apples. I would like to thank my friend Jean for taking me moun- taineering on Makalu 2 in the Himalayas and thus showing me that one can still have a life xi . The story that led me to leave the French Alps and complete a PhD in Australia is too long and too incredible to be fully accounted here. for his support.

Last but not least. Australia May 2004 Nicolas Hohn xii . and for bringing so much in my life over the years.during a PhD. Being so far from home means that I did not see my family as much as I would have wished. I am also grateful to all the amazing people from the Melbourne University Mountaineering Club with whom I shared some wonderful adventures and epics. I would like to thank Andrea for coping with my working hours and my long overseas trips. Melbourne. I thank them all for their support and understanding.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 xiii . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . 35 2. . . . . . . . . 10 1. . . . . . .1 History and fundamentals .1 Black box traffic models . . . . . . . . . . . . . 16 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . 5 1. . .1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . .4 Wavelet analysis . . . . . . . . . . . . . . .4. 25 2. . . . . . . . . . . . . . . . 27 2. . . . . . . . . . . . . . . . . .2. . . .1. . .Contents List of Tables xvii List of Figures xix Principal Notations xxi 1 Introduction 1 1.6 How to read this thesis . . .1 Self-similarity . . . . . . . . . . . . . . . . .4. . . . .4 Internet traffic models . . . . . . . . . . . . . . 19 2. . . . . . . . . 3 1. . . . . . . .2 Definitions . . . . . . . . 17 2 Mathematical background 19 2. . . . . . . . . .1 Introduction . . . .3 Teletraffic engineering . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Physical models . . . . . . . . . . 6 1. . . . . . . . . . . . . . . . . 23 2. . . . . . . . . 13 1. . 1 1. . 22 2. . . . . .3. . . . . . . . . . . . . . . . . . 20 2. . . . . 15 1. . . .5 Contributions and thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2. . . . . . .3. . 22 2. . . . . . . . . 23 2. . . . .4. . . . .4 Density spectrum . . . .2 Long-Range Dependence . . . . . . . . . . .3 Estimation . . . . . . . . . . .2 Self-similarity and other scaling behaviours . . .2 Organization . . . . . .3 Moments . . .5 Operations on point processes . . . . . . . . . . . . . . . .5. . . . . . . . . . . . . . . . . . . . . . . . 1 1. . . . . . . . . . . . .1 Definition . 2 1. . . . . . . . . . . .2. .2 Philosophy and aims of this thesis . . . . . . . . . . . . . . .4 Infinitely Divisible Cascades . . . . . . . . . . . . . . . .2 Traffic modelling . . 34 2. . . . . . . . . . . . . . . . . . . . . . .1 Introduction . . . . . . . . . . . . . .3. . . . . . . . . . . . . 15 1. . 29 2. . . . . . .3. . . . . . . . . . .1 The Internet . . . . . . . . . . . . . . . . . . . . . . 6 1. . . . . . . . . . . . . . . . . . .4. . . . . . . . . . .2 Outline . . . . . . .3 Multifractals . .3 Point Processes . . . . .2 Properties . . . . . . . . . . . . . 31 2. 19 2. 19 2. . . . . . . . . . . . . . . . . . . . . . . 9 1. . .1 Contributions . . . . . . . . . . . .5.1. . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .5. . . . . . . . . . . . . . . . . . . . . . .1. . . . .1 Introduction . . . . . . . . . . . . . .2. . .1. . . . . . .3. . . . . . . .1 Knee tracking algorithm . . . . . 96 4. . . . . . . . . . . 99 5. . . . 44 3. . . . . .3. . . . . . . . . . . . . . . . . . . mice. . .2. .1 Motivation . . . . . .1 Flow volumes manipulation . . . .5. . .4. . . . .2 Terminology . . . . . . . . . 102 5. . . . . . . . . .4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5. . . . . . . . 46 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Packet sampling . . . . . . . . . . .2 A flow based model: Bartlett-Lewis point process . . . . . . . . . . . . . .1 Passive measurements . . . . . . . . . . . . . . . . . . . 103 5. . . . . . . . . . . . . . . . 109 5. .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4. . . . 43 3. . . . . . . . . . . . . . . . . . . 71 3. . . . . .4 Model validation . . . . . . . . . 59 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Conclusion . . . . . . .5. . . . . . 42 3. . . . . and a multiclass cluster model . . 69 3. . . . . . . .3 Cluster models . . . . . .3. . . . . . . . . . . . . . . .3 Flow arrival process . . . . .4 Central observations: biscaling and heavy tails . . . . . . . . . .1 Basic manipulations . . . . . . . . . . . . . . . . . . . . .1 Marginals . . . . . . . . . . . . . . . . . . .4 Packet arrival process and semi-experiments . . . . . . . 2. . . . .4 Making sense at small scales . . . . . . .2 Flow sampling . . . 37 2. .1. . . . . . . . . .2 The data and data processing . .4. . . . . . . . . . . . . . . . . . . . . . .1 Introduction . . . . . . . . .6. . . . . . . . . . . . . . . . . . 87 4. . . . . . . 99 5. . 64 3. . . . . . . . . . . . . . 42 3. .1 Model fit . . .2. . . . 38 3 Empirical observations and semi-experiments 41 3.2 Small scale behaviour: multifractal or not ? . . . . . . .4. . . . . . . . . . . . . . . . . . .3 IP flow decomposition . 87 4. . . . . 78 4. . . . . . . . 97 5 Inverting sampled traffic 99 5. . . . . . .5 Towards understanding traffic evolution . . .6. . . . . . . . . . . . . . . . . . 110 5. . . . . . . . . . 78 4. . . . . . . . . 90 4. . . . 102 5. . 53 3. . . . . . . . . . . . . .4. . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4. . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Dependence on traffic characteristics . . . . . . .2 Inverting sampling: theory . . .2. .5 Impact on packet arrival process . .5 Conclusion . .1 Packet level . . . . . . . . . . . . .2 Knee position manipulation . . 53 3. . . . . 41 3. . . . . . . . . . . .2. 101 5. . . . . . . . . . . . .4. . . . . . . . . . . . . . . .5. . . . . . . . . . . . . . . . . . . . . .7 Conclusion . . . . . . . .1 Introduction . . . . . . .3 Flow subsets manipulation . . . . . . . . . . . . . . . . . . . . . . . . . 51 3. 68 3. . . . . . . . . . . . . . . . . . . . . . . . . . 92 4. . . 68 3. . . . . . . .3. . .1 A black box model: gamma renewal . . . . . . . . . . . . . . . . . . . . . . . . . .1. .3 Inverting sampling: practice . . . . . . . . . . 95 4. . . . . . . . 60 3. . . . . . . . 67 3. . . . . . . . . 77 4. . .3. . . . . . . . . . . . . . . . . . . . . . . . . .6 Higher order statistics . . . . 57 3. . .2 Empirical observations . . . . . . . . . . . . 100 5.4 Summary . . . . . . . . . . . . .2 Elephants. . . .2 Flow level . . 60 3. . . . . . . . . .3 Previous work . . . . .4 Outline and main contributions . . . . . . . . . . . . . . . . . .2 First observations . . . 95 4. . . . . . .2 Advanced manipulations . . .3 Reconstruction from subsets . . . 72 3. . . . . . .3. . . . .3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 xiv . . . . . . . . . . . . . . . . . . .2. . . . . . . . 73 4 Cluster processes 75 4. . . . . . . . . .3. . . .

5.4 The Bartlett-Lewis point process . . . . . . . . . . . . . . . . . . . . . . . 118
5.4.1 Thinning Bartlett-Lewis point processes . . . . . . . . . . . . . . . 119
5.4.2 Fitting from thinned data . . . . . . . . . . . . . . . . . . . . . . . 120
5.5 How to sample traffic ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.5.1 Packet sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5.2 Flow sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6 Bridging router performance and queuing theory 125
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2 Full router monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2.1 Hardware considerations . . . . . . . . . . . . . . . . . . . . . . . 127
6.2.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.2.3 Packet matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3 Preliminary delay analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3.1 System definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.2 Delay statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.4 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4.1 The fluid queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.2 A simple router model . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.4.4 Router model summary . . . . . . . . . . . . . . . . . . . . . . . . 145
6.5 Delay performance: understanding and reporting . . . . . . . . . . . . . . 145
6.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.5.2 Busy periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.5.3 Modelling busy period shape . . . . . . . . . . . . . . . . . . . . . 150
6.5.4 Reporting busy period statistics . . . . . . . . . . . . . . . . . . . 152
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

7 Modelling Internet traffic 155
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2 Empirical observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2.1 Details of traffic streams . . . . . . . . . . . . . . . . . . . . . . . 155
7.2.2 Packet train through a router . . . . . . . . . . . . . . . . . . . . . 158
7.2.3 Modelling consequences . . . . . . . . . . . . . . . . . . . . . . . 160
7.3 Validation of the BLPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.3.1 Individual links . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.3.2 Splitting and merging of traffic through a router . . . . . . . . . . . 163
7.3.3 Model extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

8 Conclusion 169
8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

A IP Packet structure 171

Index 173

Bibliography 175

xv

List of Tables

3.1 Details of packet traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.1 Full router trace details over 13 hours . . . . . . . . . . . . . . . . . . . . 129
6.2 Breakdown of packet matching for output link C2-out. . . . . . . . . . . . 132

7.1 Details of 2 hour long packet traces collected at the router . . . . . . . . . . 156
7.2 Router matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3 Details of 2 hour long packet substreams crossing the router . . . . . . . . 157

A.1 IP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
A.2 TCP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A.3 UDP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

xvii

. . . . . . . . . . . . . . . . . . . . .3 Illustration of ‘slowly’ decaying variance .4 Inversion of pj . . . . . 65 3. 115 5. . . . . . . . . . . .15 (continued) . . . . . . . . .10 Multiscaling comparison between model and data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3. 91 4. . . . . . . . . .5 Comparison of LDs of AUCK-d1 and BLPP model .2 Aims of this thesis . . . . . . . . . . . . . . . . . . . . . . 89 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3. . . . . .12 LDs of the duration based subsets . . . . . 61 3. . . 48 3. . . . . . . . . . .1 Packet size distribution . .8 Flow and packet density in Abilene . . . . . 46 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Packet arrivals in TCP connections . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 xix . . . . . . . .5 Flows characteristics . . . . . . . . . . . . . . . . . . . . . . . 88 4. . . 95 4. . . .9 Periodicities at small scales . . .1 Sprint North American network . . . . . . . . . . . . . .2 Packet inter-arrival process . . . . . . . . . . . .1 Analytic continuation method . . . . . . . . . . . . . 58 3. . . . . . . . . . . . . . . . . . . . . . . .4 Schematic representation of a BLPP . . . . 36 3. 3 1. . . . . . . . . . . . . . . . . . . . .5 Inversion of pj . .7 Comparison of data and BLPP model . . .2 Examples of Logscale Diagrams . . . . 67 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3. . . . . . . . . . . . . . . . . . .7 Analysis of the flow arrival process Y (t) . . . . . . . . . . . . . . . . . . 80 4. . . . . . . heavy thinning . . . . . . . . . . . . 77 4. . . .9 Knee dependence on traffic subsets . . . . . . . . . . . . . . . . . . .2 Spectrum reconstruction . . . . . . .8 Logscale Diagrams for different protocols . . . . . . . . . . . . . 49 3. . . . . . . . . . . . . . . . . . . . . 4 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3. . . . . . . . . . 82 4. . . . . . . . . . . . . . . . 66 3. . . . .3 Pseudo scaling of a renewal process . . 86 4. . . . .11 Knee position as a function of RTT and rate .List of Figures 1. . . . . . . . . . . . . . . . . . . . . . . . . . .3 Spectrum reconstruction from flow thinned data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5. . . . . . . . .10 Tracking the knee position in Y (t) .13 Schematic illustration of semi-experiments . . . . . . 96 5. . . . .14 Packet-in-flow manipulations . . . . . . . . . . . . . . . . 113 5. . . .15 Semi-experiments applied to AUCK-c1 . . . . . . . . . . . . . . . . 21 2. . . . . . . . . . . . . 50 3. . . . . . .1 Illustration of scale invariance . . . . . . . . . . . . . . . . . . . . . . 79 4.4 Ubiquity of biscaling behaviour . . . light thinning . . . . . . . . . . . . . . 51 3. . . 70 4. . . . 111 5.2 Flow decomposition . . . 57 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Packet process of AUCK-d1 . . .16 Impact of Y (t) on X(t) . . 52 3. . . . . .1 Examining flow variability for AUCK-d1 . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . 164 7. . . . . . 150 6. . . . . . . . . . . . .7 Comparisons of measured and predicted delays . . . . . . . . . . . . .3 Four snapshots of a packet crossing the router. . . . . . . . . . . . . . . . . . 143 6. . . 136 6. 137 6. . . . . . . . . . . . . . . . . . . . 168 xx . . . . . . . . . . . . . . . . .9 Error analysis . . . . . 160 7. . . . . . . . . . .12 Modelling of busy period shape with a triangle . . . .14 Joint probability distribution of busy period amplitudes and durations . . .1 Router diagram with traffic multiplexing to C2-out . . . . . . . . . . . . . . . . . . 141 6. . . . . . . . . 139 6. . .2 Link utilization . . . . . .5 Semi-experiments [A-Pois] and [A-Pois. . . . . . . . . . . . . .2 Second order properties of output stream and contributing inputs . . . 133 6. . 156 7. .5 Minimum router transit time . . . . . . . . . . . . . .5 (continued) . . . . . . . . . . . . . . . . P-Uni] on all traffic streams. . . . . . . . .11 Busy period construction . . . . . . . 130 6. . . . . . . . . . . . . . . . P-Uni] on output link C2-out. . . . . 162 7. . .5. . . . . .3 Packets on link C2-out . . . . . . . . . . . . . 152 6. . . . . . . . . . . . . . .4 Semi-experiments [A-Pois] and [A-Pois. . . . . . . . . .4 Packet delays . . . . . . . . . 159 7. . . . . . . . . . . . . .1 Experimental setup for full router monitoring . . . . . .10 Busy period statistics . . 149 6. . . . . . . . . . .6 Router mechanisms . .6 BLPP fitting from flow thinned traffic .13 Average duration of a congestion episodes versus link utilization . . . . . . . . . . . . . . . . . 144 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.6 Link utilization over 24 hours . . . . . . . . . . . . . . . . 121 6. . . . . . . . . . . . . 135 6.8 Measured delays and model predictions . . . . . 147 6. . . . 165 7. . .

independent and identically distributed .i. Abbreviations AS Autonomous System. page 79 CS Coarse Scale EM Expectation Maximization. page 36 LMD Linear Multiscale Diagram. page 25 iff if and only if IHL Internet Header Length. page 79 HDLC High Level Data Link Control. characteristic function. page 19 fGn fractional Gaussian noise .f. page 20 FIFO First In First Out FS Fine Scale Gbps Gigabit per second GR Gamma Renewal .d. page 37 LRD Long Range Dependence/Dependent. equation and page numbers indicate the first or significant use of the notation. page 128 HSS H Self Similar i. page 2 LD Logscale Diagram. page 171 IP Internet Protocol. page 26 c. page 42 PCI Peripheral Component Interconnect. page 20 Mbps Megabit per second PC Personnal Computer. page 42 xxi . page 117 fBm fractional Brownian motion .Principal Notations Where they are given. page 2 BLPP Bartlett Lewis Point Process.

page 5 SNMP Simple Network Management Protocol. x GP (z) Probability generating function of the discrete r. page 29 h(u) conditional intensity function. especially intensity of stationnary N.v. b] = N((a.v. page 126 WWW World Wide Web Mathematical symbols δ (. page 121 SONET Synchronous Optical NETwork.) Dirac delta function IE(x) Expected value of the r. page 26 λF Flow arrival rate. b] N(t) = N(0.27).36).15).PCP Poisson Cluster Process. page 83 λ rate of N. page 81 xxii . page 37 r. page 171 VOQ Virtual Output Queue. x]. see equation (2.v. see equation (2. page 2 TTL Time To Live. see equation (4. page 28 N(A) number of points in A. page 26 λX Mean arrival rate of X(t). page 24 U(x) average number of points in [0. P. see equation (2. b]) number of points in half open interval (a. x ζ (·. page 128 SRD Short Range Dependent. page 88 c(u) covariance density of counts.25). page 14 SLA Service Level Agreement. page 10 TCP Transmission Control Protocol. ·) generalised Riemann Zeta function.v.t) = N((0.t]). page 26 PoS Packet over SONET q-LD qth order Logscale Diagram. page 24 IR = IR1 real line IRd d-dimensionnal Euclidian space Var(x) Variance of the r. random variable RTT Round Trip time. page 28 Traffic modelling parameters λA Mean packet arrival rate within a flow. page 23 N(a.

page 65 [T-Pkt] Flow truncation after the first q packets. page 45 Y (t) Flow arrival process. flow thinning. see equation (6. page 82 P(i) Number of packets in flow i. page 65 [S-Pkt] Flow selection based on volumes. page 70 [A-Perm] Permute flows around the original arrival points. page 62 [P-ConstR] Rescale the packet inter-arrivals within each flow such that the average flow rates are moved to a common value. page 46 Semi-experiments [A-Clus] Flows are translated (without permutation) to begin at the points of a LRD Pois- son cluster process sample path. page 62 [A-Pois] Poisson arrival process with randomised flow re-assignments. see equation (6.2). page 63 [S-Thin] i.Gi (t) Arrival process of packets within flow i. n) System arrival time of packet n on link λi . page 134 g(T ) (T ) TL Estimate of TL from the reporting scheme. page 46 X(t) Packet arrival process. packet arrival times are replaced by a Poisson process of the same rate. page 26 µP Mean number of packets per flow. see equation (6.i. page 136 M (Λ j .d. page 62 [A-Pord] Retain original flow order. but re-position arrival times according to a Poisson process with the same rate.Λ j (L) Minimum excess system transit time for packets of size L from link λi to link Λ j . page 45 tP (k) Arrival time of packet k. page 134 θi Bandwidth of link i. page 62 [S-Dur] Flow selection based on durations. page 46 FP Distribution of number of packets per flow .17). page 45 D(i) Duration of flow i. see equation (6. page 131 τ(λi . page 153 xxiii . m) Packet matching function .1). page 64 [P-Pois] Within each flow separately. page 81 µA Mean packet inter-arrival time within a flow. page 26 tF (i) Arrival time of flow i.4). page 69 Router modelling parameters ∆λi . page 64 [P-ScaledR] Uniform rescale the packet inter-arrivals within each flow. page 64 [P-Uni] In each flow the first and last packet remain unchanged while the others are uni- formly distributed.

14). n) DAG timestamp of the nth packet on link λi . page 151 (T ) TL Approximation of TL with triangular busy periods. page 135 t(λi . page 133 TL Mean length of time during which packet delays are larger than L.Λ j (m) Through-system delay experienced by packet m.15). see equation (6. see equation (6.3). see equation (6.dλi . page 151 xxiv .

soon followed by the first email protocol.1 The Internet 1. a set of interconnected computers where one could quickly access data from any site. It has since been experiencing tremendous growth. its use was limited to universities and corporate research departments. with his “Galactic Network”. the first computer network.1 History and fundamentals The Internet refers to the global information system that is logically linked together by a globally unique address space based on the Internet Protocol (IP) or its subsequent ex- tensions [58].I. This was in sharp contrast to the tradi- tional circuit switching theory used in telephone networks for instance. the first host to host protocol [108]. Until the advent of the World Wide Web in 1990. He explained how information between computers could be exchanged by first breaking the data into packets. the Advanced Research Project Agency Network (ARPANET) was put in place in 1972. The networking concept is said to have been first envisioned in 1962 by J.T. it is quickly retransmitted from the source. After the development of the Network Control Protocol (NCP). R. Kleinrock published the first paper on packet switching theory in 1961 [96]. On the theoretical front. Lick- lider of M. and no internal changes are required before being connected to the Internet • Communications are on a best effort basis.1. It is based on the following four rules [108]: • Each distinct network has to stand on its own.Chapter 1 Introduction 1. C. If a packet does not reach its destination. L. and then transmitting packets independently of each other. The fundamental design principle of the Internet is the idea of open-architecture net- working. where a dedicated link with a given bandwidth is established between users for the entire duration of the con- nection. 1 .

keeping them simple and avoiding complicated adaptation and recovery from various failure modes. In [145] the growth of Internet traffic from a single site between 1991 and 1994 was examined. A Tier-1 ISP is typically an ISP with direct access to the global Internet routing tables and that does not purchase bandwidth from other providers [12].9 Gbps) as of 2003. Figure 1. without TCP reliability.1 illustrates . Current backbone networks are made of IP routers connected together with optical fiber links.e. It was later divided into two separate entities: the IP protocol which provides packet addressing and forwarding. The Internet has been growing expo- nentially for the past 10 years. reliability and flow control. was added to provide direct access to IP services. groups of routers administered with a single routing policy and single technical administration. Tier-3 ISPs are local providers with no national backbone. 1. typically OC-48 (2.4 Gbps) and OC-192 (9. link speeds had increased from 56Kbps to 45Mbps. but has become both a major economic tool supporting business critical applications and a widely used media enjoyed by a growing portion of the population. Each level of interconnection is called a Tier . During the 1970s and 1980s. the Internet grew with funding agencies and private cap- ital. Another transmission protocol. These are usually owned by large ISPs or large corporations. • There is no global control at the operations level. These rules led to the development of the first communication protocol of the Internet: the Transmission Control Protocol/Internet Protocol (TCP/IP). No information is retained by the gateways about individual packets passing through them.1. the Internet is formed by the interconnection of different Autonomous Systems (AS) . i. INTRODUCTION • Black boxes (later called gateways and routers) are used to connect the networks. Commercial Internet Service Providers (ISP) progressively replaced government bodies as the developers of new links. to reach 50000 networks by 1995. It was also shown in [38] that the total amount of data carried across the Internet doubled every year for the period between 1998 and 2001. and the total amount of traffic originating from that site was found to grow at a rate of 120% per year. The second Tier is composed of smaller ISPs with a national presence that often lease part of their network from Tier-1 providers. and the TCP protocol which provides best effort service. Today’s Internet no longer solely accommodates a small research community.2 CHAPTER 1. Tier-1 networks are often referred to as backbone networks. In the meantime.2 Organization At the highest level. called the User Datagram Protocol (UDP).

Similarly. for instance between a Tier-2 and a Tier-1 ISP. 1. or between an end user and their local ISP. In essence. an Internet access point. This situation can only occur in a context where supplementary bandwidth is available and cheap.1: Sprint North American network as of 2003.2 Philosophy and aims of this thesis The Internet is a highly complex system which provides a wide scope of research topics. illustrated in figure 1. the following events. routing policies or link dimensioning.2.1. In this thesis we narrow the scope of our research to the characterization and understanding of packet streams through a single router. the IP North American backbone network of a Tier-1 ISP named Sprint [165]. such as satellite or wireless networks. is often used to its full capacity since one tries to use all the bandwidth one pays for. are often used to their full capacity. packets can be lost when the buffers of switching elements fill up. This type of network is usually provisioned in such a way that no link is used at more than 50% of its capacity and virtually no packet is lost. take place at each router: . In such situations. On the other hand. This policy allows for instant re-routing of traffic in case of a link failure with minimum down time. PHILOSOPHY AND AIMS OF THIS THESIS 3 Figure 1.2(a). networks where the bandwidth is fundamentally limited. such as network topology.

while question (iii) has received very little empirical treatment due to the fact that it relies on technically challenging measurements. modelling and under- standing Internet traffic. In particular. question (ii) and question (iii) throughout this thesis. and has therefore a rather different orientation than most teletraffic studies. our philosophy throughout this work is to start from empirical measurements of Internet traffic. we believe that there is still a lot of work to be done to fully answer it.1.2: (a) Schematic of router mechanisms.2. In particular we do not seek yet another traffic model amenable to simple queuing analysis. From section 1. We will refer to these as question (i). we are not concerned with the details of the transport protocols TCP and UDP. (b) Measuring. for each of the above mentioned questions. nor do we place ourselves in theoretically ‘interesting’ situations where buffers of switching elements are almost always full. the router can also generate traffic statistics about the packets that cross it. This is a rather obvious . On the other hand question (ii) has only very recently become a research topic. In parallel. INTRODUCTION (a) (b) Packets are routed to the appropriate output Packets enter Packets leave the router the router Measure The router exports traffic statistics Model Understand Figure 1. Although question (i) has received a lot of attention in the field of teletraffic engineering. are then routed to the appropriate output. we assume that bottlenecks are located at the network edges and that no packet loss occurs in the part of the network we focus on. We emphasize that this work is not overly concerned with traditional problems such as link dimensioning and the associated queuing models.4 CHAPTER 1. packets enter the router. In other words. and exit the router. Instead. we will focus on the following three questions: (i) How to characterize the traffic entering a router ? (ii) How to sample packet traffic ? (iii) What happens to packets inside a router ? These are in a nutshell the problems this thesis addresses.

SLAs are most of .2(b). understand and model are very much linked together and will be used together throughout this thesis. A model can also potentially be used to refine the measurement stage and focus on a particular feature of the data. but also technical challenges to provide the best possible service.1. the very large range of time scales available. such as packet source and destination addresses. and use a model based on em- pirical measurements to get more insight and understanding on the data (see section 4.3. numerous studies on TCP mechanisms have been carried out with very strong as- sumptions. This is why the first step of our analysis is to measure what really happens in a network. In the rest of this introductory chapter. For instance. we use the methodology described in the title of this thesis: measure.5 for instance). We conclude this introductory chapter with a summary of the main contributions of this thesis and an overview of the chapters to follow in section 1. from micro seconds up to hours or days. also illustrated in figure 1. with most of the time very little connection with the real world. the three actions measure.3. a vast body of literature on Internet traffic studies is based entirely on simulations. Moreover.3 Teletraffic engineering In a fierce competition to attract new customers. While such studies might bring some interesting mathematical problems. is a very valuable asset when compared to other experimental fields where time series can have at most a few thousand points. What makes Internet analysis so interesting is the fact that these measured time-series contain a lot of information. model and understand. The bibliographies relevant to questions (ii) and (iii) will be presented in later chapters. In fact. such as infinite sources or very small networks. In particular.4 a summary of Internet traffic modelling work relevant to question (i). Our view is that one cannot even simulate a network without a proper knowledge of realistic traffic behaviour. 1. ISPs have not only to face difficult in- vestment decisions. which makes a thorough understanding of the underlying mechanisms possible. On the other hand. and then do the modelling work. in order to select the part that we really want to model (see section 3. we might first start by getting a better understanding of our measurements. Such knowledge can only be obtained from traffic measurements. This gives us different time series that we then analyze and model. In other words.5. and present in section 1. but has proved to be overlooked by re- searchers in the field. and in- cludes packet delays through the ISP network. From the measurements. TELETRAFFIC ENGINEERING 5 point from an experimental science perspective. one can follow an anti-clockwise path. The Quality of Service (QoS) is often specified in a Service Level Agreement (SLA).4 for instance). loss rate and availability. they do not describe the ‘real world’ Internet. we will give some background information on teletraffic engineering in section 1.

If one were to use a circuit switched approach to dimension a network in this case. Different resource requirements and different user behav- iours mean very different traffic characteristics. This is why such networks are called circuit switched networks. performance evaluation. 1. 116]. forecasting and resource management.1 Definition Teletraffic is concerned with the control and transport of information within telecommu- nications networks. 1. and then extended to encompass packet switched networks and the Internet. It was first developed for circuit switched networks. However. resource dimensioning. In such models. this would lead to enormous wastes of bandwidth. when applied to packet switched networks. a stochastic analysis of the .3. Telephone networks are usually dimensioned by using the Erlang model [55. call arrivals are described by Markovian models well amenable to queuing analysis. where deriving dimensioning guidelines is quite straightforward once a reasonable esti- mate of the traffic level have been obtained. A telephone voice call requires a very strict QoS to ensure that the user will perceive a satisfactory voice qual- ity. telephone and data networks are fundamentally different. INTRODUCTION the time established with ‘rules of thumb’ but could benefit from a more scientific approach known as teletraffic engineering. Such data traffic is often referred to as ‘‘bursty”.6 CHAPTER 1. these models always gave better performance prediction than what could be observed in practice and were there- fore unsatisfactory [147]. Data applications do not send traf- fic with a constant bandwidth. such as the Markov Modulated Poisson Process (MMPP)[76]. In fact. Teletraffic engineering is a well established field that encompasses modelling of telecommunication systems. For such traffic. On the other hand. Given the highly complex nature of the mechanisms involved in telecom- munications networks and the random aspect of user behaviour. but rather in bursts. This is why the first Internet traf- fic studies also used Markovian models. Such QoS is met by reserving a constant bandwidth of 64 kbps through the telephone network for the entire call length.3. such as telephone networks. a packet switched network is much more appropriate since it allows resource requirements to vary over time by sending data in small packets. data networks only need some minimal long term bandwidth to achieve a satisfactory perceptual quality for the user.2 Traffic modelling An important aspect of teletraffic engineering is the mathematical modelling of the observed traffic statistics.

and exhibited self-similarity and long-range depen- dence (LRD) properties1 over time scales larger than roughly 1s. it can be used to simulate the network under different conditions. Throughout this thesis we will focus on rather overprovisioned links. numerical simulations can still be used to compare the model with the data. Based on these findings. and there are no significant TCP induced interactions between flows. It turns out that the TCP control loop does not really play much of a role. On the one hand a black box approach will aim at blindly reproducing the statistics of the data. ATM cell traffic [93]. it gave a plausible explanation to the discrepancy in queuing performance observed between real data and Markovian models [56. 1 Definitions of these mathematical notions are presented in chapter 2 page 19 . For modelling purposes.1. evaluate the performance of the current network. They observed that the packet arrivals patterns were consistent with a multiplicative structure due to Transmission Control Protocol (TCP) feedback control mechanisms [61. A turning point in traffic engineering was the discovery that Internet traffic was richer than simple Markovian descriptions. and Wide Area Network (WAN) traffic [147]. such as the ones found in the Internet back- bone. TELETRAFFIC ENGINEERING 7 data collected is often preferred to a dynamical systems approach. In order to answer the first question of this thesis. we seek to understand the physical reasons behind these observed traffic statistics at small and large time scales. When mathematical tractability cannot be achieved. and then for Variable Bit Rate (VBR) video [21]. in packet data was first observed in the seminal work of Leland et al. with a potentially very high number of parameters with no obvious meaning. Riedi and Véhel [150] [171] showed that wide area traffic was consistent with multifractal behaviour. On the other hand a physical model goes beyond the black box approach by trying to model the physical causes of the observed statistics. and investigate whether the scaling behaviours described above are genuine or not. to achieve a given quality of service. This scaling behaviour. Feldmann et al. In that case the mathematical tractability of the model is heavily sought after because it will bring some extra insight into the data and allow one to predict the evolution of its statistics as a function of meaningful parameters. It will therefore aim at giving a physical inter- pretation to the parameters of the model in networking terms. This brought the ‘fractal’ buzz word. and led to a renewed interest in traffic modelling. Once a model is found. or scale invariance. or help dimension the network. [60] speculated that IP networks appear to act as conservative cascades and are consistent with multifractal scaling.3. In particular. On time scales smaller than 1s. [109] for Ethernet Local Area Network (LAN) traffic. 137]. either the model is judged acceptable or it is modified until it leads to satisfactory results. we will distin- guish between two types of models that we call respectively black box models and physical models. 135. From there. In the following. 71].

Given that there exists a plausible physical explanation for such LRD behaviour in Internet traffic. Another problem that arises when dealing with LRD data is that being a limiting behaviour over large time scales in a mathematical sense. it is impossible to decide whether a timeseries exhibits random fluctuations at the time-scale over which the data is observed (consistent with LRD) or is simply non stationary. 72. For instance. Large time scales In the physical world. and exhibits a scaling behaviour over large time scales. Therefore a short-range dependent (SRD) process might be a good description of the data for this purpose. INTRODUCTION we consider any measured data as a sample path of an underlying stochastic process. We also typically assume that the process is stationary and ergodic. 98. LRD cannot be truly observed over a finite time interval. 132]. . 109. However. This phenomenon has also been found for instance in astronomical and biological systems [19]. many systems exhibit a property of slowly decaying correlation func- tion referred to as Long Range Dependence (LRD). even if the traffic LRD. This has led many researchers to argue on the LRD properties of empirical Internet traffic data [22. as described in [21. Some of the better known examples were presented in hydrology by Hurst [89]. a black box Markovian model will fail to reproduce the data statistics over any length of time larger than the time period it was fitted on.4. However. when one is interested in the performance of a buffer with a fixed constant size. It is a sample size problem: one sample does not give enough information about the fluctuations at this time scale [68]. This means in fact that one can always create a multi-state Markov model that will have the same statistical behaviour as the observed data. From a modelling perspective.8 CHAPTER 1. if a physical model is well built. useful and/or appropriate over some scale range. 53. or rather. one only needs to model the temporal correlations of the traffic over the time scales corresponding to the buffer length [73. 147]. and the model should be able to reproduce the data statistics over larger time intervals. On the other hand. modelling data by a mathematical process sometimes reduces to a philosophical issue of modelling choice. We illustrate this point with two examples corresponding respectively to large and small time scales. its parameters could be fitted over a given time interval. In practice. our view is that the observed traffic is genuinely LRD. so that its statistics can be captured by its sample path. there may always be some sense in which a given model is correct. as detailed in section 1.2. 97].

for instance with multifractal models. a widely accepted physical explanation for this potential multifractal behaviour has yet to be found.4. Indeed. the set of available statistical tools are not powerful enough to clarify all the related issues. as opposed to ‘closed loop’ models that would take into account feedback mechanisms. 1. the knowledge of their performance under different conditions.1. because TCP sources for instance transmit at rates that are dependent upon the level of congestion of the network among other things. This means for instance that congestion control and flow control mechanisms that may be available at the transport layer are largely ignored. 74. Although closed loop models are certainly closer to the real Inter- net traffic given that such feedback mechanisms exist in practice. Over small time scales the traffic has non Gaussian marginals and its description therefore requires more than simple second order statistics. However. the interpretation of traffic behaviour over small time scales is subject to dis- cussion. one could therefore argue that Internet traffic is not truly multifractal but instead exhibits a pseudo scaling be- haviour over small time scales. As will become clear in later chapters. Most of the models for packet switched traffic tend to focus on the network layer. including [65. Attempts at modeling the TCP protocol represent quite a large body of literature. Second. This shows again that the choice of a model is difficult and depends strongly on the modelling aims. as will become clear in this thesis. the change of scale means a change in the objects studied. with- out any knowledge of the higher layers in the protocol stack. Instead we are interested in modelling Internet traffic timeseries such as number of IP packets or bytes observed in time intervals of a given size. and does not specify any particular behaviour for Internet traffic at small time scales. which would be based on the actual content of transmitted files. In this section we present a summary of some of the most significant modelling work in this area. from groups of packets transmitting a given file to individual packets. First.4 Internet traffic models In this thesis. INTERNET TRAFFIC MODELS 9 Small time scales Long Range Dependence is a characteristic of large time scales. Improvements are needed in their performance. we believe that one should . 139]. and important capabilities such as hypothesis tests are absent. we do not attempt to give a semantic description of Internet traffic. These models are therefore ‘open loop’ models. Some practitioners have concerns with mathematical models that do not account for retransmission resulting from packet losses being detected by higher layers such as TCP.

2. INTRODUCTION start by getting the best possible understanding from simple open loop models before taking into account any feedback mechanism. 184] • Batch Markovian Arrival Process (BMAP) Continuous time with batch arrival [94.4. their fundamental drawback is that these are all by definition Short Range Dependent (SRD) due to the finite number of states in the Markov chain.1 Black box traffic models Markov Modulated Models The first traffic models proposed for packet switched networks were based on Markovian processes and largely inspired from traffic models used in telephone networks. It was shown in [15] that a mixture of N two state MMPPs can match the correlation structure of an LRD process across a range of time scales. These models have been widely used due to their relative simplicity and mathematical tractability. They have nonetheless been used to approximate LRD processes. 134] • Discrete-time Batch Markovian Arrival Process (D-BMAP) Discrete-time with batch arrivals Markovian process [24] • Markov Modulated Bernoulli Process (MMBP) Discrete-time with single arrivals Markovian process [24] Markov modulated processes are specified by the transition probabilities of the embedded Markov chain and the arrival rate at each state.10 CHAPTER 1. The models vary according to their continuous or discrete nature. A brief summary of the main models is presented here. Different correlation patterns can be obtained by setting different values for the above quantities. However. All the fundamental mathematical concepts used in this thesis are regrouped in chapter 2.1 and physical models in section 1. We present black box models in section 1. meaning that although this process is not strictly LRD. More specific concepts are briefly introduced in the text when needed. In the following we are interested in modelling the ‘aggregate’ traffic observed on a link. this matching is possible for an arbitrarily large range of time scales. 1.4. The most common are • Markov Modulated Poisson Process (MMPP) Continuous time with single arrivals [76. 183. By increasing N. and whether or not batch arrivals of packets are permitted. A lot of work remains to be done in this area.4. while a detailled presentation can be found for instance in [160]. . it can be used to model LRD for all practical purposes. This mixture of N MMPPs converges to a fractional Brownian motion in the limit of large N [161]. 133.

.6) . the AR(p) can be written as Φ(B)Xt = bεt . a process that exhibits a self-similar behaviour over a finite range of scales only (see [154] for a definition)..4. (1. d.. it cannot strictly model a LRD process. In particular. + φ p Xt−p + bεt . + φ p ρk−p .. with the rest of the roots lying outside the unit circle.1) where εt is white noise. The Autoregressive Moving Average Model of order (p.. (1. The Autoregressive Model of order p is denoted AR(p) and has the form Xt = φ1 Xt−1 + φ2 Xt−2 + φ3 Xt−3 + . d. q). i. (1. φi are real numbers and Xt is the value of the process at the discrete time t. (1. q) is denoted ARMA(p. For k > q.1.5) The autocorrelation function of the ARMA(p.2) The autocorrelation ρk verifies ρk = φ1 ρk−1 + φ2 ρk−2 + .3) and decays exponentially. q) obtained by allowing the polynomial Φ(B) to have d roots equal to unity. INTERNET TRAFFIC MODELS 11 Autoregressive models Autoregressive processes form another popular class of timeseries used in Internet traffic models. + φ p Xt−p + εt − θ1 εt−1 − θ2 εt−2 − . denoted ARIMA (p. a pseudo self-similar process. this model can be represented as Φ(B)Xt = Θ(B)εt . This means that the AR(p) model is unable to capture autocorre- lation functions that decay at a rate slower than exponential.e.. − θq Bq ..4) Defining Θ(B) = 1 − θ1 B − θ2 B2 − . and is therefore unable to represent a process with autocorrelation function decaying at a rate slower than exponential. can be obtained by mixing AR processes with appropriate coefficients and has been used for traffic modelling [10.. It has the form Ψ(B)∆d Xt = Θ(B)εt . (1.. 112]. (1. is an extension of the ARMA(p. Defining a lag operator B as Xt−1 = BXt and the polynomial Φ(B) = 1 − φ1 B − φ2 B2 − . − φ p B p .q) model can be calculated for all lags k. it is identical to the autocorrelation function of the AR(p) model.. The Autoregressive Integrated Moving Average Model of order (p. q).q) and has the form Xt = φ1 Xt−1 + φ2 Xt−2 + φ3 Xt−3 + . However. − θq εt−q ...

A large number of queuing results have been derived for fBm traffic: large deviation results in very large buffers [54]. 119]. queue length asymptotics using the Fourier decomposition of fBm [129]. Point process models A practical way to generate a LRD process is to use a doubly stochastic process. The fBm is also of special interest as a limiting case for many other models of LRD traffic. It is the only Gaussian self-similar process with stationary increments 2 . the LRD behaviour can be introduced in the intensity process λ (t) by a power law shot noise. for instance a compound Poisson process [118.7) Γ(d)Γ(k + 1 − d) Γ(d) This is a long-range dependent process with Hurst parameter H = d + 0. When d is an integer. Norros gave an expression for the lower bound of queuing performance of a fractional Gaussian noise process [135].0) process is a stationary process with autocorrelation function given by [87]: Γ(1 − d)Γ(k + d) Γ(1 − d) 2k−1 ρ(k) = ∼ k as k → +∞. Simulation results concerning the queuing performance of a FARIMA(1. INTRODUCTION where ∆ is the difference operator defined by ∆Xt = Xt − Xt−1 = (1 − B)Xt . let define a power 2A formal definition of self-similarity and fBm can be found in section 2.12 CHAPTER 1. the ARIMA(p.d. For instance. For instance Brichet et al. Techniques for fitting the parameters of a FARIMA process to measured traffic were proposed in [180].q) is a strictly SRD process. and presented techniques for the use of fBm in the modelling of telecommunications net- works [136].0) process were given in [8]. For instance. It can be extended to the Frac- tional ARIMA (FARIMA) process by taking 0 < d < 0.2. Fractional Brownian motion Fractional Brownian motion (fBm) has proved very popular as a model of Internet traf- fic because it is a simple model that exhibits LRD and is amenable to analysis. or estimates for the queuing behaviour of fBm processes [122]. [26] showed that the superposition of N fluid ON/OFF sources with heavy tailed ON and/or OFF times converges to an LRD Gaussian process as N tends to infinity.d. A fast generation method for FARIMA processes can be found in [117].1 page 19 . FARIMA processes can therefore be used successfully to model LRD Internet traffic [21]. the FARIMA(0.5. (1. More specifically. FARIMA processes have also been used to compare different techniques of estimation of the Hurst parameter [168].5. and gave a relationship between the queuing behaviour of this limiting process and that of fBm.

172]. This process has been a very popular with the traffic modelling community because of its underlying physical meaning. Second. First. i. ON and OFF periods are all mutually independent and identically distributed (i.2 Physical models In this section we describe modelling work based on physical models. It was first proposed by Mandelbrot in an economics context [121]. h(K.1.i.4. The process λ (t) will be long range dependent.). Wavelet models have been developed [152] which have positive marginals and reproduce the scaling of Internet data. .d. Other models There is a large variety of mathematical constructions that have been used to describe Inter- net traffic in addition to the above mentioned examples. INTERNET TRAFFIC MODELS 13 law shot noise λ (t) by λ (t) = ∑ h(Kn . Moreover models have been used to improve on some of the major drawbacks of fBm. fBm models only describe the traffic at large time scales.9) 0 otherwise where the {un } stand for arrival times drawn from an homogeneous Poisson process and the {Kn } for a set of independant and identically distributed random variables representing amplitude.t − un ). 1. 162] . models for which the parameters can be related to a networking cause. ON/OFF processes An ON/OFF process is a process that can take only two values that define its ON and OFF state. For instance the authors in [144] suggested the M/G/∞ input process as a viable model for network traffic. More details about Long Range Dependent doubly stochastic point processes can be found in [158] and references therein.t) = (1.e.4. they have Gaussian marginals and therefore cannot guarantee positive marginals while the processes they seek to model has inherently positive marginals. (1. with exponent α = 2(1 − β ). when B is infinite and 1/2 < β < 1.8) n Kt −β  0 < A ≤ t < B. This is why more elaborate models based for instance on Infinitely Divisible Cascades (IDC) have been introduced to link large and small scale behaviours in a single model [156. The Ornstein Uhlenbeck process. has also proved to be an interesting approach [99. inspired by physics theory and the Langevin equation.

the throughput of a TCP connection is inversely proportional to the square of the average Round Trip Time (RTT) [102]. the aggregate traffic can have ‘mild marginals’ with finite variance or ‘wild’ marginals with infinite variance. If the distribution of the ON and/or OFF duration is heavy tailed. and defines different levels of traffic ‘burstiness’. then the process is LRD [75]. 178]. Modelling TCP behaviour The study of TCP performance can be done at three different levels: experimentation on a real network. ON and OFF period durations can follow different distrib- utions. However. or LRD.3 . and that each source is of the ON/OFF type. Depending on the distribution of the ON and OFF period duration. or models that tend to it. or analytical modelling. Most studies on TCP modelling assume long transfers and focus on the congestion 3 An exact definition of a flow will be given in section 3. Each ON period corresponds to the transmission of a file. TCP modelling can be used for instance to determine the factors impacting the performance of the protocol or to devise new congestion control algorithms. is the most widely accepted mechanism to explain LRD in Internet traffic. Models have also shown that dropping packets randomly in network routers as with active queues (e. What can be said about the normalized limit process in the limit of a large number of sources and/or large time scales? Depending on the nature of the rate distribution function. The superposition of such processes is also LRD [115. non Markovian. and during an ON period the source transmits with a certain ‘rate’ or ‘reward’ (one ON period could correspond to a flow3 of IP packets transmitting a given file). the traffic can be SRD. Random Early Detection (RED) [66]) improves the fairness of TCP by making the throughput inversely proportional to the average RTT [13. This superposition of ON/OFF processes with heavy tailed durations. It was shown in [170] how the self-similar characteristics of the aggregate traffic are intrinsically linked to the heavy tailed nature of the ON/OFF duration distributions.g. the durations of the ON periods will also be heavy tailed [143]. If the transmission rate is constant. 170. In the case of a drop-tail buffer and synchronized flows. simulation using emulating software such as ns [106]. 139]. recent measurement work on the topic has showed that RED might have a negative impact on the performance of a network [123]. This illustrates two different kinds of variability: marginals and time correlation. with Markovian characteristics.2. i.e. This model is constructive because it can be related to an underlying network cause: the sizes of the files transmitted on the Internet have a heavy tailed distribution [43]. INTRODUCTION Assume that aggregate traffic is composed of many independent sources.14 CHAPTER 1.

5.5. 102. Indeed. Contrary to most of the studied presented in section 1.1 Contributions This thesis makes the following main contributions: • We base all our analysis on empirical measurements of Internet traffic. In [176] they showed that TCP preserves the LRD created at the application layer.1. the authors also modelled short TCP connections using a simple Markov chain. Baccelli and Huong [16] observed that the sharing of a bottleneck router by several long lived TCP connections could be reduced to products of random matrices and showed that the ‘Additive Increase. we seek to model individual packets and will therefore use point process models. An obvious candidate for such statistics is the feedback dynamics of TCP. 1. when a TCP connection is mixed with self similar traffic in a bottleneck buffer. Such flows were called alpha and the remainder the beta traffic. ignoring the short duration of the slow start mode [13. on time scales around 50-500ms. In [65] the authors first showed how a Markovian model could describe the correlation structure of both the exponential back-off and congestion avoidance phases of TCP.4 where so called fluid models were used. 128]. but could not (yet) account for a multifractal scaling behaviour. In [74]. a burst in IP traffic was usually dominated by a single high-rate flow. Other researchers have tried to give a physical explanation to the apparent scaling behav- iour observed at small time scales. The alpha traffic is extremely bursty and concerns only a very small proportion of flows.5 Contributions and thesis outline In this thesis. They also noted that HTTP/1.1 should adapt better to changing traffic fluctuations and therefore improve the propagation of the self sim- ilar behaviour. In [159] the authors introduced the notion of alpha and beta traffic: in short. so that no . Veres and Boda [175] showed that under severe network conditions. it takes on the second order properties of that traffic and can therefore propagate them to other parts of the network. we concentrate exclusively on so called physical models because our aim is to gain a better understanding of how IP packets flow through a network. Multiplicative Decrease’ of the TCP congestion control could lead to self-similar behaviour. and showed that in the case of high losses. the TCP congestion control algorithm generates traffic with heavy-tailed OFF periods. CONTRIBUTIONS AND THESIS OUTLINE 15 avoidance mode of TCP. 127. 1. they showed that. TCP congestion control protocol shows chaotic nature and gen- erates self-similar behaviour.

2 Outline The rest of the thesis is organized as follows: Chapter 2 provides some mathematical background on scaling processes. We first describe the passive measurements used in the thesis. . Our fundamental result is that for the purpose of mod- elling the overall process of IP packets.5. • We use extensively a technique we call semi-experiments to make informed deci- sions on what aspects of the data have the most impact and should be modelled. We call this way of investigating our empirical data the semi-experimental method. Last we study in more detail how the flow arrival process could influence the second order properties of the packet arrival process should certain circumstances be met. Chapter 3 presents the empirical evidence upon which our modelling effort is based. We then identify the networking causes of the observed packet trace statistics by selectively modifying several of the com- ponents comprising the full packet stream. We present the first empirical results of a fully instrumented router. 1. It is a Bartlett-Lewis point process which can reproduce packet statistics for time scales larger than 10ms. • We present a new technique to report packet delay information in routers based on busy period statistics. • We show that current practices to report sampled traffic information in routers can be improved by using a flow sampling technique. flows can be treated as statistically independent. • We present a simple model of a router based on our empirical results and give a thorough understanding of single hop packet delays. In particular we formally introduce the fundamental notions of Long-Range Dependence and wavelet based scaling estimators.16 CHAPTER 1. INTRODUCTION assumption is made on the traffic characteristics. we explain why the flow arrival process has little influence on the packet arrival process for current backbone Internet traffic. and give results on point processes spectral theory used in chapters 4 and 5. • We present a physical model of packet arrival times based on our empirical observa- tions. point process theory and our primary statistical tool: the wavelet analysis. In particular. • We show that the Bartlett-Lewis point process can model the splitting and merging of packet streams through a router.

page numbers where each reference is cited are given. We first validate the main assumptions of the model on a very large amount of empirical data and then show that the model can account for the splitting and merging of packet streams in a router. We compare the performance of packet and flow sampling techniques. However the reader should at least become familiar with the Logscale Diagram statistical tool because it is used throughout. a summary of the principal notations is provided (page xxi). Chapters 5 and 6 are fairly self contained and could potentially be read independently of the rest. Last. Chapter 7 builds on all the previous chapters and should therefore be read last. We present the first empirical results on a fully monitored router. In chapter 5 we study the problem of traffic sampling. Its full un- derstanding is not a prerequisite to the rest of the thesis. We use these results to build a mathematical model of a store and forward router. It only uses a small number of physically meaningful parameters and provides insight on the role of each of these parameters in the overall traffic statistics. In chapter 6 we study Internet traffic on smaller time scales than what was done in chapter 4 by focusing on queuing mechanisms inside a router. a very timely problem given the ever increasing link speeds. We also derive sampling results for our traffic model and show how the model parameters can be fitted from sampled data. 1.6 How to read this thesis The mathematical background of chapter 2 is presented for completeness only. . as well as an index (page 173).6.1. Chapters 3 and 4 should be read in succession since our modelling work of chapter 4 relies heavily on the empirical findings presented in chapter 3. and advocate the use of flow sampling for many purposes. In the bibliography section. and show how packet delays through the router can be very accurately predicted by our model. In order to ease the reading and understanding of the thesis. We show how a particular type of Poisson cluster process known as Bartlett-Lewis point process can very accurately model the packet arrival process at time scales larger than back- to-back packet arrivals. in chapter 8. In chapter 7 we use the results from all the previous chapters to present a global valida- tion of our traffic model at a network node. we summarize our main contributions and propose possible topics for future research. HOW TO READ THIS THESIS 17 In chapter 4 we build a packet arrival model based on the empirical findings of chapter 3. We also propose a method to directly report router performance information based on busy period statistics.

.

3) 2 19 . (2.Chapter 2 Mathematical background 2. The process is in fact characterized by the relation between scales governed by the parameter H. (2. H ∈ [0.2.2.1 Introduction This chapter presents some mathematical background used throughout the thesis. There is therefore no reference scale. The fractional Brownian motion (fBm) BH (t) is defined as the centered Gaussian process with variance σ 2 and covariance σ 2 2H 2H E(BH (s)BH (t)) = (s + t − |t − s|2H ). such as self-similar and long-range dependent processes. 2.2. the discrete wavelet transform and its use for statistical estimation of scaling processes are discussed. (2. known as the Hurst parameter. for all a > 0. Last. In particular we give rigorous defin- itions of concepts mentioned in chapter 1.2. First the notion of scaling behaviour is formally introduced.2) A classic example of HSS process is the fractional Brownian motion: Definition 2.1) is that the moments of a HSS process (if they exist) behave as power laws of time: E|X(t)|q = E|X(1)|q |t|qH .2 Self-similarity and other scaling behaviours 2. results on point process theory are given. Second. A stochastic process X(t) is said to be H Self Similar (HSS) with station- ary increments if it has stationary increments and if the following equality in distributions holds for all scales: d Y (at) = aH Y (t). 1].1 Self-similarity Definition 2. A consequence of equation (2.1) This definition means that one cannot distinguish between the statistics of the process and a affinely dilated version of the process.1.

its increments Yδ (t) = X(t + δ ) − X(t) verify E|Yδ (t)|q = E|X(1)|q |δ |qH .4.1) cannot be stationary (this would involve Y (at) = Y (t)). By contrast. Definition 2. for any HSS process X(t) with stationary increments. the autocovariance function of a ‘short memory’ process. 1).6) . (2. These two restrictive consequences can be respectively alleviated with the properties of Long Range Dependence and Multifractal. it is assumed in the following to have stationary increments. It is defined in terms of second order statistics as follows. such as ARMA processes introduced in section 1.1. with α ∈ (0. 1). An example of LRD process is the fractional Gaussian noise (fGn). with α ∈ (0. defined as the increments of fBm.1. In fact. 2. MATHEMATICAL BACKGROUND It is the only Gaussian self-similar process with stationary increments. and illustrated in figure 2. scaling is often found to hold in the limit of small or large scales.2. While a process d Y (t) satisfying equation (2.2. The scaling of moments is governed by a single exponent H (equation (2. (2. (2. LRD models are also called ‘long memory’ processes for this reason. A stationary stochastic process X(t) is said to be Long Range Dependent (LRD) if its autocorrelation function γX (k) is characterised by a power-law decrease at large lags: |k|→+∞ γX (k) ∼ cγ |k|−(1−α) .20 CHAPTER 2. equivalently. 2. The scaling behaviour applies at all the scales of a process.5) The power law decrease of the autocovariance function implies that its integral sum diverges because the past values are so heavily weighted.3.2)). The property of self-similarity is extremely strong since it has the following two conse- quences: 1.4) or. In actual data. has an asymptotic exponential decrease and its sum converges.2 Long-Range Dependence Long Range Dependence (LRD) models scaling behaviours observed in the limit of large scales [20]. This is therefore quite restrictive. if its power spectral density ΓX (ν) has a power law behaviour at frequencies close to the origins: |ν|→0 ΓX (ν) ∼ cΓ |ν|−α . and the scaling of moments can be governed by a collection of exponents instead of a single one.

Let Y = {Yi } be a stationary sequence and km 1 Y (m) (k) = ∑ Y (i). k=1.5 < H < 1.1). It can be shown that the autocovariance of the increments has the following asymptotic behaviour s>>δ γYδ (s) ∼ E|X(1)|2 H(2H − 1)s2(H−1) . (2. for all integer m..9) m i=(k−1)m+1 be the corresponding m-aggregated sequence. If Y is the increment process of a self-similar process defined in (2. (2. such as sum of ON/OFF processes with heavy tailed ON and/or OFF periods [170]. then d Y = m(1−H)Y (m) .4. (2.10) holds as m tends to infinity.1: Illustration of the scale invariance phenomenon for fGn with H=0.. A stationary sequence Y = {Yi } is said to be asymptotically self-similar or Long Range Dependent if equation (2.2. Yδ (t) is a LRD process with α = 2H − 1. As already mentioned in section 1. .7. this means that when 0. more suited for time series analysis: Definition 2. there are many ways to generate LRD processes.4.2.7) From equation (2... . SELF-SIMILARITY AND OTHER SCALING BEHAVIOURS 21 Figure 2.10) for all aggregation levels m.2.10) A stationary sequence Y = {Yi } is said to be exactly self-similar if it satisfies equation (2. filtering of fractional ARIMA processes [20] or use of doubly stochastic point processes [118].4).2. (2.8) Self-similarity and long-range dependence are often practically studied with an aggregation technique.

defined as the Hausdorff dimension of the set of points t with Hölder regularity h(t) = h. if there exists a constant K > 0 such that |Yδ (t0 )| ∼ K|δ |h(t0 ) . when 0 ≤ h < 1. However. An in-depth definition can be found for instance in [151].3 Multifractals We now focus on scale invariance over small time scales. we do not use D(h) to study multifractal properties but instead a wavelet-based approach described in section 2. Indeed equation (2. In essence.4 Infinitely Divisible Cascades There is a very elegant way to describe all the above statistical behaviours and describe the different scaling regimes at small and large time scales with a single mathematical ob- ject called an Infinitely Divisible Cascade (IDC). One can also think of it as measuring ‘how often’ each value h(t) = h is found and get a frequency representation of h. (2.11) δ →0 Local Hölder regularity therefore compares a sample path at each points with a power law. The function h(t) characterizes the smoothness or the sharpness of the graph of the function X at time t. One can get a geometrical interpretation of this concept by defining a Hausdorff spectrum D(h). one can show that the qth order moment of the stationary increment process Yδ (t) verifies E|Yδ (t)|q = cq |δ |H(q) . 0 ≤ h ≤ 1.3.6) and one can loosely relate a HSS process with a process having the same local Hölder regularity at every point [148]. A process X(t) with stationary increments Yδ (t) is said to be of Hölder regularity h(t0 ) in t0 . From that respect.13) q∈IR In this thesis.11) is reminiscent of equation (2. linked to the above geometrical approach through the multifractal formalism [151]. In a nutshell. The multi- fractal Legendre spectrum is obtained from the exponents H(q) with a Legendre transform: D(h) = inf (qh − H(q)) (2.2. (2. Definition 2. This is why one often prefers a statistical description of the multifractal spectrum. this geometrical approach is not applicable in practice since the Hausdorff dimension of such sets cannot be estimated from empirical data. the presentation is not always perfectly rigorous. and introduce the notion of multi- fractal. it is quite close to the definition of scaling. the function is not differentiable. 2.12) where H(q) is not necessarily a linear qH behaviour as found for HSS processes.2. Since this concept will not be thoroughly used in this thesis. MATHEMATICAL BACKGROUND 2. one can rewrite the scaling .22 CHAPTER 2.2.4.5. For instance.

The arrival times of IP packets can be modeled by a point process. POINT PROCESSES 23 behaviours of HSS and multifractals as follows: Self-similarity: E|Yδ (t)|q = cq |δ |qH = cq exp(qH ln δ ). but simply mention when it could be used. while the full description of the traffic ( i. (2. In this framework.2. The concept of IDCs was first introduced in [27] and has since been widely used to study intermittent phenomena in turbulence (see [29] and references therein for details). but restrict ourselves to IR for simplicity when necessary. On the other hand.16) where n(δ ) is a priori an arbitrary function of δ .e.3 Point Processes 2. for further investigation.3. the space is IR2 . Consider the times of events falling be- tween 0 and T . when modelling fire spread or forest growth for instance. The point process N can be defined by the ordered list of events times . Typically one restricts the definition to random measures that are finite on any compact subset of S. When modeling temporal events. A point process can be defined as a random measure N on a space S taking non negative integer values in Z + . In this section we give some basic properties of point processes that will be used in chapters 4 and 5. In mathematical terms. Our presentation follows Cox and Isham [42] and Daley and Vere-Jones [46].3. 2. packets arrival times plus packet sizes) necessitates a marked point process .14) Multifractal: E|Yδ (t)|q = cq |δ |H(q) = cq exp(H(q) ln δ ). the space in which points fall is a portion of the real line. The two main features of IDCs are that their moments do not have to behave as power laws of the scales. N(A) represents the number of points falling in the subset A of S. and to the case where S is a completely separable metric space such as IRd .1 Introduction The IP traffic on a given link is fully characterized by the arrival times of IP packets and their respective size. We will not thoroughly study this last topic in this thesis. (2. In the particular case of temporal point processes there are other possible definitions. and the scaling of moments is governed by a collection of exponents.15) One can then introduce the even more general scaling behaviour Infinitely Divisible Cascade: E|Yδ (t)|q = cq exp(H(q)n(δ )). In what follows we try to keep the definitions and properties of point processes as general as possible. perhaps more intuitive than the random measure. a point process is a random collection of points falling in some space. (2.

. τ3 . N can be defined by the count- ing process N(t) where for any t between 0 and T .. MATHEMATICAL BACKGROUND {t1 . i = 1..e. For any function f : Z f dN = ∑ f (ti ) (2. Ak .. 2. noted A dN. λ (t) fully characterizes the point process N. . . also refered to as conditionnal intensity.. τ2 ..}. be disjoint Borel sets on the real line. λ (t) represents the expected instantaneous rate of events at time t...t3 .. A temporal point process N is typically described by its conditional rate process λ .. ti 6= t j for i 6= j.} where t0 = 0 and τi = ti − ti−1 . i = 1. The qualita- tive idea of stationarity is that the structure of the point process is unaffected by translation of the time axis.. is the number of points in the set A.. 2. .. (2.t2 . A point process is orderly if for any t: 1 P(N[t. The process is strictly stationary if the joint probabilities defined by equa- tion (2. the conditional rate λ associated with an orderly point process N is defined via P(N[t.. . (2. Alternatively.. . The equivalent information can be conveyed by the series of inter event times {τ1 . . k = 1. Since all the finite dimensional distributions of N can be derived from the conditional rate [46]. N(t) is the number of events occurring at or before t. The R integral with respect to dN. A2 . This process N(t) must take non negative integer values. (2. k}. fully characterize the point process.19) for ni = 0.. A2 .t + ∆t] > 1) → 0 when ∆t → 0. Formally.. 1.19) are unchanged by translating the sets A1 . given the entire history up to time t. A realization of a point process is often written as a sum of Dirac delta measures δti where for any measurable set A. i. . be non decreasing and right continuous.24 CHAPTER 2. If the mean and the variance of N(I) are invariant under translation of the arbitrary interval I.t + ∆t] > 0|Ht ) λ (t) = lim .17) A i:ti ∈A A point process is called simple if all its points {ti } are distinct.18) ∆t Let A1 . k.20) ∆t→0 ∆t where Ht is the entire history of the point process up to time t defined as Ht = {t j |t j ≤ t}... the process is weakly stationary.. . In what follows we will be concerned with stationary point processes only . If the distribution of N(I) is invariant under translation of the arbitrary interval I then the process is simply stationary. δti (A) = 1 if A contains ti and δti (A) = 0 otherwise. The joint distributions Pr{N(Ai ) = ni .

i = 1. the conditional den- sity of obtaining points at (t1 .d. (ii) the numbers of points in disjoint intervals are independent random variables. bi ] denote the number of events of a process falling in the half-open interval (ai . N(An ) are independent Poisson random variables. A generalization is to allow the intervals to be independent and identically distributed (i. 0 ≤ t1 < .3. the process is then an ordinary renewal process.. and (iii) the distributions are stationary: they depend only on the lengths bi − ai of the inter- vals. Renewal process In the Poisson process... [46] (p. Let f1 (x) be the pdf of the first interval X1 and f (x) be the pdf of i. It is defined as a simple point process for which the number of points in any set follows a Poisson distribution and the numbers of points in disjoint sets are independent: N is a Poisson process if for any dis- joint measurable subsets A1 .2. intervals X2 . The stationary Poisson process on the line with rate λ is completely defined by the following equation: k [λ (bi − ai )]ni −λ (bi −ai ) Pr{N(ai .d.. ..tN ).. .21) i=1 ni ! Proposition 2. (i) the number of points in each finite interval (ai . An of S......1. POINT PROCESSES 25 2. so that all random variables are identically distributed.3.i. Definition 2..3. bi ] = ni .tN ) given N points in the interval (0.3..i. Depending on the choice of the time origin.3. This result can also be thought of in the following way: there are in fact N! ways of allo- cating the N time points (t1 ..2 Definitions Poisson process The most important type of point processes is the Poisson process . .) random variables. 18) Let N(ai . the following situations can occur: • f1 (x) = f (x).... bi ] has a Poisson distribution.1. . k} = ∏ e (2. . T ] is N!/T N . The resulting series of events is called a renewal process . For any N-uple (t1 . ... . N(A1 ). . X3 .tN ).. < tN ≤ T . which corresponds to a uniform distribution. T ]. bi ] with ai < bi ≤ ai+1 . with each time point being uniformly and independently distributed over (0. the interarrival times are independently exponentially distributed.2. Proposition 2. For a stationary Poisson process.

i.23) Equilibrium conditions and inter-arrival time distribution can also be derived [110].26 CHAPTER 2.d. 3. s is the survivor function corresponding to f defined as f (t) s(t) = . the process is a modified renewal process.22) 1 − F(t) Rt with F(t) = 0 f (u)du the cumulative distribution function corresponding to f . around the cluster center with some probability density function f . and F(x) is the distribution function cor- responding to f (x). MATHEMATICAL BACKGROUND • f1 (x) and f (x) are not necessarily the same.d. with probability density function f and each cluster therefore forms a finite ordinary renewal process. A case of particular interest is a Poisson process of cluster centers. For an ordinary renewal process. (2.d. Let µP be the mean number of events per cluster. In the Neyman-Scott process the points of a cluster are i. and distribution of inter-arrivals within a cluster FA (x). i = 2. Cluster process Cluster processes are constructed as follows: there is a point process of cluster centres and to each cluster is associated a random number of points forming a subsidiary process or cluster.. More details on BLPPs will be given in chapter 4 where they will be used to model IP packet arrival times. A necessary condition to have a stationary Bartlett-Lewis process is to have both µP and µA finite [110]. µA the mean inter- arrival of events within a cluster and λA = 1/µA the average rate of arrivals within a cluster. The cluster process consists of the superposition of all the separate clusters.i. the density function f governing each inter event time is called the renewal density. where t˜ is the time of the most recent event prior to time t. . a number P of points in a cluster with discrete density Pr{P = k} = pk . (2. The conditional rate is given by λ (t) = s(t − t˜). the intervals between successive points in a cluster are i. the process is an equilibrium or stationary renewal process. The resulting cluster process is then called Poisson cluster process(PCP) . In that case.. These subsidiary points are distributed around the cluster center in some specified way. the average rate of X(t) reads λX = λF µP . There are two main cluster processes widely studied in applications: the Neyman-Scott point process and the Bartlett-Lewis point process(BLPP). . In what follows it is assumed that the number of points in different clusters is i. 1−F(x) • f1 (x) = µ . In the Bartlett-Lewis process. where µ = EXi . The cluster centers may or may not be included in the final process. Let consider a Bartlett-Lewis process X(t) with clusters arrival rate λF .i.

g. 256) A point process is infinitely divisible if and only if its finite dimensional distributions are infinitely divisible.f. renewal and cluster processes in the modelling work presented in this thesis. its probability generating function is of the form exp[−µ(1 − G(z)] where G(z) is itself a probability generating function. it can be represented as the superposition of k independent. First order moment For a stationary orderly process of finite rate λ . [46] (p.. A Poisson process provides a simple example of such process .e.3. Therefore N(A) can be seen as a random sum of independent random variables with common p. G(z). if X(t) is a Poisson process with rate λ then.3 Moments Let consider IE{N(A)}. [46] (p. k. This means that for any arbitrary set A. where the number of terms in the sum is a Poisson random variable with mean µ. identically distributed.2. the distribution of N(A) for a Bartlett- Lewis process is infinitely divisible1 . IE{N(A)} = λ |A|. X can be represented as a sum X = Xk1 + . There are many other interesting types of point processes. N(B)} for arbitrary sets A and B. so we focus exclusively on these models in the following. for each integer k > 0. From [64].3. we will only be using Poisson. The Bartlett-Lewis process described in the previous section is also an infinitely di- visible process since for any integer k > 0 it can be expressed as a sum of Bartlett-Lewis processes with cluster rate λF /k and same characteristics for the internal structure of each cluster.2.24) 1 more generally it can be shown that any Poisson cluster process is in fact an infinitely divisible point process . Lemma 2. such as self exciting point processes and autoregressive processes (see [70] for an interesting study of infinitely divisi- ble autoregressive processes). i = 1. . are independent Poisson processes with common rate λ /k... POINT PROCESSES 27 Infinitely divisible point process Definition 2. (2. an infinitely divisible discrete distribution is a compound Poisson distribution. + Xkk where the Xki . 255) A point process is said to be infinitely divisible if. Var{N(A)} and Cov{N(A). 2. i.3. point process components.. In fact.. for every k.3. However.3.

28 CHAPTER 2. (2. N(lδ . (2. z + δ1 ] = 1})2 + o(δ1 ) = λ δ1 + o(δ1 ) (2.30) k=0 and K−1 Var{N(t)} = ∑ Var{N(kδ . U(x) = h(x) (2. Suppose that an event occurs at time t = 0.25) δ1 . MATHEMATICAL BACKGROUND where |A| is the measure of the set A.32) 0 and Z t Z t Z t−z Var{N(t)} = Var{dN(v)} + 2 dz du Cov{dN(z). z + δ1 ]})2 = Pr{N(z. (k + 1)δ ] (2. z + δ1 ] is o(δ1 ). dN(z + u)} (2. z + δ1 ]} = IE{(N(z.26) Z x = h(u)du (2. z + δ1 ])2 } − (IE{N(z.t) into K intervals of length δ and consider N(t) as a sum of counts in small intervals K−1 N(t) = ∑ N(kδ . N(B)}. E{N(t)} = λt.34) .t + δ2 ) > 0|N(−δ1 . Thus Var{N(z.31) k=0 l=0 In the limit of large K this leads Z t N(t) = dN(v). so in the limit δ1 → 0. 0) > 0} h(t) = lim . (k + 1)δ ]} k=0 K−1 K−k +2 ∑ ∑ Cov{N(kδ .t). By definition U(x) = IE[N(x)] (2. In particular if A = (0. N(z.27) 0 Therefore we have the formal relation . (k + 1)δ ]. z + δ1 ) take values 0 or 1. Second order moment For disjoint sets A and B: Var{N(A ∪ B)} = Var{N(A)} + Var{N(B)} + 2 Cov{N(A). x]. (2.29) Let divide the interval (0.δ2 →0 δ2 Let U(x) be the expected number of points in the interval [0. (2.28) U(x) and h(u) will be used in the remaining to derive basic properties of point processes.33) 0 0 0 For an orderly process the probability of at least two point in (z. z + δ1 ] = 1} − (Pr{N(z. Define the conditional intensity function by Pr{N(t. (l + 1)δ ]}.

In the case of an ordinary renewal process. z + δ1 ]}] −IE{N(z. z + δ1 ] = 1}Pr{N(z + u..37) 0 0 Because of stationarity h(−u) = h(u) and therefore c(−u) = c(u). z + u + δ2 )} = IE[N(z.3.3 is called the renewal function .38) can be written as ψ(ω) = λ (h̃( jω) + h̃(− jω) + 1) (2. z + δ1 ]}IE{N(z + u. z + δ1 ).d.3.3. z + u + δ2 ]|N(z. For ω > 0 equation (2. z + u + δ2 ] = 1|N(z.33) becomes Z t Z t Z t−z Var{N(t)} = λ dv + 2 dz duλ h(u) − λ 2 0 0 0 Z t Z t = dz du c(u − z) (2. N(z + u..35) One can define the covariance density of counts by c(u) = λ δ (u) + λ h(u) − λ 2 ....38) −∞ Let h̃(s) be the Laplace transform of h(u)... z + δ ]IE{N(z + u.. z + u + δ2 ]} = Pr{N(z.2. + Xn for n = 1. z + u + δ2 ] = 1} + o(δ1 δ2 ) = λ h(u)δ1 δ2 − λ 2 δ1 δ2 + o(δ1 δ2 ) (2..2. z + δ1 ] = 1} −Pr{N(z.4 Density spectrum The spectral density of counts is defined as the Fourier transform of the covariance density and reads Z +∞ ψ(ω) = c(u)e− jωu du −∞ Z +∞ = λ +λ (h(u) − λ )e− jwu du (2. of i.i. non negative random variables with probability distribution F.39) Renewal process Let consider a sequence X1 . z + δ1 ] = 1}Pr{N(z + u. the function U(x) defined in section 2. (2. POINT PROCESSES 29 For u > 0. Define S0 = 0 and Sn = Sn−1 + Xn = X1 + . Cov{N(z. X2 .36) Equation (2. 2. {Sn } represents the arrival times of an ordinary renewal process.

(2. MATHEMATICAL BACKGROUND and reads: U(x) = IE[N(t)] ∞ = 1 + ∑ Pr(Sk ≤ x) k=1 ∞ = 1 + ∑ F k∗ (x) (2. U(t) therefore reads Z ∞ Z t U(t) = dF(x) + (1 +U(t − x))dF(x) t 0 Z t = 1+ U(t − x)dF(x). (2.41) 0 If x > t the only point in [0.45) where h̃ = L [h] and f˜ = L [ f ].43) 0 From equation (2.43) reads Z t h(t) = f (t) + h(t − x) f (x)dx. Assuming that F has a density f .48) . time of the first renewal.t] is X0 and IE[N(t)|X1 = x] = 1. the renewal density (or hazard function) is h(t) = λ and the density spectrum is φn (ω) = λ /(2π). If x ≤ t IE[N(t)|X1 = x] = 1 +U(t − x). This leads f˜(s) h̃(s) = (2. (2.40) k=1 Conditionning on X1 .44) 0 In the Laplace domaine: h̃(s) = f˜(s) + h̃(s) f˜(s). (2.30 CHAPTER 2. (2.46) 1 − f˜(s) The spectrum of the renewal process with inter-arrival density f therefore reads f˜( jω)  f˜(− jω)  ψ(ω) = λ + +1 1 − f˜( jω) 1 − f˜(− jω)  1 1  = λ + −1 (2. one can write: Z ∞ U(t) = IE[N(t)|X1 = x]dF(x). (2.42) is called the renewal equation .42) 0 Equation (2. and equation (2.47) 1 − f˜( jω) 1 − f˜(− jω) For the special case of a Poisson process with rate λ : f (x) = λ exp(−λ x).28) we have h(t) = U̇(t). one gets by differentiation Z t U̇(t) = 0 +U(0) f (t) + U̇(t − x) f (x)dx.

3. Let X3 (t) = X1 (t) + X2 (t).49) Thinning In general terms.51) and therefore hq (u) = qh(u) and h̃q (s) = qh̃(s). A useful quantity when studying thinning is the average number of points in [0. (2. This notion will be used extensively in chapter 5. thinning. (2. Superposition The superposition of point processes corresponds mathematically to addition: N3 is the superposition of N1 and N2 if for any measurable set A of S N3 (A) = N1 (A) + N2 (A).2.2.d.53) and thus ψq (ω) = q2 ψ(ω) + q(1 − q)λ . x] defined by equation (2. (2.50) Define Uq (x) the number of points in [0. Let X1 (t) and X2 (t) be two independent point processes with respective spectra ψ1 (ω) and ψ2 (ω). Its spectrum reads ψ3 (ω) = ψ1 (ω) + ψ2 (ω). One has the relation Uq (x) − 1 = q(U(x) − 1).26) as U(x) = IE{N(x)}.3. (2. POINT PROCESSES 31 Bartlett-Lewis point process Details on the spectral density of the BLPP will be presented in section 4. x] for the thinned process Xq and hq (u) its condi- tional intensity. In what follows. we are only concerned with i.52) From equation (2.38) the spectrum of Xq reads ψq (ω) = λq (h˜q ( jω) + h˜q (− jω) + 1) = qλ (qh̃( jω) + qh̃( jω) + 1) (2. the thinning of a point process X with rate λ consists in keeping each point of X with probability q or rejecting it with probability 1 − q to form a new point process Xq with rate λq = qλ .5 Operations on point processes Point processes are mathematical objects that lend themselves to a large range of operations.i. 2. (2.54) .3.

3.51). Note that N(t) = N1 (t) + N2 (t). N1 (t) is a Poisson process with rate λ q and N2 (t) is a Poisson process with rate λ (1 − q).55) Thus f˜q (s) f˜(s) h̃q (s) = = qh̃(s) = q . its renewal function Uq is given by Uq (x) − 1 = q(U(x) − 1).58) −∞ .d.57) gives f˜q (s) = qλ /(s + qλ ). Let the count- ing process N(t) be a Poisson process with rate λ . Obviously.d.54) is valid for the i.f.57) 1 − (1 − q) f˜(s) In the particular case of a Poisson process with rate λ and f˜(s) = λ s+λ . each point ti of a stationary orderly point process X(t)with rate λ is shifted by a random amount with p. • Poisson process A fundamental result concerns the i. locally finite and second order stationary.i. • Cluster process The more complex case of thinning a BLPP will be detailled in chapter 5.d. X f (t) is also an orderly stationary process with rate λ f = λ . MATHEMATICAL BACKGROUND Equation (2. (2.4. • Renewal process Let us now consider a thinning process where each point Sn of a renewal process X is retained with probability q or omitted with probability 1 − q. The two Poisson processes are independent. The new process Xq is a renewal process with interarrival density fq (x).i. From equation (2. (2. Random translation In this operation. (2. Theorem 2.56) 1 − f˜q (s) 1 − f˜(s) and q f˜(s) f˜q (s) = . thinning of a Poisson process. The conditional intensity of X f (t) is given by Z +∞ h f (u) = h(u − v) fD (v)dv. This is another way to prove that a thinned Poisson process with rate λ is a Poisson process with rate qλ . equation (2. Let N1 (t) and N2 (t) denote respectively the type I and type II events. Each time an event occurs it is classified as a type I event with probability q or a type II event with probability 1 − q independently of all other events.32 CHAPTER 2. (2. thinning of any point process X which is simple. f to form a new point process X f (t).

Limits The Poisson process frequently arises as a limiting process for the above operations. (2. In fact.2.61) Time substitution In this operation. From equation (2. We will use this result in chapter 5. WAVELET ANALYSIS 33 where fD is the density of the difference D between two independent translations each with density f . a point process M is transformed into a point process N by writing N(t) = M[Λ(t)].60) −∞ Since fD is symmetric. we focus on wavelet based meth- ods because of their intrinsic scaling properties which make them particularly suited for this purpose. (2.54) becomes q→0 ψq (ω) ∼ qλ .4 Wavelet analysis The last mathematical topic of interest in this chapter concerns the analysis of scaling process. (2. However. the operations of superposing. For instance in the limit of small q equation (2. (2.62) for some non-decreasing.63) which is the spectrum of a Poisson process. function Λ. in what follows. Such operation may be used to transform a general point process into a Poisson process [104]. f˜D (ω) is real.4. the spectrum of X f (t) satisfies: Z +∞ ψ f (ω) = λ + λ (h f (u) − λ )e− jωu du −∞ Z +∞ Z +∞ = λ +λ dv fD (v) du(h(u) − λ )e− jωu du −∞ −∞ = f˜D (ω)ψ(ω) + (1 − f˜D (ω))λ . thinning or random translations on an initial point process are entropy increasing and tend in the limit to create a Poisson process (process with maximum entropy) [46]. There exists many ways to study the scaling behaviour of a process based on aggregated time series [169]. possibly random. In fact f˜D (ω) = | f˜(ω)|2 and the spectrum of the translated point process reads ψ f (ω) = | f˜(ω)|2 ψ(ω) + (1 − | f˜(ω)|2 )λ (2. For instance they allow to get an unbiased estimate of the Hurst parameter of an .59) where Z +∞ f˜D (ω) = fD (u)e− jωu du.38). 2.

(2. called the number of van- ishing moments. The first term in equation (2. k)φ j0 . . MATHEMATICAL BACKGROUND LRD process.34 CHAPTER 2. (2. defined as the largest integer N such that Z t k ψ(t)dt = 0. The mother wavelet ψ is also characterized by an integer N.k i. φ j.k = 2− j/2 ψ(2− j t − k). one can express X(t) as a sum of weighted wavelets: nj X(t) = ∑ cX ( j0 . k) = hX. (2. A thorough description of wavelet analysis can be found in [120]. shifted in time by 2 j k and with central frequency 2− j f0 .1 Definition Discrete wavelet analysis consists in comparing a signal X(t) with locally oscillating wave- forms known as wavelets. In a similar way to expressing X(t) as a sum of weighted sinusoids in a Fourier transform.64) k j≥ j0 k=1 where φ is a low pass function companion of the mother wavelet ψ. 5]... . More precisely. 2. ψ j.k + ∑ ∑ dX ( j. It is in essence similar to a Fourier transform where one compares a signal X(t) with a family of sinusoids. (2. The cX ( j0 . k)ψ j. and see [6] for theoretical and practical details.64) constitutes a coarse approximation of the signal X(t). Moreover wavelet transforms can be calculated with a fast O(n) algorithm. whereas estimates based on a Fourier spectrum are biased [4. This mother wavelet can be shifted and scaled to give rise to a family of wavelets ψ j. while the second term adds details at different scales.k i.k (t). in which case the scaling and wavelet coefficients can be obtained by inner products: cX ( j0 . N − 1.66) A key practical advantage of wavelets is the fact that the coefficients can be computed from a fast recursive algorithm with computational complexity O(n). k = 0.65) dX ( j.4. with central frequency f0 . 1. a mother wavelet ψ is a band- pass function localised both in time and frequency. In particular wavelets with higher N are smoother and capable of analysing signals with higher order divergences. k) are the wavelet coefficients. Wavelet and scaling functions can be constructed to be orthogonal.67) It can be shown that the number of vanishing moments plays a key role in the analysis of scaling [6]. k) are known as scaling coefficients while the dX ( j.. k) = hX.

k) of a HSS process X(t) with stationary increments have the following properties (see [6] and references therein): d • P1: {dX ( j. k)|q .2. k ∈ Z }.4. k). but much better suited to the study of fractal processes. . k ∈ Z } = {2 j(H+1/2) dX (0. (2. and short range dependent if N ≥ H + 1/2. • P2: {dX ( j. k)|q = 2 j(qH+q/2) E|dX (0. It can be shown that the variance of its wavelet coefficients satisfies: Z IE|dX ( j. More particularly. k)|q and the scale j. which is why it is well suited to study scaling phenomena. 2. k)|2 = 2 j(2H−1) E|dX (0. (2.68) Let X(t) be a continuous time stationary process with power spectral density ΓX (ν).3 Estimation The fact that the coefficients dX ( j.2 Properties The wavelet basis is by definition scale invariant. WAVELET ANALYSIS 35 2.69) can be viewed as defining a kind of wavelet energy spectrum.70) nj ∑ k where n j is the number of wavelet coefficients at scale j. Property P1 can be compared with equation (2. k)|q with a simple time average 1 Sq ( j) = |dX ( j. k)|2 = ΓX (ν)2 j |Ψ(2 j ν)|2 dν. k ∈ Z } is stationary for each j fixed. Finally property P3 means that there exists a power law relationship between the qth order moment E|d( j.4. it can be shown that the discrete wavelet coefficients dX ( j. This applies for correlation between coefficients at a given scale or between scales. k) are stationary for j fixed (property P2) implies that one can use an ergodicity argument to efficiently estimate the statistical average E|dX ( j. the scaling behaviour of the wavelet coefficients reads E|dX ( j.69) where Ψ(ν) denotes the Fourier transform of ψ. • P3: E|dX ( j. (2. k)|2 for large j. k).1). In the case of an LRD process.4.1) and shows that wavelets form an ‘ideal’ basis to study HSS process since the relation between wavelet coefficients at different scales mimics equation (2. Equation (2. k). k)|q for each j fixed. Property P2 means that the long range correlation in the signal has been turned to a short range correlation in the wavelet domain. analogous to a Fourier spectrum.

The value of the Hurst parameter can be obtained from the estimated slope α2 of the line.36 CHAPTER 2. a linear relationship exists over the whole range of scales. One has to check in particular that a potentially LRD data is stationary for the value of 1/2 < H < 1 to be valid. Abry and Veitch [7] showed that second order scaling can be efficiently studied by plotting log2 (S2 ( j)) against the scale j in a so called Logscale Diagram (LD): LD : log2 (S2 ( j)) vs j. On the other hand. In any case care must be taken in the interpretation of the logscale diagram. semi-parametric estimates of scaling exponents with excellent properties can be formed using weighted regression to measure the slope over the range of scales where the scaling exists. straight lines constitute experimental evidence for the presence of scaling within the analyzed data over a certain range of scales. if 0 < α2 < 1.2: Examples of LDs.8. a straight line observed in the range of the largest scales with slope in (0. In practice. 1) (see figure 2. Scaling of higher order moments can be studied in a similar fashion by plotting log2 (Sq ( j)) . In these diagrams.71) LDs will be our primary tool to analyze Internet traffic throughout this thesis. the case α2 > 1 corresponds to a non stationary (asymptotically) self-similar process with H = (α2 − 1)/2. the process is LRD with H = (α2 + 1)/2. at q = 2.3(d) page 47. Stars (lower curve): Poisson process (λ = 1). α = 0. MATHEMATICAL BACKGROUND 1 4 16 64 256 1024 4096 10 8 log2 Variance( j ) 6 4 2 0 2 4 6 8 10 12 14 j = log2 ( scale ) Figure 2. More generally. (2. Figure 2.2 gives an example with synthetic data. For instance. while an example with traffic data can be found in figure 3.6). Top plot: fGn (H = 0. Diamonds: Renewal process with gamma inter-arrivals (λ = 1. shape=1/4). For a HSS process. For example. whereas a LRD process is characterized by a straight line at large scales for q = 2.2) betrays long memory.

72) For q fixed. WAVELET ANALYSIS 37 against the scale j in a qth order Logscale Diagram: q-LD : log2 (Sq ( j)) vs j. We address three relevant issues which are typically ignored. ·)|q = c 2αq j over some scale range is seen as a straight line in the q-LD. Long Range Dependence is a second order property. Follow- ing [6]. The estimation of the slope is made by a weighted linear regression of Sq ( j) on j.2. (1) Confidence intervals often receive little attention. More details on and an exact formulation of the scaling exponent estimation can be found for instance in [6].4 Making sense at small scales The analysis at small scales is considerably more difficult than at large scales. and a measurement of its slopes is an estimate of the corresponding q-specific scaling exponent αq . one has to look over a range of q values. Another practical problem when estimating the scaling behaviour of a process is to know which values of q to choose.4. whereas in the multifractal case one would have αq = H(q) − q/2. This means that the values of Sq ( j) for small j are estimated with a greater confidence than at large j. making it easier to assess align- ment. By definition. we use a . Therefore. In- deed. By definition of the wavelet coefficients. when looking at the scaling for small values of j. we plot hq ≡ H(q)/q against q and check for horizontal alignment. Using this approach also has the advantage that the confidence intervals are approximately of the same size. The question of alignment of points is intrinsicly related to the confidence intervals put on the estimation of Sq ( j) at different scales j.73) we call the Linear Multiscale Diagram. This plot: LMD : hq vs q. or are based strongly on Gaussian assumptions. so looking at q = 2 only is enough to detect it. the number of coefficients at scale j is twice what it is at scale j + 1. which can be delicate in marginal cases. On the other hand. Since at small time scales TCP/IP data is highly non-Gaussian. (2. 2. recall from property P3 that for a HSS process with stationary increments the local slope αq relates to the order q via αq = qH − q/2. a behaviour IE|dX ( j. where a single exponent is insufficient to describe the scaling behaviour. if the plot H(q) = αq + q/2 against q is a straight line.4. rather than plotting H(q) against q and looking for linearity. the process is HSS whereas any departure from a straight line indicates a multifractal behaviour. (2.

typically this is either omitted. one can do an exact initialisation which alleviates this problem. It is important to understand that this level corresponds to variance and not to rate. such as X(t). such as packets inter-arrival times. however real inter-arrival times are not necessarily exponential. An approximate onset scale for this trivial scaling at large scale is log2 (16/λ τ) = 6. viewed as a continuous time process with delta functions at each arrival point. or only samples X(kτ) are available.i. in figure 2. The third plot is the familiar near-linear graph of fractional Gaussian noise (fGn). This is important as 3/4 of the data is concentrated at these scales! However. 2. The lower curve is for a Poisson process with λ = 1.5 Conclusion In this chapter we have presented the main mathematical tools that will be used in the rest of this thesis. Note the apparent scaling at small scales with α > 0. in the case of a point process. as log2 (S2 ( j)) = log2 (variance( j)) = log2 λ = 0.69) predicts IE|dX ( j. k)|2 = λ .38 CHAPTER 2. MATHEMATICAL BACKGROUND non parametric technique based on general wavelet properties to estimate them more directly from data. 2. but it is asymptotically flat at a level of log2 (λ /c) = 2 which reflects the higher variance 1/cλ 2 of the inter-arrivals. (3) For intrinsically discrete data. The middle curve shows a point process with i. gamma distributed inter-arrivals with shape parameter c = 1/4. A Pois- son process is a simple model of flow or packet arrivals. significant errors are made for j = 1. without which. Equation (2. 2. a flat wavelet spectrum corresponding to trivial scaling (α = 0). The horizontal axis is calibrated both in oc- tave j and time t = τ ∗ 2 j .d. a discrete time series with an early onset of LRD at j = 3. with spectrum Γ(ν) = λ 2 δ (ν) + λ . k) requires initialisation. (2) The O(n) algorithm which calculates the dX ( j. and multiplication of X(t) by a constant a translates as a level shift in the LD of log2 (a). The fundamental concepts of LRD point processes and Logscale Diagrams . As a guide to interpretation. In the continuous cases the base resolution. j = 0. Means are eliminated by the wavelet analysis. however for real data.2 Logscale Diagrams are given of two con- tinuous time and one discrete time process. again. standard wavelet analysis does not apply and we use the special initialisation step of [174]. Note how the confidence intervals are smallest at small scale. was set to τ = 1/4 as an example. result- ing in initialisation errors which are very significant for j = 1. The spectrum is no longer flat at small scales. also with λ = 1. which agrees with the estimate in the figure.

this mathematical background is given as a reference to which the reader can come back to when needed. CONCLUSION 39 were detailled.5. . Although a thorough understanding of all these notions is not required to understand the following chapters.2.

.

The insight we get about Internet traffic is of fundamental importance for the rest of the thesis. At the flow level. and the resulting effect on the scaling structure noted. heavy tailed file sizes [179]. This similarity 41 . We then present a very thorough analysis of the origins of these traffic statistics for both flow and packet arrival processes. and we employ it extensively as a tool to track down the connections and origins of scaling behaviour. We ex- tend this idea and selectively modify several of the components comprising the full packet stream. a second. clear LRD at large scales. a difficult task for such complex data. the random re- ordering of blocks of a time series. scaling regime at small scales. first proposed in [56]. our starting point is the somewhat surprising observation that the scaling seen at the IP level. cannot explain LRD in the flow arrival process. It can also be used to selectively test models for portions of the traffic structure. our approach is based on the idea of ‘shuffling’. At the packet level. without having to postulate a full model from the outset. is roughly similar to that found in the arrival process of TCP flows.e. and will directly inspire the choice of traffic model we make in chapter 4. Namely.1 Introduction In this chapter we first describe the empirical measurements used throughout this thesis. For example. This is a way of modifying the correlations of the data whilst preserving the original structure within blocks.Chapter 3 Empirical observations and semi-experiments 3. and give some general statistics about Internet traffic. and a transition scale at around 1 second separating them. i. This is surprising in that the prevailing view on the origins of LRD at the IP level. such as packets counts. though less clear. details of the arrival process of flows can be altered while pre- serving in full the packet patterns within each flow. We call this way of virtually investigating ‘what if’ scenarios the semi-experimental method.

EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS immediately raises the question of the link between the two. for example processor load in web servers and proxies.42 CHAPTER 3. we focus solely on passive measurements.1 Passive measurements Packet collection tools have been developed since the inception of packet switched networks as a means to debug protocol stacks or network interfaces. Are the twin scaling regimes at the IP level.6. These cards are plugged into the Peripheral Component Interconnect (PCI) bus of a standard Personal Computer (PC) run- ning the Linux operating system. There are in fact many ways of collecting packets on a link. and then flows. another motivation for pursuing an understanding of the flow level arrival processes is the direct role they play for flow level performance. packet addresses . based either on a software or hardware solution. They differ from active measurement techniques where artificial traffic is injected in the network. This is important for hierarchal traffic models where an arrival process of sessions. For privacy reasons. due to or influenced by the corresponding features at the flow level. the phys- ical layer header and the first 40 bytes of the physical layer payload.2 and then present some statistics on the flow arrival process in section 3. One can also use software tools such as tcpdump to investigate IP packets on a LAN. for instance to estimate link bandwidth.2 The data and data processing 3. We analyze the link between packet and flow arrival processes in more detail in section 3. We first introduce the data in section 3. The resulting traces gather the timestamp. For instance one can use hardware equipment such as a line tester or a protocol analyser to generate real time counts of link layer faults or packet arrivals. or even two independent mechanisms ? Answers to such questions will tell us if the fractal structure of flow arrivals is important to model accurately or not. These measurement techniques are non intrusive. for both online and offline analysis. which is sufficient in most cases to extract IP and TCP header information. Although the IP level is of great importance for router throughput. We then apply the semi-experimental method on the packet arrival process in section 3.2. and use packet traces collected with high precision hardware known as DAG cards [44]. or aspects of it. and are often referred to as passive measurements.3.5. and present our conclusions in section 3. or are they both the result of some common mechanism.4 and report our findings. in the sense that they do not modify the traffic. forms the backbone of the final packet level model. 3. They provide loss-less measurements of the link with GPS synchronized timestamps [125]. In this thesis.

the Auckland II and Auckland IV data sets. TCP and UDP headers in section 3. We first give a brief overview of simple traffic statistics obtained from IP. were collected on the Internet access link of the University of Auckland.3 and the central observations of our empirical work in section 3. We also study traces recorded by the Distributed Real Time Systems group [47] at the University of North Carolina (UNC-a0 and UNC-a1).3. We focus on two three hour periods during week days. We mainly study traces recorded by the WAND group at the University of Waikato in New Zealand. The last three traces included in table 3. .1. World Wide Web (WWW) traffic 1 1 We define WWW traffic as traffic on port 80.2. These traces are used to make sanity checks on our main results as they are from different geographical regions and have different bit rates. owing to the speed limitations of modems. when the measurements were taken. when we present the data used in chapters 6 and 7. from the Cooperative Association for Internet Data Analysis [39] (CAIDA-b1) and from the Abilene Internet II [2].4. These traces. Specifications of IP.2. corresponding to apparently stationary traffic rate for a ‘low’ and ‘high’ activity period respectively. 2:00 to 5:00 and 13:00 to 16:00 local time. and provide diversity in the packet rate within individual flows. for which we were slightly more involved in the collection. THE DATA AND DATA PROCESSING 43 on publicly available traces are systematically anonymized. renamed MelbISP. TCP and UDP headers are provided for reference in the appendix. Further technical details on DAG cards and physical layer overhead will be given in section 6.1 are from a small In- ternet provider based in Melbourne.2 First observations For most of the traces studied.2.2. 4 and 5. In fact we analyze subsets of these datasets. We do not give further details on hardware considerations here since we did not collect any of the traces presented in this section and used in chapters 3. This shows that TCP is by far the dominant transport protocol on the Internet. before presenting the concept of IP flow in section 3.2. 3.1. and TCP and UDP payloads removed. Moreover.2. from the NLANR repository [130] (NLANR-SDC and NLANR-TXS). TCP represents 90% of the packets and up to 97% of the bytes carried on the link. The Internet traces we analyze have a range of link speeds and geographic locations. allowing the extraction of each IP packet header together with an accurate timestamp. and are freely available on the web [177]. The raw traces are processed with the freely available CAIDA Coralreef tool suite [40] and C programs.2. as well as ports 8080 and other web proxies. details of which are summarized in table 3.

In the research community the generally accepted flow definition is a set of packets with the same 5-tuple {IP protocol. Figure 3.44 CHAPTER 3. Packet size distributions for other traces are very similar.0 OC3c (155Mbps) NLANR-TXS 2002/01/10 90s peak period 22. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS Traces Date Time Rate Link (local time) (Mbps) AUCK-a0 1999/12/01 13:00 to 16:00 1. Another interesting observation concerns the IP packet sizes. The striking feature is that there are virtually only three packet sizes on the link: 40 bytes. represent a large portion of the TCP traffic. destination IP address.3 IP flow decomposition The information contained in IP. source IP address. represented roughly 70% of all the TCP packets. which corresponds to the minimum IP packet size and is often an acknowledgment packet sent by the TCP receiver. as of 2004.5 Gbps) Abilene 2002/08/14 10:00 to 10:10 418 OC48c MelbISP-1 2000/04/25 19:00 to 22:00 0. TCP and UDP headers allows IP packets to be categorized into different flows.5 OC3 AUCK-d0 2001/04/02 13:00 to 16:00 3.03 Unknown MelbISP-3 2000/04/28 19:00 to 22:00 0.4 OC3 (155 Mbps) AUCK-b0 2001/03/30 13:00 to 16:00 3.8 OC12 NLANR-SDC 1998/11/26 90s peak period 11. date of the recordings. On the other hand. time of the day analyzed.8 OC12 (622 Mbps) UNC-a1 2000/09/27 19:30 to 20:30 44.6 OC3 AUCK-d1 2001/04/02 13:00 to 16:00 2. destination port}. These numbers might fluctuate.5 OC3c CAIDA-b1 2002/08/14 10:00 to 10:10 638 OC48 (2. and with a maximum nearest neighbour packet inter-arrival time T0 [35].1 shows the packet size distribution for the UNC-a1 trace. a notion central to our work already introduced in the previous chapter and that we now develop further.4 OC3 UNC-a0 2000/09/27 19:30 to 20:30 179. 3. This simple empirical observation shows that the usual assumption made in the field of queuing theory where one often takes an exponential distribution to describe packet sizes has no empirical backing.2. source port. the maximum IP packet size for ethernet traffic. the proportion of UDP packets might also be different in 2004 traffic due to the advent of streaming media and voice over IP technology.1: Description of the traces: name. another one around 600 bytes.3 OC3 AUCK-c1 2001/04/02 02:00 to 05:00 0. For instance. whereas it was not very significant in 2001. IP flows are defined slightly differently in a router where a flow can . utilization.5 OC3 AUCK-c0 2001/04/02 02:00 to 05:00 0.03 Unknown MelbISP-2 2000/04/27 19:00 to 22:00 0. one would expect that peer-to-peer traffic would. link speed. Finally. and 1500 bytes. depending on the ‘killer’ application at the time of measurements.03 Unknown Table 3.

3. SYN-ACK. it was found that the above definition gave a very similar classification to that provided by tracking TCP connections by monitoring SYN.e. The UNC-a0 trace for example. 5-tuple with static timeout. the key quantity is the set of arrival times tP (k) of packets indexed in arrival order: k = 1. with the additional advantage of keeping track of late packets transmitted after connection closure.1: Packet size distribution for UNC-a1. statistics of individual . be terminated due to: (i) timeout. representing less than 1% of all connections.6 F(p) 0.4 0. Another definition worth mentioning is found in [157] where an adaptive timeout based on flow characteristics is used. all individually tracked.4Mhz workstation running Linux with 2 Gigabytes of fast memory. i.2 0 0 500 1000 1500 packet size p Figure 3. we used a dedicated file server delivering compressed data off a RAID over Gigabit Ethernet to a 2. This time series defines the continuous time point process X(t) = ∑ δ (t − tP (k)) of packet arrivals we wish to model. Considerable computation is required to perform the packet and flow level analyses. At the packet level. To run our C and Matlab programs. · · · K. In the case of TCP. but also (ii) protocol (FIN or RST packet sent by TCP) or (iii) memory management (the flow is terminated by the router software exporting flow statistics in order to free resources for new flows).2.8 0. 000 flows and 77 million packets. It therefore gives a general framework to compare TCP and UDP flows. consists of 2 Gigabytes compressed. This technique also captures the many connections that do not terminate correctly. FIN and RST packets. contains 800. THE DATA AND DATA PROCESSING 45 1 0. In the rest of the thesis we adopt the first definition. 2. where flows are not individually tracked. This classification only uses IP level information and port information common to TCP and UDP. The actual value of the timeout T0 will be taken to be 64 seconds [40] for all the traces. A discussion on the value of T0 will be provided in chapter 5. or equivalently the inter-arrival sequence A(k) = tP (k) − tP (k − 1). At the flow level . From the raw data many different time series can be constructed.

We also located and stored. packets with the same five tuple are grouped together as represented by the coloured rectangles. i.9). while the empiri- cal data shows more variations. Once the flow analysis is done.2.3(a). (3.46 CHAPTER 3.2 illustrates the decomposition of Internet traffic into flows. the packet arrival process X(t).4 Central observations: biscaling and heavy tails Scaling behaviour We first illustrate with empirical data the scaling phenomenon introduced in chapter 2. which requires extensive computation and storage space. the packet count for trace AUCK-d0 is plotted for different levels of aggre- gation. Packets with the same 5 tuple have the same color and are grouped together to form an IP flow. In figure 3. · · · I. a complete list of packet inter-arrival times.2. Figure 3. An aggregated Poisson process with same arrival rate as the data is plotted for comparison in figure 3. The flow arrival process Y (t) illustrated on the top axis is formed by the first packet of each flow. give the number of packets and durations in seconds respectively of successive flows (D(i) is only defined if P(i) > 1). The bottom axis represents the arrival times of IP packets on a link.2: Flow decomposition of the packet arrival process X(t): the bottom axis shows the arrival times of packets. 2. Figure 3. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS Flow arrivals: Y (t) Packet arrivals: X(t) Time Figure 3. β ' 0. for each flow.3(c) shows the variance of the packet count X(m) in bins of size m as a function of the aggregation level m on a log log plot. i = 1. In addition to the set of arrival times tF (i) of flows defining the flow arrival process Y (t) = ∑ δ (t − tF (i)) . the variance decays as a power law: Var(X (m) ) = O(m−β ). For large time scales. The striking observation is that the Poisson process becomes very smooth as the aggregation level increases.1) This is consistent with a long range dependent phenomenon with Hurst parameter which . flows are collected.e.3(b). The flow arrival process Y (t) is constructed by taking the arrival time of the first packet of each flow. 3. the intrinsically discrete series P(i) and D(i). as defined in equation (2.

with ‘slowly’ decaying variance. The variance of X(m) is said to decay ‘slowly’ by comparison with the exponentially decaying variance of the corresponding Poisson process.4. In the rest of this thesis we use wavelet based estimates called Logscale Diagrams (LD).8. can be roughly estimated as H = 0. Figure 3.031 0. (d) Wavelet energy spectrum. A review of different techniques can be found for instance in [168]. and shows convincingly that the variance of the packet counts decays ‘slowly’ for large m. it does not provide any confidence intervals on the estimation of H.3(d) shows the LD for the packet arrival process of trace AUCK-d0. Although this aggregation technique gives a clear illustration of the LRD phenomenon.12 0.5 2 8 32 128 22 Data Data Poisson process 20 Poisson process 2 18 log2 ( Var( X (m) )) log2 Var( d j ) 0 16 14 −2 12 10 −4 8 0 5 10 15 −5 −3 −1 1 3 5 7 log2 (m) j = log2 ( a ) Figure 3.3: Packet arrival rate with aggregation level m for (a) measured IP packet arrivals and (b) Poisson process. Estimating the Hurst parameter of a time series is by itself a research topic.3.3. The . which were introduced in section 2. (c) Corresponding variance-time plot. THE DATA AND DATA PROCESSING 47 (a) Aggregated data (b) Aggregated Poisson process log2 (m) = 2 log2 (m) = 2 5000 5000 0 0 2000 4000 6000 8000 10000 2000 4000 6000 8000 10000 log2 (m) = 6 log2 (m) = 6 2000 2000 1000 1000 0 0 2000 4000 6000 8000 10000 2000 4000 6000 8000 10000 log2 (m) = 10 log2 (m) = 10 1500 1500 1000 1000 500 500 2000 4000 6000 8000 10000 2000 4000 6000 8000 10000 log2 (m) = 14 log2 (m) = 14 1000 1000 800 800 600 600 2000 4000 6000 8000 10000 2000 4000 6000 8000 10000 Time in seconds Time in seconds (c) Variance Time plot (d) Wavelet Spectrum 0.2.

48) and (2. (b) byte arrival process.86]. The vertical axis gives the log energy at a given time scale. The founding observation underlying our approach is the prevalence of this biscaling in all the traces we studied. This has also been reported by other researchers[62].48 CHAPTER 3.1. The wavelet spectral density of the Poisson process with same rate as the data is the horizontal line.08 0. The data LD exhibits a ‘biscaling’ behaviour. and lead to values of the local scaling parameters with confidence intervals. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS 30. At large scales the LRD is clearly seen in each trace.48 81. For instance.12 20.5mus 977mus 0. At smaller scales evidence for scaling is also present which.92 327.02 0. 0. and (c) flow arrival process Y (t) across all traces horizontal axis is labeled both in logarithmic scales (bottom) and in seconds (top).68 AUCK−a0 AUCK−a0 10 AUCK−b0 AUCK−b0 AUCK−c0 10 AUCK−c0 AUCK−c1 AUCK−c1 8 AUCK−d0 AUCK−d0 AUCK−d1 8 AUCK−d1 log2 Variance ( j ) UNC−a0 UNC−a0 6 UNC−a1 Var ( dj ) UNC−a1 Abilene 6 NLANR−SDC Mel ISP−1 NLANR−TXS 4 Mel ISP−2 Mel ISP−3 4 2 2 0 (a) 0 (b) −15 −10 −5 0 5 10 0 2 4 6 8 10 12 14 16 18 j = log2(a) j = log2 (scale) 12 0. For ease of comparison the plot ordinates have been normalised.28 5. Each vertical line gives the 95% confidence interval on the variance estimation at the corresponding time scale. in accordance with equations (2. Figures 3.02 0.32 1.4(a) and (b) show the LDs of packet and byte counts for most of the traces described in table 3.68 AUCK−a0 AUCK−b0 10 AUCK−c0 AUCK−c1 AUCK−d0 8 AUCK−d1 log2 Variance ( j ) UNC−a0 UNC−a1 6 NLANR−SDC NLANR−TXS 4 2 0 (c) 0 2 4 6 8 10 12 14 16 18 j = log2 (scale) Figure 3.78. and the ‘knees’ in the curves are distinctive and all located in a narrow band at about 1s. that is dual scaling regimes separated by a distinct knee . with confidence interval [0.69).28 5.82.12 20.08 0.32 1.031 1 32 1024 12 0.4: Biscaling in (a) packet arrival process X(t). a linear regression over the octaves 3 to 7 lead an estimate of the Hurst parameter of H = 0.92 327. Slopes are estimated by a weighted linear regression. although much noisier. for both packet and byte counts.48 81. recurs .

2.5 0 1 2 3 4 5 6 log( k ) log( x sec ) Figure 3.4(c). Theoretical details of TCP mechanisms are not presented in this thesis since they are not of primary importance for our work. THE DATA AND DATA PROCESSING 49 (a) (b) 0 0 −0.2 −1 −0. Let us now illustrate the packet arrival process within a TCP connection obtained from measurements.2 −4 −1. This biscaling behaviour is also found in the flow arrival process.5 2 2. and then illustrate packet dynamics inside a TCP flow. in addition to a power-law tail that contains only around 1% (depending on the exact definition of ‘tail’) of the mass. We start with general characteristics. (b) Heavy tailed flow durations D. and study flows in more detail. This heavy tail behaviour is consistent with the physical explanations of LRD given in section 1.5 0 0. UNC. and similarly for D in plot (b).4.4 −0. as illustrated in figure 3. but with different parameters. A presentation of key concepts. such as three way handshake. also has a distribution body which is close to power-law. In all cases results from the same group (AUCK. consistently across traces. such as flow size P and flow duration D. slow start phase or retransmission mechanisms can be found for instance in [166]. The fact that LDs of packet and byte counts have a similar shape intuitively means that packet sizes have little impact on the correlation of the byte count process.2.8 UNC Abilene Abilene Mel ISP −2 Mel ISP −6 −0.2.5(a) shows the remarkable power-law form of the distribution of P across traces. In chapter 4 we will discuss the con- sequences of the fact that P. as illustrated in figure 3. cor- . Flow characteristics We now make use of the information contained in each packet header to do a flow decompo- sition of the traces. This explains why in the following we will focus more on the timing of packets rather than on their size.5 1 1.3. a TCP server sends a certain number of data packets.6 −2 log( Pr[ P > k ] ) log( Pr[ D > x ] ) −0.8 −1 −3 −1.4 −1. Very briefly.5: Flows characteristics: (a) Heavy tailed body and tail of P (# packets in flows). MelbISP) are very consistent. Figure 3.6 AUCK AUCK −5 UNC −1.

successive bursts. such as the type of application using the connection.6(a) illustrates a rare ‘textbook’ TCP connection taken from an Auckland trace. the server sends another group of data packets. Figure 3. This is an important point to keep in mind when doing traffic modelling. the cross traffic encountered through the network or the bandwidth of the access .. Packets sent by the server are plotted with their respective arrival time and sequence number. and joined by a solid line. Given the different link speeds. bottlenecks and window sizes. Also. the TCP connection goes back to a slow start phase with exponential increase of its window size. Most TCP connections are in fact short and do not transmit enough packets to exhibit the ‘textbook’ behaviour illustrated in figure 3.6(b) shows the packet arrival patterns for various measured TCP connec- tions.6(a).5 261 261. the notion of ‘infinite’ source often used to model TCP behaviour proves to be a mathematical concept with little empirical backing. The main observation is that any pattern of packet arrivals can be found in ‘real life’ TCP connections: periodic. followed by a linear increase of its window size in a second phase. Since sequence numbers are ignored. Figure 3.6: Packet arrivals in a TCP connection: (a) Close up on TCP mechanism in a long connection. there is no obvious universal pattern of packet arrivals within a measured TCP connection. Many factors influence TCP connections. When the acknowledgment has been received. (b) Packet arrivals patterns vary wildly between connections. indicated by a sudden drop in the sequence number of the packet being transmitted.. packet arrival times from the same TCP flow are represented by vertical marks on a given axis. as seen in figure 3.5 262 Time in s Figure 3. Time scales are different for different flows. large periods of inactivity.50 CHAPTER 3. responding to a window size.5(a). and joined by a dotted line. to a TCP client that will then send back acknowledgment packets to the host. while a few are very long. paths. Packets sent by the client are plotted with their respective arrival time and acknowledgment number. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS (a) (b) Sequence Number 260 260. After a packet loss detection.

and then examine X(t) in section 3. Keeping in mind that we want to understand the structure of the packet arrival process in order to model it and answer question (i).7: Analysing the Flow Arrival Process Y (t): Logscale Diagrams for the Auck.3. and at small scales evidence for another scaling regime. point. interesting structure for Y is consistently found.II (lower set) and Auck.7 superimposes LDs of Y across many of the Auckland traces: they are very similar. The prominent features are the LRD at large scales. 3.3 Flow arrival process In all the traces studied in this thesis. with no apparent dominant feature. The exact impact of the flow arrival process on the packet arrival process is studied in section 3.25 1 4 16 64 256 1024 16 AUCK2 AUCK4 14 12 log Var( d ) j 10 2 8 6 4 −6 −4 −2 0 2 4 6 8 10 j = log ( a ) 2 Figure 3.062 0.016 0.IV traces. Each has LRD and a similar knee position j∗ .4(c). • The structure of packet arrivals within a TCP flow is highly complex. as shown in figure 3. we start with the underlying flow arrival process Y (t) in section 3. • Flows have a heavy tailed distributed number of packets. This biscaling behaviour is also seen . our aim is to get a better understanding of these observed statistics. The precise value of the LRD exponent varies but is typically around α = 0.4.6.5. and has an onset scale or ‘knee’ where the LRD begins which is very pronounced. The picture that emerges from the observations made in this section is as follows: • Both the packet arrival process X(t) and the flow arrival process Y (t) have a ‘biscaling’ structure. a clear knee at a characteristic scale around 1s (top edge shows seconds).3. in particular the position of the knee as a function of network parameters. In this section we examine Y more closely. and will be discussed further in the next section. Figure 3. Specifically it has LRD. In the rest of this chapter.3. FLOW ARRIVAL PROCESS 51 0.

and/or pipelining was being extensively employed2 . however the slope at small scales is smaller for AUCK4. The dynamics of TCP connection generation in WWW browsing sessions however is an obvious candidate. The advent of persistent TCP connections and connection pipelining allowed by HTTP version 1. and AUCK4 traces.1 [100]. HTTP flow arrivals and UDP flow arrivals. suggests that such dynamics could be in the process of changing. . in contrast to that in X.8 compares the LDs of Y and of different subsets: TCP flow arrivals. and we do not attempt to fully explain it here.8: Logscale Diagram for arrival times.1. which were collected approximately one year later at a time when HTTP 1. we plotted together in figure 3. Unfortunately it was not possible to check via the packet level logs whether HTTP1. To investigate this. consistently through subsets of IP flows. The 2 see however [164] for an interesting method of inferring HTTP details from packet header measurements. is at present unknown. persistent connections. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS All Connections TCP 2 2 log (energy) 1 1 log2 (energy) 0 0 −1 −1 2 −2 −2 −3 −3 5 10 15 5 10 15 log2 (scale) log2 (scale) UDP HTTP −5 2 −6 1 log2 (energy) log2 (energy) −7 0 −8 −1 −9 −2 −10 −3 5 10 15 5 10 15 log2 (scale) log2 (scale) Figure 3. TCP only. The scaling for HTTP is extremely similar to the global scaling. and HTTP. The plot for UDP flows shows different behaviour at large scales.7 the LDs for all AUCK2 traces (lower group). but there is still a transition at roughly the same scale. A key issue is the lack of a visible mechanism which could lend a rich structure to such a sequence of arrivals. for: All flows. The other LDs look almost identical. The origin of the LRD in Y . Figure 3. UDP only.1 was be- ing deployed.52 CHAPTER 3. The knee position is unchanged.

1 Knee tracking algorithm On most of the empirical time series studied.3. in this section we focus on the onset scale of LRD. A threshold is preset to determine if the slope difference is large enough to be the start of a new slope. in an attempt to observe and quantify the parameters affecting j∗ . Indeed. which is very accurate in the case of Auck- land IV. For example.9 shows the logscale . the behaviours at large and small time scales show significant variation. the knee. which we also call the knee position j∗ . 3. When looking at different time series.2 Dependence on traffic characteristics Our main approach is to study subsets of flows according to various criteria. or indeed of different traces. However. Since we use a discrete wavelet transform. 3. Instead. and for simplicity we model it by a trivial flat spectrum. we designed an algorithm based on detecting a consistent departure from a straight line fitted over the smallest scales. we only get a small number of points on the logscale diagram and the estimated cutoff scale is an integer.10 (a) and (b). Instead. To check that the new slope is meaningful. the local slope is estimated by a 3 point moving average.3. We start our analysis by presenting a simple algorithm to determine the onset scale. both for its intrinsic importance as a characteristic scale whose origin is also not understood. This allows a spread of the resulting cutoff scale over the real axis without any supplementary calculations. and because it has received little attention in the literature. it very persistent. the existence of the change point. before using it to detect knee movements as a function of networking parameters. FLOW ARRIVAL PROCESS 53 question of the reason for the change in slope therefore remains open. Since the data is noisy. it was required to be close to constant over three different octaves. figure 3. One way around this problem is to use wavelets interleaving to get discrete values at other scales by changing the sampling period.3. In constrast. Our aim is to find networking parameters that will in some way be responsible for this onset scale. this is computationally intensive. we found LDs consisting of two asymptotic straight lines. it is difficult to find time series whose knee position varies from others of the same trace.3. Examples of cutoff scale detection are provided on figure 3. different values of the slopes are found. ie. To detect this knee robustly and automatically. separated by a knee. We do not attempt to understand the detailed characterisation of the small scale regime of Y either. we calculate the intersection of the local slopes on each side of the estimated integer cutoff scales.

[61] that the knee position at the IP level is related to the round trip time of TCP connections.9: Searching for knee variation: subsets based on (a) protocol (SMTP).4. as defined in section 1. (b) ran- dom thinning (10% of HTTP flow arrivals) diagrams for two different subsets of an AUCK4 trace. Since the same distribution of durations can be seen over all the Auckland traces. and find the knee for each using our previously described algorithm. which is also the dominant traffic type. However. which although different from HTTP. We include both alpha and beta traffic. we cannot get a single knee value for each trace and hope to see a lot of variability. although we concentrate primarily at the flow arrival level. connection durations may have a greater influence. By plotting these values against (for instance) the median . We attribute these to failed connection attempts.8. Accordingly.1 0 −5 −1 −6 −2 log2 (energy) log2( S2(j) ) −7 −3 −8 −4 −5 −9 −6 −10 −7 5 10 15 5 10 15 log2 (scale) j Figure 3. It has been suggested by Feldmann et al. of the dependence on (average) packet rate. The right plot selects 10% of HTTP flow arrivals randomly. We also investigate a related question. still shows biscaling. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS SMTP Random HTTP.54 CHAPTER 3. we focus on an analysis of HTTP connection arrivals. and connection duration. Duration dependence In this section we investigate the relation between knee position in the LD of aggregated arrival times.2. It is therefore necessary to divide the arrival times of each trace into subgroups according to the durations of connections. but exclude connections with less than three packets. which constitute a surprisingly high (≈ 20%) proportion of arrivals. The left plot shows the behaviour for mail connection. where groups of connections are launched by a single download action. given the probable role of HTTP sessions. As the clearest biscaling was seen in HTTP connections. and shows the same behaviour as the full set displayed in figure 3. prob = 0. in the next subsections we investigate these dependencies in detail. and further groups may have to wait for the completion of the first.

recall from section 2. The resulting automatically measured knee values are plotted in figure 3. In fact. we find the knee position for the LD of the flow ar- rival times in each of the subsets. 3. into 10 equal sized subsets based on percentiles: the shortest 10% and so on up to the longest 10%. We now comment on the way this interesting result was obtained. The dependence is linear.10(a) and (b) for two of the subsets. such as the local slope used by the detection algorithm.3. From figure 3.10: Tracking the knee position in Y (t): Examples of knee tracking for (a) first and (b) 7th deci-quantile range of durations. is also bound to be less accurate. This operation is illustrated in figures 3. duration of the corresponding subgroup. (3. The knee position is clearly shifted to larger values with increasing duration. The vertical dotted line on figure 3. which means that the points in the logscale diagram are less reliable at large scales.2). The star marks the cutoff scale found by the knee tracking algorithm.2) where t ∗ is the timescale associated to j∗ . The straight line with slope 1 on the logarithmic scale is equivalent to equation (3. More precisely. as indicated . while for larger durations the results are a bit more widely spread. the cutoff scales of the AUCK2 and AUCK4 datasets line up reasonably well for small durations. we can look to see what the relationship is. The LD is sometimes so noisy that the detection algorithm fails entirely. Therefore.10(c) against the median flow duration D̄ of the corresponding subset.10(c).3 that the confidence intervals on the estimation of S2 ( j) increase with the scale j. any measurement based solely on those points. we group flows (in fact TCP flows carrying HTTP). Using the algorithm previously described.10(c) tries to quantify this phenomenon by separating ‘noisy’ from ‘less noisy’ measurement data points. albeit noisy. (c) Knee position j∗ as a function of median flow duration for the subsets. A clear and robust dependency based on flow duration D is found: ∗ t ∗ ≡ 2 j ' 3D̄. FLOW ARRIVAL PROCESS 55 (a) (b) (c) 17 2 2 AUCK 2 16 AUCK 4 NLANR 0 0 15 14 −2 −2 13 log2(S2(j)) log2(S2(j)) 12 −4 −4 11 j* 10 −6 −6 9 8 −8 −8 7 −10 −10 6 0 10 20 0 10 20 2−4 2−3 2−2 2−1 20 21 22 23 24 25 26 j j durations (sec) Figure 3.4. This is why the values of the cutoff scales for large durations are widely spread.

By changing the delay of the access link. They performed a simulation on a small network topology consisting of a single webserver and 420 clients. they obtained a different scaling behav- iour depending on the RTT. that the cutoff scale in the IP biscaling is related to the RTT. we simply changed the number of subsets from 10 to 3 to obtain enough data in each subset.11(a) and indicate that there is in fact no obvious relationship. Another aspect to consider is that the value of S2 ( j) can only be estimated up to scale 12 due to the limited duration. This could explain why figure 3. Moreover. by simply changing the criteria of connections selection from duration to RTT we performed a cutoff frequency analysis on the AUCK2 and AUCK4 data sets.56 CHAPTER 3. More precisely. Another rea- son could be an estimation issue: it is notoriously difficult to estimate RTTs from passive . In addition. In fact. modelling a small ISP environment. On the other hand. The results are shown in figure 3. this means that the results are inherently noisy. Due to their relative short durations (90 seconds compared to 3 hours for Auckland traces). which make their estimation even more problematic. The RTTs were calculated for all AUCK2 traces and one AUCK4 trace only. we conducted similar calculations with the NLANR traces. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS by the points at the bottom of the plot at scale −5 for large durations. and therefore the RTT. evoked in [61].11(a) does not show the dependency found in [61]. the values of S2 ( j) at small to medium scales are very well estimated. Given the location of the cutoff scale around 9 for the durations considered here. Round trip time dependence The motivation to study the influence of the Round Trip Time (RTT) on the knee of the flow arrivals LDs is based on the hypothesis. In order to check the sanity of our results. These results seem to contradict the phenomenon described in [61] where the authors showed using ns [106] that the RTT could influence the cutoff scale of the IP level traffic (bytes per bin). and the cutoff scale is found with great precision. they obtained a pronounced dip at the scale corresponding to the RTT. the limited complexity of such a network makes its conclusions difficult to apply given the extreme richness we observe in real traces. in the Auckland traces the knees obtained for small durations are also the ones that line up the best. a slightly different method had to be used to measure the knees. However. the cutoff values corresponding to durations larger than 25 seconds are the ones obtained for the 10th quantile durations and therefore include the ‘heavy tail’ of the durations distribution. Using a similar method to that of the previous section. It is therefore quite striking that the cutoff scales obtained gather around the line previously obtained with AUCK2 and AUCK4 traces. with the same argument based on confidence intervals.

obtained by averaging over the AUCK4 traces.5 3 2−2 20 22 24 26 28 210 212 214 RTT (sec) rate (packets/second) Figure 3. (b) Knee position as a function of the rate. Repeating the same procedure. as well as the original LD of Y . Let us justify the averaging done on the LDs.3. 3. In each figure the rectangle marks the core of the scatter plot (points lying between the 0. we simply evaluated the RTT of TCP connections by measuring the time delay between packets in the three-way handshake during the connection establishment.11: (a) Knee position as a function of the Round Trip Time. For all the AUCK4 traces. To alleviate this difficulty.11(b).5 2 2.25 and 0. we believe that it is unlikely that a more sophisticated RTT estimation would lead significantly different results because most connections are so short that network conditions can be considered constant over the connection duration. We now analyze this dependency further by looking at how one can reconstruct Y from the duration subsets.3 Reconstruction from subsets We showed in the previous section a clear dependence of the knee position of Y on the flow duration.3. This is in itself an interesting result. we plotted the cutoff scale as a function of the connection rates on figure 3. Again. the sum of those 10 LDs.12 shows the 10 subset LDs. measurements since it involves reconstructing the TCP stack at the end host from measure- ments taken at an unknown point in the network [164]. Figure 3.75 quantiles in both dimensions). The fact that the data is very close from the sum of the subsets indicates that the subsets are roughly independent of each other. FLOW ARRIVAL PROCESS 57 (a) (b) 17 17 AUCK 2 AUCK 2 AUCK 4 AUCK 4 16 16 15 15 14 14 13 cutoff scale 13 cutoff scale 12 12 11 11 10 10 9 8 9 7 8 6 0 0. Rate dependence The average rate of connections is another measure that we can use to group connections. no clear dependency could be found between connection rates and cutoff scale.3. we observed such regularity in the LDs that we can consider that the arrival times of HTTP connections .5 1 1. However.

12 for clarity) and are such that the differences between the LDs for j > 8 are not significant.016 0.3 quantile 12 0.68.7 quantile 0.7 for the total average. The point process of arrival times Y (t) = ∑i δ (t − tF (i)) can therefore be written as N N   Y (t) = ∑ Y (l) (t) = ∑ ∑ δ t − tF (i)(l) .12.1 quantile 0. The only difference is really the knee position.12: The LDs of the duration based subsets of Y .3) l=1 l=1 i . if we assume that the separation of arrival times is made with roughly the same quantile durations in each trace. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS 0. and the LD of their superposition compared with data. to be compared with the Hurst parameter of 0. (3.25 1 4 16 64 256 1024 16 all sum quantiles 14 0.9 quantile 1.4 quantile 0.8 quantile 8 0.062 0. meaning that the LRD behaviour is essentially the same for all subgroups. Moreover. This is due to the fact that at small scales the flow arrival process of each subset tends to a Poisson process. the estimation of the Hurst parameter for each subsets gave results in the range [0.74]. recorded for each trace are in fact different realizations of the same stationary stochastic process. and that the Poisson limit is the same for all subsets given that the subsets have the same number of points. according to a given criteria. the LDs all have roughly the same slope at large scales. We now formally detail the quantity labeled ‘sum quantile’ in figure 3. Second. averaged over 8 traces. they all line up at small scales. Consider the decomposition of the arrival times {tF (i)} in N subsets {tF (i)}(l) . plotted with a thick gray line on the figure. 1 ≤ l ≤ N.12.0 quantile 6 4 2 0 −8 −6 −4 −2 0 2 4 6 8 10 j = log2( a ) Figure 3.58 CHAPTER 3.6 quantile log2(Var( dj )) 0.5 quantile 10 0. would each have constant flow duration. Note that confidence intervals in the estimated wavelet spectra grow with scale (not shown on figure 3. in an idealised limit. the resulting expected LDs for each quantile duration have some nice features. From figure 3. First. From this we learn that the knee in the data can be understood as a smoothed ‘mixture’ of sharper knees corresponding to independent subsets of flows which.004 0.2 quantile 0. 0. More precisely. we can apply the same reasoning to the LDs obtained for a given subset across all the traces. It therefore makes sense to average the LDs obtained for each trace to obtain a less variable LD.

k)|2 .4) l=1 (l) where dY ( j. (3. k). Recall that by definition we have n 1 j S2 ( j) = ∑ |dY ( j.3.5) Therefore .3. k) can be written as N (l) dY ( j. k) is the wavelet coefficient at scale j and time k of the timeseries Y (l) (t). k) = ∑ dY ( j. FLOW ARRIVAL PROCESS 59 Since the discrete wavelet transform is a linear operator. the wavelet coefficients dY ( j. n j k=1 (3.

.

2 n 1 j .

.

N (l) .

S2 ( j) = .

k). ∑ dY ( j.

6) . (3. .

∑ n j k=1 .

l=1 .

The sum of the subset LDs corresponds to a zeroth order approximation of S2 ( j) defined as n (0) 1 j N .

.

(l) .

2 S2 ( j) = d ( j. k).

(3. .7) .

∑ ∑.

we found that subsets were mostly independent. Based on these considerations. with the strongest correlation found between adjacent subsets. Y n j k=1 l=1 From further empirical studies of the correlation of arrival times between different sub- sets. we propose a first order approxi- (1) mation S2 ( j) of S2 ( j) defined as n (1) 1 j .

.

(1) (1) (2) S2 ( j) = ∑ .

dY ( j. k)dY ( j. k) n j k=1 N−1 l+1 + (l) (m) dY ( j.8) ∑ ∑ l=2 m=l−1 . k)2 + dY ( j. k) (3. k)dY ( j.

(N) (N−1) (N) + dY ( j. k). k)dY ( j. k)2 + dY ( j.

..

A physical explanation for this phenomenon is the topic of current research. We found no obvious relationship with any other physical parameter. 3. We also showed that the flow arrival process Y (t) has a complex structure which can be decomposed as the sum of elementary subsets based on flow durations.3.4 Summary In this section we have found that the flow arrival process Y (t) is LRD and that the onset scale at which LRD ‘begins’ is linked to the flow durations through equation (3. we now turn to the analysis of the packet arrival process X(t) in section 3. While the analysis of Y (t) is interesting in its own right. . the structure of these elementary subsets has not been explained yet.4 since it is what we really seek to understand to answer question (i).2). However.

There is a very large number of manipulations with a ‘physical’ sense one can perform on the packet arrival process. Here we will use and refine this technique. Figure 3. Flow arrival manipulation The results of flow arrival manipulations are described in figure 3. Note that we are only interested in transformations which have a physical interpretation in terms of flow arrivals or packet structure within flows. without having to postulate a full model from the outset.1. We start with some basic manipu- lations in section 3.2. We will illustrate the semi-experimental method on these manipulations and draw con- clusions on the structure of the packet arrival process. the connections between. portions of the traffic structure. whilst maintaining in full the integrity of the packet arrival patterns within each flow. to study X(t). For convenience a complete list of all the semi-experiments used in this thesis can be found on page xxiii.3 and gives the LD of the original trace AUCK-c1. They are illustrated in figure 3. along side some schematics corresponding to each manipulation class. Specifically: . We do not consider ‘black box’ modifications based solely on bins. S: Flow Selection manipulation. and then present some more advanced ones in section 3. 3. The arrival process of flows is modified in three separate ways of increasing severity. It was first proposed in [56].4. which we call the semi- experimental method.2. and to selectively test models of.4. P: Packet-in-flow manipulation. and proves invaluable as a means to track down the origins of.13. such as random shuffling of blocks of a given size as was done in [56]. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS 3.13(a) shows the prin- ciples of the flow decomposition introduced in section 3.4 Packet arrival process and semi-experiments In the previous section we transformed Y (t) in selective ways in order to better understand its internal structure. Our approach is to begin at the IP level.4. and progressively modify aspects of it to de- termine the links to the arrival level and the source(s) of the scaling behaviour. A presentation of each manipulation class follows.13(b).1 Basic manipulations The results presented in this section give the fundamental empirical backing of the mod- elling work developed in the next chapter. In this section we restrict ourselves to the following three categories of manipulations: A: Flow Arrival manipulation.60 CHAPTER 3.

P-Uni. (b) [A-Pois]: Flow arrivals follow a Poisson process with randomized re-assignments. P- Uni]: [A-Pois] combined with uniform packet arrivals within flows. P−Uni. S-Pkt] 20 Data 18 [A−Pois] [A−Pois. S−Pkt] log2 Variance( j ) 14 12 10 8 6 4 −6 −4 −2 0 2 4 6 8 10 scale j Figure 3.062 0.016 0. P-Uni.062 0. (d) [A- Pois. (c) [A-Pois.25 1 4 16 64 256 1024 (d) [A-Pois.016 0. P−Uni] 16 [A−Pois. P-Uni] combined with selection of ‘short’ flows only. P−Uni] 16 log2 Variance( j ) 14 12 10 8 6 4 −6 −4 −2 0 2 4 6 8 10 scale j 0.3. S-Pkt]: [A-Pois.016 0.25 1 4 16 64 256 1024 20 18 16 log Variance( j ) 14 12 10 2 8 6 4 Time −6 −4 −2 0 2 4 6 8 10 scale j 0.25 1 4 16 64 256 1024 (b) [A-Pois] 20 Data 18 [A−Perm] [A−Pord] 16 [A−Pois] log Variance( j ) 14 12 2 10 8 6 4 −6 −4 −2 0 2 4 6 8 10 scale j 0.016 0. P-Uni] 20 Data 18 [A−Pois] [A−Pois. .062 0.4.062 0.13: Illustration of semi-experimental manipulations and results for trace AUCK- c1. PACKET ARRIVAL PROCESS AND SEMI-EXPERIMENTS 61 (a) Data 0.25 1 4 16 64 256 1024 (c) [A-Pois. (a) Flow decomposition of the original data.

62 CHAPTER 3. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS

[A-Perm]: Permute flows around the original arrival points.

[A-Pord]: Retain original flow order, but re-position arrival times according to a
Poisson process with the same rate.

[A-Pois]: Combine the previous two: a Poisson arrival process with randomised flow
re-assignments. In other words, the flow arrival times are replaced by a sample path
of a homogeneous Poisson process (conditional on the observed number of flows),
the flow order is randomly permuted, and the flows themselves are then translated to
the corresponding new arrival times.

Figure 3.13(b) shows that none of these manipulations has any significant effect on the IP
level scaling, even [A-Pois], which completely erases the original flow arrival structure and
inter-flow dependencies. Two important inferences follow from this result:

• The biscaling structure in the arrival process is not responsible for the biscaling
structure at the IP level, and in fact does not influence it at either small or large
scales.

• Dependencies between packet processes across different flows are very weak.

The above inferences have important consequences. The first indicates that, at least in
terms of second order statistics, it is pointless to include properties of the arrival process
beyond the average rate in models of IP level traffic. This is significant as there is consid-
erable interest in hierarchal modeling approaches where packet level traffic characteristics
are derived beginning from a model of web session arrivals, leading to correlated launching
of TCP connections and so on. The second point indicates strongly that there is no synchro-
nisation (driven by TCP dynamics or anything else) between packet level processes across
flows.
Thus far, in terms of relevance for IP packets, we have an image of traffic as a collection
of entirely independent flows which are laid down in some independent way.

Packet-in-flow manipulation

After having ‘randomized’ the flow arrival process, we now show how in-flow packets can
be also ‘randomized’ with the following semi-experiment:

[P-Uni]: In each flow the first and last packet remain unchanged while the others are
uniformly distributed. In other words, if P(i) = 1 for flow i then the sole packet is
simply placed at its surrogate arrival point tF0 (i). If P(i) = 2 then the second point is

3.4. PACKET ARRIVAL PROCESS AND SEMI-EXPERIMENTS 63

placed at t = tF0 (i) + D(i). If P(i) ≥ 3 then the P(i) − 2 internal points are indepen-
dently placed according to a uniform distribution over the duration of the flow. In this
way, the flow lengths are left unchanged while the packet dynamics inside flows is
totally randomized.

As seen in figure 3.13(c), the effect of randomising the packet patterns within flows is clearly
visible, although not overwhelming, and restricted to small scales. It is significant however
that the spectrum has become flat. From figure 2.2 we know that this does not necessarily
indicate that the process has become Poisson at small scales, however, as the level is equal
to the arrival rate, this is the case here. On the other hand the large scale behaviour seems
unaffected. Two tentative conclusions of note emerge from these observations:

• The scaling structure at small scales has its origin in the packet patterns within
flows.

• The LRD structure at large scales is not influenced by the packet level structure
within flows.

Flow selection manipulation

Through exploring the effects of both arrival and packet structure, we have been able to
isolate a source of small scale scaling in IP, however the large scale behaviour has remained
unaffected thus far.
After performing [A-Pois; P-Pois], the only original features of the traffic left, where
the origin of the LRD must lie, are the flow durations D(i) and the flow packet counts P(i).
To narrow down this statistical origin more precisely, we select flow subsets according to
the number of packets per flow. Figure 3.13(d) reports on the following manipulations:

[A-Pois; S-Pkt]: Combining flows with packet volumes below the 70% percentile
with randomised arrival times.

[A-Pois; P-Uni; S-Pkt]: Randomising packet arrivals in flows in addition to [A-Pois;
S-Pkt].

In [A-Pois; S-Pkt] we select only those flows with volume below the 70% percentile. The
result is the removal of the LRD, in keeping with the findings of [179] that show how the
LRD at the IP level can be explained by the heavy tailed distribution of file sizes, as already
mentioned in section 1.4.2.
The main conclusions we can draw from these basic semi-experiments are:

64 CHAPTER 3. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS

• The LRD in X(t) has origins in the heavy tailed nature of flow durations (a
known result), and does not have a component due to packet processes within
flows.

• When the concern is IP level modeling only, flows can be viewed as arriving as a
Poisson process, with no dependence on other flows.

3.4.2 Advanced manipulations

We now refine the observations made in the previous section by performing more advanced
packet-in-flow and flow selection manipulations.

Packet-in-flow manipulation

Although duration is a natural descriptor of a flow, it is a highly derivative one in that it is
a dependent function of both the traffic source, and the effect of the network. On the other
hand P(i) acts like an independent variable describing the source, and the average rate
R(i) = P(i)/D(i), i ≥ 2, combines source and link characteristics, since the average (and
peak) rate of a flow is conditioned by the bandwidths of links it traversed before reaching
the measurement point. We investigate the role of rate with a new experiment:

P-ConstR: Rescale the packet inter-arrivals within each flow i by a factor s(i) such
that the average flow rates are moved to a common value: R∗ = s(i)R(i), chosen here
to be the median rate.

The result of this manipulation is illustrated in figure 3.14. Despite preserving P(i) as well
as the individuality of packet structures within flows, the impact is notable: the entire large
scale behaviour is translated by a significant amount. In a similar way, one could define
a manipulation [P-ScaledR] where the packet inter-arrivals within flow i are rescaled by
a constant factor s such that the average flow rate of flow i becomes R0(i) = sR(i). The
corresponding LD is a simple time translation of the original LD by − log2 (s). These flow
rate manipulations bring the following comment:

• The packet rate within flows acts as a scale parameter.

This suggests that the focus should therefore be on rate rather than duration. One can
then extend the in-flow packet randomization so that D(i) is no longer preserved, but made a
linear function of R(i). A simple way to do this (in an average sense), is to do the following
manipulation

[P-Pois]: Within each flow separately, packet arrival times are replaced by a Poisson
process of the same rate. Flow arrival times, durations and sizes are retained in full.

3.4. PACKET ARRIVAL PROCESS AND SEMI-EXPERIMENTS 65

0.016 0.062 0.25 1 4 16 64 256 1024
20
Original
18 [ A−Pois; P−Uni ]
[ A−Pois; P−Pois ]
[ A−Pois; P−ConstR ]
16 [ A−Pois; P−Pois; P−ConstR ]

14

log2 Var( dj )
12

10

8

6

4

−6 −4 −2 0 2 4 6 8 10
j = log2 ( a )

Figure 3.14: Small scales determined by in-flow structure, D can be taken as proportional
to 1/P (Note: [A-Pois; P-Uni]) and [A-Pois; P-Pois] are almost indistin-
guishable). Flow rate changes translate large scale behaviour

The two LDs corresponding to [A-Pois; P-Uni] and [A-Pois; P-Pois] are plotted in fig-
ure 3.14 and are almost indistinguishable. This shows that flows for which it would not
be appropriate to slave D(i) to rate (effectively to 1/P(i)), such as those with very large
gaps, have a negligible impact. This is also intuitively in accordance with a result given in
chapter 2 proposition 2.3.2 linking Poisson processes and uniform distributions.

Flow selection manipulation

We now report on three new flow selection manipulations:

[S-Thin]: Flow and packet structure is fully retained, flows thinned by rejecting with
probability 0.3.

[A-Pois; S-Dur]: Combining flows with durations below the 70% percentile with
randomised arrival times.

[A-Pois; P-Pois; S-Dur]: Randomising packet arrivals in flows in addition to [A-
Pois; S-Dur].

The LDs resulting from these new manipulations, as well as from some of the previous
semi-experiments, are presented in figure 3.15(a,b,c) for the trace AUCK-c1. Figure 3.15(a)
shows the results of the flow arrival manipulations described in the previous section, while
figure 3.15(b) illustrates the effect of [P-Pois] and [A-Pois; P-Pois]. The fact that these
two manipulations give such similar results simply reinforces the earlier conclusion that the
flow arrival process does not impact on the IP level. The effects of the new flow selection
manipulations presented here can be seen in figure 3.15(c). The random thinning [S-Thin]

AUCK-b0. as already shown in figure 3.5 81.5 81. The LRD of [A-Pois. although we observed two systematic differ- ences in the outbound Auckland traffic during the peak period of the day: (i) a small flow arrival dependence at the smallest scales (note the drop on the left in graph (d)). superposition model.9 328 Original P−Pois 16 [A−Pois. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS 18 0.c) for a higher rate trace.f) shows the same manipulations as figure 3. consistent with an i.08 0.b.5 81.02 0. S-Dur] . S-Dur] is a simple consequence of the observations of [A-Pois. S-Pkt] since we made D(i) a dependent variable.12 20.02 0. A similar result is obtained with[A-Pois. P−Pois. The result is the removal of the LRD. S−Dur ] [ A−Pois.12 20. when a selection is made based on the 70% percentile of D.9 328 Original S−Thin 16 [ A−Pois.c) leads to a LD with the same shape as the original. Figure 3. S-Pkt] we select only those flows with number of packets below the 70% percentile.32 1. P−Pois.15(a. S−Pkt ] 14 [ A−Pois. S−Pkt ] log2 Variance ( j ) 12 10 8 6 4 (c) 0 2 4 6 8 10 12 14 16 18 j = log2 (scale) Figure 3.28 5. S−Dur ] [ A−Pois.32 1.28 5.12 20.28 5.08 0.d.b.32 1.i. In contrast. and (ii) a .e.08 0. P−Pois] 14 log2 Variance ( j ) 12 10 8 6 4 (b) 0 2 4 6 8 10 12 14 16 18 j = log2 (scale) 18 0. with a variance which is approximately 70% of it. The results are very similar.13.66 CHAPTER 3.15 (d. where variances simply add.9 328 Original A−Perm 16 A−Pord A−Pois 14 log2 Variance ( j ) 12 10 8 6 4 (a) 0 2 4 6 8 10 12 14 16 18 j = log2 (scale) 18 0.02 0.15: Semi-experimental method applied to AUCK-c1 (a. in [A-Pois.

1 16. S−Pkt ] [ A−Pois.004 0.1 16.5 81.28 5.016 0.5 262 1050 Original Original 20 30 P−Pois P−Pois [A−Pois. P−Pois.28 5.4.32 1.064 0.02 0.32 1. The third column in figure 3.02 0.3.32 1.02 4.02 0.h.256 1.12 20.3 Summary The main conclusions we can draw are: • The LRD in X(t) has origins in the heavy tailed nature of flow durations (a known result). P−Pois.08 0.4 65.4.12 20. We speculate that this could indicate some traffic shaping at small scales.15 shows the results of the same manipulations for trace UNC-a0 which was recorded in a different location and has a rate 3 orders of magnitude higher than AUCK-c1.9 328 0. P−Pois] [A−Pois. S−Pkt ] log2 Variance ( j ) log2 Variance ( j ) 16 14 20 12 15 10 8 10 (f) (i) 6 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 20 j = log2 (scale) j = log2 (scale) Figure 3.28 5.4 65.12 20.08 0.i).5 81.064 0.f) and UNC-a0 (g. S−Dur ] [ A−Pois. 3.02 4.5 81.016 0.256 1.5 262 1050 Original Original 20 30 S−Thin S−Thin [ A−Pois. P−Pois.256 1. and does not have a component due to packet processes within . S−Dur ] [ A−Pois. S−Dur ] 25 [ A−Pois. P−Pois.e.1 16.02 4.004 0.15: (continued) Semi-experimental method applied to AUCK-b0 (d. S−Pkt ] [ A−Pois. S−Pkt ] [ A−Pois.064 0.08 0. S−Dur ] 18 [ A−Pois.004 0.5 262 1050 Original Original 20 30 A−Perm A−Perm A−Pord A−Pord 18 A−Pois A−Pois 25 log2 Variance ( j ) log2 Variance ( j ) 16 14 20 12 15 10 8 10 (d) (g) 6 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 20 j = log2 (scale) j = log2 (scale) 0. The fact that they are again very similar indicates that the findings presented in this chapter are of wide applicability.9 328 0.016 0. PACKET ARRIVAL PROCESS AND SEMI-EXPERIMENTS 67 0.4 65. P−Pois] 18 25 log2 Variance ( j ) log2 Variance ( j ) 16 14 20 12 15 10 8 10 (e) (h) 6 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 20 j = log2 (scale) j = log2 (scale) 0.9 328 0. smaller LRD exponent for flow arrivals (figure 2c).

it will not necessarily be the case that Y never has an impact on X. the stationary arrival intensity. for Auckland data. These empirical findings. More specifically. Surprisingly. we found that the influence of Y was negligible. around 6% of its points.3 to understand its link with X. at another level Y is simply a subset of X comprising. Because of the divergent growth of low frequency power characteristic of LRD. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS flows. 3. we can tentatively answer the question in the introduction as follows: the fractal scaling at the IP level does not depend to any significant extent on the TCP arrival process. summarised in figure 3.4 and develop new semi- experiments. We now introduce a new kind of manipulation to complement the three categories A 3 Results from semi-experiments on other traces will be presented in chapter 7 .16 the upper grey curve is the LD of X for our chosen trace. we use the same methodology as in section 3. and can be understood very roughly as a global reduction by 2 in the total number of packets. aside from the first order statistic. Although the points of Y have an important structural significance.68 CHAPTER 3. with no dependence on other flows.5 Impact on packet arrival process In the previous section we examined empirically the impact of the structure of Y on X using Internet traces from a number of sources. It is natural that the LD for Y lies below that of X.1 Flow volumes manipulation We start by studying the influence of flow volumes. They also have a great impact by themselves since they justify for the first time a very common assumption of traffic modelling which consists in modelling the flow arrival process by a Poisson process. constitute the corner stone of the modelling work presented in chapter 4. In each plot in figure 3. we use the knowledge on the scaling behaviour of Y gained from section 3. In this section. In the LD a uniform unit drop of 1 corresponds to halving the variance. What we find enables us to clearly explain when and how Y might impact on X. whereas the lower dashed grey curve is for Y . Although further validation from an even wider range of processes is desirable3 .13. 3. flows can be viewed as arriving as a Poisson process. with a focus on the potential dependencies between their scale invariance properties. • When the concern is IP level modeling only.5.

so that [T-Pkt] is approximately just Y (t) scaled up by some factor.3. and when q = 1.4: T: Flow Truncation manipulation. the truncated process [T-Pkt] passes from X(t) to Y (t). IMPACT ON PACKET ARRIVAL PROCESS 69 (Flow Arrival manipulation).5. as for any truncation level it includes Y as a subset.16(a) shows the result of the semi-experiment [A-Pois. There. Figure 3. Furthermore. The evolution toward Y is particularly easy to see when q is small. If the original flow has less than q packets. in addition to A-Pois. it is Y . 3. To explain this apparent paradox.5. To explore this in more detail. Thus as the truncation level q drops. however things are not as simple as they would appear. This LRD was obscured previously through the ‘noise’ of the dominant LRD generated by the heavy tail of P. The considerable drop in level in the LD follows from the fact that the heavy tailed nature of P results in a very small proportion of flows containing a notable percentage of total packets. in this case at q = 6. and takes an especially simple form at large scale. the 60% percentile.2 Knee position manipulation We have seen how Y . This is different from the flow selection semi-experiment since here the flow arrival process Y is preserved. corresponding to the scale of average duration after truncation. the packets of a given flow appear co-located compared to the scale of observation.T-Pkt].16(a) at scales beyond j = 1. In apparent contradiction to our previous conclusion. truncated flows after the first q packets. corresponding to a vertical shift in the LD. P (Packet-in-flow manipulation) and S (Flow Selection ma- nipulation) already presented in section 3. apparently contradicting our first conclusion that Y has no influence. To examine the . This manipulation consists in truncating flows after a number q of packets.5. the difference between [T-Pkt] and [A-Pois. The resulting dramatic elimination of the LRD is consistent with what we observed in the previous section. the LRD has ‘returned’ despite the absence of the heavy tail of P. then so must be [T-Pkt]. the truncated process [T-Pkt] is simply X. where we have. [T-Pkt]. observe that with a truncation level of 100% (q = ∞). is present just behind the scenes with a potentially influential LRD. This is seen in figure 3. In the third experiment. Thus far the structure of X seems very unproblematic. it remains unchanged. although of negligible influence on X over scales up to j = 11 or 1 hour. and the correlation structure of Y irrelevant to it. but the same packet volume truncation is made. the flow arrivals are not altered in any way. the first observation is that since Y is LRD.T-Pkt] is dramatic.

we consider the impact on X of the knee movement in Y found in section 3.062 0.3. (c) Looking at different j∗ using flow subsets: [S-Dur] .2.062 0. and both: [A-Pois. (b) Manipulating the knee j∗ of Y : [A-Clus] .T-pkt].16: Semi-experiments: impact of Y (t) on X(t).the results are weighted by their ‘packet impact’. this LRD can rise to prominence at the packet level.70 CHAPTER 3. [A-Clus]. flow volumes: [T-pkt]. T−Pkt] 14 [T−Pkt] 12 log Var( d ) j 10 2 8 6 4 2 0 (a) −2 −6 −4 −2 0 2 4 6 8 10 j = log2 ( a ) 0. In a new type of semi- experiment. question of when.016 0.016 0. (a) Manipulating arrivals: [A- Pois].the effect on X is large for small j∗ .016 0. long durations dominate.25 1 4 16 64 256 1024 20 X(t) Data 18 Y(t) Data 16 [S−Dur1] Y1 (t) 14 [S−Dur2] 12 Y2 (t) log2 Var( dj ) 10 8 6 4 2 0 (c) −2 −6 −4 −2 0 2 4 6 8 10 j = log2 ( a ) Figure 3.25 1 4 16 64 256 1024 20 X(t) Data 18 Y(t) Data 16 [A−Clus1] Y1(t) 14 [A−Clus2] 12 Y2(t) log2 Var( dj ) 10 8 6 4 2 0 (b) −2 −6 −4 −2 0 2 4 6 8 10 j = log2 ( a ) 0. if ever.062 0. the original flows are translated (without permutation) to begin at .25 1 4 16 64 256 1024 20 X(t) Data 18 Y(t) Data 16 [A−Pois] [A−Pois. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS 0.

In this case Y2 only contains 10% of flows. the cor- responding X process. To obtain a contrasting Y1 with a j∗ value at small scale we do not select the very shortest flows. each containing 10% of flows. both in terms of the knee position and the spectrum at scales beyond it. in a black box fashion. yet [S-Dur2 ] accounts for about half (48%) of the spectrum of X. toward smaller scales. about which a group of points are placed according to i. Figure 3. with a knee around j = 0. This result is in agreement with the corresponding one from figure 3. as the LD for Y2 is below that of Y and so contains even less energy. a flat spectrum at small scales and LRD at large scales. The knee for Y2 was put at a larger scale than j∗ . The Y processes are also plotted to show the very different knee positions chosen for the two experiments. as flows with just a single packet have somewhat different properties which would . yielding Y2 as seen in figure 3. the knee for Y1 is at a scale which is small enough so that its LRD in fact does have a significant impact on the overall packet process [A-Clus1 ]. and will be used extensively in chapter 4 to model X.4. In contrast.d. chosen here to be a finite Poisson process of rate λA beginning at the seed.3. IMPACT ON PACKET ARRIVAL PROCESS 71 the points of a LRD Poisson cluster process sample path with matched average intensity.5. Here they serve simply as a convenient parametric class to model Y which allows us to easily reproduce. Increasing λA simply translates the spectrum. We therefore also performed experiments using the Selection of flow subsets method of section 3.5. in order to induce a change in j∗ without imposing it across all flows in such a uniform manner. and hence the entire LD. which strongly influences the flows with the longest durations.16(c). Not surprisingly. Despite a knee around j = 6. copies of another process. the reconstructed packet level process [S-Dur2 ] is very similar to the original X.3 Flow subsets manipulation The last observation above illustrates a principle which is in contradiction to the original [A-Pois] conclusion. A stationary Poisson cluster process consists of a Poisson process of rate λS defining the locations of ‘seeds’.3 for Y . 3.3.4.i. To obtain a j∗ value at large scale a subset consisting of the longest 10% of flows was selected. In fact 1/λA is a scale parameter for the process. with a controllable knee position. were selected based on duration ranges designed to give a wide contrast in j∗ . that the finer structure of Y plays no role. shows little change.16(b) shows two different [A-Clus] experiments in addition to the data. a simple way to adjust j∗ . is disproportionately responsible for the form of the LD of X. [A-Clus2 ]. This is a clear indication that the tail of P.16(b). Two subsets. but it also contains an additional important element. Poisson cluster processes were introduced in section 2.2.

the subset of X corresponding to the flow subset with the longest durations contains the strong LRD due to the heavy tailed packet size distribution. 3. broadly speaking the flows with a very large number of packets are naturally also very long. and simultaneously the weakest portion of the LRD from Y . In figure 3. and therefore that the overall behaviour of the wavelet spectrum is strongly influenced by this ‘packet-level impact’ weighting. The subset corresponding to shorter durations has considerably less energy than that of the longer durations despite the delayed entry of the former’s LRD. We were able to . Clearly. for a small enough flow dura- tion the LRD of the corresponding subset of Y can indeed impact on the spectrum of X. From sec- tion 3. Although average packet rates within flows vary widely. Thus far we have not discussed the role of the comparative values of the LRD exponents of X and Y . We can now give a coherent picture explaining the above observations. it that instead of [S-Dur1 ] being well above [S-Dur2 ].16(b). and in practice would make itself felt more often and at a smaller scale. in the traces we have studied. as a result. but the number of packets corresponding to it is far lower. A key difference however. The findings here will be complemented by further analysis in chapter 4. a subset totalling 10% of flows is selected by choosing the shortest flows which have at least 2 packets. then its impact would always show up for sufficiently large scale. This is because.72 CHAPTER 3. and from the experiments of figure 3. again in agreement with the corresponding experiment from figure 3. resulting in a small subset of X with low energy. if the exponent for Y were much greater than that of X. which leaves the knee position as the key feature to understand. Conversely. where we will show that the body and the tail of the distribution of P has a strong influence on both the LRD and the knee position of X. Flows of small duration have onset scales at small enough scales to allow their LRD to impact the spectrum of (the corresponding subset of) X despite the fact that the packets marking the beginning of flows constitute only a small proportion of total packets. Thus. the exponents for the two are roughly similar.16(b) we know that. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS complicate the analysis. However.3. it is in fact well below it. it is essential to consider the impact at the packet level of any given subset of flows.16(c) the smaller j∗ = −1 of Y1 translates to an earlier knee in the packet level process [S-Dur1 ] which looks quite different from X.5.2 we know that flows of different durations have different knee positions. Instead.4 Summary In this section we showed that the flow arrival process Y could impact on the second order properties of the overall packet process X should certain circumstances be met. in the case of short durations the LRD from Y is stronger.

Using mixtures of real data and models we call ‘semi-experiments’. could change if flows of smaller duration increased in importance in terms of their proportion of overall packets. This fact has important implications for traffic modelling and performance analysis. and their impact. and so the impact of the LRD of Y is the weakest precisely for the most important flows. by showing that the heavy tailed nature of the number of packets in flows means that the spectrum of X is very heavily weighted towards the flows with the most packets.6. One can in fact replace Y by a Pois- son process and consider that flows are independent as far as the study of X is concerned. We studied both the flow arrival process Y and the packet arrival process X.3. . The current balance between the two sources of LRD. These flows have the longest onset scales for Y . which also have the longest durations. In par- ticular. CONCLUSION 73 explain why the LRD of Y has little impact despite this observation. These results constitute the cornerstone of the thesis since they provide the starting point of our modelling work. we showed using a second order wavelet analysis that in current traces the process Y does not impact on the second order properties of the overall packet process X.6 Conclusion In this chapter. these empirical findings contradict modelling approaches which postulate the need for ‘session level’ structure linking flows. at least for the lightly loaded links studied here. 3. we presented the first set of traffic measurements used in this thesis and we carried out a detailled analysis of the physical causes of the observed statistics.

.

and a transition scale at around 1 second separating them. in particular the role of flows. the following. and clarify many issues. Very recent applications of cluster processes in networking have concerned HTTP request arrivals [103] and TCP packet losses [181]. They are also easily synthesized. in what way are the twin scaling regimes at the IP level due to or influenced by the corresponding features at the flow level? Of the conclusions. yet strongly motivated by empirical features of traffic. Cluster processes have been used to model various phenomena. Through these models we are able to give strong answers to several outstanding questions. based on a second order wavelet analysis. though less clear. and their tractability allows the quantitative investigation of key properties as a function of meaning- ful network parameters. a natural language for network traffic 4.Chapter 4 Cluster processes. clear LRD at large scales. These models are relatively simple. We are not aware of prior applications to IP packet traf- fic modeling. it does not influence it significantly at either small or large scales. Poisson cluster models. 75 . This similarity led to the question. evidence for a second. The starting point is the surprising observation that the scaling seen in the point process of packet arrivals X is broadly similar to that found in the arrival process of flow arrival points Y .1 Introduction In this chapter we propose the use of a particular class of point processes. Namely. ranging from computer failure patterns [110] to forest fire spreading [17] and rainfall events [41]. This chapter builds on the empirical findings presented in chapter 3. scaling regime at small scales. to model the IP packet arrival process X. and have marginals which are intrinsically positive. directly inspires the models we investigate here: • The scaling in the flow arrival process is not responsible for that at the IP level. and further.

we let the data speak for itself and point out the orthogonal roles of ‘volume’ versus ‘rate’ based approaches. and the importance of time-scale. Through a model with a firm physical basis. Second. These findings are consistent with recent work of [182] and have two very strong im- plications for traffic modelling. flows can be treated as statistically independent.2 we present the data analysis under- lying the choice of the models. and related issues. Rather than proposing fixed definitions of these categories. and does not have a component due to packet processes within flows (new result). one for each flow. we show that there are good reasons to believe that there is in fact no true scaling behaviour at second order over small scales. Another goal of this chapter is to contribute to a clarification of the meaning and role of the elephant (large but rare) and mice (small but numerous) flow concept which has become popular in describing packet traffic. the lack of impact of the detailed nature of the flow arrival statistics suggests that they can be effectively modelled as a Poisson process. Cluster models are ideally suited to modelling the above features. The chapter is structured as follows. One of the main goals of this thesis is to explain all forms of scaling present in both statistical and networking terms in order to answer question (i). leading to sug- gested refinements to the model in section 4. the isolation of the LRD as a property of the number of packets per flow.6 we investigate higher order statistics of both the data and the model. In section 4. • The packet rate within flows is a scale variable.76 CHAPTER 4. Section 4. allows them to be modelled using simple and intuitive heavy tailed ingredients. based on the findings of chapter 3. We contribute substantially to this issue in this chapter. We conclude in section 4. “Does traffic become more bursty or more Poisson as link rates increase?”. • The LRD has its origins in the heavy tailed nature of flow volumes (a known result). We also provide explicit formulae capable of predicting the onset scale of LRD as a function of meaningful parameters. • The structure at small scales has its origin in the packet patterns within flows. CLUSTER PROCESSES • Dependencies between packet arrival processes across different flows are very weak.3 is the main part of the chapter. Finally. Further analyses on the data are then performed. . for the purpose of modelling the overall process of IP packets. In section 4. their properties given. which in turn implies no true multifractal behaviour over those scales.5 uses the model to examine in a well defined context the question. Thus. where the cluster models are introduced.4.7. the point process of packet arrivals is seen as the superposition of independent point processes. and the fit to the data examined. They suggest that. Section 4.

5 0 0 −2 −1 0 1 2 3 log( R ) Figure 4. showing high mass over a distribution of rates. P(i)) hides mass along discrete lines and is very misleading. The mass is highly concentrated (most flows have a small number of packets).1 the main physical reasons behind this choice.5 −4. 4. P) plane is shaded according to the number of points within it. EMPIRICAL OBSERVATIONS 77 4 4 −2 −2. we do not see any bimodality .2.2 where we made D(i) a dependent variable. (b) Packet density plot.5 2 2 −3. We first consider flow behaviour as a function of the ‘quasi independent’ variables: av- erage rate and flow volume. (a) Flow density plot over (R(i). and similarly a flow with a given rate may contain many packets. figure 4. P(i)).1: Examining flow variability (AUCK-d1).4.5 0 −5 0 −5 −2 −1 0 1 2 3 −2 −1 0 1 2 3 log( R ) log( R ) 4 2. (c) Coefficient of variation per flow. although the spread of values indicates high variability across flows. so a logarithmic scale is used to greatly enhance the outer regions.5 2 1 0. a scatter plot of (R(i). This is a direct consequence of section 3. or as few as the minimum of 2. (flow density weighted by number of packets).5 −4 −4 1 −4. Here we further examine flow variability to find a simple model for in-flow dynamics.5 (a) (b) −2.2 Empirical observations We start this chapter by presenting more empirical justification for the choice of our traffic model. the average rates cover a wide range. We recalled in section 4. We therefore discretise the scatter plot to form the density plot. For a fixed packet volume. Furthermore. Because P is discrete.5 3 −3 −3 log( P ) log( P ) −3. In the main high mass region flows are overdispersed.4. where each square in the (R.1(a).5 (c) 2 log( P ) 1.

We see that they cover a wide variety of values. the values in the main high mass region are reasonably uniform. We return to the question of elephants in section 4.1(c) gives the value of the index of dispersion σ /µ of the inter-arrivals within a flow. which fits well to a Gamma random variable with σ /µ = 1.1(b) we give packet density rather than flow density. k ∈ Z. Simplifying things somewhat.i. Figure 4. In figure 4.3 Cluster models In this section we define and evaluate two models for the point process X(t) of packet arrivals.2. which have an appreciable packet impact despite arising from a very small percentage of flows – they were invisible in plot (a).3. 4. . Similar results apply for other traces. Figure 4. but not extremely so. We will examine its utility as a direct model for the inter-packet times. The dark elements at large P(i) cor- respond to volume-elephant flows.d. Whilst these results are true as such. The autocorrelation in plot (b) is negligible over small lags (small scale). in effect weighting plot (a) by the packet impact of each underlying flow. the picture that emerges is that. We now disregard flows.29.4: over-dispersed compared to Poisson.1(b). These. CLUSTER PROCESSES which would suggest a need to classify flows into two or more classes. inspired by the observations of section 4. Our conclusions are not altered however. have little impact. On the contrary. 4. This can be revealed using a multiscale analysis and explained using a cluster model.78 CHAPTER 4. the packet volume distribution is approximately independent of rate (and is heavy tailed). calculated individually for each flow with at least 3 packets. but the most extreme of these are not in the main region of high mass as revealed in figure 4. 1 Atvery small rate. there are good reasons to first examine a renewal model. they are in fact misleading. in the range of rate values where the density is highest. we have a small number of very regular flows. due to TCP keepalive packets. the epicentre of activity is still located at the dark region of plot (a). a renewal process is a simple point process where the inter-arrival variables {A(k)}.2 . Although we seek meaningful constructive models rather than those of black box type.2 shows its histogram for AUCK-d0.2. then averaged over squares in a log-log plot. which is of the same trace and shares the same scale1 . with a weighted average value around 1. and examine the inter-arrival series A(k).4.3. are i. but it should be remembered that the time scale corresponding to a lag varies inversely as the packet rate.1 A black box model: gamma renewal As already detailled in section 2.

b. The small scale asympotic level is that of a Poisson process. The mean and standard deviation are given by √ √ µA = bc. IR(ΦA (ω)) is monotonic decreasing. c) = ΦA (bω. As b is a scale parameter. also of rate λA . ΦA (ω. First. from which it follows that the spectrum is also.3. as will be shown in figure 4. and ω = 2πν is the unnormalised frequency.1) where ΦA (ω) = E[exp(iωA)] is the characteristic function of the inter-arrival distribution.3. Finally. The second reason is that a renewal process has the potential to generate scaling (or apparent scaling) behaviour at small scales.4.4. and the arrival intensity λA = 1/µA .2) c 12c c h 2 cos(cπ/2) i + o(ω −c ) → λA .2(b) directly suggests it. Figure 4. The following properties of the Gamma Renewal (GR) spectrum hold: h 1 (c2 − 1) i λA ν→0 ΓGR (ν) = λA + (bω)2 + O(ω 4 ) → (4.f. as in figure 4.2ms). in the over-dispersed case (c < 1) of interest here. figure 4. Since a monotonic spectrum implies a monotonic wavelet spectrum. mean= 1.3) (bω)c One can show that. (a) The inter-arrival dis- tribution. The possibility of gaining a statistical understanding of this effect in a very simple context is worth pursuing. where c > 0 is the shape parameter.) ΦA (ω.3. the coefficient of variation by σA /µA = 1/ c. CLUSTER MODELS 79 (a) (b) 1 1 Data Gamma distribution 0. From section 2.2: Examining the Inter-Arrival Process (AUCK-d0).2 0. this limit is not specific to Poisson but is due to the general point process property that points do not coincide.2.8 0. ν→∞ = λA 1 + (4.8 autocorrelation 0. .6 Pr[ A ≤ x ] 0. corresponding to the Poisson process.3.2(a) justifies a Gamma distribution for A. (b) The autocorrelation of (detrended) inter-arrivals.2 0 0 −6 −5 −4 −3 −2 −1 0 50 100 150 200 250 300 log( x ) lag Figure 4.4 0.6.4 0. the Logscale Diagram of GR with c < 1 monoton- ically increases from the asymptotic level log2 (λA ) up to log2 (λA /c). with characteristic function (c. the spectrum of the continuous time renewal process X(t) is h i ΓX (ν) = λA (1 − ΦA (ω))−1 + (1 − ΦA (−ω))−1 − 1 (4. b. However. c). c) = (1 − ibω)−c . The exponential case is c = 1. σA = b c. the spectrum of a renewal process plays a direct role in the cluster models introduced in section 4.6 0. with fitted Gamma distribution (shape= 0.3. 1.

mean=1. To quantify this. we define a lower cutoff frequency ν ∗ where the spectrum can be said to ‘first’ deviate from its asymptotic value. Fix a deviation parameter ε ∈ (0.4) 2πb c + 1 ∗ = − log ν . The reasons for this become clear when one moves to the cluster model and result in useful insights. CLUSTER PROCESSES 30.5mus 977mus 0.80 CHAPTER 4.1).5 −15 −10 −5 0 5 10 j = log2 ( a ) Figure 4. The result. Define ν ∗ as the smallest ν such that the second term of equation (4. We have verified that if one does so.2 1.6.6 0.7 0. Approximate expressions for c ∈ (0. Comparing the resulting GR wavelet spectrum against the AUCK-c1 trace in figure 4. jGR as µA . This is standard practice in traffic analysis. (4. and α (c) ≈ (1 − c)/4. as it seems inefficient to study time series which are mostly zeros.7(a).2) Figure 4. c > 0.3: Pseudo scaling of a gamma renewal process (shape=0. Our final but important comment relates to the pitfalls in interpretation that ‘pseudo ∗ (b. as we presently show. for a range of scales close to the upper asymptotic level.2) deviates from the first by ε times the distance λA |(1/c − 1)| between the asymptotic levels. for both practical and physical reasons one is led to focus analysis on scales above it.3 (ε = 0. In general however the predictive ability of the GR model fails badly. pseudo-slopes exist not only at second order but also more generally.9 0. and its slope can also be derived. Consequently if one performs for example a wavelet .8 0. is marked by asterisks in figure 4. we see reasonable agreement at low scales and up to the onset of LRD. 1] c (b.031 1 32 1024 1. the LD of a GR process can appear to follow a straight line.2. for realistic values of c. allowing predictive tests of the model. c) ≈ 1/b · 1. are given by jGR GR The model is easily calibrated through the sample mean and variance of the inter- arrivals. Since.1 log2 Var( d j ) 1 0. Expres- The LD equivalent: jGR 2 ∗ sions for the centre of the zone where such a pseudo scaling exists. a ‘pseudo scaling’.3 illustrates how. is 1  12ε 1/2 ν∗ = .3 1. c) is the same order of magnitude slopes’ can cause. which respects the role of the scale parameter b. 1).

6(b) page 50) and network heterogenity.4. Let the arrival times {tF (i)} of flows (the seeds) follow a Poisson process of rate λF . For convenience. it is likely that in many cases the evidence for multifractal behaviour over time scales below 1s has been mis-interpreted. then the process cannot be multifractal. Determining an appropriate process for Gi (t). c)).3.4. This can lead to an erroneous belief that the data is much richer than a mere renewal process when in fact in this respect it is entirely consistent with it. Re- call from section 2. given the complexity of TCP dynamics (see figure 3.2 that a stationary Poisson cluster process on the real line consists of a Poisson process defining the locations of ‘seeds’. it is clear that if scaling (over some given scale range) is only apparent at second order.3. Indeed. about which a group of points are placed according to i. is a challenge.3. (4. one finds empirical indications of multiscaling (possibly multifractal) behaviour. making .. It is a point process containing a finite number P ≥ 1 of points (packets). Let the {Gi } be i.5) i where Gi (t) represents the arrival process of packets within flow i. An interesting fluid model approach can be found in [18]. Although we have not yet presented results beyond second order. First.d. We choose the inter-arrival random variable A to be Gamma distributed (with charac- teristic function ΦA (ω.2. b.2 A flow based model: Bartlett-Lewis point process The main observations of sections 3.i. but we focus on point processes here. copies of another process.2 that the manipulations [P-Uni] and [P-Pois] showed that simple ‘constant rate’ models accounted for most of the second order properties seen at the packet level. CLUSTER MODELS 81 multiscaling analysis of the type described in section 2. In a harmless abuse of notation. sym- bols already defined for the data will be reused.d. Recall how- ever from section 3. a list of the traffic model parameters is given page xxii. it has a scale parameter.4. More details on this issue will be given in section 4.i. with the first located at t = 0. fit naturally into a cluster process framework.6 (see also [182]). 4.3 over a range of moment orders. for several reasons. The packet arrival process can be written as X(t) = ∑ Gi (t − tF (i)). as multifractality would imply true scaling over a range of orders including second order.4.3. A (finite) renewal process model is a simple way to obey this finding which has the advantage of falling within the theoretical framework of Bartlett-Lewis cluster processes (BLPP) already introduced in section 2. that flows can been seen as independent entities arriving according to a Poisson process. and consider a representative G (t).

implying finite mean µP . the flow model can be written as P(i) j−1  Gi (t) = ∑ δ t − ∑ Ai (l) . sufficient to reinstate this structure. and two is clearly the minimum number necessary. it consistent (see below) with the observations on rate dependence of figure 3. and c = 1 corre- sponds to [P-Pois].79]) can be coerced into the following instructive real form: µ  P ΓX (ν) = λF ΓG (ν) + SG (ω) + SG (−ω) . Assembling these components. finally. c. they do have physical meaning taken directly from data. Thus. and µP . although the parameters λA . we have seen that [P-Pois] failed to reproduce important qualitative behaviour at small scales.4: Schematic representation of a BLPP. The solid vertical lines mark the P(i) packets belonging to flow i. We will see that incorporating burstiness through the variance to mean ratio is. CLUSTER PROCESSES P(i) packets A i (l) 0 t F (i) packet j packet j+1 t Figure 4. one of the main advantages of the model is that its second order properties are tractable. p. Apart from its physical motivation. (4. This is easily and naturally achieved in the Gamma family. λA . c of Gamma are not derived from network ‘first principles’. but infinite variance. and the inner sum is defined to be zero if j = 1.4. that is 1 − FP ( j) ∼ L j−β . From figure 3. starting at tF (i). (4. β ∈ (1. β . Second. A schematic representation of the BLPP is given in figure 4. as the second parameter c is equivalent to this ratio. This is the smallest number allowing a packet level description of traffic with physical meaning: one parameter for flow arrivals.14 page 65.6) j=1 l=1 where δ (t) is a delta function centered at t = 0. |z| ≤ 1.82 CHAPTER 4. p. Ai (l) denotes the l−th inter-arrival for flow i. and distribution function FP (we j→∞ take p0 = 0). in many cases. The number of packets in a flow is a random variable P with density p j = Pr(P = j). and two for flow volume. The average arrival intensity is given by λX = λF µP . The vertical dotted lines mark the arrival times of other packets. 2).417] and [42. The expressions for the spectral density of the BLPP (found for instance in [46. two for in-flow packet arrivals. probability generating function GP (z) = ∑∞j=0 p j z j . The parameters of the model are λF .4(b) it is taken to be heavy tailed.7) λA .

b. Taking also into account the Poisson contribution of points from different clusters and summing over all l and k.3. (4.12) λA where ΓG (ν) is the spectrum of the stationary renewal process with the same parameters as the finite renewal process of each cluster: h i ΓG (ν) = λA (1 − ΦA (ω))−1 + (1 − ΦA (−ω))−1 − 1 from equation (2. z + δ1 ) = N(z + u.10) µP k=1 l=k+1 Let ΦA (ω) = E[exp(iωA)] be the characteristic function of the inter-arrival distribution. one gets: Pr(N(z. z + u + δ2 ) = 1) ∞ ∞  k−l  = (λF µP )2 δ1 δ2 + λF ∑ ∑ pk ∑ fA∗ j (u) δ1 δ2 + o(δ1 δ2 ). CLUSTER MODELS 83 where ΓG (ν) is the spectrum of the stationary renewal process with the same parameters as the finite flow renewal process. (4. . z + δ1 ) = N(z + u. (4. |z| ≤ 1.47). From equation (2. (4. z + u + δ2 ) = 1) h(u) = λF µP δ1 δ2 ∞ ∞  k−l 1  = λF µP + ∑ ∑ pk ∑ fA∗ j (u) . the spectrum of the Bartlett-Lewis process reads: ΓX (ν) = λF µP (h̃( jω) + h̃(− jω) + 1)  1 ∞ ∞  = λF µP 1 + ∑ ∑ (l − k)pl (ΦA (ω)k + ΦA (−ω)k ) (4. Assume that there is a point at t = z being the l th point of a cluster of size k. (4.4. c).13) SG (ω) is such that ΦA (ω)   SG (ω) = GP (ΦA (ω)) − 1 .3. (4. (4. A point a t = z + u can either be from the same cluster or from a different one. and ΦA (ω)   SG (ω) = G P (Φ A (ω)) − 1 .39) page 29.9) l=1 k=l+1 j=1 and Pr(N(z.11) µP k=1 l=k+1 and after re-ordering of the terms. SG (ω) is real.15) j=0 Since SG (ω) and SG (−ω) are complex conjugates. here ΓG (ν) = ΓGR (ν. The conditional probability that there is a point ∗j from the same cluster in (z + u. this leads µ  P ΓX (ν) = λF ΓG (ν) + SG (ω) + SG (−ω) .14) (1 − ΦA (ω))2 and GP (z) is the probability generating function of P defined as ∞ GP (z) = ∑ p jz j.8) (1 − ΦA (ω))2 Proof. z + u + δ2 ) is δ2 pk ∑k−l j=1 f A (u). µP l=1 k=l+1 j=1 ∞  ∞ 1 ∗k  = λF µP + ∑ A f (u) ∑ (l − k)p l . We use the notations and concepts introduced in section 2.4.

) is a slowly varying function (F(k) has a finite mean µP and infinite variance). (4. c. for a discrete r. (4. this leads s→0 f1 (s) ∼ −sβ Lψ(1 − β ). and is in agreement with the empirical results presented in section 3. 1 < β < 2. L > 0.23) which leads f1 (s) = F̃(s) − 1 + µP s = GP (z) − 1 − µP log(z). one gets: F̃(s) = GP (exp(−s)). is familiar from section 4. From equation (2. ψ(·) denoting Euler’s Gamma function.3.v. (4. 1. c. Let the distribution of number of packets per flow F(k) be such that: k→∞ L(k) 1 − F(k) ∼ . From Tauberian theorem [23. this simple variance prefactor has the interpretation that one independently superposes ‘λF ’ of the same thing. λA ΓGR (ν). λF .24) and therefore from equation (4. FP ).7) only through λF .16) This is a direct consequence of chosing ΦA with a scale parameter obeying b ∝ λA−1 .19) kβ where L(.20) where f1 (s) = F̃(s) − 1 + µP s. (4. we note that: LB(β )(2πλA )2−β ω −(2−β ) → ∞ ω→0 SG (ω) ∼ (4. FP ) = ΓX (ω/λA . (4.49). (4.333] . (4. the parameter 1/λA acts as a scale parame- ter: ΓX (ω. CLUSTER PROCESSES One sees immediately that the flow arrivals enter equation (4.20) z→1 GP (z) − 1 − µP log(z) ∼ −(− log(z))β Lψ(1 − β ). (4.25) .21) and F̃ is the Laplace-Stieltjes transform of F given by Z +∞ F̃(s) = IE{exp(−sX)} = exp(−sx)dF(x) 0 ∞ = ∑ exp(−sk)pk . λA .17) ω→∞ cos(cπ/2) ∼ − →0 (4. The third striking feature λX is that the expression consists of two terms of which the first. In what follows it is assumed that L(k) = L.4. λF .84 CHAPTER 4. To understand the second. p. Furthermore.1.22) k=0 With the change of variable z = exp(−s).18) (bω)c where B(β ) = ψ(1 − β ) cos(πβ /2)/(2π)(2−β ) > 0. Proof.

and at low frequency by the divergent second term.28) where B(β ) = ψ(1 − β ) cos(πβ /2)/(2π)(2−β ) > 0. α) = (2λF LB(β )λA . the generic shape of the LD for the model is similar to that of the dashed curve in figure 2.29) Thus. 2 − β ). ap- pears in figure 4. saturating at medium scales before crossing over to a LRD behaviour at large scale.3.27) The asymptotic behaviour of SG (ω) when ω → 0 can therefore be shown to be LB(β )(2πλA )2−β ω −(2−β ) → ∞ ω→0 SG (ω) ∼ (4. An example of a wavelet spectrum for the model.17) dominates.17) depends only on the intensity λA of the GR flow processes.2(c): a monotonic function with the form of a (scaled) GR process.26) λA 2 λA one gets: β  GP (ΦA (ω)) − 1 = µP log(ΦA (ω)) − − log(ΦA (ω)) Lψ(1 − β ) + o (− log(ΦA (ω)))β . Comparing with equation (2. accounts for .5 where the magnitude of the (scaled) GR and LRD components are also plotted. and not on the second order statistics: at large scale the finer details of the flows cease to matter. CLUSTER MODELS 85 Replacing z by the characteristic function of packets inter-arrivals within a flow ω 1 2 1  ΦA (ω) = IE[exp(iωA)] = 1 + j − σA + 2 ω 2 + o(ω 2 ) (4. We accordingly give two dif- ferent definitions of transition scale.4.5). The first is the largest scale at which the small scale effects. at high frequency the spectrum is dominated by the scaled GR term. The second term shares this property as ν → 0. and was observed to obey it for all ν where it is non-negligible. To cap- ture its position as a function of parameters in a practically useful way. in which case ν→0 ΓX (ν) → λF (σP2 + µP2 ). represented by the saturation level log2 λX /c of the GR component.30) Recall that the GR term is monotonic decreasing when c < 1. It is significant that equa- tion (4. The knee in the LD is now seen as the zone where these two compete.69). evaluated using equation (2. we see that the 2−β model is LRD with parameters (cf . (4. it is essential to realise that the scale at which the ‘road to LRD’ begins may be very different from where the asymptotic LRD behaviour of equation (4. This remains true if the standard deviation σP of P exists. (4. Carrying over these observations to the wavelet spectrum. (4.

(4. and so the plateau is visible. although just barely. simulation of the model is trivial and fast. however jGR PGR departure from small scales leading to LRD can only take effect at scales where there are many packets in a flow. as the parameter dependencies of the two definitions are very similar.062 0. which can be rewritten as jPGR GR ∗ jGR = − log2 λA + log2 (π 2 (c + 1)/(3εc2 )). In order to see whether the GR component saturates before the LRD domi- nates.2. (4. (resp. see section 4.5: Comparison of LDs of AUCK-d1 and fitted BLPP model. yielding ∗∗ 1   jPGR = − log2 λA + log2 µP − log2 (2LB(β )) − log2 c .016 0.5.31) 2−β Its greater tractability encourages its use. If j ∗ ≈ j ∗ In figure 4.86 CHAPTER 4. as it includes the important medium scale effects. in describing the qualitative pa- rameter dependence of the knee. since the then the plateau will have negligible width. one can compare ∗ against j ∗ . The asterisk ∗ (resp. denoted by jPGR against data. The second definition looks for equality between the large scale asymptotic behaviours of the two spectral components ΓX (ν) = λX /c and ΓX (ν) = cf ν −α . This scale. Another advantage of the model is that the packet inter-arrival time distribution can be calculated analytically [46].32) ∗ < j ∗ . CLUSTER PROCESSES 0. However one can avoid the transient regime by starting the simulation in equilibrium conditions at time t. creating a plateau at medium scales as schematised in figure 2. Finally.7(a) jGR PGR GR PGR ∗  j ∗ is not possible.25 1 4 16 64 256 1024 20 Data Model GR component Cluster component 18 Poisson limit log2 Var( dj ) 16 14 λ µ F P 12 c 10 λ µ F P −8 −6 −4 −2 0 2 4 6 8 10 12 j = log2 ( a ) Figure 4. enabling comparisons against data and fitted Gamma inter- arrivals. which are determined by (i) the distribution of the time from t to the next flow arrival. (ii) the distribution of the number Z of active flows at time . j ∗ ).0039 0. intuitively the same criterion defining GR saturation. is the one we use for comparison half of the wavelet spectrum. square) marks the transition scale jGR PGR ∗ . apart from the long transient induced by the LRD.

not only over the scales shown in figure 4. We can now explain the failure of the black box GR model. Furthermore. and j ∗ agrees visually jPGR PGR with the onset of LRD.4. .7(a) where λX /λA ≈ 2. (iii) the forward recurrence time from t to the first event in each flow has the survivor Rt function 1 − λA 0 1 − FA (u)du. goes beyond second order even though the experiments were judged through the eyes of the wavelet spectrum. al- though sharing the general form of a GR process. the visual agreement in the process X(t) itself was found to be excellent.4). and indicates that the ‘physics’ has been captured. For the Abilene trace λX /λA = 278. (iv) the probability that there remains Q = j packets in an active flow is [105]: ∞ 1 Pr{Q = j} = pi . and the possibility of pseudo scaling with ∗ value.2. The cluster model and the black box GR model the same jGR can therefore never coincide at small scales unless λF µP /λA = λX /λA = 1. a GR process seems reasonable at small scales.14.1.58 (c = 0. but such measures cannot resolve important dependencies in the data. (4. succeeds in modelling ∗ ≈ most of the burstiness which was not reproduced using [P-Pois] in figure 3.33) µP − 1 i=∑ j+1 4.7). They can be written as: (i) the time from t to the next flow arrival follows an exponential distribution with para- meter λF . predicting that the plateau is not visible. Fortuitously. The use of GR flows.1 Marginals The model works well when fitted to the packet process for the AUCK-d1 trace. here with σA /µA = 1. Here jGR ∗ . the ‘GR term’ λF µP ΓGR /λA of equation (4. looking solely at the results in figure 4.4. a scaled renewal process is not a renewal process. as seen in figure 4.4 Model validation 4. essentially in the marginals.5. this is almost the case in figure 4. and (iv) the distribution of the number Q of packets remaining in each of the Z active flows. (ii) the number Z of active flows at time t follows a Poisson distribution with parameter λF µA (µP − 1).4.6. Analytical expressions for the equilibrium BLPP can in fact be derived [111]. but not in general. Thus. This result is significant since. but over all scales. This agreement. is not equivalent to one. This is the case. Simply. MODEL VALIDATION 87 t. (iii) the distribution of the forward recurrence time from t to the first event in each flow. which are captured by the cluster model.

This is because. β ). Unfortu- nately IE[H] can fail to match µP by a large factor. perform poorly. (b) Corresponding original process X(t). it is essential to capture the impact of each value of rate in terms of packets. 2. (a) Synthetic X(t) data binned with τ = 50ms. which agree well with semi-experimental comparisons. or the mean. A measurement from the far tail only would not be consistent with (cf . Determining an appropriate λA for in-flow packet arrivals is not trivial. 1/a) for β > 1 (the generalised Riemann Zeta function ζ (·. We now describe the parameter fitting in detail. the number of inter-arrivals in each flow. and in addition is important in the present context where the distribution body is also power-law like over a range of scales. a. k = 1.6: The packet process (AUCK-d1). α) estimates except at extremely large scales beyond the usual observable range. a. has mean IE[H] = a−β ζ (β .8 quantile) values. The exponents of heavy tails are notoriously difficult to estimate. we define the mixture distribution FP (k. as we are modelling the packet arrival process. β ) = . The above procedure includes more data and thus stabilises the estimation. rather than just the far tail. p.14). We accordingly weight the average rate by P(i) − 1. L. ·) can be evaluated to chosen precision. and the factor L is even more so. β ) = 1 − (ak + 1)−β ∼ 1 − Lk−β . The fit is at logarithmically separated k and begins at small (k = 6) or medium (0. The resulting behaviour in the LD is thus a mix of effects which must be appropriately captured when measuring (L. To broaden the family whilst respect- ing the power-law tail and/or bodies. The discrete Pareto-like variable H(k. β ) are measured via a least squares fit in a log-log plot of 1 − FP (k). This results in values that are generally considerably above a simple mean. The tail parameters (L. Simple choices such as the median of R(i) (see figure 3. CLUSTER PROCESSES 80 80 number of packets (a) (b) number of packets 60 60 40 40 20 20 0 0 0 2 4 6 8 10 0 2 4 6 8 10 time (sec) time (sec) Figure 4. An entire distribution FP is required for P to link its physical parameters µP . β ): FH (k.34) where a = L−1/β > 0 is a scale parameter.88 CHAPTER 4. (4. β . a. The flow arrival parameter λF was estimated directly from the sample mean of flow inter-arrivals. · · · .

a2 .0236. a2 .062 0. (b) The fit to UNC-a1 shows distortion not present when the empirical P histogram is used. c can be tuned to fit the LD over scales below the LRD.25 1 4 16 64 256 1024 Data Data 19 Model 25 Model GR component Model with empirical P(i) Black box GR 23 [ A−Pois.005. a.0039 0. c) = (63. β )). γ)) + (1 − p)FH (k. the transition scale jGR PGR pFH (k. 3. For .2711.062 0. 1] allows the mean µP to be independently matched . This illustrates a more general point. thus completing the physical validation of the model.25 1 4 16 64 256 1024 977mus 0. act disproportionately to decrease the effective c value.3510) (4.4.15) (4. (15) lower limit 15 21 log Var( d ) log2 Var( dj ) j 19 13 2 17 11 15 9 13 7 11 5 −8 −6 −4 −2 0 2 4 6 8 10 12 −10 −8 −6 −4 −2 0 2 4 6 8 10 j = log2 ( a ) j = log2 ( a ) (c) 61mus 244mus977mus 0. square) marks ∗ (resp. Alternatively. λA . 0. (a) The fit to AUCK-c1 is good.062 0.35) (p.0039 0.016 0. a mean- ingful value of c can be obtained by packet weighting as for λA above. For AUCK-d1 the fitting procedure yields: (λF . MODEL VALIDATION 89 (a) (b) 0. The detailed parameter fitting procedures above show that meaningful values can be given to the (meaningful) parameters.7: Comparison of data and BLPP model.016 0. The asterisk (resp. whereas the quality of the black box GR model is fortuitous. 1. j ∗ ). For fixed γ > 2 (finite variance) and a2 > 0. The flows with the largest packet volume.0039 0. A model using truncated empirical P agrees with the predicted level.25 1 4 16 64 256 Data 27 Model with empirical P(i) GR component Cluster component 25 log2 Var( dj ) 23 21 19 17 −14 −12 −10 −8 −6 −4 −2 0 2 4 6 8 j = log2 ( a ) Figure 4. as they also have higher average dispersion (see figure 4. 8.016 0.4.1(c)). (c) With Abilene deviations remain even with the empirical P.36) Finally. a. β ) = (0. 0. γ. S−Pkt ] 17 upper limit from eq. 0. the mixture parameter p ∈ [0.

It refers to the fact that often a small proportion of flows. in the BLPP model they . but it can also be applied to other quantities. the ‘elephants’. CLUSTER PROCESSES some parameters however. resulting in estimates of the LRD exponent α which are very misleading. we also plot a ‘semi-model’ fit where the empirical distribution of P is used instead of the fitted model distribution. On the other hand at large scales localised high rates are irrelevant. and the contribution of volume-elephants is significant.30). this can be computationally intensive.90 CHAPTER 4. Although we noted in section 4. 4. However. The improvement reveals that the body of the distribution of P plays an important role in the shape of the ‘ap- proach’ to the LRD asymptote. we have observed that in many cases the observed ‘LRD’ can be dominated by the shape of FP at ‘medium’ scales. in the lower part of the figure we show a semi-experimental LD where the empirical distri- bution has been truncated at the 90th percentile. Indeed. notably c and λA . We see that despite eliminating mismatches in the shape of P. so their contribution will be negligible compared to that of volume- mice. An important reason for this is that what constitutes a ‘large impact’ is scale dependent. rendering the data short-range dependent. Only a small number of packets from volume-elephant flows intersect a given small interval. mice. To see this. Much of the discrepancy is due to the more complex form of FP . Instead. To illustrate the relevance of equation (4. although the main features are re- produced. Understanding the reasons for this requires a return to the data as well as an enhancement to the model. As the fit is poorer. the model fails to account for some of the vari- ability at medium scales (also reported in [182] for other OC48 traces). The LD then saturates at a value (dashed line) which agrees well with equation (4. we show only the semi-model fit using the empirical distribution for P. How- ever. and should. have a disproportionate impact over the more numerous ‘mice’. the concept can. also be applied to the orthogonal dimension of traffic rate (see [159]). In figure 4.2 that flow rates vary widely. Finally figure 4. would make themselves felt at such small scales. The heavy tailed modelling for P respects this idea.7(b) the fit to UNC-a1 is not quite as good. flows with very high rate. and a multiclass cluster model The term ‘elephants and mice’ has become common parlance. and the results for the Auckland and UNC traces show that the BLPP model is capable of naturally modelling both elephants and mice within a single model class.4. Typically this distinction is made in terms of flow volume (bytes).7(c) shows the result for the high rate Abilene trace.30).2 Elephants. rate-elephants. in particular the knee position prediction is satisfactory. faster methods more akin to ‘fitting’ could be used for more routine application of the model.

(c) Coefficient of variation per flow.5 −4. shape and flow volume distribution.5 0 −5 0 −5 −2 −1 0 1 2 3 −2 −1 0 1 2 3 log( R ) log( R ) 6 3 (c) 2. shape cE and flow volume distribution FPE .5 4 2 log( P ) 1.5 −2. else ‘M’.) taking value ‘E’ with probability q. By a well known splitting property of Poisson processes (see theorem 2.5 −3. Consider a cluster process where for each flow an independent copy of B determines its class. share a deterministic value λA . (b) Packet Density plot. Let B be a Bernouilli random variable (independent of P etc.5 2 1 0.5 6 −1. Thus the spectrum ΓX of the ‘multiclass’ cluster model is just the weighted sum of two spectra of BLPP type. With these additional tools at our disposal. (a) Flow density plot over (R(i). the set of seeds of clusters of type ‘E’ (resp. we return to the Abilene trace with the flow .4.‘M’) is also a Poisson process with rate λFE = qλF (resp.5 2 2 −4 −4 −4. λFM = (1 − q)λF ).4 page 32). This was acceptable as a single value of λA could be found which represented well the range seen in the high density portions of figures 4. and ‘M’ with parameters λAM .5 0 −2 −1 0 1 2 3 log( R ) Figure 4. which each have constant rate. (flow density weighted by number of packets). cM and FPM .3. These two new processes. This construction can easily be extended to a countable number of classes. P(i)). MODEL VALIDATION 91 6 −1. A cluster model incorporating two distinct classes would then be needed in order to successfully describe behaviour at all scales.8: Flow and packet density in Abilene. This would not be the case if rate-elephants and rate-mice were present.5 (a) (b) −2 −2 −2.5 4 4 −3 −3 log( P ) log( P ) −3. are independent BLPP processes.4. To calculate the spectrum of a cluster model like BLPP but where the parameters can fall into two distinct classes: ‘E’ with rate λAE . we proceed as follows.1(a) and (b).

92 CHAPTER 4. CLUSTER PROCESSES

density plot of figure 4.8(a). It tells a similar story to that of figure 4.1(a), albeit with a
shift to higher rate (note that the diagonal boundary across the top is an edge effect due
to the short duration of the trace). However, when we move to the packet density plot
of figure 4.8(b) we see a striking change in the centre of mass which is not found in the
AUCK traces, where the epicentres of ‘packet’ density and flow density coincide (compare
figures 4.1(a) and (b)). The location in (R, P) space of this high density region represents
an empirical definition of ‘elephant’ which is not tied to rate or packet volume alone. It
is characterised by a very small proportion of flows containing a high proportion of to-
tal packets, with a higher average rate and higher average dispersion (lower c values), as
seen from figure 4.8(c). Thus the Abilene trace contains very strong, bursty, and high rate
volume-elephants, and yet by the argument above, the volume-mice must still be impor-
tant for small enough scale, suggesting that a multiclass model may be essential for a full
description of this data.
In future work we will examine the usefulness of the dual class cluster model to explain
the form of the wavelet spectrum shown in figure 4.7(c) (similar spectra have been observed
in OC-48 commercial backbone links [182]). Alternatives to Gamma renewal models will
also be investigated to model more extreme in-flow burstiness. Although the number of
parameters increases when moving to multiclass models, it may be necessary to capture
important network features. Network traffic is complex, and cannot be reproduced accu-
rately, nor meaningfully understood, with just 3 or 4 parameters. As the Abilene trace
is a very recent one and is from a large backbone link, these complexities are exciting to
explore since in many ways they constitute a taste of the future of traffic. However, as
networks evolve, models may have to adapt and a BLPP model might not be universally
applicable.

4.5 Towards understanding traffic evolution

In this section we examine in more detail the nature of the BLPP model as a function of
parameters, and illustrate its use as a tool to speculate on the future shape of traffic. For
convenience we recall that for large j the LD tends to log2 (cf C) + α j, or
 
log2 2λF · LB(β )C(2 − β ) + (2 − β )( j + log2 λA ). (4.37)

The flow arrival parameter λF .

The role of λF is to vary the number of flows, which, through equation (4.7), can be seen
as an i.i.d. superposition leaving the form of the second order structure invariant. The mag-

4.5. TOWARDS UNDERSTANDING TRAFFIC EVOLUTION 93

nitude of second order dependencies relative to the mean decreases as (λF µP )−1/2 , so this
result is not in contradiction with the well known weak convergence of such a superposition
to a Poisson process [46, p.285]. In traffic engineering this relative decrease of variability
is known as statistical multiplexing gain and is a standard yet powerful argument for using
links with higher capacity to enable more flows to mix together, effectively lowering vari-
ability, even for LRD traffic. This argument follows ‘open loop’ model reasoning, where
network feedback is weak. This however is currently valid for backbone links, as network
utilisations are low, and are likely to remain so.

The flow structure parameters λA and c.

Since 1/λA is a scale parameter, increasing λA results simply in translating the wavelet spec-
trum toward smaller scales. This can be seen explicitly in the expressons for the transition
∗∗ and j ∗ , and in (4.37) above. Increasing λ also obviously scales back flow dura-
scales jPGR GR A

tions proportionally. At a fixed scale of observation, say at the sampling rate of a particular
measurement infrastructure, one would see the traffic burstiness increase and become de-
cidedly less Poisson as both the in-flow burstiness and scaling behaviour translate to smaller
scale. In network terms, increased λA could correspond to the same traffic passing through
faster access networks before reaching the measured link. This is in agreement with fig-
ure 3.4 (page 48) which shows that the onset of LRD happens at a larger time scale for the
Melbourne ISP traces than for the Abilene trace. One can indeed infer that the customers of
the Melbourne ISP access the Internet through 56kbps modem connections, resulting in a
small λA , while the Abilene network is accessed from high bandwidth university links, with
larger λA .
Equation (4.37) is independent of c. Decreasing c results mainly in an increase in bursti-
ness at scales below LRD through the plateau height λX /c, and an increase in the pseudo
∗ . It also results in a monotonic movement, of approximately the
slope at octaves below jGR
∗ and j ∗∗ to higher scales. Increased flow burstiness could arise
same speed, of both jGR PGR

through lower utilisations on network links, resulting in less queueing and therefore less
traffic smoothing, and also through more aggressive TCP flow control.

The flow volume parameters µP , and (β , L).

We assume that these three can be varied independently, although this can never be entirely
∗∗ the tail parameters (β , L) have no
realised in a parametric family. At scales below jPGR
∗ is entirely independent of P, and µ enters only as a
impact. The plateau onset scale jGR P

∗∗ (thus scaling up the pseudo-
variance factor magnifying the burstiness at scales below jPGR

94 CHAPTER 4. CLUSTER PROCESSES

slope). At the other extreme, the LRD is unaffected by µP but strongly influenced by the tail
parameters: the asymptotic line moves up when the tail is made heavier either by increasing
∗∗ is the result of competing effects. It is pushed up
L or by decreasing β . The onset scale jPGR
when increased µP increases short-range burstiness, grows to a limiting value with increas-
ing β , but decreases with increasing L. In terms of networks, a smaller β corresponds to an
increased spread of file sizes, whereas L and µP trade off the proportion of ‘small’ versus
‘large’ files.

The parameter dependencies above can be combined according to possible future traffic
scenarios. For example, assume that increased access link rates promote a proportional
increase in network usage according to: λF 7→ ΛλF , λA 7→ ΛλA , and consider the question,
will traffic become more or less bursty? Clearly the answer must be time scale dependent. If
∗ , j ∗∗ ] both before and after the increase, then
observing at a scale which is in the range [ jGR PGR

the multiplexing effect due to λF will apply, reducing (relative) burstiness. At scales above
∗∗ however the increase in λ largely cancels this out, and in addition the LRD invades
jPGR A

lower scales. If the more generous access rates also encourage greater transfer volumes:
µP 7→ ΛµP , then λX 7→ Λ2 λX and the multiplexing effect will win out.

Care must be taken when one moves the scale of observation as parameters vary, such as
when studying packet inter-arrivals. There the characteristic timescale, 1/λX = 1/(λF µP ),
∗ is invariant with respect to each
shrinks with increased flow rate or volume. Since jGR
of these, as Λ increases the point of observation in fact moves towards the point process
limit of λX , regardless of the actual change(s) in traffic structure. Indeed, if smaller inter-
arrivals occur purely because of greater µP , then absolute burstiness has in fact increased
∗∗ , whereas the change in perspective might suggest that the traffic had
at scales below jPGR
become more Poisson-like. At such small scales one should also be aware of the physical
limitations of the point process model, which breaks down when packet sizes are reached.
At [OC48,OC3] speeds (assuming a large 1500 byte packet), the model breaks down at
around [5, 77]µs, or j = [−15, −11].

To illustrate this point further, figure 4.9 shows the LD and an average periodogram of
the very fine scale regime of the CAIDA-b1 trace. The two can be linked by reinterpreting
equation (2.69) as a spectral estimator and setting ν = 1/a. The Fourier analysis reveals pe-
riodicities in the packet arrival process at scales j ≤ 10 due to physical network effects, such
as back to back packets on upstream bottleneck links, which translate to shaped, roughly
periodic traffic on the observed link. The wavelet analysis averages these out and leads
to a roughly flat spectrum consistent with a Poisson process. The model presented in this
chapter cannot reproduce this behaviour since it does not include the notion of packet size.

4.6. HIGHER ORDER STATISTICS 95

30.5mus 977mus 0.031 1 32
Averaged Periodogram
28 LD
Poisson Spectrum

26

log Variance( d )
j
24

22

20
2

18

16

14
−15 −10 −5 0 5 10
j = log2 ( a )

Figure 4.9: Periodicities at small scales

Further details on this topic will be given in chapter 7.

4.6 Higher order statistics

Up to this point we have only considered second order moments when comparing the data
and the BLPP model. This is due to the fact that we were mainly concerned with the LRD
character of the data, which is a second order property. In this section we study higher order
statistics with the following two aims: first check that the model satisfactorily captures
higher order statistics of the data, and then investigate the small scale behaviour of Internet
traffic.

4.6.1 Model fit

Recall from chapter 2 that we use q-LDs defined in equation (2.72) to study higher order
statistics. Similarly to what was observed in the case q = 2, the q-LDs exhibit a biscaling
behaviour, i.e. two straight lines separated by a knee. In order to compare the statistics
of the data and the fitted BLPP, we measure the local slopes αq in the q-LDs, at both fine
scales (FS), i.e. below the knee, and coarse scales (CS), i.e. above the knee . We then form
a Linear Multiscale Diagram (LMD), defined in equation (2.73), for each range of scales.
Figure 4.10 shows the LMDs for both data and fitted model at coarse scales and fine
scales. Given the size of the confidence intervals, there is no statistical differences between
the LMDs of the model and the data at CS. At FS, the difference between the LMDs tend to
increase with q values, but the absolute difference remains small. Moreover, it is notoriously
difficult to estimate higher order statistics in empirical data due to local non-stationarities.

96 CHAPTER 4. CLUSTER PROCESSES

−0.05

−0.1

−0.15

−0.2

−0.25
hq

−0.3

−0.35

−0.4

−0.45

−0.5
0 1 2 3 4 5 6
q

Figure 4.10: Multiscaling comparison between AUCK-d1 (grey) and the fitted BLPP
model (black). Dotted lines represent coarse scale behaviour while solid line
represent fine scale behaviour.

We conclude that the BLPP captures the higher order statistics of the empirical packet arrival
process. Given the fact that we only used the autocorrelation of X to fit the parameters, this
means that the BLPP model really captures the ‘physics’ of the data.

4.6.2 Small scale behaviour: multifractal or not ?

While we simply remarked that the LMDs of the data and the BLPP model were fairly
similar in the previous section, we now investigate their actual values and their statistical
meaning. Our aim is to determine whether a multifractal description of the traffic makes
sense. There is in fact a fair body of literature on multifractal modelling of Internet traffic,
which we do not attempt to summarize here. Useful references can be found for instance
in [150] and [167]. We point out that studying the multifractal nature of a process is an
arduous task, and we will show how our wavelet estimator, arguably the best of its class,
can be fooled.
Recall from section 2.4.3 that we basically have to check whether the points of the LMD
are horizontally aligned (monoscaling behaviour) or not (multiscaling behaviour). The size
of the confidence intervals in figure 4.10 leads us to the conclusion that the data exhibits a
monoscaling behaviour at CS, consistent with LRD. At FS, one could conclude a very weak
multiscaling, or a monoscaling behaviour.
Going beyond the close agreement between the data and the model, which is very sat-
isfying in itself, the point we wish to make here is something quite different. The BLPP
model is not multifractal. Nonetheless it reproduces a non-trivial multiscaling behaviour

The model has many advantages including a known spectrum. However.7. as do higher layer mechanisms grouping flows such as web browsing sessions.2. as a transitional effect over 2 More details on the topic can be found in [173] . positive marginals. Packets within flows follow finite Gamma Renewal processes with rate λA and shape c. as a function of parameters. a stationary Poisson cluster process class was proposed as an ideal model capturing these features. It therefore provides another example of pseudo scaling. simple synthesis. A detailed description was given of the behaviour of the spectrum. It is based on empirical findings detailled in chapter 3 where we showed (at least in the context of lightly loaded links) that both the flow arrival process and de- pendencies between flows have negligible impact. and heavy tailed flow volume at large scale. alternative explanation for empirical evidence of multiscaling behaviour at small scales. CONCLUSION 97 (at least to the same extent as the data). In this chapter. The key element was found to be the concept of independence between flows.3. and a minimum number of parameters each with direct physical interpretation in terms of network traffic. in the same spirit as the transition effect between the two asymptotic values of the Gamma renewal LD illustrated in figure 4. Using wavelet analysis.4. Its spectrum can be written as a sum of a scaled spectrum of a renewal process controlling small scale behaviour. From section 2. Poisson arrival instants with rate λF denote the arrival of flows.7 Conclusion Our analysis of the structure of TCP packet arrivals in Internet traffic led to several signifi- cant conclusions. the second order statistics of packet arrivals were shown to be determined by in-flow packet arrival burstiness at small scales. and the wavelet spectrum. 4. In the hypothesis test language this corresponds to an ‘error of type II’ 2 . If we accept this model as preferable on such grounds. The model offers the possibility of a new. this shows effectively a lack of power on the part of the statistical procedure since it cannot distinguish between the signatures of a true multifractal scaling and a pseudo scaling process. and a term controlling asymptotic large scale behaviour. the BLPP is an infinitely divisible point process. From the above. the exact consequences of this property are still unclear. then we are led to conclude that the evidence for the multiscaling itself (whether it be monofractal or not) is misleading. and very simple. which we do here. and the corresponding interpretation for networks. flow volume being given by a heavy tailed variable P with infinite variance.3. we see that pseudo scaling is responsible for the empirical scaling within a model which has a strong physical foundation. The scaling-like behaviour at small scales was clearly linked to the burstiness within flows. In fact.

should they exist. analysed as a function of network parameters. very high bit rate traffic trace. It led to meaningful parameter values.98 CHAPTER 4. con- firming that the model actually captures much of the network ‘physics’. Further data analy- sis revealed some of the underlying reasons. CLUSTER PROCESSES a narrow range of scales of simple in-flow burstiness. The model was verified against large quantities of accurate Internet data. and visually convincing model sample paths. Some departures from the model were found for a recent. It was also used to illustrate how a packet volume based definition of elephants is not sufficient. suggesting that such traffic is not truly multifractal over these time scales. enabling its use as an investigative tool for the evolution of traffic properties. rather than black box. and found to be accurate. and a multi-class version of the model was described as a possible means to account for them. and was found to reproduce the second order statistics well. It was shown how the model can naturally incorporate the notion of elephant and mice flows without the need to explicitly define them and treat them separately. . The model is highly structural. An expression for the onset scale of LRD was given. The parameter fitting was described in detail. and how ‘rate-elephants’ could be accounted for in the model.

with strong empirical backing (chapter 3). The problem that then immediately arises is how to deal with such partial measurements. We focus mainly on two statistics: the spectral density of the packet arrival process. take appropriate decisions based on the characteristics of the full traffic.Chapter 5 Inverting sampled traffic 5.g. Our aim in this chapter is to provide theoretical results for the problem of recovering statistics beyond first order from sampled traffic. While the second step is left to traffic engineers and managers. and can be summarized by saying that the Bartlett-Lewis point process (BLPP) is a very good model of Internet packet arrivals (chapter 4). Routers offer tools such as Cisco’s Netflow [33] or Inmon’s sFlow [91] that give information about the flows of packets that traverse them. such as actual number of packets in flows on the measured link. and to see how successfully such results can be applied in practice with real traffic. the first corresponds to an interesting and important task which has only recently been attracting attention. One can think of this as a two step process: first recover the statistics of the full traffic from the retained sampled data through some inversion procedure. 5. This is why packet sampling techniques are increasingly being used in routers [34] to export the statistics of a portion of the traffic only. link upgrades or traffic re-routing) and traffic accounting (e. and the distribution of the number of packets per flow. We now turn to question (ii) and study in this chapter how to sample packet traffic at a router interface. This implies that we limit ourselves to portions of traffic that can be considered stationary. It also means that we do not try to recover sample values. However the generation of detailed traffic statistics does not scale well with link speed. but rather the distribution from 99 .g.1.1 Motivation Network traffic measurement is essential for traffic engineering (e. usage based pricing). and second.1 Introduction The findings presented in this thesis up to this point concern mainly question (i).

i. and therefore not all packets are seen at any single point of the backplane [90].1. Purpose built link monitoring boxes however. the action of ‘sampling’ points along the real line is called thinning. In point process theory. From a theoretical perspective. for each packet in an independent manner. We refer to packet level when describing statistics which do not use or refer to any imposed structure or detailed modelling assumptions. we are mainly interested in the point process of packet arrival times and will not be concerned with packet sizes. For example few routers can export packet level statistics such as sizes and timestamps of individual packets. for each flow independently. or they may be close to impossible to provide because of real-time constraints. In some cases these statistics may not be readily available in today’s routers. will be capable of much finer grained storage and processing. as we focus primarily on the feasibility of the inversion problem. based on con- cepts introduced in chapter 3.2 Terminology In accordance with the rest of this thesis. retaining the packet with probability q or discarding it with probability 1 − q. and ‘sampled’ or ‘thinned’ to refer to the sampled traffic. In this chapter we first place ourselves in a general framework whereby any raw statistics of the sampled data that we may need are considered to be available. Independent and identically distributed (i. In this chapter we will study two different sampling rules: packet thinning.d. Similarly.) packet thinning consists of.i.d. 5.100 CHAPTER 5. We use a hierarchy of descriptors to study the statistics of packet traffic. We will use ‘full’ or ‘original’ to refer to the non-sampled packet traffic. The process of thinning the packet process is to be understood in general terms as the action of only recording part of the total traffic according to a certain rule. one is interested in recovering as much information as possible about the original point process by observing a thinned version of it. INVERTING SAMPLED TRAFFIC which these samples were drawn. Traffic statistics commonly considered vary widely depending on user requirements and the capabilities of the collection mechanism. and we will use these two terms interchangeably. In addition. which acts directly on individual packets and is ignorant of flows.i. Flow level is concerned with statistics arising from the grouping of packets . and flow thinning. Examples of packet level statistics are the mean packet arrival rate or the spectral density of the packet arrival process. where entire flows of packets are retained or discarded at once. flow thinning consists of. currently high-end routers use switched instead of shared backplanes. leaving the flow untouched with probability q or removing it entirely with probability 1 − q. or dedicated passive measurement infrastructures supporting offline studies based on sampled traffic.

An adaptive sampling technique was also used in [32] where a bound on the sampling error for traffic load measurement was studied. Each of the aforementioned studies were concerned with a packet level description of network traffic in the sense described above. a particular kind of sampling. such as the distribution of the number of packets per flow. More specifically the recovery scheme requires the knowledge of the number of original flows. INTRODUCTION 101 into flows. protocol. Recently a technique to approximate the full distribution of number of packets per flow was proposed in [51]. The aim of this work was to estimate the packet size distribution from the sizes of sampled packets.. such as the mean arrival rate of packets belonging to a given flow.3 Previous work In the early 1990’s. In [52].. although of less importance here. The estimation is not blind and makes strong use of additional information contained in the TCP packet header. where it is shown how certain first-order IP flow level statistics can be recovered from sampled traffic. provided estimates from sampled traffic of the mean number of bytes or packets of a set of packets with common properties (e. In particular. Duffield et al. is used to reduce the variance of the estimators. Because of the heavy tailed distribution of file sizes. This . data collection on the T1 NSFNET backbone showed that information was lost during peak periods. it must be inferred separately. It is shown how this can be achieved by looking for TCP SYN packets in the case of ‘ideal’ TCP flows which all begin with a SYN packet and have infinite timeouts. and as it is assumed that this is not measured directly. an estimator (and its all important variance) of the mean number of packets per flow is given. Autonomous System. we use in-flow level to refer to statistics de- scribing the placement of packets within a flow.. IP addresses. known as stratified sampling [37]. Sampling methods were therefore advocated in [36] to reduce the load on the measurement infrastructure. Much closer in spirit to this chapter is the work presented in [131]. Another study of sampling techniques can be found in [31] where the mean number of packets and the packet size distribution are estimated from a sampling where the number of skipped packets is a Poisson random variable. Different sampling strategies were compared: deterministically taking one in every N packets (systematic sampling). taking on average one in N packets (simple random sampling) or taking one packet in every bucket of size N (stratified random sampling).5. 5. In [50] an adaptive sampling rate was proposed to optimize the resource allocation. Sampling strategies were also used in [88] for the detection of denial of service attacks.).g. Finally.1. It basically consists in sampling ‘more’ in the heavy tail of the distribution and gives different weights to different samples.1.

2.4 Outline and main contributions We are interested in inverting sampled traffic in a statistical sense. not always in practice.6. packet sam- pling and i.102 CHAPTER 5. The inversion methods require different assumptions on the original traffic depending on the sampling method which are carefully detailled and justified. We show in particular how the theory of point processes can help recover the original spectral density from the thinned data.i.1. These results will be discussed further in section 5. We also propose a theoretical scheme to recover the full distribution of the number of packets per flow.d. We are not aware of any previous study on recovering the spectral density of the packet arrival process from sampled traffic. The practical application of the two methods to real traffic and the limitations of their numerical evaluation are given in section 5. whereas flow thinning can be usefully inverted no mat- ter how high this probability becomes. All quantities corresponding to thinned traffic will be written with the superscript (q) . We then present an alternative sampling technique named flow sampling which is (almost) as computationally feasible as packet sampling. provided enough traffic is sampled. These previous studies are only concerned with the distribution of number of packets per flow. We present theoretical inversion methods to recover the spec- tral density and the distribution of the number of packets per flow from the observed thinned traffic. and the distribution of the number of packets per flow (flow level).3.2 we address this problem from a theoretical perspective. INVERTING SAMPLED TRAFFIC method is based on an expectation maximization technique and gives a smooth estimate of the original distribution. In this respect we extend the work of [131] where only the mean number of packets was recovered. In section 5. .d.i. Section 5. Our main contribution is the demonstration of the fact that inversion is essentially impossible in practice in the case of packet thinning for any useful thinning probability. We conclude in section 5. 5. In section 5.3. It is shown how the parameters of the model can be fitted from the thinned data obtained from both sampling techniques in theory.5 we summarize our findings and discuss the use of different sampling techniques for different tasks. but has a more straightforward inversion mechanism both at the packet and flow level. We first consider the case of packet sampling since it is the method currently implemented in routers.4 is concerned with the application of the sampling techniques to the BLPP model introduced in chapter 4. focusing mainly on two quantities: the spectral density (packet level). flow sampling.2 Inverting sampling: theory In this section we study two different sampling techniques. which we call i. 5.

Let P(q) be the discrete random (q) variable describing the number of packets per flow after packet thinning. From [25. locally finite and second order stationary point process X with spectral density ΓX (ω). the spectral density of X (q) reads (q) ΓX (ω) = q2 ΓX (ω) + q(1 − q)λ . with density pk = Pr(P = k). In this subsection our aim is to recover the properties of (q) the marginal FP of the original flows from FP . Flow level Let us assume that the original process is in fact the superposition of identically distributed groups of points called clusters.1) q A much less intuitive result links the spectral densities of X and X (q) .2.3. INVERTING SAMPLING: THEORY 103 where q is the retention probability defined below. 46]. Recall the notations of chapter 3: P is the discrete random variable describing the number of points per cluster. (5.i. Since we look at the marginal only.2) and (5. and distribution FP(q) . (5.1 Packet sampling In general terms. Packet level The original rate can be recovered from that of the thinned process in a straightforward way via 1 λ = λ (q) .5 page 31. the i. distribution FP .2) Proof. . (5. In the traffic context these are packets grouped into flows. In practice no flow of length 0 is observed and therefore p0 = 0. packet thinning of a stationary point process X with rate λ consists in independently keeping each point of X with probability q or rejecting it with probability 1−q to form a new point process X (q) with rate λ (q) = qλ . In particular no modelling assumptions are required beyond stationarity. and finite mean µP . See section 2.5.d. for any simple.1) the spectrum ΓX (ω) of the original process can therefore be recovered and reads 1  (q) (q)  ΓX (ω) = Γ (ω) − (1 − q)λ . 5.3) q2 X This powerful result gives readily accessible and very useful information about the orig- inal process without making any assumptions on its detailled structure.2. with density pk = Pr(P(q) = k). From equations (5. there is no need to assume independence between flows.

To invert this relation we use results on probability generating functions and complex analysis. (q) Let GP (z).i. binomial random variables.d. 1). P(q) and B defined respectively as ∞ ∞ (q) (q) j GP (z) = ∑ p jz j . nor an expansion about the origin. GB (z) is an entire function (analytic for all z ∈ C). r) will denote respectively the circle. as we see from equation (5. 1)) = D̄(1 − q. equation (5.d. packet thinning.6) only gives an inversion formula for GP for z ∈ D̄(1 − q.5) This equation is the transform domain version of equation (5.104 CHAPTER 5. (see the thick circle in figure 5. the open disk and the full disk with center z and radius r. GP (z) and GP (z) are defined on the closed unit disk D̄(z. the probabilities p j that we wish to calculate can be obtained by picking out the coefficients of a power series expansion of GP about the origin. the probability pk of having a flow of size k ≥ 0 after thinning reads ∞ (q) pk = ∑ Pr{k packets after thinning| j packets before thinning}p j j=k ∞   j k = ∑ q (1 − q) j−k p j . However. r) and D̄(z. q).1(a)). the mean number of packets per flow can be recovered via (q) (q) dGP .6) q Now. In the following C (z. j=0 j=0 (q) and GB (z) = 1 − q + qz. GP (z) and GB (z) be the probability generating functions of P. r). INVERTING SAMPLED TRAFFIC (q) Conditioning on the number of packets in a given original flow. P(q) can be expressed as a sum of P i. one can obtain GP from equation (5. We consider how to circumvent these difficulties in a moment. q).4).4) gives the densities of the thinned flows as a function of the densities of the original flows. From results on the generating function of a compound distribution the following relation holds: (q) GP (z) = GP (GB (z)) for z ∈ D̄(0. (5. a closed disk which lies within the unit circle and is centered at z0 = 1 − q. q).4) j=k k Equation (5. Denote by B the binomial random variable such that Pr(B = 0) = 1 − q and Pr(B = 1) = q. (5.5) as (q) z − (1 − q)   GP (z) = GP for z ∈ D̄(1 − q. 1).5). 1) due to a singularity at z = 1. GP (z) = ∑ pj z. Using standard results on generating functions. D(z. (5. Since G−1 B (D̄(0. It does not give us GP over the full unit disk. r) = D(0.i. but if FP is heavy tailed they are only analytic on the open unit disk D(0. Let us first introduce some notation. 1) ∪ C (0. By definition of i.

.

1 dGP .

.

µP µP = .

= .

= . (5.7) dz z=1 q dz z=1 q .

5). 1). .11) qα We now present two different theoretical schemes to recover the original probability densities.333] that FP has tail behaviour (q) x→+∞ L(q) 1 − FP (x) ∼ . From equation (5. (5.8) and (5. GP is known on D(0. since GP is analytic in D(0. Carrying this through in practice however is not straightforward. (5. In principle.5.6) its values are known on D(1 − q. z ∈ D0 .9). q). This means that if a heavy tailed is observed in the thinned traffic. and cannot have been created by the thinning process itself. 1) through analytic continuation. From equations (5. INVERTING SAMPLING: THEORY 105 Let FP be a heavy tailed distribution such that x→+∞ L 1 − FP (x) ∼ . it must come from the original traffic.8) xα where L > 0 and 1 < α < 2. q) which lies inside D(0. (5. 1) and from equation (5. We denote by z0 = 1−q the origin of the original analytic domain D0 = D(z0 .9) xα where L(q) = qα L. Within D0 it is easy to show from equation (5.10) The thinned distribution for the number of packets per flow is therefore also heavy tailed with the same index but reduced tail mass. In fact the Tauberian theorem used above is even stronger and gives an equivalence between equations (5.10) one can trivially invert the tail prefactor: 1 (q) L= L .13) qn and the radius of convergence is r0 = q.6) one can show by using a (q) Tauberian theorem [23. Scheme 1: analytic continuation Our aim is to construct a power series expansion of GP about the origin in order to recover the p j via the expansion on the left in equation (5.8) and (5.6) that the following power series expansion holds: ∞ GP (z) = ∑ a0n (z − z0 )n . The required expansion about the origin can therefore be found.12) n=0 where the coefficients obey (q) pn a0n = (5.2. (5. (5. p.

1(b) illustrates the case where q = 0. The basic principle we employ is to choose a point z1 ∈ D0 and to expand GP as a power series about it. · · · l. The coefficients of this new series can then be obtained by comparing with the series of equation (5. k = 1.106 CHAPTER 5. The identity in question states [153.14) and noting from equation (5.5.5 0. can be chosen as the origin. An alternative way to derive this inversion formula is to directly apply a combinatorial identity to invert equation (5.1. and (b) q = 0.1 and l = 5).6. and GP will be expanded in a power series centered about zk .5 −1 −1 −1 −0. 1). we have n (−1)n− j ∞   (q) pj = ∑ n (1 − q)n− j pn .1 1 1 z0 z0 z1 z1 0.5 z2 z3 z4 z5 0 0 −0.5 −0. p. a very mild degree of thinning.5. 0. as illustrated in figure 5. and are ∞   n 0 a1j =∑ an (z1 − z0 )n− j . of points along the real axis obeying 1 > z0 > z1 > · · · zl = 0.5 0 0. 1] when FP is heavy tailed.1: Analytic continuation method for (a) q = 0.5 0 0.5] z1 cannot be chosen at the origin. 1]. before a point. In the present context this identity   ∑∞j=k k A j and A j = ∑∞ k= j j (−1) k can only help us for q ∈]0. zl being the origin itself (figure 5. z5 . that Bk = j k k+ j B are inverses.5) that a1j = p j in this case.4).5 1 Figure 5.1(a) for q = 0.12) evaluated at z = z1 . zk will be chosen to lie inside the circle of convergence Ck−1 from the previous stage.1 a series of analytic continuations are required. q) and the thick dotted grey circle is the unit circle C (0.6 z1 can be chosen as the origin and an expansion made there whereas for q = 0. INVERTING SAMPLED TRAFFIC (a) (b) q = 0.5 1 −1 −0. For q = 0.15) n= j j q which only converges for q ∈]0. The thick solid dark circle represents C0 = C (z0 .14) n= j j Consider how this works for the simple case of q ∈]0.5. (5.6. For q ∈ [0. (5. 2. with no convergence criteria given. 1] where we are able to choose (q) pn z1 to be the origin. At the kth stage. whose . and we adopt a recursive procedure involving a sequence {zk }.49]. Substituting a0n = qn into equation (5.6 q = 0.

5 we first have to infer the values of GP on some suitable contour S and then use equation (5. the coefficients of the final power series will be the desired densities.5. Scheme 2: Cauchy integral A second theoretical scheme to recover the original p j is based on another important result of complex analysis: the Cauchy integral formula. However for q < 0. The main drawback of the Padé approximation is that there are no general bounds on the error. that |G p (z)| ≤ 1 on the unit circle. is not of much use. Moreover. The natural bound in this case. (5. In this way. are summarized in [1]. Details on the determination of these polynomials and convergence issues can be found for instance in [14].5. including methods using inverse Fourier transforms and damping tech- niques.17). (5. and so the corresponding radius of convergence will be rk = 1 − zk . In fact the zk can be chosen so that the origin is approached geometrically: a minimum of d− log2 (q)e iterations is required. Inversion methods based on equation (5. that is p j = alj . INVERTING SAMPLING: THEORY 107 coefficients akj will be obtained through those of the previous stage: ∞   n k−1 akj = ∑ an (zk − zk−1 )n− j . As before.17) to recover the p j . of degree L and M respectively.16) n= j j Since zk lies inside the unit circle where we know GP is analytic. Methods have also been developed to remove the aliasing terms caused by the unavoidable discretization of the integral in the numerical evaluation of equation (5. They work well when one can directly evaluate GP on a contour including the origin1 . the procedures are quite involved. While complex analysis provides elegant theoretical results for the recovery of FP from (q) FP .17) S z j+1 where S can be any closed contour containing the origin. it is an ill-posed problem in the sense that 1 In some queuing problems for instance one has an explicit expression for the generating function to be inverted for the corresponding probability densities.2. its circle of convergence Ck will first encounter a singularity at z = 1. It consists in approximating GP at the point z = z0 by a quotient of two polynomials P(z) and Q(z). as the sequence {zk } marches towards the origin. A common method to do so is to use Padé approximants . . the radii of convergence increase monotonically to 1. Note that for q > 0. which for our particular problem reads GP (z) I pj = dz. both when Fp is light tailed [45] and heavy tailed [155]. we can choose S = C0 and the Cauchy integral can be directly evaluated along this contour. In our case we evaluate the Padé approximants on a contour S chosen to be the unit circle. and then evaluating P(z)/Q(z) at the desired values of z.17).

11) do not suffer from this problem as z = 1 is on the circle for all q. Packet level We now explain how.2: X(t) = ∑ Gi (t − tF (i)). no assumption of flow independence is needed at this point.108 CHAPTER 5. Flows will be taken to be identically distributed through an assumption of stationarity. Given these fundamental difficulties at the flow level. which are uncorrupted by flow thinning. all the marginal flow properties. (5. To put this in perspective.3.2 Flow sampling As stated in the introduction. as defined in section 2. i. The packet arrival process X(t) is therefore a Poisson cluster process (PCP). We now turn to a very different kind of sampling. Note that equations (5. the practical limitations of the two schemes described above are so severe that only a few values for the first iteration step can be obtained numerically. packet level information such as the spectral density of X can be recovered from flow sampled traffic. which has quite different properties. As for packet thinning. 5. can be readily estimated from the observed thinned traffic.d.i. under reasonable assumptions on the underlying process. and in particular the distribution P of the number of packets per flow. The same holds true for in-flow statistics. This is in marked contrast with the packet thinning scenario and its problematic inversion requirements. we are trying to extrapolate values from a tiny circle of radius q close to z = 1 up to the entire unit circle.3. in the case of significant thinning. say q = 0. We place ourselves in the modelling framework detailled in chapter 4 where we consider that the flow arrival process Y (t) is a Poisson process and we take flows to be mutually independent. flow sampling consists in selecting flows with probability q. Flow level Since the flows that are kept by the thinning procedure are identically the same as the origi- nal flows.i. There is no inversion problem as such (beyond estimation issues).2. As we will see in section 5. It is assumed that the subsidiary process Gi (t) has a . i. flow thinning.001. INVERTING SAMPLED TRAFFIC small errors in the evaluation of G p at points in the original domain D̄0 become magnified in the extrapolation [126]. we do not attempt to investigate the inversion of in-flow statistics for packet thinning.18) i where the flow arrival times {tF (i)} follow a Poisson process and Gi (t) represents the ar- rival process of packets within flow i.7) and (5. and the value of q plays no theoreti- cal role.d.

i.19) From [110] the rate of the stationary process X(t) reads λ = λF µP . (5. one can show that the i. (5. They come from the Auckland-IV [177] and Abilene NLANR [130] trace repositories. This means that flow sampling transforms (q) X(t) into a PCP X (q) (t) with flow rate λF and the same Gi (t). The rate of X (q) (t) reads (q) (q) λ (q) = λF µP = qλF µP = qλ and the original rate can be recovered via 1 λ = λ (p) . . The spectral density of X(t) can be shown to be simply [46] ΓX (ω) = λF ΓG (ω).4 page 32).3 Inverting sampling: practice The previous section was concerned with theoretical inversion methods for two different kinds of thinning.20) Let us now consider the effect of flow thinning a PCP. (5.3. In this section we present a numerical evaluation of these inversion tech- niques. INVERTING SAMPLING: PRACTICE 109 finite mean number µP of packets per flow and a finite intensity.23) q 5. We begin with the packet level statistics in 5.3. The traffic can be considered stationary for the period of time covered by the traces. (5. (5.22) from which the original spectral density can be expressed as 1 (q) ΓX (ω) = ΓX (ω). Let ΓG (ω) be the ‘spectrum’ of Gi (t).5. These two conditions are necessary for X(t) to be stationary [110].3. The passive measurements used to illustrate the thinning methods are presented in ta- ble 3. more precisely the expectation of the modulus squared of the Fourier transform of Gi (t).1 page 44.21) q The spectral density of X (q) (t) reads (q) (q) ΓX (ω) = λF ΓG (ω) = qΓX (ω).d.3. Using the well known indepen- dent splitting property of a Poisson process (theorem 2.2. Results concerning the estimates of first order quantities and their con- fidence intervals for packet sampled traffic can be found in [131] and will not be detailed here. sampling with probability q of the Poisson flow arrival process Y (t) with rate λF is (q) a Poisson process Y (q) (t) with rate λF = qλF .1 before tackling the flow level statistics in 5.

The inversion is incapable of re-inserting the flow dependencies which were weakened by the thinning. the flow based thinning still gives a qualitatively accurate estimate while the inversion technique based on packet thinning is highly inaccurate. as illustrated on figure 5. Since for small q the confidence intervals (q) on the estimation become so large that ΓX (ω) cannot be reliably distinguished from this noise. The problem clearly becomes steadily worse as q drops. When one moves to much smaller values of q however. as figure 5.1 Packet level From equations (5. Despite this strong assumption however. from the form of equation (5.3) and (5. the reconstruction fails to precisely match the small scale behaviour.2 illustrates the inversion methods in the (log) wavelet domain. The thick gray line corresponds to the wavelet spectrum of the original traffic. the spectrum of the full traffic can be recovered from the spectrum and the rate of the thinned traffic for both sampling techniques. The fact that fine details of the spectrum can be reproduced is due to the fact that equation (5. recovery of the spectrum in the case of flow thinning does (q) not suffer from the same drawback as it simply involves multiplying ΓX (ω) by a scale factor (an upward translation on the logarithmic scale of the LD).3) one can see that the original spectrum ΓX (ω) is recovered by measuring the (q) difference between ΓX (ω) and a Poisson noise. the quality of the . The straight line observed over large scales betrays long memory.69) page 35.3. When q is relatively large (q = 0.2(a).2) is valid for any second order stationary point process. This is significant since as link rates increase a trend to ever more aggressive thinning seems likely. Figure 5. This is a direct consequence of our assumption underlying the inversion formula that flows are uncorrelated.23). the spectrum reconstructed from the flow thinned traffic does not match the true spectrum quite as well. essentially the same inversion formulae can be used. Because of the linearity of the relationship between the Fourier and wavelet spectra detailled in equation (2.2(b) shows for q = 0.110 CHAPTER 5. When estimating from data however. because of the scaling properties of network traffic we use a wavelet based estimate of the spectral density. the inverted spectrum clearly reproduces the main features of the true spectrum. In fact. In contrast to the above.001. INVERTING SAMPLED TRAFFIC 5.1). while the vertical lines mark confidence intervals on the spectrum estimate at the different scales. the inversion procedure must fail. On the other hand. While very good at large scales. the spectrum inferred from the packet thinned traffic is remarkably close to the ‘true’ spectrum estimated directly from the full traffic. In fact.

001.2: Spectrum reconstruction: (a) AUCK-d1: Logscale diagrams of the original traffic.016 0. INVERTING SAMPLING: PRACTICE 111 (a) q = 0. the inversion method will therefore lead to an approximately ‘constant’ error. and the two corresponding inverted estimates for the full traffic.3 where inversion based estimates . estimation through the flow thinning inversion method depends mainly on the number of flows N remaining after thinning.5mus 977mus 0. the value of q being largely irrelevant. the estimate recovered from it is far better.031 1 32 Original 35 Packet Thinned Inferred from Packet Thinned 30 Flow Thinned Inferred from Flow Thinned 25 log2 Var( d j ) 20 15 10 5 −14 −12 −10 −8 −6 −4 −2 0 2 4 6 j = log2 ( a ) Figure 5. irrespective of the thinning probability.001. This point is illustrated in figure 5.5.1.3. Despite the fact that the flow thinned traffic is ten times thinner.062 0. (b) IPLS: Logscale diagrams of the original traffic. At constant N. flow thinned traffic with q = 0.1 0.004 0. The top axis marks the timescale in seconds. packet and flow thinned traffic each with q = 0. Flow: q = 0.0001.0001 30. and the two corre- sponding inverted estimates for the full traffic (T0 = 64s). Similar experiments on other traces led to the same conclusions.25 1 4 16 64 256 1024 18 Original Packet Thinned Inferred from Packet Thinned 16 Flow Thinned Inferred from Flow Thinned 14 log Var( d ) j 12 2 10 8 6 −8 −6 −4 −2 0 2 4 6 8 10 12 j = log2 ( a ) (b) Packet: q = 0. packet thinned traffic with q = 0.

The flow definition and timeout value adopted for the full traffic applies without change after sampling. 5. estimating the distribution of the number of packets per flow consists of an (q) estimation of the densities p j for the thinned process.3. we will first assume that we know the distribution of P and can therefore evaluate 2 This LD reconstruction from i. (q) For flow thinning we have already seen that the inversion is trivial. followed by an inversion phase. An estimate of (q) the number of flows NF before thinning is NF = N1 /q. non-stationarities and edge effects make it difficult to accurately estimate the spectrum when the number of remaining flows drops too low (In the case of the traffic used in figure 5. For example. and that this (q) is therefore the probability that a flow has been retained. one should at least replace the timeout value T0 with T0 /q. the quantity p0 cannot be estimated without extra information. we assume in what follows that p0 is known. Another solution proposed by [131] was already mentioned in the introduction. Since the proportion of discarded flows (q) is not automatically observed as it is in flow thinning.d. and N1 the number of observed packets with a SYN flag. Consider the set of such SYN packets.112 CHAPTER 5. For packet thinning however even the first phase is potentially (q) problematic. It is another advantage of flow thinning that problems of this type do not arise. It is clear that the probability that a given SYN packet is retained is also q. the trace was only 10 minutes long resulting in quite strong edge effects). Another important practical issue with packet thinning concerns the consistency of the flow definition before and after thinning. in the spirit of Inmon’s sFlow [91]. However this does not eliminate all problems and extra flows can still be created for some types of applications [131].i. Let NF be the total number of (q) observed flows.3. One can construct an estimate of (q) (q) (q) p0 via p0 = NF /NF . in order to prevent the breakup of flows due to their sparsity after thinning. Assume that each original TCP flow has only one packet with a SYN flag and that it is the first. In practice however. flow thinning also gives a nice semi-experimental result further justifying the flow independence of the BLPP packet traffic model . To clearly evaluate the performance of the thinning inversion techniques in isolation (q) from other issues such as those above. In addition. The simplest solution is to supply the total number NF of flows with the measured sampled traffic. and the p j can be estimated from a histogram. although there were 3 million flows. INVERTING SAMPLED TRAFFIC are given for two values of q at constant N.2 Flow level In general. The near independence of the inversion method with respect to q is a strong argument in favour of flow based sampling for spectral estimation2 . as a knowledge of p0 > 0 is needed.

a. q) reached for n = nmax ( j. q) = pn .3. q) is unimodal with a maximum cmax ( j. that for fixed j and q the function c(n. INVERTING SAMPLING: PRACTICE 113 61mus 977mus 0.1 Inferred from Flow Thinned with q=0. q) = ng .27) 2q − 1 .24).016 0. Scheme 1: analytic continuation We first consider inversion scheme 1 (using analytic continuation) in the case where q > 0.1 (omitted for clarity) are similar to those of q = 0.25 4 64 Original 28 Inferred from Flow Thinned with q=0.4). The quality of the estimation remains roughly unchanged as q varies. j. (i) First the evaluation of the coefficients c(n.5. The mean of H is IE[H] = a−β ζ (β . q) and cmax ( j. · · · . k = 1. q) (5. using equation (5. q). and is therefore fully determined by the tail behaviour. The confidence intervals for q = 0. which we repeat here: ∞ pj = ∑ (−1)n− j c(n. The functions nmax ( j. In fact. (5. β ) = 1 − (ak + 1)−β ∼ 1 − Lk−β . (5. (q) pk numerically from equation (5.25) n= j with n (1 − q)n− j (q)   c(n.01 26 24 log2 Var( d j ) 22 20 18 16 −14 −10 −6 −2 2 6 j = log2 ( a ) Figure 5. 1/a) for β > 1. q) is not entirely straightforward since their magnitudes can become enormous due to the q−n factor. j.5 for which we can calculate p j from equation (5.15). 2.3: Reconstruction of the (log wavelet) spectral density from flow thinning when the number of flows after thinning is constant (N=3000).01. one can show. j. For this purpose we use a simple discrete Pareto-like variable H with distribution FH (k. j. The variance is infinite.24) where a = L−1/β > 0 is a scale parameter. (5. q) can be respectively approximated by qj max ( j.26) j qn There are three main issues with the numerical evaluation of this sum. and decays exponentially fast to zero for large n.

28). a simple dichotomy program will locate jmax very efficiently. Moreover. q) > M. q) the sequence c(n. j.28) where γ = β a−β q−1 (2q − 1)−(β +1) . q) = (2q − 1) − j cg + γ. j.114 CHAPTER 5. jmax = 22 is the smallest value for which cmax ( j. (5. q) are plotted in figure 5. Using typical double precision with 32 bits used for the mantissa of a floating point number (Matlab was used). q) ≤ ε. as q tends to 0. (5.4(a). the absolute error on the partial sum of the alternating series is bounded by the first neglected term. where n0 is the smallest integer larger than nmax ( j.5 from above.30) log 2q−1 An exact analytic expression of jmax can also be found from equation (5. This is a direct consequence of (i): the calculator lacks precision to accurately cancel out the very large coefficients appearing in the sum. (ii) Second. the estimation issue does not apply as exact values are used.29) Values of cmax ( j) and cg max ( j. j. The probability p j can therefore be evaluated with precision ε by summing the first n0 terms. assuming that all the c(n.28) yields the following approximation for jmax : & ' log(c max ) jmax = g 1 . (q) (iii) Finally. the convergence of the series is very fast. where M is the largest floating point number integer jmax such that cg that can be stored by the calculator. Taking the leading term in equation (5. decreasing and tends to zero. Given that the terms c(n. q) can be accurately evaluated. They show that the numerical evaluation of p j fails for j ≥ 22. q) such that c(n0 + 1. q) is positive. This can be done by finding the smallest max ( j. . in practice there are the additional errors from the need to estimate the pn . other computational problems due to the truncation 3 The Lambert W-function is defined as the inverse of the function f (W ) = W exp(W ). Given the exponential increase of cmax ( j). INVERTING SAMPLED TRAFFIC and −j β +1 max ( j. In practice. it is important to understand how many values can be accurately calculated before a loss of precision occurs. Since for n > nmax ( j. but it is rather cumbersome since it involves the Lambert W-function3 . but where nonetheless precision limitations create serious problems.4(b) where the truncation issue has been carefully addressed. q) decay exponentially for large n. j. the sum must be truncated. Numerical results for scheme 1 are presented in figure 5. (5. q) > 232 .

This inversion scheme only fails in this particular case due to the form of the coefficients of the power series. estimates are reliable.5.16) makes these numerical problems dramatically worse. . (b) Packet thinning inversion: Scheme 1: even with no estimation.6 using FP given by equation (5. extend to much greater j. (L = M = 200 and 215 discretization steps for the evaluation of the Cauchy integral). It is important to note that analytic continuation can be successfully applied in practice [28].28) 10 0 1 2 3 10 10 10 10 j (number of packets per flow) (b) 0 10 Theoretical original density Flow thinning Packet thinning: scheme 1 Packet thinning: scheme 2 −2 10 Pr(P=j) −4 10 −6 10 0 1 2 3 10 10 10 10 j (number of packets per flow) Figure 5. Scheme 2: Some improvement at high computational cost. the recursion introduced in equation (5. Flow thin- ning inversion and estimation: starting with 106 flows.24) with a = 1 and β = 1.5. light thinning: Numerical evaluation of the different in- version schemes for q = 0. issue (ii) will occur since nmax will become very large. q). q ) 0 cmax ( j . for q < 0.4: Inversion of the pj .3. q ) from (5. and that there are ways of controlling the truncation error at each iteration [77].5. INVERTING SAMPLING: PRACTICE 115 (a) 10 10 cmax ( j ) 10 5 232 cmax ( j . (a) cmax ( j. the in- version becomes unstable as soon as cmax ( j) cannot be accurately calculated. and degrade gracefully. Finally.

. one should make sure that N will be large enough to allow a sufficiently precise estimation of the distribution tail. However since the particular form of the coefficients (equation (5. In contrast to these packet thinning based inversion (q) schemes. and not on the value of q.5. the quality of the estimation depends essentially on the number of flows N after thinning. provided a low cost and reliable estimation of p j . the quality of the spectrum estimate is poor and deteriorates . are plotted at constant N = 3000. 1].31) (q) (q) where o1 . Again.116 CHAPTER 5. since the estimation of heavy tails is a notoriously difficult problem [11]. the numerical study carried out in this section reveals the following: • The packet sampling technique leads to an excellent reconstruction of the spectrum and a fair estimate of the p j for j up to the order of 50 for q > 0. INVERTING SAMPLED TRAFFIC Scheme 2: Cauchy integral The numerical evaluation of scheme 2 (Padé approximants followed by the Cauchy integral formula) takes us a little further. (5. As expected. the inversion method for packet thinning cannot be used in practice unless ‘infinite precision’ arithmetic is employed. algorithms to recursively compute the coefficients (k) a j based on the analytic continuation idea of scheme 1 can be found in [77] and [28]. the ‘inversion’ from flow thinning. flow based thinning avoids the above problems. In the general case where q ∈ (0. It was found that increasing the degree of the Padé approximants did not significantly im- prove the accuracy of the calculations. This is unlikely to be computationally feasible in a router context.4. but at the price of a fairly intensive numerical evaluation. When the thinning procedure removes more than half of the original packets. However in the useful range q  0.13)) prevents a precise numerical evaluation of even the first step of the recursion. This is shown in figure 5. However.. the method is not applicable here. including the numerical estimation of the pn . The estimates of the pn were made according to the (q) following formula. and just as for the spectrum. are the normalized histogram estimates of the number of packets per flow after flow thinning.5 where the complementary cumulative distribution function (CCDF) of the number of packets per flow on an OC-48 link. whose accuracy dropped gracefully as (q) j increased as seen in figure 5.5. where p0 was assumed known: (q) (q) (q) p j = (1 − p0 )o j for j ≥ 1. despite the fact that both sampling types can theoretically be inverted. o2 . and the estimated CCDFs for three different values of q. In summary. the quality of the estimation of the CCDF is roughly independent of q.

4) presented above for the following two reasons: (i) First.01.5: Inversion of pj . We now compare our inversion technique to recover the distribution of flow size from packet sampled traffic with recently published results [51]. a much easier task than trying to numerically recover each p j . flow thinning is by far superior to packet thinning if one is interested in recovering detailed characteris- tics of the original traffic such as its spectrum or the distribution of flow size. The method proposed in [51] gives a smooth estimate of the histogram of flow sizes for a given traffic sample by using an expectation maximization (EM) technique. It is worth noting at this point that if a parametric family was chosen for FP one could try to estimate its parameters. we have not pursued this here. and estimated CCDFs obtained from flow thinned traffic for q {0.5 without extended precision arithmetic.1 CCDF for q=0.3. 0. • The flow sampling technique gives a reasonable estimate of the spectrum and an ex- cellent estimate of the p j for a large range of thinning probabilities. INVERTING SAMPLING: PRACTICE 117 0 10 Empirical original CCDF CCDF for q=0.d. 0. steadily as q gets smaller and it becomes impossible to evaluate the p j even for small j as soon as q drops below 0. In particular. The quality of the estimation remains unchanged despite the wide variation in q. Since no family has been identified as valid for all Internet flows.5.001 10 Pr(P>j) −2 10 −3 10 −4 10 0 1 2 3 4 5 10 10 10 10 10 10 j (number of packets per flow) Figure 5. we consider the observed flow sizes as i. heavy thinning: Empirical CCDF of the number of packets per flow for the IPLS trace. of the order of 1% or less. This means that this method is fundamentally different from the direct inversion of equation (5. copies of the random variable P .001} while the number of flows after thinning remains constant (N = 3000). for thinning probabilities used in practice.1.i.01 −1 CCDF for q=0.

5.5 plays no theoretical role in this approach.32) is consistent with the general form for the spectral density of a PCP given in equation (5. the EM method gives a smooth estimate of flow size frequencies whereas we aim to recover the exact flow size densities. We expect the results for practical values of q. The EM method therefore gives results for q one order of magnitude smaller than for the direct inversion. the EM does not aim to recover the exact values of the p j . In other words. such as q = 0. (5. and investigate the viability of fitting the model via measurements on thinned data in section 5.32) λA where ΓG (ν) is the spectral density of the stationary renewal process with the same parame- ters as the finite flow renewal process.19).01. the EM algorithm fails to capture the heavy tailed distribution of the original traffic.118 CHAPTER 5. . the aim of [51] is to recover the original frequencies of each observed flow size for a given data set. As seen in [51. but the results are already less satisfactory for smaller q such as q = 0. namely h i ΓG (ν) = λA (1 − ΦA (ω))−1 + (1 − ΦA (−ω))−1 − 1 . but gives instead a histogram that is optimal for a specific metric. This means in particular that the EM can give wrong estimates for even the first values p1 and p2 . fig. We simply recall from equation (4. (ii) Second. (5. This apparent discrepancy between the results of the two methods comes from the fact that the direct inversion is much more demanding than the EM method.2. On the other hand. In particular. We derive the thinning results of the BLPP in section 5. Indeed the EM method averages through the values of p j whereas the direct inversion does not. 4]. (5. INVERTING SAMPLED TRAFFIC and aim to recover the distribution FP .4.4 The Bartlett-Lewis point process In this section we apply both packet and flow thinning to the Bartlett-Lewis point process (BLPP) model presented in detail in chapter 4.001.1.34) (1 − ΦA (ω))2 As expected equation (5. the EM method performs relatively well for ‘large’ values of q such as q = 0.7) page 82 the form of its spectral density ΓX (ν) since it is of particular interest here: µ  P ΓX (ν) = λF ΓG (ν) + SG (ω) + SG (−ω) .33) and ΦA (ω)   SG (ω) = G P (ΦA (ω)) − 1 .4.1. to be even worse. and the value q = 0.

j > 0.4).4. with marginal distribution FP given by (q) equation (5. for example for the dimensioning of backbone links.i. by theorem 2.4 on Poisson splitting the original flow starting points (which may have themselves been thinned) of flows which (q) do not evaporate form a Poisson process O of rate λF . The average flow arrival rate is then reduced to 1−p0 (q) (q) λF = λF (1 − p0 ). Consider such a flow which has sur- vived thinning. There exists a random variable T ≥ 0 giving the time interval between the . thinned process X (q) is another renewal process.4) to obtain a FP with (q) (q) pj (q) densities x j = (q) .3. (5. From section 2. then the i. An i.1 Thinning Bartlett-Lewis point processes If the BLPP model is to be useful in practice. packet thinned Bartlett-Lewis process X (q) is also a Bartlett-Lewis process with parameters: (q) (q) • flow rate: λF = λF (1 − p0 ). packet thinning with (q) probability q. 1 − (1 − q) f˜(s) Proof. Since the flow evap- (q) oration probability p0 acts independently on flows.1. j > 0.i. 1−p0 • density of in-flow packet inter-arrivals: q f˜(s)   −1 f (q) (x) = L .5. in this picture λF is unchanged but some flows may be empty. one needs to be able to measure its parameters from data.4. we must renormalise the p j from equation (5.3. Since p0 = ∑∞j=1 (1 − q) j p j > 0. and x0 = 0. It is therefore of interest to see if it is compatible with either or both of the thinning procedures.d.35) 1 − (1 − q) f˜(s) It follows that each finite ordinary renewal process that constitutes a flow of X will become another ordinary renewal process with the inter-arrival density above provided it has at least 2 points. Let X be a BLPP and X (q) the process resulting from its i.d. In this subsection we derive the properties of thinned BLPPs.i. (q) (q) pj (q) • density of P(q) : x j = (q) . To conform to a convention where a BLPP has zero probability of (q) (q) an empty flow. and x0 = 0.4.d. The remaining property of X (q) to specify is the arrival process of the non-empty thinned (q) flows.d. Theorem 5. The thinned flows are clearly i. We now show that this is in fact a Poisson process with rate λF . with inter- arrival density fq (x) whose Laplace transform f˜q (s) reads q f˜(s) f˜q (s) = .5 page 31 we know that if X is a renewal process with inter-arrival density f (x).i. THE BARTLETT-LEWIS POINT PROCESS 119 5.

where the simple inversion of the- orem 5.2. it is to an excellent approximation only the total number of flows which determines the size of the confidence intervals.i.1 cannot be exploited in practice if q > 0. As this can be viewed as an i.5 How to sample traffic ? Both the IETF working groups IPFIX (Internet Protocol Flow Information Export) [92] and PSAMP (Packet Sampling) [138] advocate the use of packet sampling. Figure 5.4. to the very small values of all the x j except at j = 1 if q is small. 5. which by a well known theorem [46] is another Poisson process of the same rate.2 concerning the recovery of the p j from the x j .4. Theorem 5.6 illustrates the procedure for p = 0. One merely fits the model on the thinned data as one would normally. despite the attractive theoretical properties described above. and the parameter inversion problem.4. ultimately. Proof.2 presents no difficulties. They can be implemented using the Fast Fourier Transform [1].5.d.2 Fitting from thinned data With respect to i. not q. with (q) similar limitations due. In the next section we briefly consider the practical side of the question. INVERTING SAMPLED TRAFFIC original starting point and the first non-thinned point after thinning in that flow. Figure 5.4.2. Moreover. and f (q) (x) = f (x). flow thinned data. even if one could numerically evaluate these. We see that the BLPP model has almost ideal theoretical properties with respect to the interpretation of thinned forms of itself.d. the result follows. there would be another inversion problem to recover the in-flow packet inter-arrival density from its Laplace transform.3. An i.2. For completeness.001.2. translation by T of the points of O.i.i. 5. We turn then to fitting from i. the .d. The remarkable thing about this approach is that we do not need to explicitly invert the more complex in-flow or even flow level statistics. x j = p j . which is worth mentioning in its own right. and as before. This pleasant closure property of a BLPP.i.001.6 shows that the results can be good even for p = 0. The result follows from the discussion at the end of section 5. and then scales up the value of λF . we note that the relevant inversion techniques are also based on Cauchy’s integral formula and are similar to the one presented in section 5. However.120 CHAPTER 5.d. packet thinning.1. The reasons are (q) the same as those stated in section 5. also helps to make the inversion of its parameters analytically tractable. most of theorem 5. flow thinned Bartlett-Lewis process X (q) is also a Bartlett-Lewis (q) (q) process with flow rate λF = qλF .

such as the spectral density of the packet arrival process or distribution of flow length.3. Usage Packet sampling is very useful when one wants to get traffic information at a higher reso- lution than that obtained from the Simple Network Management Protocol (SNMP) coarse- grained counters or active probe data.d. In this section we summarize our main findings and give some indication on how and when each sampling technique should be used. results presented in this chapter indicate that in certain circumstances flow sampling is a much more efficient option.i. and then inverted by simply shifting it vertically. Packet sampling is also well suited for collecting other basic statistics.004 0.5.1 that its performance when recovering higher level statistics. 5. These sampling techniques are not equivalent to the mathematically friendly i. is matched to the BLPP.6: BLPP parameter fitting from flow thinned traffic AUCK-d1 data. . such as source and destination IP addresses. where the sampling method usually keeps one out of N sequential packets. Some line card engines can also sample consecutive packets. However we showed in section 5. packet thinning presented in section 5.016 0. thinned with p = 0. The inversion compares well with the original data showing that the model can be successfully fitted from thinned data. For instance. is very poor.001. packet sampling will perform very well to estimate the average packet rate [131]. route prefixes and au- tonomous system numbers.5. the theoretical spectrum calculated.25 1 4 16 64 256 1024 20 Original BLPP matched to Original Flow Thinned BLPP matched to Flow Thinned 15 BLPP reconstructed from Thinned log2 Var( d j ) 10 5 0 −8 −6 −4 −2 0 2 4 6 8 10 12 j = log2 ( a ) Figure 5. HOW TO SAMPLE TRAFFIC ? 121 0. but have the advantage of requiring very low computations.2.5.062 0. The same fitting procedure applied to the full traffic is also shown.1 Packet sampling Implementation Packet sampling is widely implemented in today’s routers.

For instance.2 Flow sampling Implementation Today’s routers do not implement any flow sampling strategy. An exception to this is the asymptotic tail which can be recovered by a different technique . Two kinds of sampling were used. flow sampling.i. packet sampling. with a given probability q of retaining a packet or flow respectively. then packet sampling can be fundamentally unsuitable. or characterizing certain network attacks[51]. flow sampling should be the preferred sampling method whenever possible.01 or smaller. all packets have to be grouped into flows before they can be processed or discarded. However this might not be such a drawback if new flow classification techniques. is essentially the same for both packet and flow sampling. such as q = 0.122 CHAPTER 5. for a given q. This involves more computation and more memory if one uses the traditional hash table approach with one entry per flow. in the case of packet thinning. i. In each case. and become much worse as q becomes smaller still.6 Conclusion We have explored in detail the question of recovering the spectrum and the distribution of the number of packets per flow of the packet arrival process.i. we showed how the inversion methods were of little to no use in practice for q small enough to be truly useful. However. INVERTING SAMPLED TRAFFIC 5. for flow thinning. such as the distribution of number of packets per flow. 5. exact theoretical inversion techniques were derived.d. In fact. Usage The fundamental point of this chapter is that when one is interested in recovering more detailed information about packet traffic beyond first order statistics. knowing the flow length is very valuable when deploying web proxies [63]. Flow sampling also has the combined advantage of avoiding the problem of flow splitting and having a preci- sion that depends only on the number of remaining flows after thinning.d. In other ways it has ideal scaling with respect to sampling rate.5. such as bitmap algorithms [57] or Bloom filters [101]. setting up connection thresholds in flow-switched net- works [59]. For all these applications. The higher cost of implementing flow sampling has to be compared with the very high cost or near impossibility of recovering certain statistics. whereas flow sampling can perform very well. and i. can be applied instead. from sampled data. This is because flow sampling does require more overhead than packet sampling. The total number of packets stored. from packet sampling. not on the thinning probability q.

which could be set in practice depending on memory and computational limitations. However. One path worth exploring would be to incorporate con- straints such as a heavy tail flow length distribution in the inversion step. CONCLUSION 123 (although in practice it remains a very difficult problem). and so cannot capture all aspects of the traffic. In sharp contrast. the inversion step does assume flow independence. as in [51]. whereas packet thinning based methods can provided q is large enough. Although this chapter concludes that packet sampling is not appropriate to recover the flow length distribution in practice. whereas inversion based on fitting from flow based thinning performs well. . We also investigated the fitting of a useful type of cluster model describing packet ar- rivals. This could be considered in the maximum likelihood context. The performance of inversion methods based on flow thinning does not deteriorate with q but depends essentially on the number of retained flows. it avoids these problems entirely and the inversion is trivial. In practice however. we have not proven that different inversion techniques could not lead to better results. this may not be important.5. for backbone links where there is strong evidence that dependence between flows is weak. or in analytic continuation ap- proaches [49]. However. as flow thinning preserves flows intact but simply reduces their number.6. It was shown that the model class is closed under both kinds of thinning and that exact inversion is theoretically possible. again inversion based on packet thinned packet is not feasible for realistic values of q.

.

A fundamental building block of the path delay experienced by packets in IP net- works is the delay incurred when passing through a single IP router. who may have Service Level Agreements (SLAs) specifying allowable values of delay statistics across the domains they control. All input and output links1 were monitored. However. This is not a typical operating scenario. However since the router only had one input and one output link. and in particular it led to the through-router delays being extremely low. In [177] single hop delays were also obtained for a router. which were of the same speed. both from the network operator and application performance points of view. 1 with one negligible exception. Although there have been many studies examining delay statistics measured at the edges of the network. on some links. only samples of the delays experienced by packets. we now examine ‘through-router’ delays and investigate the detailled mechanisms of a router to answer ques- tion (iii). An important component of this delay is the time for packets to traverse the different forwarding elements along the path. In [141] an analysis of single hop delay on an IP backbone network was presented.Chapter 6 Bridging router performance and queuing theory 6.1 Introduction After having investigated questions (i) and (ii) in the previous chapters. In this chapter we work from a data set recording all IP packets traversing a Tier-1 access router over a 13 hour period. 125 . End-to-end packet delay is an important metric to measure in networks. since the measurements were limited to a subset of the router interfaces. were identified. and different delay components were isolated. the internal queueing was extremely limited. This is particularly important for network providers. very few have been able to report with any degree of authority on what actually occurs at switching elements.

and yet are rare in today’s backbone IP networks. and reporting them back via SNMP. We are able to confirm in a detailed way the prevailing assumption that the bottleneck of such an architecture is in the output queues.3 to ρ = 0. The third contribution of the chapter is to combine the insights from the data. close to the limits of timestamping precision. of delays on a subset of links which experienced significant congestion: mean utilisation levels on the target output link ranged from ρ = 0. From a measurement point of view. We base all our analysis on empirical results and do not make any assumptions on traffic statistics or router functionalities. We go further to provide two refinements to the simple queue idea which lead to a model with excellent accuracy. to address the question of how delay statistics can be most ef- fectively summarised and reported. and temporal structure. the existing Simple Network Management Protocol (SNMP) focuses on reporting utilisation statistics rather than delay. Our second aim is to use the completeness of the data as a tool to investigate how packet delays occur inside the router. performed in software on a very small subset of packets.7. It only imperfectly takes account of the much rarer control functions. and propose a new approach based on direct reporting of queue level statistics. This is practically feasible as buffer levels are already made available to active queue management schemes imple- mented in modern routers (note however that active management was switched off in the router under study). We propose a computationally feasible way of recording the structure of congestion episodes. the connection between the two is com- plex and strongly dependent on the structure of traffic arriving to the router. We explain why trying to infer delay from utilisation is in fact fundamentally flawed. and justify the commonly used fluid output queue model for the router. We explain why the model should be robust to many details of the architecture. High utilisation scenarios with significant delays are of the most interest. in other words to provide a physical model of the router delay performance. The statistics we select are rich enough to allow detailed metrics of congestion behaviour to be estimated with reasonable . performed at the hardware level for every IP datagram. this chapter provides the most comprehensive picture of end-to-end router delay performance that we are aware of. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY allowing a complete picture of through-router delays to be obtained. Although it is possible to gain insight into the duration and amplitude of congestion episodes through a multi-scale approach to utilisation reporting [140]. The model focuses on datapath functions.126 CHAPTER 6. and sim- plications from the model. The first aim of this chapter is to exploit the unique certainty provided by the data set by reporting in detail on the actual magnitudes. For this purpose we first position ourselves in the context of the popular store & forward router architectures with Virtual Output Queues (VOQs) at the input links [124]. Currently.

2. FULL ROUTER MONITORING 127 accuracy. The router measurements are presented in sec- tion 6. A typical datapath followed by a packet crossing the router is as follows. The packet is stored in the appropriate queue of the input interface where it is decomposed into fixed length cells. A key advantage is that a generically rich description is reported. i.6.3. Virtual Output Queuing means that each input interface has a separate First In First Out (FIFO) queue dedicated to each output interface. before describing our passive measurement infrastructure. and then recall relevant physical considerations of the SONET link layer. and analyzed in section 6.2. where the methodology and sources of error are de- scribed in detail. Router architecture As mentioned in the introduction. 6. and illustrate how such measurements can be exploited. and reassembled before being handed to the output link scheduler. The router is essentially composed of a switching fabric controlled by a centralized scheduler. The chapter is organized as follows. In section 6. When the packet reaches the head of line it is transmitted through the switching fabric cell by cell (possibly interleaved with competing cells from VOQ’s at other input interfaces dedicated to the same output interface) to its output interface. In section 6.4 we construct and justify the router model. We then describe how to report the statistics with low bandwidth requirements. This does not occur however until the packet completely leaves the input link and fully arrives in the linecard’s memory.5 we define congestion episodes and show how important details of their structure can be captured in a simple way. our router is of the store & forward type. When a packet arrives at the input link of a linecard. and interfaces or linecards. Details of such an architecture can be found in [124]. present our experiment setup to monitor a full router. and detail how packets from different traces are matched. . its destination address is looked up in the forwarding table.2 Full router monitoring In this section we describe the hardware involved in the passive measurements. the ‘store’ part of store & forward. and implements Virtual Output Queues (VOQ). measure its accu- racy and discuss the nature of residual errors. Each linecard controls two links: one input and one output.2. 6.1 Hardware considerations We first give the most pertinent features of the architecture of the router we monitor. without the need for any traffic assumptions.e.

76 Mbps. In this case the IP bandwidth is (49. also called the IP bandwidth. is in fact 49. the bandwidth accessible to the transport protocol. 192}) is achieved by merging n basic frames into a single larger frame.92 ∗ n) Mbps. a path overhead of 3 bytes and an effective payload of 780 bytes. 12.4. Layer overheads Each interface on the router uses the High Level Data Link Control (HDLC) protocol as a transport layer to carry IP datagrams over a Synchronous Optical NETwork (SONET) phys- ical layer. In the above description the packet might be queued both at the input interface and the output link scheduler. These layer overheads mean that in terms of queuing behaviour. Timestamping of PoS packets As already mentioned in section 3.1. A basic SONET OC- 1 frame contains 810 bytes and is repeated with a 8kHz frequency.76 Mbps. The importance of these seemingly technical points will be demonstrated in section 6.2 cards to monitor OC-3c and OC-12c links. The packet might then experience queuing before being serialised without interruption onto the output link. This yields a nominal bandwidth of 51. The first level of encapsulation is the SONET framing mechanism. and DAG 4. 48. This protocol adds 5 bytes before and 4 bytes after each IP datagram. For instance the IP bandwidth of an OC-3 link is exactly 149.2. irrespective of the SONET interface speed [163]. an IP datagram of size b bytes carried over an OC-3 link should be considered as a b + 9 byte packet transmitted at 149. The second level of encapsulation is the HDLC transport layer. Since each SONET frame is divided into a transport overhead of 27 bytes. the ‘forward’ part of store & forward. We use DAG 3. and sending it at the same 8kHz rate. Packet over SONET (PoS) is a popular choice to carry IP packets in high speed networks because it provides a more efficient link layer than IP over ATM. . We now detail the calculation of the bandwidth available to IP datagrams encapsulated with HDLC over SONET.128 CHAPTER 6. all measurements presented in this thesis are made using high performance passive monitoring ‘DAG’ cards [44].92 Mbps. and faster fail- ure detection than broadcast technologies. However in practice the switch fabric is overprovisioned and therefore very little queueing should be expected at the input queues.84Mbps.11 cards to monitor OC-48 links on the router. OC-n bandwidth (with n ∈ {3.e. In queuing terminology it is ‘served’ at a rate equal to the bandwidth of the output link. and the output process is of fluid type because the packet flows out gradually instead of leaving in an instant. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY i.

61% 0. Adding errors due to potential GPS synchronization problems between different DAG cards leads to a worst case error of 6µs [67].109 0.129 0.95% of all traffic flowing through it.87% 0.004 BB1-out 808319378 53 99. timestamps on OC-3 links have a worst case precision of 2. However.2µs on an OC-3 link [48].2. In fact.005 0.008 C1-out 103211197 3 99.93% 0.84% 0. They look past the PoS encapsulation (in this case HDLC) to consistently timestamp each IP datagram after the first (32 bit) word has arrived.2 cards are based on a design dedicated to ATM measurement and therefore op- erate with 53 byte chunks corresponding to the length of an ATM cell. in complement of the traces presented in table 3.008 C4-in 342414216 36 99.79% 0.045 0.006 C2-out 735717147 77 99.038 0. DAG 3.1 page 3. The PoS timestamp- ing functionality was added at a later stage without altering the original 53 byte process- ing scheme.60% 0.011 0.023 C1-in 133293630 15 99. between 03:30 – 16:30 UTC.84% 0.249 0.066 0.81% 0. 14 2003.084 0. Six interfaces of the router were monitored.001 C3-out 382732458 64 99.2.050 0.001 C2-in 1479788404 70 99. This mechanism can cause errors of up to 2.1: Trace details: Each was collected on Aug.014 BB2-in 1143729157 80 99. This number should be kept in mind when we assess our router model performance.008 Table 6. FULL ROUTER MONITORING 129 Average Matched Duplicate Link # packets rate packets packets Router traffic (Mbps) (% total traffic) (% total traffic) (% total traffic) BB1-in 817883374 83 99. accounting for more than 99.2 Experimental setup The data analyzed in this chapter was collected in August 2003 at a gateway router of the Sprint IP backbone network. 6.76% 0. As a direct consequence of the characteristics of the measurement cards. significant timestamping errors occur.98% 0. and constitutes the second set of empirical data used in this thesis. since PoS frames are not aligned with the 53 byte divisions of the PoS stream operated by the DAG card. The cards use different technologies to timestamp PoS packets. DAG 4.2µs. a timestamp is generated when a new SONET frame is detected within a 53 byte chunk.001 C3-in 16263 0.003 N/A N/A N/A C4-out 480635952 20 99.009 BB2-out 882107803 69 99.11 cards are dedicated to PoS measurement and do not suffer from the above limitations. .1.155 0.6.74% 0.

20 bytes for the IP header and the first 24 bytes of the IP payload.130 CHAPTER 6. 6. the records corresponding to the same packet appearing at different interfaces at different times. 12 bytes for control and PoS headers. Each DAG card is synchronized with the same GPS signal and outputs a fixed length 64 byte record for each packet on the monitored link. The experimental setup is illustrated in figure 6. but the packet matching program can also accommodate multi-hop situations. while the other four connect customer links: two trans-pacific OC-3 linecards to Asia (C2 and C3). In our case the records all relate to a single router.3 Packet matching The next step after the trace collection is the packet matching procedure.1: Experimental setup: gateway router with 12 synchronized DAG cards. one OC-3 (C1) and one OC-12 (C4) linecard to domestic customers.3 billion IP packets or 3 Tera Bytes of traffic. SONET or Ethernet).1. In our case all the IP packets are PoS packets. The details of the record depend on the link type (ATM. representing more than 7. across all the traces. Two of the interfaces are OC-48 linecards connecting to two backbone routers (BB1 and BB2). It consists in identifying. A small link carrying less than 5 packets per second was not monitored for technical reasons.2. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY monitor monitor monitor monitor monitor monitor in out BB1 out OC48 OC3 in C1 out OC3 in C2 GPS clock out signal OC3 in C3 in out BB2 out OC48 OC12 in C4 monitor monitor monitor monitor monitor monitor Figure 6. We captured 13 hours of mutually synchronized traces. and each 64 byte record consists of 8 bytes for the timestamp. The DAG cards are located physically close enough to the router so that the time taken by packets to go between them can be neglected. We describe .

the padded bytes are not included in the hash function. Matching packets is computationally intensive2 and demanding in terms of storage: the total size of the result files rivals that of the raw data. Apart from duplicate packets. for output link C2-out four files are created. Since the router did not drop a single packet over the 13 hours. The hash function is based on the CRC algorithm and uses the IP source and destination ad- dresses. a record of the input and output timestamps as well as the 44 byte PoS payload is produced. and illustrate it in the specific case of the customer link C2-out. Sometimes two packets from the same link hash to the same key because they are identical: these packets are duplicate packets generated by the physical layer [146]. BB2-in.6. We match identical packets coming in and out of the router by using a hash table. Assume that the matching algorithm has determined that the mth packet of output link Λ j corresponds to the nth packet of input link λi . Gianluca Iannacone and Tao Ye from Sprint Advanced Technology Laboratories. They can create ambiguities in the matching process and are therefore discarded. 2 Thepacket matching code was designed and written by Konstantina Papagiannaki. unmatched packets comprise packets going to or coming from the small unmonitored link. For each output link of the router.2. This can be formalized by a matching function M . Packets that can not be matched are not considered part of the domain of definition of M . Our matching algorithm uses a sliding window over all the synchronized traces in parallel to match packets hashing to the same key. There could also be unmatched packets due to packet drops at the router. and in most cases the full 24 byte IP header data part. the DAG card uses a padding technique to extend the record length to 64 bytes. . no such packets were found. For instance. When two packets from two different links are matched. corresponding to the packets coming respectively from BB1-in. Our methodology follows [141]. In fact when a packet size is less than 44 bytes. (6. n). Since different models of DAG cards use different padding content. C1-in and C4-in (the input link C3-in has virtually no traffic and is discarded by the matching algorithm). obeying M (Λ j .1) The matching procedure effectively defines this function for all packets over all output links. m) = (λi . however their frequency is monitored. the packet matching program creates one file of matched packets per contributing input link. FULL ROUTER MONITORING 131 below the matching procedure. All the packets on a link for which no match could be found were carefully analyzed. or with source or destination at the router interfaces themselves. the IP header identification number.

BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY Set Link # Matched packets % traffic on C2-out C4 in 215987 0. The breakdown of traffic according to packet origin shows that the contributions of the two incoming backbone links are roughly similar. Table 6.89% C2 out 735236757 99.2(a) gives an idea of how congested the link might be.2. seen largely as a ‘black box’. This is illustrated in figure 6. The packet matching results for the customer link C2-out are detailed in table 6.03% C1 in 70376 0. and examine delays in greater detail. the utilization in packets per second is important from a packet tracking perspective. Since the matching procedure is a per packet mechanism.01% BB1 in 345796622 47. It is therefore the best candidate for observing queuing behaviour within the router. The point of view is that of looking from the outside of the router.93% Table 6.98%. We start by carefully defining the system under study. The percentage of matched packets is at least 99.6% on each link.2(b) illustrates the fact that roughly all packets are matched: the sum of the input traffic is almost indistinguishable from the output packet count. 100% could not be attained because of router generated packets. In fact. C2-out receives most of its packets from the two OC-48 backbone links BB1-in and BB2-in.93% of the packets can be successfully traced back to packets entering the router. This is the result of the Equal Cost Multi Path [86] policy deployed in the network when packets may follow more than one path to the same destination.2. and as high as 99. which represents roughly 0. In the remainder of the chapter we focus on link C2-out because it is the most highly utilized link. While the utilization in Mbps in figure 6.00% BB2 in 389153772 52.01% of all traffic. For this link. .2: Breakdown of packet matching for output link C2-out. figure 6. showing convincingly that almost all packets are matched. even if there were no duplicate packets and if ab- solutely all packets were monitored.2 where the utilization of C2-out across the full 13 hours is plotted. 99.132 CHAPTER 6.3 Preliminary delay analysis In this section we analyze the data obtained from the packet matching procedure. and we concentrate on simple statistics. In the next section we begin to look inside the router. and then present the statistics of the delays experienced by packets crossing it.1 summarizes the results of the matching procedure. In fact. 6. and is fed by two higher capacity links.

PRELIMINARY DELAY ANALYSIS 133 (a) 110 Total output C2−out input BB1−in to C2−out 100 input BB2−in to C2−out Total input 90 Link Utilization (Mbps) 80 70 60 50 40 30 20 06:00 09:00 12:00 15:00 Time of day (HH:MM UTC) (b) 22 Total output C2−out input BB1−in to C2−out 20 input BB2−in to C2−out Total input 18 Link Utilization (kpps) 16 14 12 10 8 6 4 06:00 09:00 12:00 15:00 Time of day (HH:MM UTC) Figure 6. 6. n).3.2: Utilization for link C2-out in (a): Megabit per second (Mbps) and (b): kilo packet per second (kpps).2. As the DAG cards are physically close to the router. packets are timestamped differently depending on the measurement hardware involved. because. one might think to define the through-router de- lay as t(Λ j . n). this would amount to defining the router ‘system’ in a somewhat arbitrary way. m).6. where we would like individual router delays to add. arrival and departure times of a packet should be measured .3. However. and later on the outgoing interface at time t(Λ j . For self-consistency and extensibility to a multi-hop scenario. Furthermore there are sev- eral other disadvantages to such a definition.1): the mth packet of output link Λ j corresponds to the nth packet of input link λi . as we showed in section 6. The DAG timestamps an IP packet on the incoming in- terface side as t(λi . leading us to suggest the following alternative.1.1 System definition Recall the notation from equation (6. m) − t(λi .

no action (for example the forwarding decision) is performed until the packet has fully entered the router. With this convention. m) + 8(Lm − H(Λ j ))/Θ j These definitions are displayed schematically in figure 6. This buffer is the place where the packets are stored when they reach the input linecard. (2) Again as a store and forward router. and let θi and Θ j be the corresponding link bandwidths in bits per second. It is therefore appropriate to consider that the packet has left the router when it completes its service at the output queue. The snapshots are: (a): the packet is timestamped by the DAG card monitoring the input interface at time t(λi . We now establish the precise relationships between the DAG timestamps defined earlier and the time instants τ(λi . n) of arrival and τ(Λ j .2) τ(Λ j . It is natural to focus on the end of the (IP) packet for two reasons: (1) as a store & forward router. For a given link λi . This is a component which is already understood since its service rate is the same as the bandwidth of the incoming link. but not the link direction. the output queue is the most important component to describe. m) of departure of a given packet to the system as just defined. = b if λi is an OC-3 or OC-12 link.3. 53) to account for the ATM based discretisation described earlier. which is the part of the router which we study. but not yet the system.134 CHAPTER 6. at which point it has already entered the router. Therefore it does not have to be modelled or measured. n). H is defined as H(λi ) = 4 if λi is an OC-48 link. and is not exactly the same as the physical router as it excises the input buffer. We denote by H the function giving the depth of bytes into the IP packet where the DAG timestamps it. n) = t(λi . n) + 8(ln − H(λi ))/θi (6. H is a function of the link speed. n). (b): it has finished entering the router (arrives at the system) at time τ(λi . Denote by ln = Lm the size of the packet in bytes when indexed on links λi and Λ j respectively. that is when it has completely exited the router. Defining the system in this way can be compared with choosing the most practical coordinate system to solve a given problem. a packet has entered the system when its last bit has been serviced by the input queue. We can now derive the desired system arrival and departure event times as: τ(λi . The arrival and departure instants in fact define the ‘system’. where we take b to be a uniformly distributed integer between 0 and min(ln . Thus the input buffer can be considered as part of the input link. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY consistently using the same bit. m) = t(Λ j . and (c): is timestamped by the . and not part of the system.

n) time λ i . PRELIMINARY DELAY ANALYSIS 135 λ i .3. Finally (d): it fully exits the router (and system) at time τ(Λ j . but with the added certainty gained from not needing to address the sampling issues caused by unobservable packets on the input side. DAG at the output interface at time t(Λ j . m).3.6.2 Delay statistics A thorough analysis of single hop delays was presented in [141]. Here we follow a similar methodology and obtain comparable results.Θ j t(λ i . m) − τ(λi .Λ j (m) = τ(Λ j . With the above notations. m).3: Four snapshots of a packet crossing the router. .3) To simplify notations we shorten this to d(m) in what follows.Θ j τ(λ i .Θ j τ(Λ j.θ i (c) Λj .m) time Figure 6. the through-system delay experienced by packet m on link Λ j is defined as dλi . 6.n) time λ i . (6.Θ j t(Λ j . n).θ i (a) Λj .θ i (d) Λj .θ i (b) Λj .m) time λ i .

a packet size dependent component which is already understood. To investigate this here we compute the ‘excess’ minimum delay experienced by packets of different sizes. All the spikes above 10 ms have been individually studied. namely the time interval between beginning and completing entry to the router at the input interface. Figure 6.136 CHAPTER 6.2.5 1 06:00:00 09:00:00 12:00:00 15:00:00 Time of day (HH:MM UTC) Figure 6. mean and maximum delay experienced by packets go- ing from input link BB1-in to output link C2-out over consecutive 1 minute intervals. for every packet size L we compute ∆λi .5 3 2.Λ j (L) = min{dλi . The fluctuations in the mean delay follow roughly the changes in the link utilization presented in figure 6. The maximum delay value has a noisy component with similar vari- ations to the mean. as well as a spiky component. representing less than 0. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY BB1−in to C2−out 5. The .5 2 1. there is a constant minimum delay across time. (6. up to timestamping pre- cision. This explains why they take significantly longer to cross the router.5 4 log10 Delay (us) 3. that is not including their transmission time on the output link. Formally. In any router architecture it is likely that many components of delay will be proportional to packet size.0001% of all packets.5 min mean 5 max 4.Λ j (L) for packets going from BB1-in to C2-out.Λ j (m) − 8lm /Θ j |lm = L}.5 shows the values of ∆λi .4 shows the minimum. All delays above 10ms are due to option packets.4: Packet delays from BB1-in to C2-out. Figure 6. while all other packets are processed with dedicated hardware on the so-called ‘fast path’. This is certainly the case for store & forward routers. The analysis revealed that they are caused by IP packets carrying op- tions. Option packets take different paths through the router since they are processed through software.4) m Note that our definition of arrival time to the system conveniently excludes another packet size dependent component. As observed in [141]. as discussed in [95].

transmit it across the switch fabric and reassemble it (each being packet size dependent operations). This could be due to the time it takes for a non full cell to be padded with random bytes before being sent through the switching fabric. transmitted through the backplane cell by cell.6. IP packet sizes observed varied between 28 and 1500 bytes. that at least one packet encountered no contention on its way to the output queue and no packet in the output queue when it arrived there. In other words. This enables us to find a physically meaningful model which can be used both to understand and predict the end-to-end system delay very accurately. and finally to deliver it to the appropriate output queue. MODELLING 137 40 Minimum Router Transit Time ( µs ) 35 30 25 20 0 500 1000 1500 packet size (bytes) Figure 6.4. We assume (for each size) that the minimum value found across 13 hours corresponds to the true minimum. to divide the packet into cells. which means that a full cell is transmitted faster than a nearly empty one. A given number of cells can therefore correspond to a contiguous range of packet sizes with the same minimum transit time. i. . This means that the excess minimum delay corresponds to the time taken to make a forwarding decision (not packet size dependent). 6. The step like curve means that there exist ranges of packet sizes with the same minimum transit time. it appears that the steps each have a downward slopes. and reassembled. we assume that the system was empty from the point of view of this input-output pair.e.5: Measured minimum excess system transit times from BB1-in to C2-out. This is consistent with the fact that each packet is divided into fixed length cells. Visually.4 Modelling We are now in a position to exploit the completeness of the data set to look inside the system.

however it is not immediately obvious how to incorporate the minimum delay property in a sensible way. and architecture dependent.5) µ The waiting time of the next packet (i + 1) to enter the system can be expressed by the following recursion: li Wi+1 = [Wi + − (ti+1 − ti )]+ . which is a function of the rate of the output interface and the occupancy of the queue. (6. so the system time. and let ti be the arrival time to the system of packet i of size li bytes. It is therefore the natural mathematical quantity to consider when studying delay. (6.7) µ We denote by U(t) the amount of unfinished work at time t. that is the total amount of time spent in the system.1 The fluid queue We first recall some basic properties of FIFO queues that will be central in what follows. The service time of packet i is simply li /µ.. but it leaves progressively as it is served (modelling the output serialisation). Consider a FIFO queue with a single server of deterministic service rate µ.138 CHAPTER 6.4. We assume that the entire packet arrives instantaneously (which models a fast transfer across the switch). . and the delay corresponding to the time spent in the output buffer. for the system to completely drain. Assume for instance that the router has N input links λ1 . Let Wi be the length of time packet i waits before being served. interface.2 A simple router model The delay analysis of section 6. The service time of packet i + 1 reads li+1 Si+1 = [Si − (ti+1 − ti )]+ + .3 revealed two main features of the system delay which should be taken into account in a model: the minimum delay experienced by a packet. that is the time it would take.6) µ where [x]+ = max(x. is li Si = Wi + . λN contributing to a given output link Λ j and that a packet of size l arriving on link λi experiences at least the minimum possible delay ∆λi .4. Thus it is a fluid queue at the output but not at the input. Nonetheless we will for convenience refer to it as the ‘fluid queue’. Note that it is defined at all real times t.. The unfinished work at the instant following the arrival of packet i is nothing other than the end-to-end delay that that packet will experience across the queuing system. 0). 6. A representation of this . with no further inputs.Λ j (l) before being transferred to the output buffer. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY 6.. The delay across the output buffer could by itself be modelled by the fluid queue as described above. which is size. (6.

Suppose that a packet of size l enters the system at time t + and that the amount of unfinished work in the system at time t − was U(t − ) > ∆(l). situation is given in figure 6. Our first problem is that given different technologies on different interfaces. (6. First we assume that the minimum delays are identical across all input interfaces: a packet of size l arriving on link λi and leaving the router on link Λ j now experiences an excess minimum delay ∆Λ j (l) = min{∆λi . Second.Λ j are not necessarily identical. in two ways. the potentially complex interactions between packets which do not experience the minimum excess delay but some larger value due to contention in the router arising from cross traffic.Λ j (l)}.. we assume that the multiplexing of the different input streams takes place before the packets experience their minimum delay. The following two scenarios produce the same total delay: (i) the packet experiences a delay ∆(l). or . nor to take into account.. we effectively ignore all complex interactions between the input streams.4. We address this by in fact simplifying the picture still further. We will justify these simplifications a posteriori in section 6. is shown in figure 6.6(b). which is in fact the model we propose. In doing so.6: Router mechanisms: (a) Simple conceptual picture including VOQs. The second is that we do not know how to measure.3 where the comparison with measurement shows that the model is remarkably accurate. We now explain why we can expect this accuracy to be robust. then reaches the output queue and waits U(t) − ∆(l) > 0 before being served.4.. the functions ∆λ1 . ∆λn . MODELLING 139 (a) Δ1 N inputs ΔN (b) N inputs Δ Figure 6.8) i In the following we drop the subscript Λ j to ease the notation. Our highly simplified picture. .6.6(a). By this we mean that we preserve the order of their arrival times and consider them to enter a single FIFO input buffer. (b) Actual model with a single common minimum delay.Λ j .

the output buffer will be empty by the time packet k + 1 reaches it after having waited ∆(lk+1 ) in the first stage of the model. This implies that no matter how complicated the front end of the router is. In other words.11) µ + Suppose now that packet k + 1 of size lk+1 enters the system at time tk+1 and that the − − amount of unfinished work in the system at time tk+1 is such that 0 < U(tk+1 ) < ∆(lk+1 ). Once the system is busy.10) Therefore.10) is verified. it behaves exactly like a simple fluid queue. this robustness is the main motivation for the model. (6. The errors made through this approximation will be strongly concentrated on packets with very small delays.140 CHAPTER 6. The service time of packet k + 1 therefore reads lk+1 Sk+1 = ∆(lk+1 ) + .e. t1 + ∆(l1 ) < t0 + S0 . µ The same recursion holds for successive packets k and k + 1 as long as the amount of unfin- ished work in the queue remains above ∆(lk+1 ) when packet k + 1 enters the system: tk+1 + ∆(lk+1 ) < tk + Sk . In this case. (6.e at t0 + S0 . BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY (ii) the packet reaches the output queue straight away and has to wait U(t) before being served. the fact that the packet should wait ∆(l) before reaching the output queue can be neglected. Its service time is l0 /µ and therefore its total system time is l0 S0 = ∆(l0 ) + .9) µ Suppose a second packet enters the system at time t1 and reaches the output queue before the first packet has finished being served. It waits ∆(l0 ) before reaching the empty output queue where it immediately starts being served. A system equation for our two stage model can be derived as follows. i. Apart from its simplicity. the system times of successive packets are obtained by the same recursion as for the case of a busy fluid queue: lk+1 Sk+1 = Sk − (tk+1 − tk ) + . whereas the more important medium to large delays will be faithfully reproduced. as long as there is more than an amount ∆(l) of work in the queue when a packet of size l enters the system. i. It will start being served when packet k0 leaves the system. Assume that the system is empty at time t0− and that packet k0 of size l0 enters the system at time t0+ . (6. (6.12) µ . as long as equation (6. one can simply neglect it when the output queue is sufficiently busy. Its system time will therefore be: l1 S1 = S0 − (t1 − t0 ) + .

The black dots represent the actual measured delays for the corresponding input packets. A crucial point to note here is that in this situation.8 shows the system times experienced by incoming packets.7: Comparisons of measured and predicted delays on link C2-out: Grey line: un- finished work U(t) in the system according to the model.2 0.6.8 time ( ms ) time ( ms ) Figure 6.6 0. In practice the queue state can only be measured when a packet enters the system. This is also true of the actual router. Figure 6. The model delays are obtained by multiplexing the traffic streams BB1-in to C2-out and BB2-in to C2-out and feeding the resulting packet train to the model in an exact trace driven ‘simulation’. MODELLING 141 450 160 data data model 400 model 140 350 120 queue size ( µs ) queue size ( µs ) 300 100 250 80 200 60 150 40 100 20 50 0 0 0 0. the system is idle until the arrival of the next packet.8 1 1. The process U(t) is a right continuous jump process where each jump marks the arrival time of a new packet. It is a reasonable assumption since it is quite common for a line card to be able to accommodate up to 500 ms worth of traffic. that is its delay. we focus on a set of busy periods on link C2-out involving 510 packets all together. In order to see the limitations of our model.7 0.4 0.3 Evaluation We now evaluate our model and compare its results with empirical delay measurements.1 0. and agreement between the two seems very good. The top plot of figure 6.3 0. The resultant new local maximum is the time taken by the newly arrived packet to cross the system. Once the queue has drained.7 shows two sample paths of the unfinished work U(t) corresponding to two fragments of real traffic destined to C2-out. The . In this brief analysis.4.4 0. Black dots: measured delay value for each packet. Thus the black dots can be thought of samples of U(t) obtained from measurements.4.5 0. the output queue can be empty but the system still busy with a packet waiting in the front end. both from the model and from measurements. The time between the arrival of a packet to the empty system and the time when the system becomes empty again defines a system busy period. we have assumed an infinite buffer size. 6.2 0 0.2 0.6 0.

There are three main points one can make about the model accuracy. that is the difference between measured and modeled delays at each packet arrival time.8 shows the error of our model. and is lost in measurement noise for most busy periods. as shown by the spiky behaviour of the error plot. This is due to the fact that our queuing model drains slightly faster than the real queue. The model presented above has some limitations. as well as control PoS packets. This is much more likely to happen when the links are busy. However in practice local reordering can happen when a large packet arrives at the system on one interface just before a small packet on another interface. option packets experience a much larger delay before reaching the output buffer. the absolute error is within 30µs of the measured delays for almost all packets. The lower plot in figure 6. First. These packets are not accounted for in the model. These spikes are due to a local reordering of packets inside the router that is not captured by our model. This means that packets exit our system in the exact same order as they entered it. but as far as the model is concerned.6(b) that we made the simplifying assumption that the multiplexing of the input streams takes place before the packets experience their minimum delay.8 which shows that spikes always happen when the queuing delays are increasing. plotted on the same time axis as the upper plot. Recall from figure 6. Once again. but also the ‘unmatched’ packets generated by the router itself. We could not confirm any physical reason why the IP bandwidth of the link C2-out is smaller than what was predicted in section 6. the model reproduces the measured delays very well. the error is much larger for a few packets. . a sign of high local link utilization. However.1. irrespectively of their arrival order. Once the two packets have reached the output buffer. Second. the important observation is that this phenomenon is only noticeable for very large busy periods. Given that the minimum transit time of a packet depends linearly on its size (see figure 6.142 CHAPTER 6.2. First it does not take into account the fact that a small number of option packets will take a ‘slow’ software path through the router instead of being entirely processed at the hardware level. This is in agreement with figure 6. the amount of work in the system is the same.5). transit times through the router only depend on packet sizes. Second. local reordering requires that two packets arrive almost at the same time on two different interfaces. Thus these local errors do not accumulate. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY largest busy period on the figure has a duration of roughly 16 ms and an amplitude of more than 5 ms. the small packet can overtake the large one and reach the output buffer first. the output queue stores not only the packets crossing the router. As a result. The last point worth noticing is the systematic linear drift of the error across a busy period duration. Intuitively.

these results are very satisfactory. Of the delays inferred by our model. Despite its simplicity.1.8: Measured delays and model predictions (top).76 Mbps). i. As expected.2. Figure 6. a fluid queue with OC-3 nominal bandwidth. There is in fact only a 4% difference between the nominal and effective bandwidths. the errors inside a busy period build up very quickly because the queue drains too fast.e. 90% are within 20µs of the measured ones. and a fluid queue with OC-3 IP bandwidth. with a simple fluid model.4. Given the timestamping precision issues described in section 6. our model is considerably more accurate than other single-hop delay models.8 with three different models: our two stage model.6. Figure 6.9(a) compares the errors made on the packet delays from the OC- 3 link C2-out presented in figure 6.9(b) shows the cumulative distribution function of the delay error for a 5 minute window of C2-out traffic.52 Mbps) for the queue instead of a carefully justified IP bandwidth (149. If moreover one chooses the nominal link bandwidth (155. Absolute error between data and model (bottom). but this is enough to create errors up 800µs inside a moderately large busy period. when one does not take into account the minimum transit time. MODELLING 143 6000 measured delays 5000 model 4000 delay ( µs ) 3000 2000 1000 0 0 5 10 15 20 25 time ( ms ) 150 100 error ( µs ) 50 0 −50 0 5 10 15 20 25 time ( ms ) Figure 6. all the delays are systematically underestimated. .

which confirms the excellent match between the model and the measurements.6 0 0. our model performs very well for a large range of link utilizations.8 0. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY (a) 200 0 −200 error ( µs ) −400 −600 −800 model fluid queue with OC−3 effective bandwidth −1000 fluid queue with OC−3 nominal bandwidth 0 5 10 15 20 25 time ( ms ) (b) (c) 1 1 0.9. We divide the period into 156 intervals of 5 minutes.5 −200 −100 0 100 200 50 60 70 80 90 100 110 error (µs) Link utilization (Mbps) Figure 6. We now evaluate the performance of our model over the entire 13 hours of traffic on C2-out as follows. the relative error grows due to the fact that large busy periods are more frequent. The absolute relative error is less than 1.2 −1 0 −1.5 Relative error (%) 0.9: (a) Comparison of error in delay predictions from different models of the sam- ple path from figure 6. Overall. The packet delays therefore tend to be underestimated more often due to the unexplained bandwidth mismatch occurring inside large busy periods.5% for the whole trace. (b) Cumulative distribution function of model error over a 5 minute window on link C2-out.8.144 CHAPTER 6. we plot the average relative delay error against the average link utilization. The results are presented in figure 6. For large utilisation levels.4 −0. . (c) Relative mean error between delay measurements and model on link C2-out vs link utilization.5 0. For each interval.

8).5.4. For each output link Λ j : (i) measure the minimum excess (i. where packets are first delayed by an amount ∆Λ j before entering a FIFO queue. as described in equation (6. A model of a full router can be obtained by putting together the models obtained for each output link Λ j . not the type of traffic.2. excluding service time) packet transit time ∆λi .2. The accuracy would drop off under loads heavy enough to shift the bottleneck to the switching fabric.1 Motivation From the previous section. our router model can accurately predict delays when the input traffic is fully characterized. The problem is that these curves are not unique since packet delays depend not only on the mean traffic rate. (ii) calculate the IP bandwidth of the output link by taking into account the different levels of packet encapsulation. However in practice the traffic is unknown. 6. System equations are given in section 6. Define the overall minimum packet transit time ∆Λ j as the minimum over all input links λi .5 Delay performance: understanding and reporting 6. and feeding it to a simple two stage model. we propose the following simple approach for modeling store and forward routers. but not dominant. This .6(b). link utilization alone can be very misleading as a way of inferring packet delays. These depend only on the hardware involved. Although very simple. as described in section 6. this model performed remarkably well for our data set. when details of the scheduling algorithm could no longer be neglected. (iii) obtain packet delays by aggregating the input traffic corresponding to the given output link.4. illustrated in figure 6.Λ j between each input λi and the given output Λ j . where the router was lightly loaded and the output buffer was clearly the bottleneck. such as curves giving upper bounds on delay as a function of link utilization.6.1.e.4 Router model summary Based on the observations and analysis presented above. as defined in equation (6.5.4). we expect the model to continue to perform well even under heavier load where interactions in the front end become more pronounced. DELAY PERFORMANCE: UNDERSTANDING AND REPORTING 145 6. but also on more detailed traffic statistics. and could potentially be tabulated. which is why network operators rely on available simple statistics. In fact. Suppose for instance that there is a group of back to back packets on link C2-out. As explained above. when they want to infer packet delays through their networks.

number of packets and bytes. For this traffic window. such as dura- tion. i. For instance. Figure 6. and it ends with the last packet before the start of another busy period. if one were to detect busy periods by using timestamps and packet sizes to group together back-to-back packets.4. and second to propose a simple mechanism that could be used to report useful delay information about a router.2. 90% of busy periods have an amplitude smaller than 200µs. and amplitude (maximum delay experienced by a packet inside the busy period).10(b) for a 5 minute traffic window. The equivalent definition in terms of measurements is as follows: a busy period starts when a packet of size l bytes crosses the system with a delay ∆(l) + l/µ. is a lot more robust than an alternate definition based solely on packet inter-arrival times at the output link. the local link utilization is 100%. The cumulative distribution functions (CDF) of busy period ampli- tudes and durations are plotted in figures 6. First. However this does not imply that these packets have experienced large delays inside the router.12). and 80% last less than 500µs. packets belonging to the same busy period are not necessarily back to back on the output link (see equation 6. we propose to study performance related questions by going back to the source of large delays: queue build-ups in the output buffer. we begin by collecting per busy period statistics. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY means that packets follow each other on the link without gaps.4 that we defined busy periods as the time between the arrival of a packet in the empty system and the time when the system goes back to its empty state.2 Busy periods Definition Recall from section 6.10(c) shows a scatter plot of busy period amplitudes . 6. In this case they would actually cross the router with minimum delay in the absence of cross traffic. the following two problems would occur. Second and more importantly.10(a) and 6. according to our system definition from section 6.5. Inferring average packet delays from link utilization only is therefore fundamentally flawed. which makes full use of our measurements.e. Instead. In this section we use our understanding of the router mechanisms obtained from our measurements and modelling work of the previous sections to first describe the statistics and causes of busy periods.146 CHAPTER 6. timestamping errors could lead to wrong busy periods separations. They could very well be coming back to back from the input link C1-in with the same bandwidth as C2-out. Statistics To describe busy periods. This definition.

although roughly speaking the longer the busy period the larger its amplitude.5 3 3 2. This means intuitively that busy periods have a ‘regular’ shape.8 0.e. i.10: (a) CDF of busy period amplitudes. In particular.5 3 Amplitude (µs) Duration (ms) (c) (d) 7 7 6.2 0 0 0 200 400 600 800 1000 0 0. A scatter plot of busy period amplitudes against the median delay experienced by pack- ets inside the busy period is presented in figure 6. relationship between maximum and median delay experienced by packets inside a busy pe- riod.4 0.5 2 2. (c) Busy period amplitudes as a function of busy period durations. we can use our knowledge about the input packet streams on each interface to understand the mechanisms that create the busy periods observed for our router output .5 3. busy periods where most of the packets experience small delays and only a few packets experience much larger delays are unlikely. DELAY PERFORMANCE: UNDERSTANDING AND REPORTING 147 (a) (b) 1 1 0. against busy period durations for amplitudes larger than 2ms on link C2-out (busy periods containing option packets are not shown).5 2 2. albeit noisy. One can see a linear.5 1 1.5 5.10(d).5 1 1.2 0.5 2.5 6 6 busy period amplitude (ms) busy period amplitude (ms) 5.6.5 6. Origins Our full router measurements allow us to go further in the characterization of busy periods. (d) Busy period amplitudes as a function of median packet delay.5 3 3.8 0.5 4 busy period duration (ms) median delay (ms) Figure 6.5.5 5 5 4.6 0. There does not seem to be any clear pattern link- ing amplitude and duration of a busy period in this data set. (b) CDF of busy period durations.5 4.6 0.5 4 4 3.5 2 2 0 10 20 30 40 50 60 70 80 0 0.4 0.

in the limit of large amplitude. In this case. For each of the plots 6. together with the true delays measured on link C2-out for the same time window as in figure 6. The large busy period is therefore due to the fact that the delays of the two individual packet streams peak at the same time. A different situation is shown on figure 6. (e) and (f). A detailed analysis can be found in [142].11.8. The resulting sets of busy periods are grouped according to the largest packet delay observed: figure 6. Figures 6. A more surprising example is illustrated in figure 6. 6. The striking point is that most busy periods have a roughly triangular shape.11(b) and 6. where one link contributes almost all the traffic of the output link for a short time period. It is interesting to notice that the three large busy periods plotted in figures 6. This non linear phenomenon is the cause of all the large busy periods observed in our traces.11(d). the measured delays are almost the same as the virtual ones caused by the busy input link. For each 5 min interval.11(e) between 4ms and 5ms. It is clear that. and plot the delays experienced by packets in a window 10ms before and 15ms after t0 . the multiplexing of different input streams. busy periods are created by a local aggregate arrival rate which exceeds the output link service rate. and figure 6.11(d).11(c) all have a roughly triangular shape. The resulting congestion episode for the multiplexed inputs is again much larger than the individual episodes. These results are reminiscent of the theory of large deviations. the black line highlights the busy period detailed in the plot directly above it.11(e) and 6. Some hints on the shape of large busy periods in (Gaussian) queues can be found in [9] where it is shown that. This can be achieved by a single input stream.11(a).148 CHAPTER 6. we detect the largest packet delay.11(d) when the largest amplitude is between 5ms and 6ms. the largest delay for the multiplexed inputs is around 5ms. feed them individually to our model and obtain virtual busy periods. We restrict ourselves in this section to an illustration of these different mechanisms. by definition. but a triangular assumption can still hold. They were obtained as follows.11(c). figure 6.11(b) that shows one input stream creating at most a 1ms packet delay by itself and the other a succession of 200µs delays.11(f) that show that this is not due to a particular choice of busy periods. The delays obtained are plotted on figure 6.11(a). To create the busy periods shown in figure 6. the maximum delay experienced by packets from each individual input stream is around 1ms. However. 6. The largest busy periods have slightly less regular shapes. or a combination of both phenomena.11(f) between 2ms and 3ms. we store the individual packet streams BB1-in to C2-out and BB2-in to C2-out. which states that rare events happen in the most likely way. store the corresponding packet arrival time t0 . . In the absence of cross traffic. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY links.

(a) (b) (c) 6 6 6 C2−out C2−out C2−out BB1−in to C2−out BB1−in to C2−out BB1−in to C2−out BB2−in to C2−out 5 BB2−in to C2−out 5 BB2−in to C2−out 5 4 4 4 3 3 3 delay ( ms ) delay ( ms ) delay ( ms ) 2 2 2 1 1 1 0 0 0 0 5 10 15 20 25 0 5 10 15 20 0 5 10 15 20 time ( ms ) time ( ms ) time ( ms ) (d) (e) (f) 6 6 6 5 5 5 4 4 4 3 3 3 delay ( ms ) delay ( ms ) delay ( ms ) 2 2 2 1 1 1 0 0 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 6. DELAY PERFORMANCE: UNDERSTANDING AND REPORTING time ( ms ) time ( ms ) time ( ms ) Figure 6. 149 . (d) (e) (f) Collection of largest busy periods in each 5 min interval on the output link C2-out.11: (a) (b) (c) Illustration of the multiplexing effect leading to a busy period on the output link C2-out.5.

12 a basic principle: any busy period of duration D seconds is bounded above by the busy period obtained in the case where the D seconds worth of work arrive in the system at maximum input link speed. From our measurements. Let L be the delay experienced by a packet crossing the router. The busy period shown in figures 6.8 and 6. in agreement with the scatter plot of figure 6.5. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY D measured busy period theoretical bound modelled busy period delay A L 0 0 D time Figure 6. we first illustrate in figure 6.12 by the triangle superposed over the measured busy period. busy periods are quite different from their theoretical bound.12 for comparison. This is illustrated in figure 6.10(c).12: Modelling of busy period shape with a triangle. In the rest of the chapter we model the shape of a busy period of duration D and ampli- tude A by a triangle with base D. A network operator might be interested in knowing how long a congestion level . The amount of work then decreases with slope −1 if no more packets enter the system. height A and same apex position as the busy period.11(a) is again plotted in figure 6. in agreement with what we see here.150 CHAPTER 6. 6. This very rough approximation can give surprisingly valuable insight into packet delays. busy periods tend to be antisymmetric about their midway point. we now study how useful such a model could be. In the case of the OC-3 link C2-out fed by the two OC-48 links BB1 and BB2 (each link being 16 times faster than C2-out).3 Modelling busy period shape Although a triangular approximation may seem very crude at first. To do so. We de- fine our performance metric as follows. One can see that its amplitude A is much lower than the theoretical maximum. it takes at least D/32 seconds for the load to enter the system.

D is a function of L. Our simple approach therefore fulfills that role very well.13.12. the mean duration of the congestion episode is also small. The results are plotted on figure 6.10(a).A.A. dL.D = (6. as seen for instance from the amplitude CDF in figure 6.6. D} pairs for successive busy periods over time.14) TL can be approximated by our busy period model with Z (T ) (T ) TL = dL. Let dL. For both utilization levels.5. (T ) In other words. This is due to the fact that.D .13) 0 otherwise. The small discrepancy between data and model can be considered insignificant in the context of Internet applications because a service provider will be realistically only interested in the order of magnitude (1ms.D the random process governing {A. Let dL. For a small congestion level L. (6. 100ms) of a congestion episode greater than L.D be the length of time the workload of the system remains above L during a (T ) busy period of duration D and amplitude A. (6.A. It is also worth noticing that the results are very similar for the two different link utilizations. Denote by ΠA.A. because this gives a direct indication of the performance of the router. This shows that our very simple triangular shape approximation captures enough information about busy periods to answer questions about duration of congestion episodes of a certain level. A and D only. From basic geometry one can show that D(1 − AL ) if A ≥ L  (T ) dL. most busy periods do not exceed L by a large amount. and do not depend on average utilization. the two parameters (A.A.15) to approximate TL on the link C2-out.D . D) are therefore enough to describe busy periods. 10ms. the measured durations (solid line) and the results from the triangular approximation (dashed line) are fairly similar.A. Both dL.D dΠA.D (T ) be the approximated duration obtained from the shape model.A. although a large number of busy periods have an amplitude larger than L. so the mean duration is small. For the metric considered.A. DELAY PERFORMANCE: UNDERSTANDING AND REPORTING 151 larger than L will last.13 for two 5 minute windows of traffic with different average utilizations.D are plotted with a dashed line in figure 6. . This means that busy periods with small amplitude are roughly similar at this time scale.A.D and dL. the knowledge of the apex position does not improve our estimate of dL. as obtained from our delay analysis. The mean length of time during which packet delays are larger than L reads Z TL = dL.D . Let us now qualitatively describe the behaviours observed on figure 6.15) We use equation (6.D dΠA.

With an even larger values of L however.5 L (ms) Figure 6. As the threshold L increases. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY 2 1.15).17) 0 0. They are not critical since at the end of the 5 minute interval a much coarser discretisation is performed in order to limit the volume of . up to the point where there are no busy periods larger than L in the trace. In principle we need to report the pair (A. fewer and fewer busy periods qualify.152 CHAPTER 6.2 1 0.4 1.3 and 0.6 Mean duration (ms) 1.15) 0. dots: equation (6. D) for each busy period in order to recreate the process ΠA. most are considerably larger than L.5 3 3. The ones that do cross the threshold L do so for a smaller and smaller amount of time. We do not consider these details here.8 1.4 Reporting busy period statistics The study presented above shows that one can get useful information about delays by jointly using the amplitude and duration of busy periods. D) pairs during 5 minutes intervals. Thus.D can be described by the joint marginal distribution FA.5.17) Data 0.7) on link C2-out.5 2 2. The bin sizes should be as fine as possible consistent with available computing power and memory. for each busy period we need simply update a sparse 2-D histogram.17). dashed lines: equation (6. Solid lines: data. Measuring A and D is easily performed on-line.35 utilization Equation (6. for two different utilization levels (0. We start by forming busy periods from the queue size values and collecting (A. This is feasible in practice since the queue size is already ac- cessed by other software such as active queue management schemes.8 Data 0. the (conditional on L) mean duration first increases as there are still a large number of busy periods with amplitude greater than L on the link.6 Equation (6. we instead assume that busy periods are independent and therefore that the full process ΠA.5 1 1.4 Equation (6. 6. Now we look into ways in which such statistics could be concisely reported using SNMP.7 utilization Equation (6.15).D of A and D.15).D and evaluate equation (6. Since this represents a very large amount of data in practice. and of these.15) 0.13: Average duration of a congestion episode above L ms defined by equa- tion (6.

A simple and natural way to do this is to select bin boundaries for D and A separately based on quantiles.14. The element p(i. For exam- ple a simple equal population scheme for D would define bins such that each contained (100/N)% of the measured values. the discretisation scheme must adapt to the traffic to be useful.15) becomes   L Z Z (T ) (T ) TL = dL. Applying this to the measurement of TL introduced in section 6.D = D 1− dFA.D dFA. j) of M is defined as the probability of observing a busy period with duration between the (i − 1)th and ith duration quantile. on bin populations. the average duration TL can then be estimated by N j (T ) 1 (T ) TL = g ∑ ∑ dL. together with a shape model.6.. In this preliminary study we have only illustrated how TL could be approximated with the reported busy period information.D ..A. and N 2 /2 joint probability values. Using this richer information. characterizing respectively packet delays and link utilization.A . (6. j). . M can be used to answer performance related questions. 2N bin boundary values for amplitude and duration. Estimates obtained from equa- tion (6. Every 5 minutes. Denote by M the N × N matrix representing the quan- tized version of FA. (i. we need to determine a single representative amplitude Ai and average duration D j for each quantized probability density value p(i..D .13.3. The 2-D histogram stored in M contains the 1-D marginals for amplitude and duration. (6. In any case. As we do not know a priori what delay values are common. are exported. N}2 .17) nL j=1 i=1 i j Ai >L where nL is the number of pairs (Ai . from M.5.14. but other performance related questions could be answered in the same way.17) are plotted in figure 6. this reporting scheme can give some interesting information about the delay performance of a router.5. i. j) ∈ {1. DELAY PERFORMANCE: UNDERSTANDING AND REPORTING 153 data finally exported via SNMP. In addition however. from the 2-D histogram we can see at a glance the relative frequencies of different busy period shapes. and assuming independent busy periods. One can for instance choose the center of gravity of each of the tiles plotted in figure 6. For a given level L. Given that for every busy period A < D. equation (6. Although very simple and based on a rough approximation of busy period shapes. as shown in figure 6.D p(i. We control this directly by choosing N bins for each of the amplitude and the duration dimensions.e. the matrix is triangular. and amplitude between the ( j − 1)th and jth amplitude quantile. D j ) such that Ai > L. j).16) A>L A To evaluate this. our reporting scheme provides a much more . They are fairly close to the measured durations despite the strong assumption of independence.

We then proposed a scheme to export router delay performance in a compact way. Second.03 300 0. Our third contribution concerns a fundamental understanding of delay performance.06 600 0. Moreover.14: Histogram of the quantized joint probability distribution of busy period am- plitudes and durations with N = 10 equally spaced quantiles along each di- mension for a 5 minute window on link C2-out.05 500 0. valuable insight about packet delays than presently available statistics based on average link utilization. .08 900 0. we used our dataset to provide a physical model of router delay performance.02 200 0. and presented a simple triangular shape model that can capture useful delay information.01 100 0 200 400 600 800 1000 Duration ( µs ) Figure 6.04 400 0. and showed that our model could very accurately infer packet delays.07 800 Amplitude ( µs ) 700 0. 6.09 1000 0.154 CHAPTER 6. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY 1100 0. We first described a unique experimental setup where we captured all IP packets crossing a Tier-1 access router and presented authoritative empirical results about packet delays.6 Conclusion In this chapter we have explored in detail ‘through-router’ delays. it is only based on measurements and is therefore traffic independent. We gave the first measured statistics of router busy periods that we are aware of.

We start by presenting some empirical results showing how a packet train is modified by the router in section 7.2.2. We will in fact put together the empirical findings from chapter 3. the modelling work from chapter 4.2. Figure 7.2. some mathematical results from chapter 5 and the router mechanisms from chapter 6 to gain a thorough understanding of the problem. Recall that general characteristics of the traf- fic over the entire 13 hours of the data collection were already given in chapter 6.4.2 Empirical observations We now have all the elements in place to understand how a router modifies a packet train.2.1.3 we validate the BLPP model on the data collected in chapter 6 and show how it can be used to model the splitting and merging of packet streams through a router. in section 7.1 shows a schematic of the 155 .3. We first. and present a complete validation of our traffic model at a network node.2. we give a full description of the virtual paths linking input and output linecards in our fully instrumented router. we use the knowledge gained by answering the three questions detailled in section 1. we define the ‘packet’ time scale as the smallest time scale at which the BLPP can be applied. and over which we aim to validate our modelling results. We present some consequences for traffic modelling in section 7. give the details of the traffic streams on which we base our analy- sis. Here we focus on a two hour period where the traffic is roughly stationary on all the links.1 Introduction In this chapter. 7.2. We conclude in section 7.1 Details of traffic streams In this section. 7.Chapter 7 Modelling Internet traffic 7. and then illustrate how a router modifies a packet train through a queuing mechanism in section 7. Based on these findings. In section 7.

2. MODELLING INTERNET TRAFFIC in out BB1 out OC48 OC3 in C1 out OC3 in C2 out OC3 in C3 in out BB2 out OC48 OC12 in C4 Figure 7.4 BB1-in 119808388 9502484 81.2 C1-out 17384671 2643529 3. corresponding to roughly 100 million IP flows.1: Details of traces collected over a two hour period: trace name. We use the results of the packet matching analysis presented in section 6. There is a wide range of link utilizations: backbone links are utilized less than 4%. average bandwidth. number of pack- ets.6 C4-in 49801794 3655830 39. Table 7. The router is not ‘fully meshed’.7 C3-out 52998594 3945802 57.5 C4-out 67797464 6848361 20.6 BB2-in 126566855 11761474 78. number of flows.2 C2-in 216140434 27320806 71. not every input contributes to every output. or substreams.156 CHAPTER 7.2. and we therefore discard this link.9 BB2-out 166385423 16874143 73. Trace # Packets # Flows Band. the router routed over 1 billion packets. flowing from a given input to a given output linecard where they exit the router.2 BB1-out 120286864 15742387 53. an output trace is decomposed into substreams corresponding to the different input line cards. router. while utilization on customer links ranges from 2% on C1-in to 52% on C2-out. From table 6.e. Over the 2 hour period considered.1. i. at an average rate of 575mbps.7 Table 7. it makes sense from a routing perspective that there is no traffic on the matrix diagonal: there is no point sending traffic to the router .1 page 129 we know that there is virtually no traffic on input link C3-in. Similarly.3 page130 to decompose each input packet trace into groups of packets.width (Mb. The details of the traces collected at each linecard are given in table 7. First.2 shows the logical paths inside the router.7 C2-out 108637851 7857864 79.ps) C1-in 21363721 1845783 16.1: Router diagram illustrating the multiplexing of input streams contributing to link C2-out.

Second.2 C2-in to C4-out 300495 35445 0. Empty boxes mean that there is no traffic flowing between the specified input and output linecards. . EMPIRICAL OBSERVATIONS 157 C1-in C2-in C3-in C4-in BB1-in BB2-in C1-out ✔ ✔ ✔ C2-out ✔ ✔ ✔ ✔ C3-out ✔ ✔ ✔ C4-out ✔ ✔ ✔ BB1-out ✔ ✔ ✔ BB2-out ✔ ✔ ✔ Table 7.0 C4-in to C1-out 29419 2308 0.003 C4-in to C2-out 39039 4768 0.3 Table 7.3. details of which can be found in table 7. Substream # Packets # Flows Band. because this router is a gateway router linking clients to a Tier-1 backbone network. BB1-in and BB2-in.1 BB2-in to C3-out 24224258 1969708 25. if the traffic is then sent back to where it comes from. number of flows.4 BB1-in to C4-out 31184234 2705399 11.ps) C1-in to C2-out 12445 1052 0.5 BB1-in to C1-out 9634095 1414227 1.2 we can form 19 substreams between router linecards.6 C2-in to BB2-out 127968526 13814708 43.7. average bandwidth.2.9 C4-in to BB2-out 26577591 2056484 21.1: packets exiting on link C2-out come from input links C1-in.6 BB1-in to C3-out 28653178 1966237 32. and none between the backbone links.02 C4-in to C3-out 98359 7087 0.2 BB2-in to C4-out 36207136 4107360 9. From table 7.0 C1-in to BB2-out 11669672 988326 9. most of the traffic goes from the backbone to the clients or from the clients to the backbone.0 BB2-in to C1-out 7709435 1226433 1. There is little traffic between clients.3: Details of each substream obtained with the packet matching procedure: name.05 C2-in to BB1-out 87430709 13294988 28.88 BB1-in to C2-out 50210170 3403399 36.09 C4-in to BB1-out 22955506 1573749 17. number of packets. A typical situation for an output link is illustrated in figure 7.27 BB2-in to C2-out 58258428 4423855 43.2: Router ‘matrix’ showing the packet streams through the router.width (Mb.004 C1-in to BB1-out 9664976 853932 7. C4-in.

From table 7. In this way. C2- out. we know that a packet train is modified by a router since not all packets experience the same delay. wavelets are not necessarily the best analysis tool to use since they average out the spectral estimation on a certain frequency range. MODELLING INTERNET TRAFFIC 7. and study the modifications incurred by substreams contributing to this link.2. while they are noticeably different at smaller time scales. Recall that at small scales the confidence intervals on the estimation are very small. one can think of the point process Xout as a point by point translation of Xin . At very small scales.3 these packets entered the router on links BB1-in and BB2-in. In order to simplify the notation. We can form a virtual input packet train Xin .2 illustrates the second order properties of Xin and Xout . by multiplexing the packets coming from these four input links and destined to C2-out. which means that the observed difference is significant. Instead. with a small fraction also coming from C1-in and C4-in.2 Packet train through a router From the understanding of packet delay mechanisms in chapter 6.d. timestamped on the input linecards. We place ourselves in a worst case scenario by studying the most congested link. In this section. we call Xout the packet train observed on C2-out.5 page 31. We could simply study the differences of packet arrival times for each substream of table 7.158 CHAPTER 7. amount whereas in practice the delay of a packet is conditioned both by its size and the history of the queuing process. we capture all the packets in the output buffer and are more likely to give a physical explanation of our results. However this would not take into account the cross traffic and would make any physical interpretation of the results difficult. The periodograms are represented by the ‘noisy’ . This result can be intuitively understood from the delay analysis carried out in chapter 6: most packets experience less than 1ms delay in the router therefore the behaviour of Xin over time scales larger than 1ms will remain unchanged through the router. we compare the timing of all the packets on an output link with the arrival times of these same packets taken together on the different input interfaces.3. This is of course an approximation since the translation operation moves each point by an i. We therefore use periodograms to estimate the power spectral density at small scales. we seek to quantify the extent of the changes. The thick black line and the thin grey line represent respectively the LDs of Xin and Xout . The LDs are identical at scales larger than 20ms. as introduced in section 2. where periodicities due to back-to-back packets might occur.3 timestamped both before and after the router. and shows how the point process of packet arrival times is modified through the router. In very rough terms. Figure 7.i.

e. 1500 byte packets placed back-to-back on the output link. The router busy periods explain how Xin has been locally modified by the router to become Xout .2: Fourier spectral density and LD for output traffic C2-out and sum of contribut- ing input streams. Other peaks are not as easily identifiable. No such periodicities could be observed in the incoming packet train Xin . The jumps in the top plot of figure 7. Definition of the ‘packet’ time scale.2. EMPIRICAL OBSERVATIONS 159 30. the Fourier spectral density shows strong periodicities in Xout that were averaged out by the wavelet analysis. A quantitative analysis of the largest peak exhibited by the power spectral density of Xout reveals that it corresponds to the transmis- sion time of a 1500 byte packet at the nominal bandwidth of link C2-out.5mus 977mus 0. since they mix other packet sizes and harmonics of lower frequencies. i. The packet sizes are represented respectively by the height of the jumps in the top plot and by the grey rectangles in the bottom one. we illustrate the micro behaviour of Xin and Xout on a 2ms time window. Although wavelet and periodogram estimates agree for time scales larger than 200µs and above.3 mark the packet arrival times of Xin while the bottom plot shows Xout . Finally.031 1 Input Fourier spectrum 19 Output Fourier spectrum Input LD Output LD 18 17 1500 bytes at OC-3 speed 16 15 Poisson 14 level 13 −15 −10 −5 0 5 Micro Fine Knee Coarse time scales time scales transition time scales Packet time scale BLPP model Figure 7. signals in figure 7.7.2. .

Keeping in mind that we chose the worst case queuing scenario in section 7. As already defined in section 4. every link would need to be analyzed. . and fine scales (FS) the time scales smaller than the knee transition but larger than the packet time scale. but characterize it instead in a qualitative manner by saying that it is a few orders of magnitude larger than the transmission time of a 1500 byte packet on the link considered. we can assume that traffic characteristics at scales larger than 10ms remain unchanged through a backbone network were all the links are over dimensioned.2. below 1 or 10ms. This corresponds to roughly 1ms for an OC-3 link.6.1.2 illustrates these different time scales. will strongly depend on the point of the network where they were obtained and should therefore be generalized only with the greatest care. 7. These measurements also show that the results of very small scale traffic analysis. if traffic statistics were different over all time scales at every point of a network. We do not give a quantitative value to this scale. By this we mean that one can draw general conclusions on traffic characteristics by studying a single link. Modelling traffic at scales larger than 10ms from measurements taken at one point of a network is therefore worth doing. we call coarse scales (CS) the time scales larger than the knee transition area. On the other hand. Figure 7. they show that traffic characteristics at scales larger than 10ms were not altered by the router we monitored. Bottom: Corresponding packets on link C2-out. We call packet time scale the scale below which the BLPP does not apply.3 Modelling consequences Such router measurements highlight some interesting problems of traffic modelling. For instance. We will solely focus on scales larger than the packet time scale in the following.2.3: Top: Unfinished work U(t) in the queue. MODELLING INTERNET TRAFFIC U(t) 200 100 0 0 1 2 time ( ms ) Figure 7. One can think of it as the scale below which the size of a packet ‘matters’.2.160 CHAPTER 7.

Figure 7.3.3.5 billion packets comprised in thirty 2 hour long traces. The thick grey line represents the LD of the original traffic.2. In section 7. 7. We start with the traces detailled in table 7. the choice of i. At most time scales.1 and is therefore the link where.3. We emphasize the fact that the results presented here constitute by far the most thorough traffic model validation we are aware of.7. that the underlying assumptions of our model are in fact verified over a large range of link speeds and link utilizations. We thus complement the preliminary findings presented in chapters 3 and 4 and obtained on the lightly loaded links described in table 3.1 that our previous empirical findings are valid for most of them.1. VALIDATION OF THE BLPP 161 7.3.3. In agreement with the findings of chapter 3. It is an intensive computer task that involves the individual manipulation of more than 1.3 Validation of the BLPP The first aim of this section is to validate the BLPP as a versatile traffic model for time scales larger than the packet time scale. P-Uni] on all the traffic crossing the router.3. P-Uni]. we conclude that our flow independence hypothesis is also verified for this fairly loaded link. the results of [P-Uni] show that most of the energy at small time scales comes from in-flow dynamics.3 to extend the validation of our traffic model from a single link to a node and then to a network. We extend the model to the non stationary case in section 7. In the following we will check the results of the two key semi-experiments [A-Pois] and [A-Pois.3. . we study the splitting and merging of traffic with the substreams detailled in table 7. We will show.1 Individual links Recall from chapter 4 that the fact that the BLPP model works is a direct consequence of the results of selected semi-experiments. while the solid black line and the line with circles represent the LDs of the reconstructed packet arrival process after applying respectively [A-Pois] and [A-Pois. This represents at least a hundred times more data than most other traffic modelling studies where one or two relatively short traces are usually deemed a sufficient empirical check.4 shows the results of the semi-experiments on link C2-out. intuitively. the flow independence assumption is the most likely to fail.1 and show in section 7. We first focus on link C2-out since it has the highest load of all the traces detailled in table 7. Moreover. The second aim is to use the data from table 7. flows following a Poisson process comes from the fact that the manipulation [A-Pois] has virtually no impact on the structure of X(t).d. the differences due to the [A-Pois] manipulation are not significant.i. by using key semi-experiments. For instance.

MODELLING INTERNET TRAFFIC 0. First. since the same manipulation on link C2-out with similar uti- lization did not create significant changes. we focus on the traffic seen on each linecard. P-Uni] have similar effects to the ones observed in chapter 3. However. The plots are presented in a ‘matrix’ organisation that matches the traffic matrix presented in table 7.004 0. . and how packets were put on it. In most cases. most packets on C2-out would be generated by web servers in the US with very fast network connection. Two notable exceptions. are C2-in and C4-out which we now detail.3. This shows that most traces satisfy the requirements for a BLPP model to apply. P-Uni] on output link C2-out. while larger time scales remain mostly unchanged. The legend is the same as in figure 7. for which semi-experiments results are illustrated in the top row and left column of figure 7. where [A-Pois] significantly changes the form of the LD. before the traffic gets to link C2-in. stretched over pages 164 and 165. and gives very convincing empirical evidence of the wide applicability of the model. Instead. In this section. C2-in carries traffic from Asia on a transpacific link using up to 50% of its capac- ity. shows the LDs of the two semi-experiments [A-Pois] and [A-Pois.4. as found in chapter 3. Fig- ure 7.4.4: Semi-experiments [A-Pois] and [A-Pois.25 1 4 16 64 256 Orig 25 A−Pois A−Pois P−Uni 21 log2 Var( d j ) 17 13 9 −8 −6 −4 −2 0 2 4 6 8 j = log ( a ) 2 Figure 7. We now present similar results for the traces collected at each router interface. P-Uni] drastically changes the small scale behaviour of the LD below 1s (flat spectrum up to the knee).1 and 7. P-Uni] on the 30 traces detailled in both tables 7.062 0. the manipulations [A-Pois] and [A-Pois. where packets certainly experience large queuing delays resulting in high correla- tion between the flows. On the other hand. this has more to do with the way the link is being used. and one could think that this relatively high utilization is enough to explain the changes created by [A-Pois].016 0. The manipulation [A-Pois.162 CHAPTER 7. high utilization is not by itself a sufficient reason.5.2. We can only infer that traffic coming from the end-users is heavily shaped by access routers.

7. the splitting and merging of a BLPP is done on a flow by flow basis since all the packets in a flow belong to the same substream. one can use a multi class model. but we will refer to it as multi class BLPP for convenience. these highly correlated flows have been left virtually unchanged when they reach our measurement point. which is probably the first bottleneck encountered by the packets. The merging of two independent single class BLPPs will lead respectively a multi class or single class BLPP depending whether the BLPPs being merged have different or similar flow characteristics.7. Given the very low queuing delays in the core. We first briefly recall some results on the splitting of BLPPs and then present our empirical validation. As shown in chapter 6. VALIDATION OF THE BLPP 163 Our edge router. which is less than 15% utilized.d. this multi class model is not a BLPP.2 Splitting and merging of traffic through a router The second aim of this section is to check that the BLPP can model the splitting and merging properties of traffic substreams through a router. In essence. where all the flow characteristics have an extra level of randomization: µA and c are then random variables while P becomes a doubly stochastic random variable. An i. Theory Recall from chapter 4 that the BLPP is a single class traffic model where all the flows are considered to have the same dynamics with constant µA and c chosen to be representative of the measured flow statistics. The second exception concerns the link C4-out. Technically. We know from chapters 4 and 5 that these operations are easily applied to the BLPP: because flow arrival times in a BLPP follow a Poisson process. a description of traffic in terms of busy period statistics is a lot more informative. We see again that one cannot accurately predict the results of the semi-experiments based solely on link utilization and emphasizes the fact that utilization tells very little about the burstiness of traffic. This would intuitively tend to indicate that flows on the link are only weakly correlated. one can use results on the splitting and merging of Poisson processes to study the splitting and merging of BLPPs. the fact that [A-Pois] causes significant changes means that this reasoning is flawed.i. does not shape the traffic enough for [A-Pois] to have a significant impact on flow correlations.3. flow based splitting of a single class BLPP will lead two independent single class BLPPs. This means that in the general case where substreams have different flow characteristics. Again. probably only a few hops away from the sources of these packets. When this single class approach is not possible because very different kinds of traffic are mixed [159].3. What this shows is that a large proportion of these flows share a common bottleneck upstream. the BLPP does .

MODELLING INTERNET TRAFFIC 25 25 C1−in C2−in 21 21 17 17 13 13 9 9 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 25 C1−out 21 17 No traffic No traffic 13 9 −8 −6 −4 −2 0 2 4 6 8 25 16 C2−out C1−in to C2−out 21 12 17 8 No traffic 13 4 9 0 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 25 C3−out 21 17 No traffic No traffic 13 9 −8 −6 −4 −2 0 2 4 6 8 25 16 C4−out C2−in to C4−out 21 12 17 8 No traffic 13 4 9 0 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 25 25 25 BB1−out C1−in to BB1−out C2−in to BB1−out 21 21 21 17 17 17 13 13 13 9 9 9 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 25 25 25 BB2−out C1−in to BB2−out C2−in to BB2−out 21 21 21 17 17 17 13 13 13 9 9 9 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 Figure 7. . 164 CHAPTER 7.5: Semi-experiments [A-Pois] and [A-Pois. P-Uni] on all traffic streams.

5: (continued) See text for details.7.3. VALIDATION OF THE BLPP 165 25 25 25 C4−in BB1−in BB2−in 21 21 21 17 17 17 13 13 13 9 9 9 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 16 25 25 C4−in to C1−out BB1−in to C1−out BB2−in to C1−out 12 21 21 8 17 17 4 13 13 0 9 9 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 16 25 25 C4−in to C2−out BB1−in to C2−out BB2−in to C2−out 12 21 21 8 17 17 4 13 13 0 9 9 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 16 25 25 C4−in to C3−out BB1−in to C3−out BB2−in to C3−out 12 21 21 8 17 17 4 13 13 0 9 9 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 25 25 BB1−in to C4−out BB2−in to C4−out 21 21 17 17 No traffic 13 13 9 9 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 25 C4−in to BB1−out 21 17 No traffic No traffic 13 9 −8 −6 −4 −2 0 2 4 6 8 25 C4−in to BB2−out 21 17 No traffic No traffic 13 9 −8 −6 −4 −2 0 2 4 6 8 Figure 7. .

while [A-Pois. 7.3.2.3 are shown in figure 7. a single class BLPP is preferable since it is does not require the knowledge of routing tables and has a smaller number of parameters. in exactly the same way as what is done with Poisson processes.5. Although slightly less precise.166 CHAPTER 7. but they both have the same empirical backing.3 Model extension Apart from the above mentioned improvement of a multi class approach to take into account the fact that flows do not all have the same rate [159]. MODELLING INTERNET TRAFFIC not quite lend itself to the same linear operations as a Poisson process. The later is simply richer than the former. P-Uni] only modifies the small scale behaviour. one can split and merge BLPPs in exactly the same way as Poisson processes. if flows are considered to follow the same statistics on every substream. can therefore be modelled as a sum of independent single class BLPPs. However. There is no contradiction between a single class and a multi class approach to the mod- elling of a packet trace. and a single class BLPP when all the substreams have the same flow characteristics. Empirical validation We know from section 7.3 will give similar results whether the substreams are timestamped before or after the router. [A-Pois] does not have a significant impact on the form of the LD. and then through a network. We did not pursue the multi class approach further since it would lead to an even larger number of parameters to be fitted. A packet train on a given link. Again. seen as the superposition of independent substreams. This means that semi-experiments on the substreams defined in table 7. We therefore only study one set of semi-experimental results per substream.2 that the router does not introduce any significant non linearities: the correlation of a packet train is not modified above the packet time scale. whereas our aim has been from the beginning to understand the ‘physics’ and the networking causes of the observed statistics. corresponding to the packets being timestamped before they enter the router. with the added benefit of having a traffic model with strong empirical backing. This means that each individual substream can be modelled by a single class BLPP process as a first approximation. an other obvious improvement to the . Our simple single class BLPP model is sufficient for this purpose. One can thus study the splitting and merging properties of traffic substreams through a node. in most cases. We know that this sum is in fact a multi-class BLPP when different substreams have different flow characteristics. The results of the semi-experiments for the substreams presented in table 7.

6 the arrival rates of packets. . one can use the following algorithm [113]: (1) Generate the arrival times {tF (i)∗ } of a homogeneous Poisson process with rate λF∗ . (2) Reject {tF (i)∗ } with probability 1 − λF (tF (i)∗ )/λF∗ . In fact recall that in the stationary case the packet arrival rate is given by equation (2. the most interesting point is that the packet arrival rate follows the flow arrival rate. VALIDATION OF THE BLPP 167 model concerns the relaxation of the stationarity hypothesis.5. On the one hand. an empirical observation. the byte rate mimics the packet rate. The resulting process packet arrival process X(t) is non stationary as well. However this technique might have poor efficiency if the inversion of the rate function has to be computed numerically [107]. with a 24 hour periodicity to account for daily cycles. one can write λX (t) = λF (t)µP . bytes and flows on link C2-out over a 24 hour period. In agreement with one of our earliest observations made in section 3. We quantitatively compare in figure 7. it could be useful for simulation purposes. this means that flow characteristics remain roughly unchanged over time. Our starting point is. again. From the above observations.2. It is no longer stationary but is instead cyclo- stationary [69]. In this context. Once the flow arrival times have been determined. for our purpose.3.7. The remaining points follow an inhomogeneous Poisson process with rate λF (t). there are at least two ways of simulating the arrival times of an inhomogeneous Poisson process. However. Although such model does not lend itself to the same analytic treatment as its stationary counterpart. (7.1) where the constant flow arrival rate λF has been replaced by a time dependent function λF (t). and that the packet arrival rate can be approximated by a scaled ver- sion of the time dependent flow arrival rate. the flow arrival process Y is an inhomogeneous Poisson process with periodic rate λF (t). In the context of our BLPP model. Assuming that λF (t) has a finite maximum λF∗ . From a practical perspective. On the other hand there exists in fact a very simple and elegant way of simulating an inhomogeneous Poisson process with rate λF (t) based on a thinning operation.3. one simply has to lay down the packets corre- sponding to each flow.23) λX = λF µP . one can apply a time substitution operation to a Poisson process. as mentioned in section 2.2. while the flow characteris- tics are time invariant.

We also used results from a 24 hour link monitoring to show how the model could be extended to take into account daily variations of traffic loads. Using our knowledge of Internet traffic gained by answering the three questions detailled in section 1. bytes and flows per sec- onds. 7.2.168 CHAPTER 7. This proves that the model is versatile and could potentially be used to model packet streams through a network. MODELLING INTERNET TRAFFIC Link Utilization (kpps) 20 15 10 5 03:00 06:00 09:00 12:00 15:00 18:00 21:00 00:00 Time of day (HH:MM UTC) Link Utilization (Mbps) 100 80 60 40 20 03:00 06:00 09:00 12:00 15:00 18:00 21:00 00:00 Time of day (HH:MM UTC) 15 flows (1000/s) 10 5 03:00 06:00 09:00 12:00 15:00 18:00 21:00 00:00 Time of day (HH:MM UTC) Figure 7.4 Conclusion In this chapter we presented a validation of our BLPP traffic model over a large number of traces with a wide range of utilizations and link speeds.6: Utilization over 24 hours of link C2-out in packets. The flow arrival rate ‘shapes’ the traffic. . we used results from all the previous chapters to show that the BLPP can model the splitting and merging of packet streams in a router.

We showed in particular that packet flows can be considered as independent entities with Poisson arrival times. we used extensively a technique we called semi-experiments to determine the impact of different traffic characteristics on the packet arrival process. we analysed router mechanisms thanks to a unique experimental setup where all the pack- ets crossing a router could be captured. 8. and gives a few directions for future work. These findings led to a new physically motivated packet model called a Bartlett-Lewis point process (BLPP). (ii) How to sample packet traffic ? (Chapter 5) Having characterized the traffic entering a router. We proposed a simple model explaining the 169 . with strong empirical backing and useful analytic properties. We also showed that the BLPP model has a nice closure property under both forms of thinning. we then studied how packets are accounted for in today’s routers using packet sampling methods. without repeating the de- tailled comments made in the previous chapters.Chapter 8 Conclusion This last chapter presents a brief summary of our main findings.1 Contributions In this thesis we studied how IP packets cross a router and answered the three questions itemized on page 4: (i) How to characterize the traffic entering a router ? (Chapters 3 and 4) Starting from empirical measurements. (iii) What happens to packets inside a router ? (Chapter 6) In order to get a better understanding of Internet traffic at very small scales. We advocated the use of a new flow sampling technique whenever possible in order to recover detailled statistics about the traffic crossing the router.

this work pushes the open-loop approach of physical traffic models a very long way. there is still a large amount of work that could be done. whether these large deviation principles are in fact verified or not. we will probably need a closed-loop system where feedback mechanisms and traffic interactions will be taken into account. A first step in this direction is to understand the Internet’s router- level topology [114] in order to add a spatial dimension to the usual temporal analysis of packet traffic.3.2. in order to really make a step forward in our understanding of Internet traffic. In terms of our BLPP model. This spatiotemporal analysis represents the next challenge of Internet traffic modelling. the empirical data presented in chapter 6 provides a unique opportunity to investigate the empirical evidence of some large deviations principles often used to solve buffer occupancy problems [68]. One could always add minor improvements to the model. However. One could also try to improve the model by including the LRD flow arrival process empirically observed. one could work on a method to do a blind fitting of the parameters in order to ease the use of the model. Answering these three questions gave us the deep understanding of network traffic over all time scales that we summarized in chapter 7.170 CHAPTER 8. Finally. for instance by using a Markovian description of in-flow dynamics. Another interesting problem would be to link recent results on infinitely divisible cascades [30] with BLPPs expressed as compound Poisson processes [149]. CONCLUSION packet delays through the router and presented a proof of concept for a new way of exporting traffic information based on busy period statistics. A related unsolved problem concerns the physical explanation of the empirical findings concerning the dependency of the knee position on flow durations described in section 3. From a queuing theory perspective.2 Future work Although this thesis brings a lot of insight on Internet traffic. 8. on a more philosophical level. This would provide a very interesting study. . Another natural ex- tension of our full router monitoring work concerns the further testing and implementation of the router performance reporting scheme based on busy period statistics. We also showed how the BLPP could capture the splitting and merging of traffic substreams through a router and could therefore be extended to a network wide traffic model.

IHL: Internet header length is the length .Type of service: Indicates the quality of .Flags: 3 control flags.Total length: Length of the IP datagram col used in the data portion of the Internet in bytes.Protocol: Indicates the next level proto- . . as shown in table A.Options: IP options.2 and A.Appendix A IP Packet structure IP Header All IP packets are structured the same way: an IP header followed by a variable length data field. datagram. .3.Version: IP version number. of a datagram.Time To Live (TTL): Number of hops. Table A.Identification: Value assigned by the . described respectively in tables A.Header Checksum: checksum on the IP sender to aid in assembling the fragments header. service desired. . 0 4 8 16 19 32 Version IHL Type of Service Total Length Identification Flags Fragment Offset Time To Live Protocol Header Checksum Source IP Address Destination IP Address Options (+padding) .1: IP Header 171 .Fragment offset: Indicates where this of the Internet header in 32-bit words fragment belongs in the datagram. The two most common transport protocols are TCP and UDP. . .1. .

Flags: 6 bits .Acknowledgment Number: value of the next .Length: Length of the datagram in bytes. . which indicates where the data begins. . IP PACKET STRUCTURE TCP Header 0 4 10 16 32 Source Port Destination Port Sequence Number Acknowledgement Number Offset Reserved Flags Window Checksum Urgent Pointer Options (+ padding) Data (variable) . . first data byte in this segment. .SYN: Synchronize sequence numbers. .Source Port: Source Port number. .3: UDP Header . sequence number which the sender of the segment .Offset: number of 32 bit words in the TCP .ACK: Acknowledgment field.Destination Port: Destination Port number.Source Port: Source Port number .URG: Urgent pointer field.Destination Port: Destination Port number .Data: UDP data. . . Table A.RST: Reset the connection.172 APPENDIX A. . .PSH: Push function.FIN: No more data from sender.Reserved: 6 bits reserved for future use.2: TCP Header UDP Header 0 16 32 Source Port Destination Port Length Checksum Data .Window: number of data bytes which the sender header.Data: TCP data Table A.Sequence Number: sequence number of the . is expecting to receive. of this segment is willing to accept.

31 Circuit switching. 29 delay. 171 Bartlett-Lewis process. 31 Cluster process . see Sampling mice. 76 substream . 27 Conditional intensity function. 51 equilibrium conditions. see Traffic models13 origins. 6 thinning . 128 model. 82 asymptotic behaviour. 160 Padé approximants. 2 header. see Packet delay statistics. 127. 101. 145 bandwidth. 118 infinitely divisible. 22 numerical evaluation. 112 sampling . 8. 130 elephant. 45. 128 HDLC. 138. 146 model. 26 Long Range Dependence. 27 Knee. 27 Autonomous system. 136. 87 tracking algorithm. 28 Packet Covariance density of counts. 19 Analytic continuation. 142 level. seeSampling108 switching. 138 level. 152 Operations on point processes. 107 Hölder regularity. 42. 32 superposition. 1 IP. 116 Hausdorff spectrum. 53 spectral density. 135 FIFO queue. 100. 146 random translation. 6 thinning. 2 Internet. 128 precision. 84 Legendre spectrum. 45. 156 sampling . seeSampling108 thinning. 128 Backbone network. 20 thinning. 113 Infinitely divisible point process. 76 option. 105 numerical evaluation. 110 Flow. see Point process26 time substitution. 48 Black box models. see Spectrum Passive measurements. see Knee reporting. 145 Delay. 132 minimum router transit time. see Spectrum stationary condition. see Traffic models Neyman-Scott process. 141. 108. 147 Onset scale. 2 Back to back packets. 42. see Sampling Fractional Gaussian Noise. 31 statistics. 42 Hurst parameter. 103. 48. 150 ON/OFF Process . 33 Compound Poisson distribution. 119 Minimum router transit time. 137 Biscaling.Index Active measurements. 103. 20 time scale. 108. 26 Busy period. 137 DAG card. 44 matching. 1. 26. 1. 128 173 .

172 A-Clus. 19 Sampling. 105 TCP. inhomogeneous167 PoS. 122 physical. 128 Poisson. 109. see Opera- splitting. 26 Poisson process . 26 Bartlett-Lewis process. 25 on point processes equilibrium. 65 conditional intensity. 62 Wavelet transform. 103. 22 Poisson process. 126. 5. 65. 119 tions on point processes thinning. 127 black box. 64 P-ScaledR. 138. 60 header. 64 P-Pois. 108. 27 Superposition of point processes. 66 cluster. see Operations Renewal process. 125 operations on. 29 Thinning of a point process. 62 A-Pord. 26 coarse. 26 Time scale ordinary. 145 TCP. 13 architecture. 66 Point process. 30 Time scale stationary. 10 system definition. 14 monitoring. 13 Flow. 30 Teletraffic. 24 frame. 23 SLA. 32 fine. 62 . 7. 160 thinning. 133 Compound Poisson process. see stationary Tier. 2 modified. 160 spectral density. 24 Legendre. seeRenewal process25 renewal process. 120 ON/OFF Process. 25 packet. 156 infinitely divisible. 82 renewal . 32 Survivor function. 116. 26 model. 69 marked. 19 UDP. 14 Renewal equation. 22 Poisson cluster process . 26 S-Thin. 62 P-ConstR. 128 Tauberian theorem. 64 P-Uni. 127 Traffic models model. 29 Poisson cluster. 128 orderly. 70 A-Perm. 172 Renewal density. 34 A-Pois. see Poisson process Spectral density.174 INDEX Physical models. 2 Semi-experiments. see Operations on point SNMP. 121 Self Similar Process. 7. 24 Spectrum stationary. 23 S-Pkt. 25 Substream . 11 store and forward. 14. 2 Random measure. 23 header. see Point process26 Hausdorff. 127 Autoregressive process. 6 Renewal function. 84. 56 Traffic models Router Ornstein Uhlenbeck process. 63. 13 Packet. 160 Round Trip Time. 12. 12 Fractional Brownian Motion. see Traffic models S-Dur. 152 processes SONET. 30 simple. 24 T-Pkt.

K. Norros. Neame. 779–787. 1995. mod- elling and performance evaluation”. Amindavar and J. Abry. 16th International Teletraffic Congress. in Proc. lecture notes in statistics edition. Mukherjee. and D. Wiley. “Wavelet-based spectral analysis of 1/ f processes”. p. Flandrin. Springer-Verlag. M. Adas and A. Gonçalvès. 34. Hohn. Net- work Magazine. Taqqu. of ACM SIGCOMM. Birkhäuser. “Invariance d’échelle dans l’Internet”. 34 [5] P. 977–984.internet2. Flandrin.”. http://abilene. chapter Wavelets for the analysis. 1993. 34 [6] P. editors. Ritchey. Colloque Mesure de lInternet. IEEE ICASSP. and P. Taqqu. Avranchenkov. 10:5–88. D. New York. chapter Wavelets. 1995. 43 [3] P. 1995. “The impact of peering on ISP performance: what’s best for you ?”. Flandrin. P. P. III 237–240. 1992. “A stochastic model for TCP/IP with stationary random losses”. P. and C. “Padé approximations of probability density func- tions”. pp. IEEE Infocom’95. K. Adler. May 2003. Abry. S. R. 116 [12] D. Allen. and T. Nice. 37 [7] P. Abry. Gonçalvès. J. Abry. Veitch. spectrum estimation and 1/ f processes. Abry and D. Altman. 1999. “The Fourier-series method for inverting transforms of prob- ability distributions. 2 [13] E. Wavelets and Statistics. in Proc. 12 [9] R. E. and M.Bibliography [1] J. G. Addie. Park and W. 1998. 30(2):416–424. 120 [2] Abilene Network. 1998. 148 [10] R. estimation. in Proc. “Performance formulae for queues with Gaussian input”. Mannersalo. “On resource management and QoS quarantees for long- range dependent traffic”. and D. M. IEEE Infocom ’95. Self-Similar Network Traffic and Performance Evaluation. P. in Proc. Flandrin. and I. N. 44(1):2–15. Veitch. Veitch. Barakat. Zukerman. 2000. 107 175 . 14. in Proc. Addie. and P. 103. France. 15 [14] H. ix [4] P. November 2001. in Proc. S. 35.edu. and synthesis of scaling data. Feldmann. 2000. Abate and W. 107. pp. Whitt. 1994. Queueing Systems. “Wavelet analysis of long-range dependent traffic”. 11 [11] R. pp. IEEE Transactions on Information Theory. P. Willinger. “Fractal traffic: measurements. A practical guide to heavy tails. 36 [8] A. IEEE Transactions on Aerospace and Electronic Systems.

Veitch. Chen. V. 1994. 158–163. 84. and C. 1990. Diot. R. “A discrete-time batch Markovian arrival process as B-ISDN traffic model”. 2(32):3–23. C. Gupta. Simonian. “Statistical methods for data with long range dependence”. 2004. 23 [30] P. II France. and W. Riedi. “Traffic behavior analysis with Poisson sampling on high- speed network”. IEEE Journal on Selected Areas of Communi- cation. Chainais. ACM International symposium on symbolic and algebraic com- putation. and P. Sherman. “The Hurst effect under trends”. P. 229 –238. 23 [28] C. “Heavy traffic analysis of a storage model with long range dependent On/Off sources”. Appl. and J. Beran. Nielsen. Chapman and Hall/CRC. 7. Chaffy. 115. “A Markovian approach for modelling packet traffic with long-range dependence”. Cheng and J. J. Massoullié. 6:105–114. Blondia. 1995. Statistics for Long-Memory Processes. K. 15 [17] P. Cambridge University Press. Bingham. 101 . Cambridge England. Adv. Castaing. F. Belgian Journal of Operation Research. “The analytic continuation process: from computer algebra to numerical analysis”. “On non scale invariant infinitely divisible cas- cades”. Taqqu. 12 [22] R. 5(16):719–732. Abry. Journal of Applied Probability. LIMOS UMR CNRS 6158. Thiran. in Proc. N. G. 8 [23] N. 103 [26] F. Baccelli and D. application à l’étude des intermittences en turbulence. A. 75 [18] C. Bak. 2002. 1996. Regular Variation. Nov 6–8 2002. 10 [25] P. and D. Andersen and B. pp. PhD thesis. in ACM SIGCOMM Internet Measurement Workshop (IMW-2002). 147:297–300. Beran. A. Technical Report RR-04-06. pp. ENS Lyon.. 2002. IEEE Transactions on Communications. 7(4):404–427. Brémaud and L. C. 216–222. 34:205–222. “The temperature of turbulent flows”. 170 [31] G. Beran. Gong. and E. in Proc. Goldie. Teugels. Proba. T. Cascades infiniment divisibles et analyse multirésolution. R.176 BIBLIOGRAPHY [15] A. pp. Iannaccone. (20):649–662. Tang. 1993. Phys. “AIMD. Phys. Roberts. 2001. 23:197–225. Queueing Systems. Waymire. M. 43:1566– 1579. IEEE Infocom ’02. 1996. ICII 2001. Chainais. pp. “A flow-based model for Internet backbone traffic”. 35–48. Owezarski. “Variable-bit-rate video traf- fic and long range dependence”. 8. Barakat. Statistics and Computer Science. Statistical Sci- ence. 2001. 12 [27] B. 81 [19] J. 1998. fairness and fractal scaling of TCP traffic”. Lett. J. in Proc. 1992. Willinger. 116 [29] P. 1994. 1987. 10 [16] F. 21 [21] J. 1983. Brichet. 20. 8 [20] J. K. Bhattacharya. “Power spectra of general shot noises and hawkes processes with a random excitation”. Huong. “A forest-fire model and some thoughts on turbulence”. Marseille. and P. 105 [24] C.

BIBLIOGRAPHY 177 [32] B. and Z. 24:4705–4714. 86. ACM SIGCOMM. Daigle.cs. 109. 2002. unc. An Introduction to the Theory of Point Processes. in Proc. 1998.edu/Research/dirt/. 1997. Point Processes. University of Waikato.-W.com.com. A: Math.-L. in Proc. 101 [37] W. pp. Donnelly. 14 [44] DAG network measurement card. 93. Springer-Verlag. Drobisz and K.cisco. Jones. “Self-similarity in world wide web traffic: Evidence and possible causes”.caida. “Queue length distributions from probability generating functions via Fourier transforms”.org. Drabold and J. http://www. (8):229–236..cs. J.nz/. Braun. G. Claffy.-Y. 47–73. Cowpertwait. 1980.ac. Chapman & Hall.. “Adaptive sampling methods to determine network traffic statistics including the Hurst parameter”. 136(8): 1481–1494. Phys. Coffman and A. 129 [49] D. IEEE Annual Conference on Local Computer Networks. J. 101 . 45 [41] P. 2000. 123 [50] J. “Maximum-entropy approach to series extrapolation and analytic continuation”. High Precision Timing in Passive Measurements of Data Networks. 23. 44 [36] K.waikato. 21:49–61. 1552– 1556. Operations Research Letters. 27. pp. 99 [35] K. and H. 1995. IEEE/ACM Transactions on Networking. pp. Int. PhD thesis. “Application of sampling methodologies to network traffic characterization”. http://www. pp. A.. 23. 1988. Kluwer Academic Publishing. 2003. Daley and D. 82. 43 [40] Coralreef software. L.cisco. H. 42. Sampling Techniques. G. http://www.caida. Handbook of Massive Data Sets. 120 [47] DIstributed Real Time Systems . chapter Internet growth: is there a Moore’s Law for data traffic ?. IEEE Journal on Selected Areas in Communications. 99 [34] Cisco Sampled NetFlow. Christensen. http://www. 33. Gen. Claffy. “Parameterizable methodology for Internet traffic flow profiling”. Park. Cochran.-W. 2002. Choi. “Adaptive random sampling for total load es- timation”. J. in Proc. Braun. Polyzos. 1989. IEEE International Conference on Communications.University of North Carolina.org/tools/measurement/ coralreef/. Isham. Zhang. 25. 128 [45] J. 82 [43] M. 101 [38] K. 103. http://dag. 13–17. J. Odlyzko. 1987. “A renewal cluster model for the inter-arrival times of rainfall events”.. 2 [39] Cooperative Association for Internet Data Analysis. 75 [42] D. Vere-Jones. M. Cox and V. 24. 101 [33] Cisco Netflow. 43 [48] S. 107 [46] D. and G. 5(6):835–846. Crovella and A. Polyzos. 1993. http://www. Wiley. Climatol. 238–247. 43. Bestavros. 1991.

301–313. Jacobson. ACM Press. Rexford. Willinger. Misra. J. pp. Feller. 7 [61] A. Russel. 1999. Narayan. 8 [54] N. Lund. submitted. in Proc. 473– 477. G. Gilbert. and R. Lewis. R. Willinger. “Estimating flow distributions from sampled flow statistics”. 1(4):397–413. Varghese. C. 6 [56] A. 1998. 2003. pp. and M. Gilbert. Willinger. Duffield and N. Thorup. in Proc. C. Canada. Proc. 14 . Rabinovich. IEEE/ACM Transactions on Networking. Post Office Electrical Engineers’ Journal. 123 [52] N. Feldmann. 1999. ACM/SIGCOMM conference. “Efficient policies for carrying web traffic over flow-switched networks”. April 1998. 117. T. 1 [59] A. N. IEEE ICC. 6(6):673– 685. A. K. Duffield. O’Connel. 122 [64] W. “Large deviations and overflow probabilities for the general single-server queue. in Proc. “The NewReno modification to TCP’s fast recovery algo- rithm”. Figueiredo. “Experimental queueing analysis with long-range dependent packet traffic”. and W. 56 [62] A. 48 [63] A. 54. Huang. 107–116 vol. “Bitmap algorithms for counting active flows on high speed links”. Computer Networks Journal Special Issue on Advances in Modeling and Engineering of Long-Range Dependent Traffic. 1. 122. V. Mathematical Proceedings of the Cambridge Philosophical Society. Gilbert. 1971. sample less: control of volume and variance in network measurement”. and W. G. 1998. 2003. ACM SIGCOMM Internet Measurement Conference. J. April 1996. in Proc. 122 [58] Federal Networking Council. B. 4(2): 209–223. Lund. and W. C.178 BIBLIOGRAPHY [51] N. 15 [66] S. Cáceres. Internet monthly reports. 2003. An Introduction to Probability Theory and Its Applications. 7. Willinger. 325–336. Estan. ACM SIGCOMM. 101 [53] N. Thorup. 7. IEEE/ACM Transactions on Networking. 122 [60] A. 1993. G. O’Connel. 2002. “The changing nature of network traffic: Scaling phenomena”. Cáceres. and M. “On the autocorrelation structure of TCP traffic”. Towsley. Vancouver. G. Duffield. “Predicting quality of service for traffic with long-range dependence”. pp. 60 [57] C. Erlang. Douglis. Feldmann. Floyd and V. “Performance of web proxy caching in heterogeneous bandwidth environments”. 1918. 118. volume 2. “Dynamics of IP traffic: A study of the role of variability and the impact of control”. Feldmann. Computer Communication Review. pp. O. R. Fisk. A. 9. Feldmann. and F. 118:363–374. Toomey. with applications”. C. 101. A. R. John Wiley and Sons. “Learn more. Feldmann. October 1995. “Solution of some problems in the theory of probabilities of some significance in automatic telephone exchanges”. ACM SIGCOMM. and M. 2nd edition. “Data networks as cascades: Investi- gating the multifractal nature of Internet WAN traffic”. and D. Erramilli. 27 [65] D. in Proc. W. 12 [55] A. Liu. F. IEEE INFOCOM’99. IEEE/ACM Transactions on Networking. 1995. 1995. Glass. and T. and M. P. Duffield. 41. 10:189–197. Kurtz.

Special Issue on Signal Process- ing in Networking. “A Markov modulated characterization of packetized voice and data traffic and related statistical multiplexer performance”. Probab. Miami. T. “Investigating the scaling behaviour of Internet flow arrivals”. D. a Natural Langage for Network Traffic”. and P. O’Connel. and A. A. Appl. Adv. Veitch. 1986. Diot. 12:727–745. 1995. D. IEEE Trans. Cotton. Wiley and Sons. (fast track submission). “Packet-level traffic measurements from the Sprint IP backbone”. Heath. University of New South Wales. and I. R. 9. 63–68. Vol 1. Abry. Gaver and P. Lucantoni. S. W. N. Fraleigh. 1974. in Proc. 629 – 640. in Proc. Matta. October 2003. Wischik. ix [81] N. 1(23):145–164. Applied and Computational Complex Analysis. Moon.. IEEE Journal on Selected Areas in Communications.BIBLIOGRAPHY 179 [67] C. Moll. “Inverting sampled traffic”. Grossglauser and J. 170 [69] W. Veitch. Lectures Notes in Mathemat- ics. Big Queues. Springer. ix [80] N. Veitch. Bolot. M. 116 [78] N. 2003. and C. ix . F. “How does TCP generate self-similarity ?”. Gardner. “Cluster Processes. Ganesh. IEEE Transactions on Signal Processing. Lyles. Seely. 1980. p. Feldmann. New York. Veitch. pp. 6(4):856–867. “Inverting sampled traffic”. May 2002. Hohn. November 2002. 7 [72] M. Information Theory. and D. ACM Internet Measure- ment Conference. 215. ix [79] N. Grasse. Gilbert. Hohn. 222–233. 1999. 8. 2001. Rockell. “First-order autoregressive gamma sequences and point processes”. “Scaling analysis of conservative cascades. 27 [71] A. 1998. 17(6):6–16. Abry. 2003. Hohn and D. and G. Cyclostationarity in Communication and Signal Processing. M. 2004. USA. Annales Mathématiques Blaise Pascal. 6. Technical Report COST 242. C. To be published. 14 [76] H. IEEE Press. Willinger. Lewis. pp. Henrici. B. in Proc. C. Veitch. W. Arnold. Heffes and D. Crovella. with application to network traffic”. 129 [68] A. International Conference on Self-Similarity and Applications. IEEE/ACM Transactions on Net- working. 8 [74] L. and P. France. “On the non-stationarity of MPEG-2 video traffic”. Khan. IEEE/ACM Transactions on Networking. R. S. Mathematics of Opera- tions Research. France. Guo. Hohn. and P. ix [82] N. and J. Analysis and Simu- lation of Computer and Telecommunication Systems (MASCOTS’01). Abry. M. 51(8):2229–2244. 167 [70] D. Clermont Ferrand. 10 [77] P. Frater. M. 115. D. ACM Internet Measurement Workshop. Resnick. A. Samorodnitsky. 1994. 1999. 15 [75] D. “Does fractal scaling at the IP level depend on TCP flow arrival processes ?”. D. “On the relevance of long-range dependence in network traffic”. in Proceedings of the Ninth International Symposium in Modeling. P. “Heavy tails and long-range depen- dence in ON/OFF processes and associated fluid models”. Marseille. pp. 45 (3):971–991. 8 [73] M.-C. IEEE Networks. Hohn and D.

120 [93] J. 1997. 112 [92] Internet Protocol Flow Information eXport (IPFIX) . VI 37–40. 1981. in Proc. Analysis of an Equal-Cost Multi-Path Algorithm. “Information flow in large communication nets”. Klivansky.ietf. L. 12 [88] Y. 101 [89] H. ACM SIGCOMM Internet Measurement Workshop. Veitch. ix [85] N. 7 [94] S. 1 [97] L.pdf. “Bridging router performance and queueing theory”. 122 . and P. Xu. USA. M. “Issues and trends in router design”. 68(1):165–176. Mogul. Queuing Systems. Diot.charters/ipfix-charter. “Heavy-traffic analysis of a data handling system with many sources”. NYC. 8 [98] S. Hong Kong. Diot. 1975. J. H. Keshav and S. H.IETF Working Group. Pullen. 1999. “On long-range dependence in NSFNET traffic”. Technical report. RFC 2992. and C. R. Li. Jerkins and J. Veitch. M. Y. in Proc. “Countering denial-of-service attacks using congestion triggered packet sampling and filtering”. Georgia Institute of Technology. Proc. Song. “The impact of the flow arrival process in Internet traffic”.inmon. “Long-term storage capacity in reservoirs”. Kumar. A. Hosking. in Proc. Wang. Wang. Toronto. Rosen. “A measurement analysis of ATM cell-level aggregate traffic”. 1995. D. Hohn. Hurst. C. 490–494. K. M. I. 132 [87] J. Mukherjee. John Wiley and Sons. C. 1991. x [86] C. Kleinrock. 99. IEEE ICASSP. in Proc. April 2003. 2001. D. 136 [96] L. Choi. IEEE Globecom ’97. D. M. A.com/ PDF/sFlowBilling. Papagiannaki. Abry. Keown. D. Knessel and J. in Proc. L. “Fractional differencing”. IEEE Communication Magazine. 1961. (submitted). and C. 1589–1595.0 and HTTP/1. and T. 76(11). and J. in Proc. RLE Quarterly Progress Report.180 BIBLIOGRAPHY [83] N. Morrison.1”. 2004. Biometrika. 13 [100] B. in Proceedings of the WWW-8 Conference. 10 [95] S. 4(50):633–642.. Iannaccone. 36(5):144–151. pp. http: //www. Graham. and D. ACM SIGMETRICS conference. 1950. E. “Splitting and merging of a traffic model: validation”. Hohn. American Society of Civil Eng. Sung. 52 [101] A. “Monitoring very high speed links”. “An application of Markovian ar- rival process (MAP) to modeling superposed ATM cell streams”. Hopps. 8 [90] G. Hohn. June 2004. and B. International Conference on Com- puter Communications and Networks. Volume 1: Theory. 2001.org/html. and N. SIAM Journal of Applied Mathematics. 2000. sFlow accuracy and billing. K. Kristol. 2003. 51(1):187–213. ACM SIGCOMM Internet Measurement Conference. Kang. Veitch. 2002. D. 100 [91] Inmon Corporation. 8 [99] C. http://www. Kleinrock. ix [84] N. Krishnamurthy. IEEE Transactions on Communications. 1998. “Space-code bloom filter for efficient traffic flow measurement”.html. “Key differences between HTTP/1. Kim. J. Huang and J. pp. Ye. pp. L.

A. W. 14. Alderson. Logistics Quart. Fernandez-Veiga. Clark. S. Madhow. 1997.isi. Roberts. Lindley. Lewis and G. Latouche and M. 2000. Satist. 20(2):777–809. Rodriguez-Rubio. 75 [104] A. “Modeling and simulation of a nonhomogeneous Poisson process having cycle behaviour”. 14 [116] D. C. (92):881–893. R. 1991. Lewis. Khan. 15 [103] G. W. Suarez-Gonzales. B. A. “The theory of queues with a single server”. Simula. Appl. volume 48. C. IEEE/ACM Transactions on Networking. Tsybakov.. 33 [105] A. R. 1979. Naval Res. and N. 12 [118] S. “On the self-similar nature of ethernet traffic”. Lakshman and U. Taqqu. 2002. “The past and future history of the Internet”. Remiche. 167 [114] L. Physical Review A. V. L. Industrial Engineering and Management Science. Teich. D. McGraw Hill. J. Lewis.. http://www. 1964. and S. IEEE Infocom ’95. Ray. IEEE/ACM Transactions on Networking. W. 1991. Lee and J. Lopez-Garcia. “Doubly stochastic point process driven by fractal shot noise”. ACM SIGCOMM. D. “A branching Poisson process model for the analysis of computer failure patterns”. Cerf. M. 1972. 1994. Kleinrock. A. Series B. 2nd edition. Lowen and M. 6 [117] J. D. “On the use of self-similar processes in network simulation”. Doyle. 1952. K. 43(8):4192–4213.. Journal of the American Statistical Association. L. Series B. Postel. 26(3):398–456. 1997. “Simulation of nonhomogeneous Poisson processes by thinning”. Wilson. and R. A. J. 87 [112] P. 21 . Journal of the Royal Statistical Society. 56 [107] S. Kelton. C. 167 [108] B. “Arbitrary event initial conditions for branching Poisson processes”. J. Cambridge Phil. Shedler. 12. 49(1-4):359–370.-A. pp. 109 [111] P. Likhanov. Wolff. non linearity and periodic phenomenon in sea surface temperatures using TSMARS”. Wilson. W. V. soc. in Proc. Journal of the Royal Statistical Society. Communications of the ACM. Prob.BIBLIOGRAPHY 181 [102] T. in Proc. 277–289. 75. 1997. A. 5(3):336–350. Lewis and B.. 1969. in Proc. and D. “An MAP-based Poisson cluster model for web traffic”. 1995. 34(1):114–123. 1 [109] W. W. 7. Leland. “Performance analysis of TCP/IP for networks with high bandwidth-delay products and random loss”. 1991. ACM Transactions on Modelling and Computer Simulation. Georganas. 2:1–15. C. 14. Willinger. Simulation Modeling and Analysis. V. “Modeling long-range dependent. “Asymptotic properties and equilibrium conditions for branching Poisson processes”. Li. 2004 (to be published). Leiner. Willinger. and J. W. 6:355–371. 170 [115] N. 8 [110] P. Lynch. 26. B. Performance Evaluation. Commun. Lawrence. 87 [106] LBNL Network Simulator. “A first-principles approach to understanding the Internet’s router-level topology”. 10(2):125–151. M. “Analysis of an ATM buffer with self-similar (fractal) input traffic”. 40(2):102–108. D. Law and W. 11 [113] P. 26(3):403–413. Lopez-Ardao.edu/nsnam/ns/.

Gong. 43. “The concept of relevant time scales and its appli- cation to queuing analysis of self-similar traffic (or is Hurst naughty or nice?)”. 1998. “Large buffer asymptotics for the queue with frac- tional Brownian input”. Diot. in Proc. Norros. 1(1):39–63. 14th ITC Specialist Seminar. Aug. “Reasons not to deploy RED”. and D. 15 [129] O. Queueing Syst. 1999. 1970. 10:82–113. Misra. May. http://www. ACM SIGCOMM. “iSLIP: A scheduling algorithm for input-queued switches”. and D. London. Mandelbrot. “Properties and prediction of flow statis- tics from sampled packet streams”. “Exact asymptotic queue length distribution for fractional Brownian motion”. 12 [130] National Laboratory for Applied Network Research. 12 [120] S. 1992. John Hopkins Uni- versity Press. Micheel. B. 183(2):346–363.-C. 47(2):992–1001. 7th IEEE/IFIP International Workshop on Quality of Service (IWQoS’99). Teich. 10 [135] I. 108 [127] V. 126. C. 2000. in Proc. Academic Press. Lund. “A storage model with self-similar input”. and M. L. 36:894–906. 102. W. pp. 2000. 121 [132] A. 12 [136] I. Advances in Performance Analysis. Lyles.182 BIBLIOGRAPHY [119] S. Graham. Towsley. 12 [123] M. Neidhardt and J. IEEE/ACM Transactions on Networking. 42 [126] K. J. Narayan.Duffield. Mallat. pp. SIAM Journal on Applied Mathematics. 2002. 112. “On the use of fractional Brownian motion in the theory of connectionless networks. Gong. 34 [121] B. IEEE Journal on Selected Areas in Communications. 101. 1998. Wang. F. 1998. L. Performance. “Stochastic differential equation modeling and analysis of TCP-windowsize behaviour”.nlanr. Misra. W. “Fractal renewal processes generate 1/f noise”. I. “Long-run linearity. Simonian. C. June 1999. 222 – 232. 1999. 109 [131] N. 15 [128] V. (75):1255–1265. Norros. Brownlee. 16:387–396. Lowen and M. “The Auckland data set: an access link observed”. 13 [122] L. Phys. 7. 12 . “Stabilized numerical analytic prolongation with poles”. Neuts. “Models based on the Markovian arrival process”. Neuts. 14 [124] N. and N. in Proc. C. H-spectra and infi- nite variance”. ACM SIGMETRICS. 109. 1993. Rev. 8 [133] M. 127 [125] J.net/. Bolot. Miller. 7(2):188–201. Massoulié and A. and B. 1994. 1999. Matrix-Geometric Solutions in Stochastic Models. in Proc. 10 [134] M. ACM SIGCOMM Internet Measurment Workshop. 953–962. F. B. B. 1981. Journal of Applied Probabilities. “Fluid-based analysis of a network of AQM routers supporting TCP flows with application to RED”. E.”.. 1995. A Wavelet Tour of Signal Processing. McKeown. 1969. International Economic Review. locally Gaussian processes. Thorup. Towsley. in Proc. IEEE Transactions on Communications.

“Wide-area traffic: The failure of Poisson modelling”. J. 1999. Klim. 2 [146] V. Fraleigh. “Network performance monitoring at small time scales”. France. V. 131 [147] V. 136 [142] K. Parulekar and A. INFOCOM. Padhye. IEEE Infocom. PhD thesis. 126 [141] K. pp. Kurose. in Proc. Riedi. 1995. P. France. Riordan. “Growth trends in wide-area TCP connections”. “Analysis of measured single-hop delay from an operational back bone network”. C. Tobagi. Lévy-Véhel. Peltier and J. 1996. Simonian.BIBLIOGRAPHY 183 [137] I. 14 [140] K. INRIA Rocquencourt. “A multifractal wavelet model with application to network traffic”. Technical Report RR-2645. Baraniuk. and J. D. pp. of ACM SIGCOMM. charters/psamp-charter. 1997. 9th ITC Specialists Seminar’95: Teletraffic Modelling and Measurement. V. 14 [144] M. Papagiannaki. and J. L. 7. Véhel. “A Beneš formula for a buffer with fractional Brownian input”.. Moon. 3(3):316–336. pp. transport protocols and self-similar network traffic”. Paxson. 183–198. “Origins of microcongestion in an access router”.ietf. S. in Proc. Riedi. 1995. 7. G. April 2004. and C. Cruz. H. Paxson. “An improved multifractal formalism and self-similar measures”. F. and N. Norros. 13 [145] V. 131. Papagiannaki. 22 [152] R. 295– 300. M. Papagiannaki. in Proc. 106 . 1998. Philipson. “M/G/∞ input processes: A versatile class of models for network traffic”. Veitch. D.. G. F. Passive and Active Measurment Workshop. IEEE Networks. 9. University of California. 8 [148] R. Makowski. Math. 6. 1994. IEEE Transactions on Information The- ory. “Multifractional Brownian motion: definition and preliminary results”. Towsley. Wiley and Sons. 1995.org/html. 96 [151] R. Veitch. J. 2003. Crovella. and R. 1968. in Proc. Hohn. INRIA. 1995. 7 [138] Packet Sampling . IEEE International Conference on Network Protocols. Park. 171–180. Virtamo. 419. and C. Paxson and S. 120 [139] J. ix. “Modeling TCP throughput: a simple model and its empirical validation”. Appl. Berkley. Diot. 1997. Crouse. Anal. Skand. 45(3):992 –1018. Riedi and J. New York. in Proc. “On the relationship between file sizes. and M. 22 [149] C. 4(8): 8–17. 13 [153] J. 170 [150] R. 1966. Firoiu. S.html. Technical Report 3129. Miami. Thiran. p. Aktuartidskr. Floyd. 189:462–490. Ribeiro. 135. R. 125. H. Combinatorial Identities. 2002. 148 [143] K. D. http://www. Antibes. 1997. in Proc. “Lewis’ branching Poisson process model from the point of view of the theory of compound poisson processes”. M. “Multifractal properties of TCP traffic: a numerical study”. ACM SIGCOMM Internet Measurement Conference. Measurements and analysis of end-to-end Internet dynamics. A. IEEE/ACM Transactions on Networking. Diot.IETF Working Group. in Proc.

Lut- ton and C. Sherman. V. Ryu and S.-Y. 2001. 51. 356–363. “Is network traffic self-similar or multi- fractal?”. Ott. in Fractals in Engineering’97. R. Tricot. 21 [171] J. 1994. Campos. Salt Lake City. 33 [170] M. “Transient ans stationary distributions for fluid queues and input processes with a density”. L. Addison-Wesley. and R. Flandrin. V. Willinger. with applications”. Teverovsky. Boudec. Baraniuk. S. W. 27-28:159–173. Virtamo. Roughan. 13 [163] W. and W. 1998. 1997. Stochastic models. Simpson. D. Smith. Fractals. 90. Willinger. D. 2001. and D. D. “Point processes models for self-similar network traffic. S. 11 [155] M. 13 [159] S. http://www. 245–256. Performance Evaluation. 26:5–23. P. “On the convergence of MMPP and fractional ARIMA processes with long-range dependence to fractional Brownian motion”. L. (5):63–73. and M. Birkhauser. 1997. Veitch. A Practical Guide to Heavy Tails: Statistical Tech- niques and Applications. and J. Roux. Taqqu. B. Lévy Véhel and E. Taqqu. 14. Sikdar and K. “Statistical scaling analysis of TCP/IP data”. L. Rumsewicz. Jeffay. Passive and Active Measurement work- shop. 1998. Prentice-Hall. Springer. 1991. pp. Broadband Integrated Networks. Robert and J. 10 [161] B. in Proc. Fractals. Teverovsky. in Proc ICASSP 2001. 166 [160] M. 49 [167] M. in Proc.com. Teverovsky. Computer Communication Review. Taqqu and V. 14:735–761. 1996. USA. S. and H. and W. Véhel and R.sprint. Vastola. “On a markov modulated chain exhibiting self- similarities over finite timescale”. 15. Ryu. Taqqu. 3 [166] W. Lowen. F. 47 [169] M. “Computing queue length distributions for power-law queues”. J. chapter On estimating the intensity of long-range depen- dence in finite and infinite variance time series. H. “What TCP/IP protocol headers can tell us about the web”. K. in Proc. 10 [162] A. 1994. in Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems. 3(4):785–798. 1995. D. 128 [164] F. P. “Estimators for long-range depen- dence: an empirical study”. Willinger. chapter Fractional Brownian motion and data traffic mod- eling: The other end of the spectrum. Huang. SIAM Journal of Applied Mathematics. Veitch. 96 [168] M. Sarvotham. H. 1998. pp. IEEE Infocom. 163. May 2001. editors. Riedi. RFC 1662. Schwartz. 1997. pp. “Connection-level analysis and modeling of network traffic”. Volume 1: The Protocols. 45 [158] B. 57 [165] Sprint corporation. S. and R. 2001. 12. T. 107 [156] S. Abry. “Internet flow characterization: Adaptive timeout strategy and statistical modeling”. Riedi.184 BIBLIOGRAPHY [154] S. 13 [157] B. “Proof of a fundamental result in self- similar traffic modeling”. Simonian and J. Braun. Cheney. Micheel. 52. PPP in HDLC-like Framing. 7 . 1996. TCP/IP illustrated. 2000. 34th Conference on Information Sciences and Systems. Stevens. 177–217. in Proceedings of the ACM SIGCOMM Internet Measurement Workshop.

in Procedings of ACM/SIGCOMM Internet Measurement Workshop 2001. V. V. 1986. Vattay. Duffield. “Self-similarity through high variability: statistical analysis of Ethernet LAN traffic at the source level”. “Meaningful MRA initialisation for discrete time series”. 15 [177] Waikato Applied Network Dynamics. N. “On multi channel queuing systems with fluctuating parameters”. 2000. chapter Traffic modeling for high-speed net- works: Theory versus practice. Wilson. Chainais. in Proc. and P. 10 . and G. 2003. Zhang. 81. S. 2000. Boda. 1986. 2000. “On the propagation of long range dependence in the Internet”. and P. 76. 10 [184] M. “Queue size and delay analysis for a communication system subject to traffic activity mode changes”. 1995. Willinger. “Small-time scaling behaviors of Internet backbone traffic: An empirical study. Kenesi. http://wand.waikato. 1995. in Proc. IEEE Infocom ’86. pp. Willinger. November 1999. Springer-Verlag. in IEEE Infocom. and P. Zukerman and I. 12 [181] Y. Abry. Taqqu. pp. 38 [175] A.nz/ wand/wits/. (submitted). Taqqu. and S.BIBLIOGRAPHY 185 [172] D. 75 [182] Z. Signal Processing. Xue. 15 [176] A. Z. in Proc. Stochastic Networks. Zukerman and I. 97 [174] D. Ribeiro. 100–113.”. P. Veitch. 8:1971–1983. IEEE Transactions on Communica- tions. in Pro- ceedings of ACM SIGCOMM’95. of IEEE ICASSP. Veres. Veitch. P. 2004. Abry. 2000. Sherman.ac. 395–409. in Proc. Veres and M. 90. “Modeling and predicting long-range dependent traffic with FARIMA process”. Moon. Molnar. Hohn. 92 [183] M. in Proceedings of ACM SIGCOMM. “On the constancy of Internet path properties”. M. Abry. 43.cs. 63 [180] F. Rubin. 600–608. Paxson. Flandrin. 2001. and D. N. 13 [173] D. and C. 109. ix. 6(34):622–628. 14 [179] W. “The chaotic nature of TCP congestion control”. R. IEEE Infocom. M. 41. Rubin. “Multifractality in TCP/IP traffic : the case against”. S. Diot. pp. Zhang. of 1999 International Symposium on Communication. “Infinitely divisible cascade analysis of network traffic data”. Veitch. Shenker.-L. 125 [178] W.