Adavit

Herewith I , Peter Dornger, declare that I have written the present master thesis fully on my own and that I have not used any other sources apart from those given.

First Name Surname

0910581003 Registration Number

ii

Acknowledgement
The greatest mistake you can make in life is to be continually fearing you wi l l make one. Elbert G. Hubbard (1856-1915)

First, I want to thank my supervisor Peter Kritzer, who gave me valuable advices to make a high quality thesis. He managed it to challenge me throughout the thesis to perform my best. Great thanks go to my colleagues for spending their time for fruitful discussions and thus signicantly improving my work. While it is not possible to provide an exhaustive list here, I want to thank two people for their input. First Georg Panholzer who helped me to make working code out of my thoughts. Second Wolfgang John who provided me with his set of ground truth traces for my evaluation. To family and friends I say thanks for taking me away from work, and make my life so enjoyable. Renate, Tobias and Verena managed that it does not take longer than ve minutes to forget the work and enjoy life, which was a prerequisite to nish my thesis.

Details
First Name, Surname: University: Degree Program: Title of Thesis: Academic Supervisor: DI (FH) Peter Dornger Salzburg University of Applied Sciences Information Technology & Systems Management Real-Time Detection of Encrypted Trac based on Entropy Estimation Mag. Dr. Peter Kritzer

Keywords
1s t Keyword: 2n d Keyword: 3r d Keyword: 4t h Keyword: 5t h Keyword: trac classication entropy estimation user privacy real-time detection t r a c ltering

Abstract
This thesis investigates the topic of using entropy estimation for t r a c classication. A real-time encrypted t r a c detector (RT-ETD) which is able to classify t r a c in encrypted and unencrypted t r a c is proposed. The performance of the RT-ETD is evaluated on ground truth and real network traces. This thesis is opened by some introductory chapters on entropy, pattern recognition, user privacy and trac classication. A real-time encrypted t r a c detector which is targeted to operate in a privacy preserving environment is presented. The RT-ETD consists of several modules that can be used to customize the approach for specic needs. A customization for two dierent tasks is performed, where unencrypted t r a c is dropped and only encrypted t r a c is forwarded. The classication of the RT-ETD is solely based on information gathered from the rst packet of a ow. Header elds as well as the payload are taken into account. The core concept of the RT-ETD is based on the estimation of the entropy of the payload, and a comparison of the retrieved value to the entropy of a uniform distributed payload. Based on ground truth traces with encrypted t r a c and real network traces it is shown that the RT-ETD is able to lter out a large fraction of unencrypted t r a c , whereas a large fraction of encrypted ows is forwarded. The optimal parameterisation of the RTETD depends on the trade-obetween detection performance and privacy preservation.

...................................................................................2 2.16 Entropy-based Test for Uniformity ...........4 Entropy Estimation ..................12 Undersampled Estimation ..............................1 Pre-Classication .....................................20 3 Pattern Recognition 22 ix 1 4 ii iii iv iv iv v 3....................................................................2 Classication ......................................................................................................4 2........................23 3.........17 Alternative Tests for Uniformity .....................3 2........................................24 ...........................................................................................5 Contents Adavit Acknowledgement Details Keywords Abstract Table of Contents List of Figures List of Tables 1 Introduction 2 Entropy and Entropy Estimation Entropy .......................

.........5 Content-based Classication ...............................................................................................2 SPID Customized Classication .........................................50 6........59 6...........................................................1 Architecture ...............................1 Packet-based Classication ...............2 Development of a Privacy Preserving Application .1 Skype Customized Classication ..............57 6.................................................2 7.................................................35 4..5 4...4 IP-address-based Classication ......39 Online vs Real-Time Classication ..............................................................................6 Full Classication Chain ...51 6..................63 7 Evaluation Inuence of the Symbol Length .......................................74 SPID Trac Pre-Filter ............2 Content Providers and Privacy ..........................................7 1 4..............................6.................41 Trac Classication using Entropy .....................................................2 4.....3 Coding-based Classication .............................68 Appropriate Size of Condence Bound ....41 5 User Privacy i n Computer Networks 43 28 5............................1................................58 6.......................................................4..........................................................................................2 Host/Social Information based Classication .............1 Port-based Classication ...............................................................3 7......5 4 T r a c Classication Port-based Classication ....4 7.................................4 7.............................................................1........6 4...........74 Skype Trac Pre-Filter ........................................71 Evaluation of Coding-based Classier .....1 Privacy Preserving Network Monitoring ...................47 6 Real-Time Detection 49 6...........................................32 Classication in Dynamic Port Environments ........................................................4.75 66 .............6......2 Entropy-based Classication ........33 Classication when Trac is Encrypted .....56 6..37 Recent Trac Classication Approaches .....................................................................................................................................4..30 Challenges in Trac Classication ............................................3 4............58 6...........................................47 5..............................................................44 5.34 4..45 5..................................................

..............................................................................79 Bibliography List of Abbreviations 80 87 .........8 Conclusion 78 8.......1 Future Work ..

.6 6.......................................................................1 Example of handwritten text .......................................................................................................1 Decomposition of the choice for horse race example ..59 Skype pre-ltering ow chart ................................................................ p..............................................................1 PRISM architecture ....2 Evaluation of a based on ground truth trace ........54 CDF for UDP/TCP packets ...........................10 2...........69 7.......... see [70] .........2 Long Skype UDP probe (based on [1]) ..64 7......................................3 Skype TCP handshake (based on [1]) ....................................55 Block diagram for classication ..........................4 6...2 H(p) vs...............................................................5 6..52 Normalized length of the condence intervals ...........................................3 CDFs of HN(U) HM L E SD ~ .......25 4.............. ~ 72 .........................................................................................................................................7 List of Figures 2.........................................3 MLE distributions for m = 256 and dierent N ................36 4..................................................6 2...................15 3......................................37 4..38 5...................................................................................1 TCP / IP Header eld usage for t r a c classication ...............................................61 SPID pre-ltering ow chart .67 7.................................1 Evaluation process ...46 Inuence of a on entropy for given word length based on Monte-Carlo .............................................................................................6..........53 Standard deviation ...........

..........1 Important well-known port numbers .............. 68 Evaluation of a based on real network trace .........12 Probabilities and information content for a sequence of horse races ........2.............................5 7.......................................13 Undersampled Entropy Estimation for a sequence of horse races ..................... 51 Evaluation of a based on ground truth trace ...........4 2.. 70 Evaluation of condence bound based on ground truth trace ......4 7.5 .......................6 List of Tables Probabilities and Shannon information content for an English text........................1 Encrypted well-known port numbers ...3 2......................... 31 6.....3 7............17 4... see [44] 8 Probability-based coding scheme 9 Number of transmitted bits for 64 horse races ........................................................................

.................................................................. 73 Results for Skype pre-lter ......................... 76 ......... 75 Results for SPID pre-lter ............................................................................................................... 73 Evaluation of condence bound based on real network trace ..................

Applications like P2P le-sharing. To provide QoS to these applications the t r a c belonging to these applications has to be identied within a multiplexed t r a c aggregate. VoIP communication. Legacy network . which is an essential prerequisite for network operation. A major trend in the Internet is that legislation evolves into digital life. During the last years the legislation has enacted a few laws which extend the right for privacy to the world of computer networks. Video on Demand or videoconferencing are of growing interest. This identication of t r a c is known as trac classication. as trac from real users is collected in passive monitoring. these methods are summarized under the term Quality of Service (QoS) for the Internet. Consequently. Several methods have been proposed to better support such applications. new architectures and algorithms have to be developed to support network operation in accordance with user privacy. Based on an improved understanding. Trac classication is further needed to understand the network behaviour. Content providers and Internet Service Providers have to respect and protect the privacy of their customers. for example network protocols are enhanced or a better network management is applied. Due to limited network resources such applications are raising new needs within the Inter-net.1 Introduction During the last years there has been an enormous increase in applications which are using the Internet for communication. Especially the research eld of passive network monitoring is inuenced by these laws. QoS supports services that have specic qualitative needs by providing them with the appropriate quality.

Major challenges in t r a c classication are discussed. where the RT-ETD can be . As only the rst packet of a ow is processed the algorithm is capable to operate in a real-time environment. This thesis focuses on a trac classier that is capable of classifying t r a c into encrypted and unencrypted t r a c . Chapter 5 presents potential conicts between network operation and user privacy. A framework for privacy preserving network monitoring. The usage of entropy estimation as a basis for a test on uniformity is presented as well as alternatives to entropy for tests on uniformity are provided. This approach supports network operators in a privacy preserving network operation. Chapter 4 considers approaches for t r a c classication. The challenges in pattern classication are discussed based on an example. The chapter further presents dierent methods for estimating the entropy. A focus is on algorithms that are able to classify encrypted t r a c and on the usage of entropy in t r a c classication. It is shown that the proposed approach is able to identify encrypted ows and operate in an online/live/real-time environment as a pre-lter for advanced classication approaches. The integration of a priori knowledge into classication is shown by the usage of the Bayes formula. The classier is used to drop unencrypted t r a c which ensures that only encrypted t r a c can be further processed. The core concept of the RT-ETD is based on estimating the entropy of the payload. The focus of the chapter is on entropy estimation in an undersampled environment as this is needed as an essential prerequisite for the RTETD. Starting from historical portbased classication methods and evolving towards recent algorithms. The pre-ltering has to be applied as closeto the collecting point as possible. The remainder of this thesis is organized as follows: Chapter 2 gives a detailed overview on entropy. These applications have to be modied to perform theirs tasks on a pre-ltered subset of the original t r a c . Chapter 3 briey introduces pattern recognition/classication. The main steps needed before classication can be performed are presented.monitoring applications are reading all customer t r a c to perform their algorithms. The approach is called real-time encrypted t r a c detector (RT-ETD).

where the focus is on the usage of entropy on the payload of the rst packet. Several modules that can be used in a real-time detector are described. Chapter 6 investigates a new approach on real-time detection of encrypted t r a c . The algorithm is further customized as a pre-lter for two dierent t r a c classiers. The applicability of the approach as a pre-lter for two dierent types of trac classiers is demonstrated. at the end of the thesis.used. It further provides information about the conicting situation of content providers and user privacy. Further all abbreviations that are used within this work can be found in the List of Abbreviations. The inuence of setting two parameters is evaluated. Chapter 8 concludes the presented work and indicates directions for future work. Chapter 7 reports evaluation results for the approach using ground truth traces and real network traces. is described. .

This is ensured by a proper encoding of the information. Entropy is used as a kind of measure to represent the information content (number of bits for transmission). it is a goal to transmit information with as few resources as possible. The chapter is based on [11. 44. I t is favourable to transmit a message with 4 bits rather than 5 bits if the same information can be represented by 4 bits. it is obvious that we are transmitting the sequence 000 (of the rst horse) more often than those of the other . The main issue is to use statistical knowledge about the data to be transmitted to reduce the required capacity of the channel for transmission.2 Entropy and Entropy Estimation This chapter provides a description of entropy. Transmitting the winner of the horse race would need 3 bits per race. 1 64. The simplest way would be to give each horse a unique number starting from 000 to 111 in binary representation. The chapter will further evolve into entropy estimation and entropy estimation in an undersampled environment as these are core concepts of the presented work. 24. At this point entropy comes into play. 1 16. but if we assume that the probabilities of winning for the eight horses are 1 2. 1 64. based on statistical knowledge of the transmission content. 70]. Entropy is a quantity that can be used to evaluate the information content of a message. 1 8. 1 64. 2.1 Entropy I n information theory. Suppose that we would like to transmit the winner of a horse race with eight horses. This is a primitive variant. 1 4. 1 64.

1.1. when there are more possible events. With equally likely events there is more choice. We will use this example repeatedly within this section. Based on the probabilities. 10]. This is all we know about which event will occur.1) . 10]. . pi = 1 n . The meaning of this for the horse race example.. we would like to dene how uncertain we are about the outcome. It is obvious that using a shorter sequence for the rst horse and longer numbers for Horses 5 to 8 will potentially have the eect that for a number of races fewer bits have to be transmitted than in the simple case.H should be continuous in the pi [70. The nal probabilities are the same in the left part of the gure as in the right part. P. is illustrated in Figure 2. Shannon considers the case where we have a xed number m of possible events p p A1mA whose probabilities of occurrence 1 2m p are known.. Shannon denes a measure H(p1 p2 pm) of how uncertain we are about the result of an experiment. 2. This measure is known as entropy in information theory.horses. P. 1 i m. H 21 41 8 1 161 641 641 64121 + 1 = 6 H 4 1 ~1 2 2H ~ 21 +1 2 4H 21 2 ~ + ~1 +1 +8H 2 1 16H 4 1 4 1 4 1 2 4 (2. the sequence 000 will be transmitted for 50 percent of the horse races. In 1948 Shannon [70] developed a measure for the uncertainty of a message. in our example ~ ~1 ~ ~1 ~1 ~ ~ ~1 H(p 1p2pm) that fu ll s this property we require. the original H should be the weighted sum of the individual values of H [ 7 0 . Our notation for n is m. More precisely..If a choice be broken down into two successive choices. the events are Horse 1 wins.If all the pi are equal. In the horse race example the uncertainty would be higher if all horses had the same probability of winning the race. Horse 8 w i n s w i t h the probabilities given above. then H should be a monotonic increasing function of n. or uncertainty. 10]. P. Shannon states three desirable properties for such a measure.[70. 3. The uncertainty depends on how likely it is that an event occurs. For the horse race example we have m = 8. To have a measure that. having ve successive choices.

1: Decomposition of the choice for horse race example In our example the choice is broken down into ve successive choices. m According to [70] the only H that satises the three assumptions above is of the form: X H=K p i= 1 i log(pi) where K is a positive constant and log is the logarithm to the base 2. entropy provides us a measure of how many yes/no questions we need on average to identify the actual value of a discrete random variable. consequently there is the coecient of Figure 2.1) before the second term. In half of all the races we would only need one question to get a yesand we are done. only occurs in 1 1 6 1 2 in Equation (2.Figure 2. this would mean how many yes/no questions we need on average to nd out the winner of the race. For the other horses we will . Decision number two only occurs half of the time.1). The last term in Equation (2. For entropy K = 1 is used.1. Based on a top down view. For the given probabilities it is intuitively clear that we rst ask Has horse number one won the race?. For 75 percent of the races we will get a yes with at most two questions. which will be used throughout the whole thesis. Based on the horse race example. choice number ve in 1 1 1 =1 2 2 2 2 of the cases. If we get a nothe second question would be Has horse number two won the race?.

Horse 4 will be dealt with in the same way. 01.3) bits. Consequently we will only use one bit that represents the number of Horse 1 where the value of the bit is 1. By changing the representation format we now have an average length of 348 = 325 bit. in other words. the entropy H is equal to two: ~ ~ ~ ~ ~ 1 ~1 H = 2 log 2 +1 4 log ~1 +1 8 log ~1 +1 16 log ~1 +4 64 log ~1 =2(2. Or. If we would like to transmit the winner of the horse race we will have to transmit 1 bit in 50 percent of the races.000001.4) 64 4 8 16 . but if we do not have the bit representation of a more complex scenario how should we get the average number of bits that have to be transmitted? This is where entropy is used. for given probabilities. For this example this is simple. if the answer is yesa 1 will be used. For the Horses 5 to 8 the probability of winning is equal thus we cannot get any advantage from asking questions in an appropriate order. Consequently. for all the remaining horses the same number of bits is the minimal choice. Horses 5 to 8 will have the bit streams 000000.continue in the same way. The probabilities are used to ask the questions in an appropriate order to get the answer with the minimum of yes/no questions. if n o a 0 will be used. Entropy provides. For Horse 3 we ask three yes/no questions so the representation would be 001. So the expectation of the number of bits that we have to transmit will be 051+0252+01253+006254+00156256+00156256+00156256+00156256 = 2 (2. 2 bits in 25 percent of the races and so on.000010 and 000011. So we do no longer use the numbers 000 to 111 for Horse 1 to 8 for representing the number of the winning horse. each yes/no question is seen as one bit. If Horse 1 wins the race we will only use one yes/no question. If Horse 2 wins the race we will need two yes/no questions so two bits will be used. For the simple case the average message length was 3 bits (000 to 111). The answers of the yes/no questions can be directly related to sending the winner of the race as a bit stream. For the horse race example. 0001. the number of bits that are needed to transmit the information (in our case the winner of the race).

0334 4.z a n d a space character -. For the representation of 27 characters 5 bits are needed.0575 4.1 0.Ai a b c d e f g h(pi) pi 0.0006 10. Suppose the task is to transmit an English text with signs a . Suppose that each horse wins with an equal probability of 1 8. We still use a simplied scenario so the focus can be put on the real important aspects.0313 5.1 0.0913 3.0263 5.9 0.2 Ai h i j k l m n h(pi) pi 0.1928 h(pi) 7.1 0.9 10. the entropy H will be three.0 0. transmitting an English text with 1000 signs would lead to the transmission of 5000 bits. Table 2. According to [44] the information content h of an event Ai can be dened as !1 h(Ai) = log P (Ai) where P(Ai) is the probability for event Ai. In this case.1 5.0073 0.7 0.1 Ai o p q r s t u h(pi) pi 0. The entropy gives us the minimum number of bits that are needed on average to transmit information.0069 0.9 Ai v w x y z - pi 0.4 7. Consequently.1 shows the probabilities and the information content based on an average English text.0689 3.5 0.4 0. if we use more bits than specied by the entropy we are wasting resources as we do not represent the information with the minimum number of bits that can represent the same information.0285 5. We have a total of twenty-seven letters.0508 4.9 0. The entropy based on Table 2.8 0.4 2.3 0.0084 6.0007 0. So in a simple mapping each sign would have a 5 bit representation. It is clear that e w i l l be more likely to occur than z .1 leads to a value of 4. So in theory there exists a representation of the 27 signs ensuring that for 1000 signs only 4110 bits have to be transmitted on average. We now look at a scenario that is more likely to be used in the real world than just transmitting the winner of a horse race.7 0. .3 0.4 Table 2.0008 10.2 0.0706 3.1: Probabilities and Shannon information content for an English text.0355 4.1 0.0133 6. In other words. So transmitting the winner of the horse race with equal probabilities will be more costly.11 bits.0235 5. than transmitting the winner in the example above.0173 5.0567 4.2 6.9 0.0596 4.0119 0. see [44] which means that the winner of the horse race can be represented by using two bits on average.0192 5.3 0.0599 4. 1 i m. in the sense of transmitted bits.9 0.0164 0.0128 6.

1. Nevertheless. which is one. 1 m. 2.1.2: Probability-based coding scheme a reduction of almost 20 percent. Entropy has a few interesting properties that make it reasonable to be used as a measure of information. If we reduce the probability for Horse 1 to increase from 2 to 216. for reduction of transmitted data the entropy gives us valuable information about how close we are to the optimal transmission. Compared to the simple 5 bit encoding the number of bits for transmission of an English text can be reduced by about 18 percent.2 shows a straightforward approach to encoding the 27 signs based on the probabilities in Table 2. Changing the probabilities toward a uniform distribution increases H . The average number of transmitted bits per sign is 4. Based on a given m. Table 2. The entropy gives us a lower bound for the length of the representation which can be achieved. H will . H = 0 if all the pi except one. For more properties of H. The gure clearly shows the properties of H 1 4 and increase for Horse 3 by 1 4. H attains its maximum (log m) if all the p i are equal who will win.Ai a b c d e f g Code 0111 11111101 11100 11011 0100 111101 11111100 Ai h i j k l m n Code 11010 1000 111111111111 1111111100 11000 11101 1001 Ai o p q r s t u Code 0110 111100 11111111110 1011 1010 0101 11001 Ai v w x y z - Code 1111111110 11111110 1111111101 111110 111111111110 00 Table 2. The most important ones for this work are listed here.12 in this case.11. are zero. Figure 2. so we are most uncertain 3.2 is a plot of the entropy for an example with 2 cases occurring with probabilities p and q = 1 p as a function of p. This is the case if all horses win with the same probability. see [70]. We are sure who will win. so that for the straightforward approach we do almost reach the ideal representation which would lead to 4. In the horse race example this would be if all races are won by the same horse.

There exists no representation of the same information that can make use of less than this average number of bits. . p . see [ 70] as stated above. Consequently we are now located at the receiving end of a connection. 2: H ( p ) vs. Consequently entropy can also be used as a measure of data compression. we do not longer try to encode data for transmission.F i g u r e 2. Therefore we now change the view. Increasing p further reduces H again. Besides ideal transmission / data compression there is one further major domain where entropy is of great importance: encryption. Data transmission such that the expected number of bits equals the entropy can be seen as ideal data compression. but we try to decode a transmitted bit stream. To encode text such that only a minimum number of bits have to be transmitted can be seen as data compression. The entropy is zero if p = 0 or p = 1 which means we are sure about the outcome. H increases to one at the point where p = 05 which means that both cases are equally likely.

Thus entropy can be used as a measure whether dierent probabilities for dierent bit combinations can be identied. .. The receiver will receive a bit sequence where the number of zeros and ones is equal (64 in our example). A major goal for a good encryption algorithm is to remove any predictable behaviour present in the source data. A large value for the entropy means that the used sign to bit mapping is close to the ideal mapping.. Let us focus on compressed t r a c again. As already argued above. because such information can be used to fully or partially decrypt the content.. Horse 1 will win 32 races. This is a method to decrypt a message by only using well known statistical information. The second highest probability is assumed to be an e . Proceed in the same way for a few further signs. consequently we have to transmit 16 times 0 and 16 times 1 (01). consequently we will transmit 32 times 1. Horse 8 winsequals 000011). . Consequently if we are receiving a bit stream where the entropy is high it can be assumed that the data has gone through a data compression or an encryption process.For the example of the English text.1. q a n d z . Horse 2 will win 16 races. Then decrypt the text based on your assumptions. Find the bit representation with the highest probability and assume this is a -. The winners are encoded by our compressed representation (Horse 1 winsequals 1 . As an example of receiving a compressed bit stream. A data source having equal probabilities leads to high/maximum entropy. A full list of the number of transmitted bits per winning horse can be found in Table 2. compare the words to English text and adjust your assumptions slightly to t to English words. Removing predictable behaviour would mean to have equal probabilities for 0 a n d 1 b i t s and even for all bit combinations like 010and 1 1 1 . The approach to decrypt the text is straightforward. we assume that each letter is encrypted by using a unique 5 bit representation. We now look at the probabilities for 0 a n d 1 b i t s . we assume that we are receiving the winner of 64 horse races with the probabilities for winning given in Figure 2. Someone who would like to decrypt this text being aware that the text is English has now the possibility to decrypt the text simply by using probabilities of signs in an English text.3. equal probabilities lead to maximum entropy. Consequently a good encryption algorithm would have equal probabilities for any bit strings within the data. The lowest probabilities correspond to the letters j .

Consequently the probabilities in Table 2. the probabilities would be slightly dierent. which would have to be calculated based on all documents that exist in the world.3: Number of transmitted bits for 64 horse races because these two processes have the goal to encode data such that the entropy is high. So entropy can be used as a measure whether the data is encrypted/compressed or not. Furthermore there is often only a sample available. which is used as a basis for calculating the entropy. For data compression the goal is to reduce data to a minimum. The probabilities in Table 2. 1 i m are known. 2. I f we would use a dierent document instead.2 Entropy Estimation The Shannon entropy can only be calculated if the appropriate values for p i. Clearly it is not feasible to calculate the true entropy based on all documents in the world. This is only true if the probabilities are based on an i n n i t e amount of input data. whereas in encryption the goal is to remove any predictable behaviour that is present in the source data. This would not be i n n i t e but the closed set as stated above. I f we would assume that this is the only document that exists in the world.1 are based on The Frequently Asked Questions Manual for L i n u x . Consequently entropy estimation plays an important role. in the above case only a specic document. The reasons to do so for the two processes are dierent.1 are estimations of the true probabilities. the probabilities and the entropy would be accurate.Horse sequence Horse1 1 Horse 2 01 Horse 3 001 Horse 4 0001 Horse 5 000000 Horse 6 000001 Horse 7 000010 Horse 8 000011 wins 32 16 8 4 1 1 1 2 #1 32 16 8 4 0 1 1 2 64 #0 0 16 16 12 6 5 5 4 64 Table 2. or a closed set of data. This is the same for a wide range of application areas where entropy is used. . depending on the document.

I t is obvious that the longer the sequence of races is.i. The sequence was produced by a random number generator using the probabilities from Section 2.i.XN) drawn from P.4: Probabilities and information content for a sequence of horse races the results of a sequence of horse races.91 4.07 0. .91 Table 2.Horse Horse 1 Horse 2 Horse 3 Horse 4 Horse 5 Horse 6 Horse 7 Horse 8 wins 13 6 4 2 1 1 1 2 pi 0. .32 2.91 4.03 0.91 4. sample (X1 . . As shown above in the horse race example there is a relevant dierence between the entropy and the estimated entropy.03 0. see [57] for more details.43 0. Given an i. The probabilities and the information content of each event based on these results are shown in Table 2. As this is also true in general there has been lot of research on entropy estimation. we consider the problem of estimating the entropy H(P) or some other functional F = F(P) of the unknown distribution P.[2. P.21 2. Mathematically formulated based on [2] the estimation of the entropy is: Suppose P is an arbitrary discrete distribution on a countable alphabet A. .d.03 0.39 instead of 2 for the real probabilities.1] This is exactly what we have done above for the horse race example.91 3. the more accurate the entropy estimation will be.. The entropy based on the estimated probabilities is 2. in our example the number of horse races. It is obvious that for dierent sets of races we will get dierent probabilities as long as the sequence is not i n n i t e .d sample is an independent and identically distributed sample.4. Dierent estimators were suggested and the most important ones for our work are now presented here. Producing a further random sequence of horse races with the same length will lead to dierent probabilities and thus to a dierent value for entropy.1. N is the length of the sample. and an i.20 0. Suppose we have the winners H1-H2-H1-H2-H1H8-H7-H3-H6-H1-H5-H1-H4-H3-H3-H1-H2-H8-H1-H1-H2-H1-H4-H1-H1-H1-H2-H1-H2H3 in a sequence of horse races.13 0.91 3.07 h(A i) 1.

=1

The goal of entropy estimation is to make the dierence between the true entropy and the estimated entropy small independently of the distribution p. The simplest estimator is the maximum likelihood (ML) estimator [2, 72] which is also called the plug-in or naiveestimator. The approach is straightforward: we count the occurrences ni , 1 i m of each event Ai, 1 i m. The frequency fi of Ai in the sample is then niN. The entropy of the sample based on the ML estimator is:

m

HM L E

= X fi log(fi)
i= 1

This leads to a systematic underestimation of the entropy [57],

Ep( HM L E )

H(p)

where Ep() denotes the conditional expectation given p. Based on the above estimator several variants have been proposed with the goal to reduce this underestimation. Miller-Madow [48] introduced a bias correction to the maximum likelihood estimator (MLE) M1 +___ 2N

HM M =

HM L E

where M is the number of bins with nonzero probability. A third version of the MLE is the jackknifed version [19], N 1 HJ K
=NH M L E
N X

HM L E j

Nj where HMLEj is the maximum likelihood estimation based on all but the j-th sample. Suppose we have a uniform distribution with m = 256 and use the MLE for dierent N. We performed a Monte-Carlo experiment producing 100.000 sequences of length N. The entropy was estimated using the MLE. Figure 2.3 plots the histograms of the Monte-Carlo experiments. On the x axis the estimated entropy is plotted, on the y axis
P( HM L E )

which is the probability of HMLE. The true value of H is eight. Figure 2.3

(b) N=100 (a) N=10

(c) N=256

(d) N=500

Figure 2.3: MLE distributions for m = 256 and dierent N

shows the most basic fact about entropy estimation: the variance is small and the bias is large until N m. This statement holds even for the above estimators that are tailored to bias reduction [57]. Given a xed m the estimation of the entropy is closer to the true value the greater N is. In [57] Paninski performs a detailed analysis of the bias. He proposes an estimator based on Bernstein approximating polynomials, which are dened as linear combination of binomial polynomials. This estimator allows the experimenter to decide whether bias reduction or variance reduction is of greater importance. The Paninski estimator still leaves open problems, which prevent the estimator from being the ultimate estimator. Schuermann gives the following statement about the Paninski estimator: Unfortunately, the good approximation properties of this estimator are a result of a delicate balancing of large, oscillating coecients, and the variance of the corresponding estimator turns

out to be very large [57]. Thus,

even when N m is small. the practical usage of the Paninski estimator is limited. We are using the Matlab [76] random number generator (rand) to generate uniformly distributed numbers in the half open unit interval ([01[). Consequently. The smaller the ratio. The higher the value of N m .1. which means N = m. We performed a run of length 8 with the random number generator and got the values [0718001889043390045606555 078100862907854]. the easier is the estimation problem. The latter also depends on whether the experimenter is more interested i n reducing bias than variance.to nd a good estimator. where the winners are based on the probabilities given in Section 2. It is even possible to estimate the entropy on m bins when you have less than m samples available.3 Undersampled Estimation The diculty of estimating entropy directly depends on the quotient of the number of samples available and the number of events N m . The result is a regularized least-squares problem. or vice versa. [05075[ to Horse 2 winsand so on. Based on this sample we now estimate the entropy using the algorithms presented above. The Miller-Madow estimator is in the middle. Paninski [58] shows that it is possible to accurately estimate the entropy on m bins. For all three estimators the bias is negative. 2. The Maximum Likelihood estimator has the greatest negative bias and the jackknifed the smallest. one has to minimize bounds on bias and variance simultaneously. Suppose we have a sequence of eight horse races of eight horses.5. L296] Consequently. But as stated above. the usage of the Paninski estimator is limited due to its complexity. For the Paninski estimator the code provided . given N samples. P. the winners for the races are H2-H1-H1-H1-H2H3-H3-H3. whose closed-form solution is well known. For the Maximum Likelihood. Especially entropy estimation in the range where N < m is not a trivial task. where values [005[ are mapped to Horse 1 wins. the Miller-Madow and the jackknifed estimator the results are as expected. The results are given in Table 2. However. one can only hope that the solution of the regularized problem implies a good polynomial approximation of the entropy function. the harder it is.[68.

To optimize the parameter set is not a trivial task and.1. 2.9409 1. so we focus on two uniformity tests that are inuenced by entropy but do not need the estimation of the entropy. . The major nding of [59] is that for a uniformity test N m samples are needed.5613 1. In the range where N is much grater than m.6863 1. The entropy in this case is two. For an entropy-based test for uniformity we would need an appropriate estimator for the entropy. the major reason being is that the parameter set for the estimator is not optimized for our use case. as not a major need for our work. it was not performed. For horse races where all horses win with the same probability the entropy is three. Suppose we have horse races with the above given probabilities of winning. see [59] for details. Paninski [59] analyses how many samples N are needed from a distribution p to decide whether p is uniform or not. The bias is greater than expected. which states that for uniformly distributed birthdays coincident birthdays are least likely than for any other distribution.Estimator Maximum Likelihood Miller-Madow jackknifed Paninski H 1.5: Undersampled Entropy Estimation for a sequence of horse races in [56] was used.486 1 Table 2. For m N Paninski relates this problem to the birthday problem [15] and uses the so-called birthday inequality. fewer samples will have the problem that for at least one distribution the non uniformity will not be detected. the answer regarding uniformity can be given by the chi-square theory. Consequently entropy can be used as a measure of uniformity. see Section 2.4 Entropy-based Test for Uniform ity A key characteristic of entropy is that if events are more equally likely the entropy will be higher. As stated above this is hard to retrieve.

e. the N-truncated entropy is used instead. Generate all possible words w of length N according to p. The N-truncated entropy is then the average of the MLE estimates. Checking for uniformity is then straightforward. motivated by the problems of estimating the entropy based on a sample of length N accordingly.11 ) Given that pi = 1m for all i . Based on a sample of length N estimate the entropy using MLE and compare the result to HN(U). HN(U) is equal . The Kullback-Leibler distance is used to compute the divergence from one distribution to another one. equation 2. " N HN(p)= X 1 + + nm n ! !# n1++nm=N pn m 1 m X ni N log(2. This can be seen as special case of the Kullback-Leibler distance [11]. The N-truncated entropy HN(p) is dened as follows. sum up all MLE estimates and divide by the number of words to retrieve HN(p).10) ni N nm 1 p i=1 where (2. According to [55] HN(p) can be calculated by direct summation.A second approach is presented by Olivain and Goubault-Larrecq in [55] where.10 can be simplied to " 1 Xm X ni HN(U) =___________________________________ _________________________________________(2. The closer the estimated value is to HN(U) the more likely is it that the underlying distribution is uniform. which means that p is the uniform distribution U. In our case the reference distribution is the uniform distribution. Based on the example in Section 2. Estimate the entropy based on maximum likelihood for all words w.12) mN n 1 + + n mN n1++ nm=N i=1 N ! !# Nlogni The maximum likelihood estimator can then be used as an unbiased estimator of HN.3 where we have a sample of size 8. i.

to 2. The random number generator produces the values [071800188904339 .246.

The second open point is the calculation of HN(U) which is getting unmanageable.004 bits. The N-truncated entropy of a uniform distribution can be calculated by + cj1 (j 1)! logj + o ( 1 ) ( 2 .1556 which is closer to the truncated entropy of a uniform distribution which is 2.25[ is mapped to Horse 2 wins and so on. We conclude that it is more likely that the sequence H6-H2H4-H1-H6-H7-H7-H7 is based on a uniform distribution than the sequence H2-H1-H1H1-H2-H3-H3-H3. for m m = 256. What is still an open question is: With which probability is one of these two sequences based on a uniform distribution? To answer this question we have to take a look on the condence intervals. Details about the estimation of the condence intervals are presented in Chapter 6.0.125[ is mapped to Horse 1 wins.246. The estimated entropy using MLE is 2. An example plot of this error term. can be found in [55]. Consequently. [0. The entropy estimated by MLE is 1. [0.14 ) is a good approximation of HN(U). based on the probabilities given in Section 2. 1 3 ) X HN(U) = logm + log c ec j=1 where c = N and o(1) is an error term. the sequence is H6-H2-H4-H1-H6-H7-H7-H7.125. for this example.5613. For the same random numbers. .0.0045606555078100862907854]. For m = 256 the error term is never more than 0. The authors of [55] present solutions for an approximation of HN(U) and for estimating the condence intervals. given equal probabilities.1 the winners for the races are H2-H1-H1-H1-H2-H3-H3-H3. consequently Olivain and Goubault-Larrecq state that + j1 c X HN( U ) l o g m + l o g c e c j=1 (j 1)! logj (2.

5 Alternative Tests for Uniformity Beside entropy there exist several tests for uniformity.16 ) Based on the above example of the English text E(I c) is equal to 00783. one would take two signs out of a text sequence and determine how likely it is that they are identical.2. For a uniform distribution the value of Ic is lower than for any other distribution. .1). The chi-square test can be used to determine whether an observed frequency distribution diers from a theoretical distribution or not. If we assume a uniform distribution with the same length of the alphabet m = 27. if we take two elements out of a sequence. Most of these tests are used in cryptography [47]. For a selected subset of them. More details about the Maurer test can be found in [47]. Based on the above example of the English text (see Section 2. a . More details about the chi-squ are test can be found in [47]. This test is for example used for Skype detection [5]. details will be presented in this subsection. that they are identical. As compressing the sequence is to time consuming. Maurers test uses the fact that it is not possible to signicantly compress a uniformly distributed sequence without loss of information. The index of coincidence Ic of Friedman [22] determines how likely it is. The index of coincidence can be calculated by the following equation: f (fi_____1) Ic =____________N ( N 1 ) Pm ________i=1 i (2.15 ) The expectation of the index of coincidence E ( I c) can be calculated by the following equation: E ( I c) = p2 i X m i=1 (2. Maurers test instead computes a quantity which is related to the length of the compressed sequence. If the theoretical distribution is the uniform distribution. the test can be used as test for uniformity.z a n d -.

Note that the factor between the expectation of the index of coincidence of a uniform distribution and the expectation of a natural language is in the range of two. has a lower Kolmogorov complexity than the sequence a1rz6e2i8tqw03unmk79hs5owhere the only possibility to reproduce the sequence is to print the full sequence. The Kolmogorov complexity [11] of an object. the Kolmogorov complexity can be used as an indication for uniformity.E(I c) is 0037. is a kind of measure of the minimum length of a program that is needed to reproduce this piece of text. such as a piece of text. Since a uniform distribution produces sequences that are hard to be expressed by a shorter representation than printing the full sequence. . For example the sequence ababababababababababababwhich can be expressed by print 12 times a b .

All these aspects belong to pattern recognition and classication. Compared to humans. height divided by the width. as we have been trained for millions of years as this has been crucial for our survival. Therefore we present the core concepts of pattern recognition. encrypted t r a c and unencrypted t r a c . For humans pattern recognition and classication is an easy task. Features can for example be the height. read handwritten characters. width. I n the case of handwritten characters we would classify them according to a character which is stored in our brain and looks similar to the character under investigation. The core basics of t r a c classication are in pattern recognition. understand spoken language even with accent or decide whether a meal is tasty or not. I n the case where we see a new human face we would almost automatically classify it into b e a u t i f u l o r not beaut if ul. Humans are almost perfect engines for pattern recognition and classication. The ndings within this chapter how such tasks can be performed by computer systems are based on [18]. For us it is quite simple to recognize a face.3 Pattern Recognition The RT-ETD is a t r a c classier with two classes. computer systems are at the very beginning of pattern recognition and classication. The ability to take raw data and execute a fast action based on patterns extracted from the raw data is a key task for humans. I n pattern classication a main part is the categorization into pre-dened classes. A feature is a characteristic that can be extracted from a character. fraction of straight . A few terms of pattern recognition are used throughout this thesis and will now be dened based on the example of reading handwritten characters.

After the selection of the appropriate features. The rst one is the pre-classication process. Using many features leads to highly complex patterns and tends to be too specic. This description is so specic that dierent patterns can be distinguished from each other. Too few features would lead to a worse dierentiation between the patterns. Questions like How much features should be used?or Which features are the most promising ones? seem to be trivial but they are not.1 Pre-Classication Before a system can start to classify.Unsupervised Learning . Pattern classication can be seen as process including two major steps. I n the case of handwritten characters this would allow to uniquely describe each letter based on this set of features. a few steps have to be performed in advance. where the two most important ones are: 1.lines compared to curves or angle of the longest straight line. The major task in feature extraction is to nd the right set of features. As a rst step one has to extract features that allow to uniquely describing the patterns. which means that slight variations would not be detected. 3. Choosing the wrong features will result in increased complexity without increased performance. 2.Supervised Learning 2.Learning. There are three main methods for this: 1. The second step is the process of classication. a histogram or density function for each character and each feature has to be dened. A pattern would be the description of one character based on a set of features.Feature Extraction. A pattern is a description of a letter that allows dierentiating i t from any other letter.

The main input parameter is the number of clusters that the system should use. the classier is evaluated using a test data set. Consider the hand written text in Figure 3. This term is often used in t r a c classication. After the classication the classier will get the feedback whether the classication was appropriate or not. After the training phase. which character is present in the raw data. in which case the classier is not able to accurately classify slight dierences compared to the training data set which are present in the test data set and will probably be available in real data. A further challenge is to determine where one letter ends and the next one begins. The test data set ensures that the classier is able to generalize. where the character is classied based on raw data. First of all the text is not horizontal. Classication is the extraction of the above dened features from raw data followed by the decision.2 Classication The second step in pattern classication is the process of classication. Especially between Letters three and four this is hard. The ability to accurately classify the training set but failing with the test data set is known as overtting. a teacher provides the appropriate character for each handwritten character in a training set. which has to be taken into account. who assumes that there . I n unsupervised learning. A human reader.3. As in supervised learning the target character has to be known. there is no pre-classied data set. based on the values of the appropriate features. also called clustering. The best way for training a classier would be reinforcement learning. 3. Reinforcement Learning I n supervised learning. A set of appropriate values of all used features that represent a pattern is also called signature. Based on raw data a few problems can emerge. For the rst letter it is not possible to determine the feature height as parts of the line are missing. The system performs a grouping. For the last letter it is hard to determine if it is a c or an e .1.

A rule in pattern recognition is to use as much pre-dened knowledge as possible for classication. If the feature values extracted from the raw data of a character would have exactly the same signature as a pattern it would be easy to nd the appropriate letter. The a priori probability P(e) = 00913 and P(c) = 00263 (see Section 2. automatically recognizes it as an e . P(wjx) = p(xwj)P(wj) p(x) . based on our features. To combine a priori probabilities and feature based probabilities the Bayes formula is used. In case of the fact that some of the values are apart.1 without any usage of information gathered from previously read characters. What the human reader uses here is called context. The context is information gathered not from the raw data of the character itself but in this case the information of the characters before.Figure 3. We assume that.1) can be used for the decision. As the word automatc does not exist. But what does closestmean in this case? For each feature value a probability of each pattern has to be dened. that it is an e w i t h a value of 04. For example if the height to width ratio is two. it is much more likely that the character under investigation is a b t h a n an a . The decision should not be based on this information only. the last letter has to be an e . the main reason being that with pre-dened knowledge the number of features that have to be extracted can be reduced. the closest pattern should be chosen. What else can be used here is the fact that the letter e i s much more likely to occur in English than c s o also this fact can be used in such a case. we calculate the probability that this is a c w i t h a value of 06. Suppose we would like to classify the last character in Figure 3. This is where the Bayes decision theory comes into play. For classication we would like to use a priori knowledge of the probabilities of characters in English text.1: Example of handwritten text are no missing letters.

For the given example. Note that the evidence can be neglected for this decision. In general it should be chosen the character with the greater value for the above. So it is intuitive that it should be decided that the character is a n e . The Bayes decision rule combines prior probabilities and likelihoods to achieve a decision with the minimum probability of error.4 for an e a n d 0. . So in all scenarios where the amount of misclassication should be reduced to the minimum. For our given task to classify t r a c in encrypted and unencrypted t r a c the Bayes decision rule can be used for ports greater than 1023 where the fraction of encrypted t r a c on the port under investigation is used as a priori knowledge.6 for a c .1) and our assumed probabilities of 0. P(c) = 00263 (see Section 2. where P(e) = 00913. the Bayes formula is of importance. as the major goal in classication is to reduce the amount of misclassication to a minimum. P(wj) is the a priori probability of character wj and m p(x) = p(xwj)P(wj) j=1 X where m is the number of characters. This task will be investigated in future work. It is shown in [18] that this decision will minimize the probability of an error. we can calculate the probability for e a n d c . The probability p(xe)P(e) is 003652 and p(xc)P(c) is 001578. So choosing w1 if p(xw1)P(w1) > p(xw2)P(w2) and w2 otherwise minimizes the probability of error. the posterior probability for each class has to be calculated. The nal decision should then be based on choosing the character with the greatest posterior probability.likelihood posterior = prior evidence where wj is a character and x is the feature value.1 can also be expressed informally by saying In order to decide to which class the character belongs. p(xwj) is the probability of feature value x for character wj. Equation 3.

Details about Bayes decision. where e. The appropriate setting for this depends on the application that uses our pre-ltered data. . Also in our application domain. where the risk should be minimized instead of the error can be found in [18]. Risk is a function or value for the consequences of a misclassication. based on a scan of a ngerprint access should be granted or not. I n a dierent classication problem. as it violates user privacy. For the given example to classify a n e as a c has the same risk as to classify a c as an e .. it is obvious that risks are not equal.Above we assumed that the error for misclassication is equal costly. the risk of forwarding unencrypted t r a c can be higher.g. which means the risks for making the wrong decision are equal. The risk of granting access to someone who should not have access is much higher than the risk of rejecting someone that should have access. than as dropping encrypted t r a c .

bulk. peer-to-peer (P2P) communication such as e-Donkey. the Domain Name Service (DNS) [51] protocol for communication with the domain name servers which perform the mapping of domain names to Internet protocol (IP) addresses... Kazaa or Skype. for example POP3S. are well handled in t r a c classication. Some of the trac classication algorithms are able to identify even the encrypted versions of these protocols which are normally labelled by an S . . [52]). Protocols that are often targeted by classiers are the File Transfer Protocol (FTP) [63] which is used for le transfers. 36. And for sure the well known application protocol HTTP (Hyper Text Transfer Protocol) [21]. 30. where no server is involved in communication. is target of trac classication. interactive.g. the network management protocol SNMP (Simple Network Management Protocol) [7] or the Network Time Protocol (NTP) [49] used for time and date synchronization between computers. The goal of trac classication is to identify the type of trac carried on links [6. Classication is performed according to application protocols or to a class of applications (e. Additionally the most important mail protocols. used in web surng. the Secure Shell (SSH) [81] protocol used for secure remote management. and the two protocols used for receiving emails POP3 (Post Oce Protocol version 3) [53] and IMAP (Internet Message Access Protocol) [12]. are responsible for a large fraction of trac and are consequently target of trac classication. WWW. HTTPS or FTPS. In recent years.4 T r a c Classication Trac Classication is the decomposition of network data based information. . gathered from t r a c on network links. 54]. the Simple Mail Transfer Protocol (SMTP) [37] used for sending emails and transmission of emails between mail servers.

Signatures based on dierent features can be very specic or wide ranging but should always ensure that the following key requirements are fullled [25]: 1.e. 6.e. where the t r a c is handled dierently based on classication. The rst requirement ensures that the approach is useful at all and is. For t r a c classication.e. 4. 65] or weighted fair queuing [3]. Requirement 5 should .robustness against asymmetric routing.lightweight and fast. right at the beginning of a connection. 2. as an increased QoS level should be applied right from the beginning of a connection. such as network attacks and other security violations. Requirement 3 plays a central role. Furthermore the legal discussion about Internet usage in the eld of peer to peer (P2P) le sharing between P2P communities and intellectual property representatives is a motivation for t r a c classication [40]. i.The results of classication are needed for a sophisticated network management. Quality of Service (QoS) mechanisms. The ability to understand the type of trac on network links can be further used to detect illicit t r a c . t r a c classication should work regardless of which direction of the trac is used for classication.early identication of applications. all approaches use signatures (which is a subset of features. a low misclassication error rate. i. see Chapter 3).good accuracy properties over an extended period of time.e. low evaluation overhead and short evaluation time. able to identify a range of dierent applications. the key requirement within all approaches. i.Accuracy. 3. consequently. such as token bucket mechanisms [16.wide applicability. 5. are nowadays used in networks. Modern rewalls and Intrusion Detection Systems (IDSs) perform access policies based on trafc classication to protect networks from security violations. i. In cases where the result of trac classication is used to map ows to QoS classes. As network t r a c may be routed on dierent links. Requirement 2 is motivated by the demand for online-operation on high-bandwidth links.

also such approaches are of high interest. such as FTP. Further information about t r a c classication is given in [30]. SSH or HTTP were designed to use wellknown port numbers [27] for the communication between clients and servers. These applications are either using the connection oriented bi-directional communication protocol TCP (Transmission Control Protocol) [62] or the connectionless uni-directional communication protocol UDP (User Datagram Protocol) [61].1 shows a short list of the most important well known port numbers. relative proportions may change. Such highly specic classication approaches can be integrated into bigger classication environments. Both protocols provide the port numbers of the connection within their headers. Table 4. However. which should not prevent appropriate t r a c classication. especially Requirement 4 and 6 are often neglected.1 Port-based Classication Early network applications. During the 1990s. The major advantage of such a classication approach was a good performance to cost ratio. The task of trac classication was simply to classify t r a c based on port numbers in TCP/UDP headers. Nowadays specically targeted approaches exist that are only able to classify a single application (neglecting Requirement 6). 4. Consequently. signatures have to be changed for sure.ensure that the mechanisms work for a changing network t r a c mix on the monitored link. Later on we will investigate the aspect that not all t r a c classication approaches f u l l all the requirements. port-based t r a c classication was the state of the art in t r a c classication [67]. a full list is available at [27]. If the application itself changes. In situations with symmetric routing Requirement 4 becomes unnecessary. t r a c classication used to be straightforward. as protocols exist where it is complicated to classify them appropriately and a reasonable amount of eort is needed. Requirement 6 ensures that a major subset of the t r a c on the monitored link can be classied appropriately. or between peers. Trac classication based on port numbers is implemented . As network load changes.

port-based classication was the only method needed for a proper classication. In two other areas. During the rst years the presence of le sharing t r a c was limited. developers of upcoming le sharing applications (e. News and NTP port-based t r a c classication approaches still lead to a classication with more than 98. Up to the year 2000.g.Port Application FTP 20/21 SSH 22 Telnet 23 SNMP 25 DNS 53 HTTP 80 POP3 110 443 HTTPS IMAP 993 Table 4. during high load. such as passive FTP download or many P2P applications. Nowadays port-based t r a c classication algorithms can still be used to detect legacy applications with high precision and recall. Cisco NetFlow [10]). SNMP. Port-based t r a c classication on the full set of trac leads to worse results.1: Important well-known port numbers in commercially available products (e. namely if the application is using dynamic ports. but . based on port numbers (e. thus port-based classiers were still valid for a major part of the t r a c . The overall amount of such hidden t r a c and thus the inuence on classication is still limited [36]..g.. VoIP is short for Voice over IP where IP is the well known Internet Protocol. For DNS. the accuracy of port-based classication is limited [34.g. 69]. As le sharing applications are nowadays responsible for a major part of network t r a c . such as 80 or 443. Recently developed le sharing applications and the well known VoIP Software Skype hide t r a c behind well known port numbers. port-based classication fails nowadays. In the early 2000s. to evade detecting and blocking. Kazaa [39]) started to use dynamic port numbers.9 percent precision and recall [36]. 52. Mail. but using it on a subset of trac the results are still satisfying. The presence of hosts using Skype increased within the last years [77]. given well-known ports up to 1023 higher priority than ports greater than 1023). Such routers allow distinguishing the service quality.

they are implementing mechanisms that allow a widespread usage of their application. but maybe also during normal operation. Applications such as Skype are a security risk for company networks (as they open a communication path to the Internet). The latter is even more important than the Skype case. the motivation for application detection is often to restrict the usage of a specic application or a group of applications. ISPs often oer VoIP services and thus are interested in degrading the service quality of other VoIP services. 4. 33. and trac classication to detect the application t r a c . in order to allow proper operation of other services. This can be compared to the elds of Spam/Anti-Spam or Virus/Anti-Virus communities. So it is more or less a race between Skype developers to hide the t r a c . many company network administrators try to detect Skype applications and try to block the t r a c . at least at times when network load is high. This can be illustrated by a simple example such as the well known VoIP application Skype [71].2 Challenges in T r a c Classication The major question that arises in this context is: Why is trac classication getting more and more d i c u l t ? T h e answer can be given by looking at the motivation for t r a c classication or more precisely application detection. Findings from [36] might already be outdated.we are not aware of a recent work that evaluates the amount of trac hidden within port 80 or 443. degrading service of P2P le sharing applications during high network load is of interest to ISPs. The second major player in Skype detection is the group of Internet Service Providers (ISPs) who would like to detect the trac on their network. Consequently. application developers will try to make detection as hard as possible in order to allow a widespread use of their software. as long as trac classication is done for this reason. 50]. Consequently. Skype developers have the contrary objective. This potentially hurts the vision of network neutrality and on this topic there has been extensive discussion between ISPs and content providers [32. Moreover. . As stated above. as the share of P2P le sharing t r a c is higher than the one from Skype [29].

signatures have been identied that allow classifying t r a c accordingly. . 43]. The major problem is that signatures have to be updated with each update of the application. For detection of P2P applications the signatures are in general more complex. e-Donkey. Consequently. 52. With this approach the quality of t r a c classication based on port numbers degrades. Once one has identied the unique payload signature this approach enables very accurate classication. for the Gnutella protocol it is checked that the rst string following the TCP/IP header is GNUTELLA. In this case a new signature for the new application is needed and the signature of the existing application has to be rened. The challenging task for these classiers is to identify unique payload signatures. As applications hiding t r a c update their t r a c payload on a regular basis. see [69] for details.4. the time consuming identication of new signatures based on each change became inadequate. or even when a new application is developed which uses the same signature as an existing application. For example. Based on this analysis. [69] proposed a signature-based classication method to identify the major P2P applications present in the Internet in 2004. It has to be ensured that the signatures are as simple as possible to allow operation on high bandwidth links. One idea is the usage of payload-based classication [9. there has been a lot of research on developing new algorithms to classify t r a c . where the concept of machine learning is used to detect a range of applications within the t r a c [25. GETor HTTPand if the rst string is GETor HTTPthere must be a eld with a subset of predened strings. together with the fast increase of networked applications.3 Classi cati on in Dynamic Port Environments The rst measure of application developers trying to prevent their applications from being detected was the usage of dynamically assigned port numbers. For each application a signature has to be identied in the trac that is only used by this application. especially in the P2P eld. A new research eld in t r a c classication was introduced. Such signatures can be as simple as that the rst few payload bytes of a connection have a xed signature. BitTorrent and Kazaa as implemented in 2004. In 2004 Sen et al. providing a detailed analysis of the trac from Gnutella. 69].

Host/social information based classication. Signatures for encrypted ows are based on the plain section of connection setup of HTTPS and SSH. such an approach together with portbased classication can increase overall classication results.4 Classication when T r a c is Encrypted As payload-based approaches lead to good results once a signature for an application has been identied. payload-based signatures fail to identify the t r a c appropriately. For partially encrypted payload. Machine learning has been performed with preclassied t r a c . Trac is classied into seven dierent groups of applications which are FTP control. For encrypted t r a c there exist two major research directions: 1. [25] present results from three dierent machine learning approaches. The signatures are solely based on packet payload. Haner et al. HTTPS. For fully encrypted t r a c . Nevertheless.chapter play a central role. . AdaBoost [66] and Maximum Entropy [4]. 2. New algorithms have to be developed that are capable of dening signatures for encrypted t r a c . IMAP. which has already been demonstrated for HTTPS and SSH in [25]. HTTP and SSH. as signatures are generated based on an automatic process. SMTP. applications can be detected without any TCP/IP header information and even signatures for encrypted ows can be gathered. application developers have changed their strategy from changing t r a c proles upon each update to using encrypted payload or partially encrypted payload. 4. The results are interesting because of two aspects. The main weakness of this approach is that it has been only evaluated for applications that mostly use well known port numbers and thus can also be classied using port-based classication [36]. POP3.Packet-based classication. payload-based signatures may still be based on the plain section of connection setup. namely Naive Bayes [64].

inter-packet gaps. size of Packet 2 and 3 .or bi-directionalow are used to decide to which application a ow belongs [13. From the IP header mainly source/destination IP and the protocol eld are used. Encrypted payload packet-based t r a c classication algorithms can use the following features for the denition of signatures: 1. which is a ag evaluated at the receiver that indicates that this TCP segment should be immediately forwarded to the application and not to wait for further segments before forwarding it. Such classication algorithms do not use information that is based on packets from other ows. Packet-based t r a c classication uses a subset of the above features. Recent detection algorithms such as [1] also use the FLAGS of the TCP header for classication. 4. it is the behaviour of a host that is used for classication. for example [1. Beside header elds the packet size of one or several packets can be used.The dierence between the two approaches is mainly the question: What is the central entity used in classication?For the rst approach single ows are used for classication. 8. 20]. For the second. Many t r a c classication algorithms.1 Packet-based Classication For packet-based classication the packets of a single uni.number of transmitted packets. size of Packet 4 is 40 bytes. use information gathered from header elds. All other header elds are marginally used in t r a c classication approaches. Figure 4.1 shows which header elds are likely to be used. in this special case the PUSH (PSH) ag is used. 45. 4. 5.sequence of packets between hosts.header elds. 3. Together with source and destination port from the TCP header we have the classical 5-tuple for classication.packet size. 67]. 2.4. For example the size of Packet 1 is 10 bytes.

Alg orit hm s that eec tivel y use inte rpac ket gap s are rare bec aus e of the .1: TCP / IP Header eld usage for t r a c classication are var yin g.Figure 4.

fact that suc h app roa che s are pro ne to cha nge s in the rou nd trip tim e. as inte rpac ket gap s stro ngly dep end on the del .

ay bet wee n sen der/ rec eive r and rec eive r/se nde r. and . Suc h alg orit hm s hav e to be cust omi zed to spe cic con nec tion s.

Co unti ng the nu mb er of tran .. the ove rall qua lity is still neg ativ ely inu enc ed. due to the cha ngi ng del ay in net wor ks.

A seq uen ce of pac kets me ans that .smi tted pac ket s is mai nly use d for sho rt ows wh ere onl y a few pac ket s are tran smit ted.

the sec ond and thir d one fro m the .the sign atur e use s the info rma tion that the rst pac ket goe s fro m rec eiv er to sen der.

Eci ent ow info rma tion bas ed clas sica tion app roa che s use a co mbi nati on of a sub set .sen der to the rec eiv er and so on.

In the foll owi ng.of the abo ve feat ure s. two exa mpl es of clas sica tion bas ed on info rmat ion gat her ed fro m ows are pre .

whi ch use s sig nat ure s bas ed on info rma tion fro m sin gle ows to clas sify . Bot h exa mpl es are tak en fro m [1].sen ted.

Fig ure 4.2 sho ws the sign atur e use d to det ect a lon g Sky pe UD P pro be.Sky pe tra c. The sign atur e use s the seq uen .

ce of pac ket s bet we en the hos ts. Pro be 1 and 3 in one dire ctio n and .

2 Host/Social Information based Classication Host/Social information based classication algorithms are using information gathered from severalows to predict the presence of an application on a specic host and classify ows according to host behaviour [28.Figure 4. The paper proposes three levels of classication. The main dierence is that header information is used and that only for two out of six packets information about the size is known. Karagiannis et al. whereas for Probe 3 only the oset compared to Probe 1 is known and for Probe 4 we know that the size is one out of a set. On the functional level . The social level only classies hosts based on popularity. 4. From the header elds the information about the PSH ag is used. For Probe 2 the exact size is known.2: Long Skype U D P probe (based on [1]) Probe 2 and 4 in the opposite direction. Figure 4. Further information about the packet size is used in several variants. which is based on the number of hosts communicating with. [35] present a trac classication approach that uses host/social information about a host to classify ows from and to this host. 31.4. 35]. The number of transmitted packets is four.3 shows the signature of the Skype TCP handshake message.

the classication level that is the most relevant one for the work presented here is classication on application level.F i g u r e 4 .similar behaviour as other hosts (community).transport layer protocol. 3 : S k yp e T C P h a n d s h a k e ( b a s e d o n [1]) hosts are classied into provider and consumer hosts. 4.detection of a service on a specic . 5. Classication is based on six heuristics: 1. Finally.the relative cardinality of destination sets (IPs vs ports). 3.average packet size of the ow. 2.

Host or social based classication is often used in combination with other algorithms.4.1 can . 6. The detection of the Skype TCP handshake message as presented in Section 4.presence of payload.host is used for classication of other hosts (recursive).

Dening signatures without any machine learning support is almost impossible because of the frequent changes in applications and the introduction of new applications. or even hides itself behind HTTP or HTTPS connections. the classier can use this host behaviour to increase the overall classication quality. A major . Concerning the specialized approach. 4. Together with the widespread usage of Skype. 77]. e. 5. Skype uses randomly chosen ports. Skype detection is one eld that gained high interest in research. For example the authors of [77] only use statistics about ows for the detection of Skype hosts. Within the last years a huge number of papers on Skype detection has been published. A precondition for a Skype TCP handshake message is a Skype UDP probe where the port on the server side is the same for the UDP probe and the TCP handshake.a detailed and specialized algorithm for detecting a single application. The design of t r a c classication for a large set of applications is challenging. The features used for Skype detection in the aforementioned approaches are widely spread. Header elds. As Skype uses a proprietary protocol where almost all of the t r a c is encrypted. payload-based signatures. 73. statistics on the payload and signatures based on packet directions are as well used as information gathered not directly from the packets of the ow [73]. 2.a widespread combination of algorithms for detecting a large set of applications.g. encrypts the payload.be further enhanced by using host-based information. 41. because a large set of applications needs a large set of signatures for classication.. the detection of Skype ows is challenging. [1. the detection of Skype t r a c is of great interest to the research community. Such an approach allows operating in a privacy preserving environment on high speed links as only marginal information is processed. Consequently.5 Recent T r a c Classication Approaches Recent research activities show two trends for t r a c classication: 1.

dierences in access and backbone networks inuence the quality of trac classication [82]. Most of the proposed algorithms are only applicable to a small subset of network protocols [14. a training le with pre-classied t r a c for all the potential applications is needed which still needs manual classication of a subset in advance. The authors use thirty attributes to classify t r a c using the Kullback-Leibler divergence [38]. However. Simple questions like How much of my trac is encrypted?. 43. the resulting trace le would be of limited value as many classication algorithms would not be able to operate on such a trace. A novel approach that can potentially be used for a wide range of applications. The reasons for this are privacy concerns as well as the problem how to retrieve the ground truth for a trace le. The major problem in actual t r a c classication is the missing availability of trace les to allow a comparison or evaluation of the algorithms based on a mutual basis.research direction is to use approaches that generate their signatures by themselves [25. is presented in [26]. Even if the privacy problem and the ground truth problem can be solved. 26. What applications are present on the network?. To which extent is P2P trac used in my network?or Which application should be used for t r a c classication?cannot be generally answered accurately. . Szabo et al. 46]. To retrieve the ground truth. For supervised learning. To change the network driver on all hosts in the network seems to be challenging. These algorithms are based on supervised or non-supervised learning. To deal with privacy concerns. 26]. there is still the problem to have a trace le that represents a mixture of the trac that would be present in reality. one can remove the payload and anonymize the header elds. In particular. which is also known as relative entropy. called statistical protocol identication (SPID). [74] propose installing a network driver that marks all outgoing packets with an application-specic marking. Currently research is far away from having a trac classication approach that is applicable on a wide range of applications and a wide range of network links.

A real-time system has to satisfy explicit bounds on the response times. but need several packets to classify a ow. or is it able to classify a ow in real-time once all the needed information is received? Within this work we dene real-time such that an upper bound for the response time between receiving the rst packet and classication of the ow has to be specied. What is often meant by a real-timeclassier here is an online or live classier. Further algorithms that only need a few packets at the beginning of the connection are called real-time. Wagner and Platter [78] use entropy of trac parameters such as IP addresses to detect fast changes in network behaviour. For example. Many trac classication algorithms claim themselves to be real-time. such as round trip time or dropped and retransmitted packets. entropy-based classication algorithms have gained interest within the last years. this would mean that there exists an upper bound for the time between receiving the rst packet of a new ow and the time when the ow is classied. Algorithms that are capable of classifying t r a c on high-speed links are often called real-time classiers. it is not possible to give an upper bound for the time needed to classify the t r a c of a o w . In case of a trac classication approach. Entropy-based approaches are often used to detect malicious t r a c [42. 55. 78].4.7 T r a c Classication using Entropy As more and more network t r a c is encrypted. 80]. a worm can be detected when a large number of hosts start to contact a large number of other hosts where all of them are . as once you have received the specied number of packets an upper bound for the classication time can be given. In trac classication the term real-timeis not so strictly dened.6 Online vs Real-Time Classication In trac classication the term real-time is often used in a way that is not in conformance with the actual denition of real-time [17. The major question that arises here is: Is the approach real-time capable from receiving the rst packet of a connection. As interpacket times depend on a lot of external factors. 4.

Pescape [60] uses entropy to reduce the amount of data that has to be processed by t r a c classication tools. Lyda and Hamrock [42] use an entropy-based approach called bintropy that is able to quickly identify encrypted or packed malware. . it is assumed that the connection is subverted. Packets not needed for an appropriate model are dropped. Entropy is used as input for an advanced sampling approach that ensures that sensible information needed to get an appropriate model of the network t r a c is still present. for an HTTPS connection the byte entropy after 256 bytes of payload should be between six and seven. An entropy-based approach which inspired the present work is presented in [55]. I f the value for a specic connection is below this range. which has a vulnerability. For example.using the same destination port. The entropy is used to identify statistical variation of bytes seen on the network. The Ntruncated entropy for dierent encrypted protocols is determined.

only the information that is needed for the specic task should be used.IP addresses are privacy sensitive information even if dynamically allocated. 2. Based on these regulations. The most important regulations that inuence and motivate our work are: 1. Based on the above mentioned laws it is getting more and more d i c u l t for researchers and network operators to f u l l the needed tasks and comply with the law. and on the .the user has to be informed in advance for which purpose which monitored in-formation is used. 3. On the one hand the IP addresses. Network researchers need traces with network t r a c for development and evaluation of new algorithms. Due to these changes two areas in network operation have to change their operation to be in compliance with these new laws. Network providers need to monitor their network to ensure proper operation of the network. two major parts of a packet have to be handled with care in a privacy preserving environment.5 User Privacy i n Computer Networks During the last years the legislation has enacted a few laws which extend the right for privacy to the world of computer networks [23].

A simple approach would be to use a random value for each packet.other hand the payload of the packet. we will now present a framework for privacy preserving secure network monitoring (PRISM). where an IP address is replaced by another IP address. An example for such an IDS is an element of a rewall that inspects the payload of packets to detect email worms. an employee of an operator wants that a purpose is satised and not that a monitoring application is working properly even if thinking in an application focused manner. The more correlation is needed. Firstly.1 Privacy Preserving Network Monitoring The EU funded project PRISM [75] provides a framework for privacy preserving secure network monitoring. the payload is often simply stripped o . As the RT-ETD can be used to support operation of applications in a privacy preserving environment. 5. The core concept of the project is that each given purpose can be satised in a privacy preserving environment. The usage of the purpose as central entity is mainly motivated by two aspects. where a specic monitoring application can be used to satisfy these. An example for such a purpose can be to detect all Skype ows. Anonymization is often used for IP addresses. Consequently. someone would like to have the . For example. the central entities are no longer monitoring applications but are purposes. whereas for IP addresses a potential solution can be anonymization. Simply stripping the payload would end up with broken applications that need parts of the payload. The kind of correlation strictly depends on the intended usage of the data. Consequently. such as intrusion detection systems (IDSs). Also anonymization cannot neglect the needs of a specic application. but if the application that operates on the anonymized data needs to identify which packets belong to which connection a specic IP address has always to be replaced by the prior used IP address. Dierent strategies are applied to them. as we are used to it. there has to be a correlation between the original IP address and the anonymized IP address. the weaker the anonymization is and thus it may be broken by an attacker that gets the anonymized data.

3.1 Architecture The PRISM architecture. It is cut to the length that is needed in further . Following to this is the major step in the front-end and can have various subtasks.1. It consists of 4 major processing steps where for dierent purposes the chain looks dierent. A monitoring application is split into 3 main parts: 1. applications often satisfy a wide range of dierent purposes.front-end for short time. As TSTAT is a powerful tool and has a wide range of operations. targets a line speed of 1 Gbit/s for privacy preserving network monitoring. The rst analysis step is a lter based on packet header information.remaining part of monitoring application that stays outside the PRISM system. where the length of each individual packet can be cut. This information cannot be given to the user for such applications as the usage is not limited to a specic purpose. 2. low processing eort on-the-y analysis. shown in Figure 5. So the PRISM system satises a customer purpose by using a specic monitoring application for this. The front-end mainly provides on the y operations that need no or only little buering of information. To ensure privacy preserving monitoring the PRISM system provides a two-tier approach which will be presented in the next section. 5. In a privacy preserving environment it has to be clearly stated to the network user what the monitored t r a c is used for.back-end for long time analysis and storage. The rst step is copying the t r a c from the wire for further processing. What is possible is to have a privacy preserving Skype trac detection by using TSTAT. Secondly.1.TCP Statistic and Analysis Tool (TSTAT) from the University of Turin running in a privacy preserving environment to detect Skype t r a c . a full functional TSTAT would not be able to operate in a privacy preserving environment.

analysis steps. . Privacy sensitive information which is not needed in back-end analysis functions is already anonymized in the front-end. results are calculated or complex ltering approaches are applied. which is a standardized protocol for exchanging monitoring information. The last step before the trac is transmitted is to encrypt the trac to be protected from being tapped on the wire. The back-end decrypts the trac and performs long term analysis. In this step the t r a c is further reduced. After this the chain may be completely dierent for each purpose. The data transformation for export operation transforms the data into the format which the external monitoring application requires. Analysis results are then stored in an encrypted data base for later usage. Such analysis can for example be exponentially weighted moving average or payload inspection over a reasonable amount of packets. Trac between frontend and back-end is transported using the IPFIX (Internet Protocol Flow Information Export) protocol.

2 Development of a Privacy Preserving Application For the design of a privacy preserving application. the rst question is: What purpose should be satised?There has to be a closer look on what information has to be processed for this purpose. The whole PRISM system is surrounded by a privacy preserving access control system. The access control system is based on an ontology. Processing of privacy sensitive information has to be performed in the PRISM system. Within the PRISM system the processing is split into on the y tasks performed in the front-end and long term analysis in the back-end.1. it may be needed to split the application into 3 parts where the external part is only for visualization. For most applications. the solution is somewhere in between. where processing of the latter can be done outside of the PRISM system. the external part of the monitoring application will get the nal results for presentation purposes only. The processed information has to be split into privacy sensitive and non privacy sensitive information. which ensures that only dedicated users with a dedicated role are allowed to launch a privacy preserving monitoring task. Content providers are often .2 Content Providers and Privacy The RT-EDT supports network operators in privacy preserving network monitoring. where the RT-EDT is not able to support user privacy. The other extreme would be that the application can be fully operated outside of PRISM if privacy sensitive information can be removed by ltering. 5. When privacy sensitive information is processed. The second major area of privacy threats in computer networks are content providers. if all processing steps include privacy sensitive information. Or. 5. Based on the monitoring purpose there exists a wide range of possibilities for privacy preserving monitoring.The external part of the monitoring application can be a legacy application that only operates on a subset of the trac if the PRISM system is capable of ltering the t r a c in that way that user privacy is not violated.

As a user. and reduce the risk of being inl trated. based on the browser history of a customer. . The usage of entropy in this domain is left open for future work. facebook is able to see into ones social life.collecting as much information as possible about their customers. A recent paper [79] shows an approach where. Content providers are often hiding the collection of data behind laws of nations without any user privacy legislation. when all friends are interlinked within a social network webpage as facebook. to obfuscate the social network of their users. and make it more di c ul t to identify individual users. one should be aware of the fact that. the more likely it is to make revenue out of this knowledge. What users are often not aware of is that a content provider of a dierent webpage can do this as well. This even includes the ability to identify facebook groups the customer joined. as the more information about a customer they have. the content provider is able to identify which websites have been visited and thus to identify which social networks the customer is using. by using all methods that are available. For such obfuscation entropy can be used as a kind of measure to keep the level of obfuscation high. Such approaches include phishing of account details of other websites. Content providers should change their behaviour. Collecting privacy sensitive information is widespread and starts with simply logging each users behaviour on the website to allow sending personalized oers per email. An approach where user privacy is simply ignored is when content providers start to collect as much data of a user as possible. The main reason for this is business.

the approach is customized for SPID [26] which is a classier that detects multiple encrypted protocols. .Port-based classication.e. The goal of the proposed approach is to operate as a real-time encrypted t r a c detector (RT-ETD). Customizing of the algorithm will be dealt with in Section 6.6 Real. Furthermore. with the Adami Skype detector [1]. The algorithm is targeted at being used as a t r a c l t e r in a privacy preserving environment.Time Detection This chapter presents a novel approach for t r a c classication based on information solely gathered from content of the rst packet of a ow.Coding-based classication.6 where the proposed algorithm is used as a privacy preserving real-time lter for Skype detection. 1. remove ows with privacy sensitive payload). The term ow is dened as bi-directional ow based on the 5-tuple. The rst packet in this context is dened as the rst packet sent by a UDP connection and for TCP the rst packet is the one that follows the three-way handshake. 3. where the task is to lter t r a c and forward only ows which are encrypted (i. The proposed system is based on several modules that can be combined to meet the requirements of various purposes.Entropy-based classication. 2. The proposed algorithm can be divided into ve dierent modules that are used for classication of t r a c .

Port numbers greater than 1023 are not taken into account for the port-based classier as these ports may not be so strictly bound to a specic type of application. . this message has a partially xed content and does not have random payload. If a user or application developer does this.Content-based classication.IP(-address)-based classication. to circumvent rules or regulations set by an administrator. the communication is encrypted within an HTTPS connection. The main motivation why we are not using the entropy based algorithm here is the fact that port-based classication is faster. Therefore a solely entropy-based classication approach would always drop that t r a c . Each of them will now be explained in more detail. the reason is most properly that this t r a c should be hidden. 5.1 Port-based Classication Within the range of well known ports [27] a few ports exist that are solely targeted at applications that use encrypted communication. and that on these ports the rst packet is often an unencrypted initialization message.Real-Time Detection__________ 4. The most important ones are shown in Table 6.6. We assume that there is no application that transports unencrypted t r a c on these ports. an HTTPS connection starts with a Client Hello message.1. Still. For example. 50 6. The proposed algorithm oers the possibility to use a port-based classier which is based on the assumption that t r a c on these ports is encrypted and consequently can be forwarded without impacting privacy. We assume that forwarding such t r a c is not a violation of their privacy as they have to be aware of the fact that they are using well known encrypted ports for unencrypted t r a c .

In Chapter 2 we present an approximation of HN(U). Within this work the standard Monte-Carlo method for HN(U) and for an estimate of the condence intervals is used.000 words were generated. HN(U) and the condence intervals are based on this experiment. to determine whether the payload is encrypted or not. within this approach only information gathered from the rst packet of the ow is used. as an alternative approach Olivain and Goubault-Larrecq suggest using standard Monte-Carlo method where a large number of words w of length N are drawn at random and the average is taken as HN(U). over TLS/SSL 990 telnet protocol over TLS/SSL 992 IMAP4 protocol over TLS/SSL 993 IRC protocol over TLS/SSL 994 POP3 protocol over TLS/SSL 995 Table 6. over TLS/SSL 989 FTP protocol. While Olivain and Goubault-Larrecq evaluate the payload of several packets. .Name SSH NSIIOPS HTTPS DDM-SSL NNTPS sshell LDAPS CORBA-IIOP-SSL IEEE-MMS-SSL FTPS-data FTPS telnets IMAPS IRCS POP3s Application Port The Secure Shell (SSH) Protocol 22 IIOP Name Service over TLS/SSL 261 HTTP protocol over TLS/SSL 443 DDM-Remote DB Access Using Secure 448 NNTP protocol over TLS/SSL Sockets 563 SSLshell 614 LDAP protocol over TLS/SSL 636 684 CORBA IIOP SSL IEEE-MMS-SSL 695 FTP protocol.2 Entropy-based Classication This section presents the core algorithm of the RT-ETD. which is described in Chapter 2. Also for an estimation of the condence intervals the standard Monte-Carlo method is suggested to be used. using the Matlab [76] function rand. We use the approach of Olivain and Goubault-Larrecq [55]. Based on the dierence between these two values. control. For each N from 1 to 1472 100. The core concept is to estimate the N-truncated entropy of the payload of the rst packet and compare this value to the N-truncated entropy of uniformly distributed payload. it is decided whether the ow is encrypted or not.1: Encrypted well-known port numbers 6. data.

1: Inuence of a on entropy for given word length based on Monte-Carlo m=256without taking into account for example.1 shows the inuence of choosing a for given words of length N based on Monte-Carlo. We dene a to be the number of bits used for the alphabet which means m = 2a. We have performed a study on the inuence of the size of the alphabet m.Figure 6. The entropy values are normalized by dividing by the number of used bits. For a we are using 1 to 12 bit and compare the results for entropy and condence intervals. such that all values are between 0 and 1. As basis for an estimation of the condence intervals we use the same Monte-Carlo method as for HN(U). which calculates the borders of the condence intervals. All curves tend to one. We . As in an undersampled environment the MLE estimator underestimates the entropy it is obvious that for larger a longer words will be needed to get an estimate equal to one. bit entropy or nibble entropy. The results of the Monte-Carlo method are fed into the Mat-lab [76] function p r c t i l e . The major aspect is that the larger we choose a the smaller H N(U)a will be. Figure 6.

Based on Figure 6. The length of the condence intervals is normalized by the standard deviation (SD) to allow easy comparison and divided by two to show them in SD units.2 it is obvious that choosing a condence bound of plus/minus three times the standard deviation will lead to at least a 99% condence bound independent of a. The plots for the remaining eight a look similar. Values between HN(U) + 3 SD and H are very likely to be encrypted. a = 4.2 it can .2 plots the length of the condence intervals for a = 1.(b) a=4 (a) a=1 (d) a=12 (c) a=8 Figure 6.2: Normalized length of the condence intervals then calculate the length of the condence intervals by subtracting the lower border from the upper border. Figure 6. but we stay to the condence bound of 3 SD as based on Figure 6. Consequently we are 99% sure that entropy of uniform distributed payload is within this range. a = 8 and a = 12. An explicit look on short payloads will be given later. Only for short payloads this is not true.

5% .be seen that the 99.

As argued in [59] for a test for uniformity the word should at least have the length o f m . for details see [59].Figure 6. . within the of HN(U) + consequently at most 0. For packet size up to 71 bytes a = 12 has the smallest standard deviation whereas for packet size greater than 71 bytes a = 1. Figure 6.3 shows the standard deviation for dierent a. The evaluation in Chapter 7 will show that the dierences are marginal. which would in the case of a = 12 be 64 words of length 12 (or equivalent 96 bytes). The consequence of this would in our case be that unencrypted t r a c is forwarded. with shorter words there is at least one distribution that cannot be distinguished from the uniform distribution. As for the condence bound we are using three times the standard deviation.3: Standard deviation percentile is range 3 SD.5% are beyond HN(U) +3 SD. a detailed look on the standard deviation for dierent a is given next. But if we choose a = 12 for small packet sizes the 99% condence bound will be larger than three times the standard deviation. The reason for this is that.

(b) CDF UDP (a) CDF TCP

Figure 6.4: C DF for U D P / T C P packets

The condence bound of 3 SD ensures that we are 99% sure that entropy of uniform distributed payload is within the range of 3 SD, which is only one goal that should be retrieved, further unencrypted t r a c should be removed. As the standard deviation is small we are condent that only a small fraction of unencrypted t r a c will be in this range. The evaluation in Chapter 7 shows that in practice the decision bound works well. Further theoretical consideration is left open for future work. To determine whether the RT-ETD has to be operational below 96 bytes let us have a look at a real world network trace. We have collected 2GB of t r a c in a local cable network providers network. Figure 6.4 plots the CDFs (cumulative distribution function) of the packet size of the rst packet of UDP/TCP ows. I t is obvious that in case of UDP ows the algorithm has to be operational below 96 bytes, as more than 80% of the rst packets are shorter. According to Figure 6.4 it is obvious that our approach has to be operational for a packet sizes of eight or at least sixteen bytes. For a packet size of sixteen bytes, about 0.5% of the UDP ows cannot be evaluated. We assume that this low fraction of ows can be neglected. Consequently we need to use an a that is smaller than 9 as, according to [59], greater values need longer words than 16 bytes. To choose a = 1 may be not the optimal choice for a completely dierent

reason. I n this case it is only determined whether the number of zeros and ones is equal within

the word or not. For example, the hexadecimal (HEX) sequence 0F0F0F0F0F0Fhas the maximum entropy of 1 although it is clearly not random. To nd the optimal a is not trivial. We are going to investigate this in more detail by evaluating the inuence of a on the ltering performance in Chapter 7.

6.3 Coding-based Classication
The coding-based classier utilizes the fact that many protocols use ASCII encoded plain text. For example the payload of the rst packet of a POP3 message can look like +OK Dovecot ready. . .1 which is equal to 2b 4f 4b 20 44 6f 76 65 63 6f 74 20 72 65 61 64 79 2e 0d 0 a i n HEX. Based on the HEX values the above plain POP3 payload looks random. For this example, based on a = 8, HN(U) is 4.25, the entropy based on MLE is 4.02. The standard deviation is 0.082, as 4.02 is within the range of 425 3 008. Consequently, the entropy-based classier would assume that the message is encrypted. The rst packet partially consists of a free congurable text, in this example Dove cot r e a d y . . A dierent server is instead using Hello t h e r e . . Depending on this free congurable text, the entropy-based classier decides whether the ow is encrypted or not. For the packet with the payload+OK Hello there. . . H MLE entropy is 3.68, which is outside of 410 3 0083, consequently this ow is assumed to be plain. For a human being it is clear that both messages are unencrypted. It is obvious that there is a big text section in the payload. A shown by the example above our entropy-based classication for encrypted t r a c has a too high rate of false positives. To overcome this shortcoming of the entropy-based approach we add a further check. We assume that for plain text messages the payload is encoded in ASCII or ANSI where coding values 32 to 127 are used for printable characters. For our entropy estimation the entropy of the example above is within the given condence bound. Nevertheless it is obvious that it is very unlikely that a random source
1The
N

(U) is 4.10 and the

two dots at the end are non printable characters

The probability that a character from a random source will be in the range from 32 to 127 is about 37. which is the length of the range from 32 to 127.would produce such a word. An analysis based on real network traces has shown that the coding-based classier does not drop any UDP ows. The IP-address-based classication module allows specifying IP addresses that should be forwarded without further checking for port numbers. As the simple approach only taken into account the fraction. Consequently we added a check that if the fraction of bytes with values between 32 and 127 is greater than 75% we assume that the ow is unencrypted. works ne for our purpose we do not use n-grams for now. As we want to reduce the processing time we do not evaluate the full payload of the packet. entropy or coding. A more complex possibility would be to check for n-grams in the payload. An evaluation for dierent values for this is left for future work. We only evaluate the rst few bytes. Especially if at the beginning of the packet a large fraction of characters is in this range the payload is most likely unencrypted.5 and so on. in the range from 32 to 127.4 IP-address-based C la s s i c a t i o n What has not been taken into account until now is the knowledge that a specic IP address or a range of IP addresses only uses encrypted communication or is only oering information that is not privacy sensitive.5%. Only taking into account too few bytes would lead to a high rate of misinterpretations. 6. 75% means that the fraction of bytes in the range from 32 to 127 is twice as high as that of a random source. For example we can count the number of occurrences of sequences in the range from 32 to 127 of length 3. We exemplarily choose to use a maximum of 96 bytes. consequently the approach is only used for TCP ows.4. This approach would allow a more detailed analysis of the content. . With this aspect we would not only take into account the fraction of letters and numbers but also the grouping of those.

An example of such an application is Skype where the HTTP Update is unencrypted.5 Content-based C l a s si ca t i o n This classication module uses very detailed information about the payload of the rst packet of an application. This specic IP does not oer any other service. Such an approach can be used for any application where the other modules fail but enough detailed information about the rst packet is known. For example the Skype Login starts with a packet that is only 5 bytes long.5 until one block decides to forward it or it reaches the end of the chain. consequently our entropy-based approach fails. and the Skype HTTP Update messages do not transport privacy sensitive information. If the packet is forwarded. Skype uses a single host for the updates. which traverses the blocks shown in Figure 6. In general specic knowledge about the rst packet can be used to further enhance the lter.classication often has the goal to classify all the t r a c from a specic application. The decision whether to forward a ow is still based on the rst packet. 6. we add a classication module that uses this specic knowledge to identify a Skype Login ow. Often applications that are using encrypted t r a c do not encrypt a small fraction of t r a c . If our classication algorithm should be used as pre-lter for Skype t r a c . based on [1] we know that the rst Skype Login packet has a payload of HEX 22 03 01 00 0 0 a n d the PSH ag set. consequently the IP-addressbased classication module can be used to forward Skype HTTP Update messages.6 F u l l C l a s s i c a t i o n Chain The classication modules described above are combined to form a classication chain. Such knowledge can for example be areas of unencrypted t r a c that are not taken into account for the entropy classier or partially xed values within the payload. 6. the corresponding 5-tuple is added to the . However. consequently forwarding them is inline with our vision.

the 5-tuple will be deleted from the lists. Unfortunately.Figure 6. After receiving a TCP segment where the FIN ag is set. which is set in the last packet of a connection indicating that all data has been transmitted. otherwise it will be dropped and the 5-tuple will be added to the list of unencrypted ows. a lot of applications use encrypted as well as unencrypted t r a c . A major goal in t r a c classication is to identify all the t r a c from a specic application.5: Block diagram for classication list of encrypted ows. As there is no such signalling for UDP ows we use a timeout for UDP ows. Therefore we provide the possibility to customize our algorithm to forward application specic t r a c that can be identied based on the . Thus our approach removes a fraction of the applications t r a c and possibly degrades the detection quality. If for the duration of the timeout we do not receive a packet for a specic 5-tuple it is deleted from the lists.

6.rst packet and is unencrypted.1 Skype Customized Classication This section presents a lter chain for Skype classication with the Adami Skype detector [1]. Furthermore we provide a customization as pre-lter for SPID [26]. . Skype trac can be divided into 4 groups that will be detected by using dierent modules from above.6. We provide this exemplarily for Skype detection.

We use a few lists to store information. For the rst category of trac our entropy-based classier together with the coding-based classier will be used.1.) 3. The UDP 5-tuples are removed from the lists if there has not been a packet for this 5-tuple for 300s.9. we are only using the IP address of the update server u i . A TCP ow which was detected as encrypted will be stored in the ENCRList. Flows where rst packet is only 5 bytes long and has a xed payload (see example in Section 6. 2This server is solely used for updates of the Skype client via HTTP . Flows where the rst packet is longer t h a n m bytes and the entropy evaluation suggest that it is encrypted. Furthermore the 5-tuple will be removed from the SYNList if. there has not been any data packet. skype . The Skype Login ows are detected as presented in Section 6. which indicates a TCP connection establishment. HTTP ows to u i . Flows on port 443 (based on the rst packet it is not possible to distinguish Skype ows on port 443 from normal HTTPS ows. 163. 158. 2. This should prevent the SYNList from growing due to SYN ooding attacks. The SYNList stores all 5-tuples where we have received a packet with the SYN ag set.5. The 5tuple will be removed upon receiving a packet where the FIN ag is set. 158) here. The ow chart for pre-ltering of Skype trac is shown in Figure 6.9. The rst one is a list of known Skype IP addresses.6. com (204. 4.5). As either is encrypted there are no privacy concerns when forwarding them. A 5tuple is removed from the SYNList after receiving the rst packet containing pay-load or receiving a packet where the FIN ag is set. Encrypted UDP ows are stored in the ENCRList.com2. 163. skype. whereas unencrypted UDP ows are stored in the UNENCRList. 60s after the packet with the SYN ag set. Trac on port 443 will be detected by the port based classier whereas the Skype HTTP update will be detected by the IP address used for the update which is 204.

l t e r i n g ow chart .61 Figure 6.6: Skype p r e .

for any decision the ow will be removed from the SYNList.skype. First of all non IP trac is dropped. The core concept of the algorithm for TCP ows is in the First packet after handshake block.All horizontal arrows in the ow chart indicate that the answer at the decision point was yes. for which we perform the preltering. All other packets with empty payload are dropped. Packets shorter t h a n m will be dropped and added to the UNENCRList. not evaluate packets with empty payload we only forward empty packets where the ACK ag is set and which are in the SYNList. This behaviour is ensured by the Drop empty packetsblock.com is forwarded. The next check ensures that all t r a c to ui. The last check for TCP forward packets of ows in the ENCRList and drops all others. For TCP we used a port-based classier to forward all t r a c on port 443. if is in the UNENCRList it will be dropped. Longer packets will be checked with the entropy-based classier. If the ow is already in the ENCRList it will be forwarded. The following check drops all ows where the rst packet is shorter t h a n m as the entropy-based classier does not work reliably on such short packets. If decided that the packet is unencrypted it is dropped and the ow is added . The next check is whether the payload is equal to HEX 2203010000to ensure to forward Skype Login ows. it needs to be evaluated. For UDP the sequence is shorter as there is no three-way handshake to be taken into account. vertical arrows indicate a no. and that the 5-tuple is stored in the SYNList. If the packet does not belong to a ow in any list. If the ow should be forwarded the packet will be forwarded and the ow will be added to the ENCRList. Next we distinguish between TCP and UDP. does. The SYN+SYN/ACK forwardblock ensures that the rst two messages of the three-way handshake are forwarded. after the initial three-way handshake. It evaluates whether the ow should be forwarded or not. The FIN cleanupblock ensures that when a packet is received where the FIN ag is set the 5-tuple is removed from SYNList and ENCRList. As the Skype detector. The In SYNList check ensures that this is the rst data packet after the three-way handshake. The last check in this block is based on the results of the entropy and the coding-based classier.

.Spotify Server.eDonkey TCP. SPID is able to identify several unencrypted applications (DNS. 2. 9.. 3. FTP.Skype UDP.2 SPID Customized Classication In this subsection we present the usage of the RT-ETD as pre-lter for encrypted t r a c that should be classied by SPID [26].only processing a fraction of trac thus making SPID faster.only processing encrypted t r a c (privacy). There are two major reasons why our approach can be used as pre-lter for SPID: 1. .to the UNENCRList.) and the following encrypted applications: 1. 8.SSL. POP. 6.MSE. 7.Spotify P2P.Skype TCP. HTTP. a third possibility would be to integrate the entropy-based approach as a further metric directly into SPID. 2. 6.SSH. SMTP. 4. otherwise the packet is forwarded and the ow is added to the ENCRList. 5. As SPID uses metrics based on statistical information..6.eDonkey UDP.

7: SPID pre-ltering ow chart .Figure 6.

The Forward handshake ACKensures that the third packet of the 3 way handshake is forwarded. The SYN+SYN/ACK forwardand FIN cleanupblocks are the same as for the Skype pre-lter. Also the UDP ltering is exactly the same as for the Skype pre-lter.For this work we will concentrate on Skype. MSE (Message Stream Encryption) is used by BitTorrent implementations to encrypt the t r a c between peers.7 shows the ow chart of the pre-lter for SPID.6. The First packet after handshakeblock. independent of the size of the payload. It has to be noted. Figure 6. For the SPID pre-lter all packets belonging to an encrypted ow are forwarded. Consequently as pre-lter for SPID to classify Skype. beside the ones of the 3 way handshake. The HTTP Update of Skype is not classied as a Skype ow but as a normal HTTP ow. even if the ow was detected as encrypted. Likewise HTTPS ows on port 443 are classied as SSL ows. It is again distinguished between TCP and UDP trac. . MSE and eDonkey the algorithm will be slightly dierent. In Section 6. MSE and eDonkey.1 we dropped all packets with payload zero. All non IP trac is dropped. There is no need for the port-based and IP-based classiers. the forthcoming In ENCRListcheck and the corresponding actions are the same as before. that the denition of Skype ows within SPID is dierent to the one of the Adami Skype detector.

The evaluation was performed on ve main areas: 1. 2. traces where the ground truth is known are essential.Pre-ltering for SPID classier. a perfect identication solely based on the trac itself is very hard or even impossible. 5.Coding-based classier. In most cases the result does not reect the wide variety of dierent t r a c types and patterns and thus has limited usability. 2.Producing interesting t r a c in a controlled environment. 4.Classifying t r a c from representative traces. Furthermore. . For an evaluation of a trac classier.7 Evaluation This chapter presents evaluation results of the real-time encrypted t r a c detector (RTETD). There are two ways to create such traces: 1.Inuence of the symbol length a.Pre-ltering of Skype t r a c . The latter option does not require extensive preparation at the cost of a very complex and time-consuming subsequent analysis. 3.Inuence of setting the condence bound. The former option can be done relatively easy but the quality mainly depends on the composition and preparation of the environment.

1 shows a schematic representation of the evaluation process. A 1h/2GB trace. The authors of [26] provided us with some of their fully classied traces of encrypted t r a c .Figure 7. The traces in this network have been collected in October 2009. Additionally a trace from the network of a wireless provider used by about 1000 customers has been captured in May 2010. which will be referred to as ground truth traces for the rest of the thesis. 7. the Adami Skype detector and RTETD. These traces are used for the evaluation of the parameter settings of the entropy-based classier. Figure 7.1: Evaluation process For the evaluation we used real network traces and traces where the ground truth is known. Real network traces have been collected in a network of a small cable network provider in a segment used by about 100 customers. The trace covers about 1 hour and has a volume of about 13GB. The results of the Adami Skype detector and SPID are used as 100% within the individual t r a c categories. For collecting the traces a mirror port at a bre switch was used. the average t r a c volume is about 4Mbit/s. For the lesize the size of the original trace les is used as 100%. Several traces have been collected in this network for the evaluation we are using three of them. In the rst step the collected trace les are processed by SPID. The ltered output les from the RT-ETD are then processed by the Adami .5h/13GB trace and a 35h/48GB trace. which is an average rate of about 30Mbit/s.

9% 1. only the entropy-based classier is used.2% 98.0% 100% 99.2% 99.6% 99.5% 98. For example for a = 98. Especially for encrypted eDonkey UDP t r a c a > 10 lead to worse results as more than 98% of the rst packets are shorter than 52 bytes.4% 3. Flows where the rst packet is shorter than m are dropped.5% 100% 99.3% Skype UDP 1973 95. The evaluation is based on the ground truth traces.7% 97.as original 1 2 3 4 5 6 7 8 9 10 11 12 Skype TCP 91 100% 100% 100% 100% 100% 98.3% 13.3% 12.5% 65. an evaluation for a starting from one to twelve is given.3% 98. 7.9% 26.8% 98. An optimal result would be if still 100% of the encrypted ows are present in the ltered les.6% 98.4% 99. .0% 99.0% 15.5% 97.4% 98.4% 99. and consequently not evaluated.9% 6 in the Skype TCP category represents that 90 ows out of 91 Skype TCP ows in the trace le are still present in the ltered le.2% Table 7.5% 99.8% 98. and no unencrypted ows are present.1: Evaluation of a based on ground t ru t h trace Skype detector and SPID.0% 91.2% 99.4% 99.4% 72.8% 94.7% 99.2% 96.1 shows the number of ows in each category based on the original trace le which is used as 100%.5% 1.2% 99.2% 30.8% 97.4% 98.6% 98. The results are then compared to the results based on the original le.7% 83.6% 98.4% 55.5% 98.3% 99. The metric we are using here is the fraction of ows in each category that can still be detected in the ltered le and the fraction by which the size of trace le was reduced by the RT-ETD. For a > 8 the detection ratio decreases as the fraction of packets shorter than m increase.1 I n u e n c e of the Symbol Length As setting the symbol length (a) to an appropriate value is not trivial.4% 96.0% MSE 649 96.2% 60.0% eDonkey TCP eDonkey UDP 828 398 99% 96. Table 7.6% 94. For each a the fraction of detected ows within the ltered le in each category is given.9% 100% 97.6% 96. To avoid inuences by other classiers.

As we are only evaluating on encrypted ground truth traces the best results would be performed by an algorithm that simply forwards all packets. For MSE and eDonkey t r a c SPID [26] was used. for eDonkey UDP an a of seven lead to the best results.2: Evaluation of a based on ground t ru t h trace Figure 7. I f our algorithm should be targeted to a specic type of t r a c a dierent a would lead to better results. so this column is omitted. Table 7.2 plots the fraction of detected ows for a from one to eight.2 shows the results for a one hour / 2GB real network trace collected in the cable network. For the detection of Skype ows the Adami Skype detector [1] was used. I t seems obvious from this gure. We performed an evaluation on a real network trace to have results for the amount of data reduction. For an a of eight the size of the le is reduced by a factor of about fty. e. what has been neglected in this evaluation is the amount by which the approach removes unencrypted t r a c from a trace.g. that an a of six would be the optimal choice for the detection of this ve t r a c categories. SPID does not detect any eDonkey TCP ows. Whereas for an a of six the factor of data .Figure 7.

Based on manual deep packet inspection the missing MSE ow seems to be a false positive of SPID.7% 98.64% 4. Consequently.50% 1.2% 98.67% 9.04% 100% 0.4% 100% 99.4% 97. Thus we decided to use an a of eight for the rest of the thesis.1% Skype UDP probe UDP call 1445 1 97.7% 41.7% 100% 99. based on ground truth and real network traces the fraction of encrypted ows that is still present in the ltered le is greater than 94% for all categories.5% 100% 85.7% 100% 100% 100% 100% 100% 100% 100% 100% 91.5% 60.8% 99.11% 4. whereas from a viewpoint of data reduction six is not optimal.98% 2.28% 2.7% 8.7% 100% 99.9% 100% 98. even if a = 12 is the best symbol length concerning data reduction.6% 98.1% 66. there is no practical usage as for example almost all Skype UDP probes are removed. Based on the ground truth traces the optimal setting would have been an a of six.52% 100% 0. What is obvious from Table 7.04% 0% TCP 12 91.2: Evaluation of a based on real network trace reduction is in the range of twenty. Consequently.33% MSE eDonkey UDP 5 4 100% 50% 100% 75% 100% 75% 100% 75% 100% 75% 100% 75% 100% 75% 100% 75% 80% 75% 75% 80% 40% 75% 0% 75% Table 7.99% 6.0% 97.2 is the bad detection quality for MSE ows.9% 98. .85% 4.a original 1 2 3 4 5 6 7 8 9 10 11 12 size 2GB 7.55% 7.63% 2. For a > 9 again a lot of encrypted ows are dropped.1% 99.7% 100% 99.2% 100% 97.2% 55.83% Ping 2840 96. As the approach is targeted to operate in a privacy preserving environment.12% 1.6% 100% 99.1% 100% 1. the appropriate setting of a depends on whether the factor of data reduction or the fraction of forwarded encrypted ows is more important. as the remote IP/port pair is hosting a service for a SPAM lter software. the factor of data reduction plays a central role.

5% percentile are very likely to be encrypted. Table 7. as a signicant fraction of encrypted ows are between 3 SD and 4 SD. We will now investigate on this in more detail by evaluating how close ows to this border are. would have the consequence of forwarding about 35% of the ows including a large fraction of unencrypted t r a c .3 shows the CDFs of HN(U) ground truth traces a large fraction of ows is within 3 SD. all Skype TCP and eDonkey UDP ows are within 4 SD.1 are used. For the other three categories the fraction of ows detected as encrypted is slightly increased. ~ ~ HMLESD for the 6 dierent traces.5% percentile can be used as condence bound. Asows whereH MLE is greater than HN(U) + 3 SD are very likely to be encrypted the condence bound could be changed from symmetric to asymmetric.2 Appropriate Size of C o n d e n c e Bound Based on a Monte-Carlo method we decided to use a condence bound of 3 times the standard deviation (see Chapter 6). For the calculation of the length of the 99% condence bound we have been using the 0. Based on Figure 7. as increasing the condence bound this wide. For the encrypted Figure 7. If HMLE is within HN(U) 3 SD we assume that the ow is encrypted.3 it may be worth having a closer look on the inuence of increasing the condence bound to 4 SD. We thus present the same evaluation on our real network trace. The same traces as in Section 7. For example to forward all t r a c whereH MLE is greater than HN(U) 3 SD.5% percentile of HN(U). and consequently not only more encrypted t r a c . which cannot be detected by the proposed approach. Again ows where HM L E is greater than the 99. Consequently also the 0. but also unencrypted t r a c will be forwarded.3 shows a comparison between the four dierent condence bounds. Using the other two condence bounds lead to similar results as 3SD. Concerning the condence bound of 4 SD.7.5% and 99. Increasing the condence bound to 4 SD will have the consequence to forward more t r a c . . this would lead to the aspect that we are 99.5% sure that the encrypted t r a c is forwarded. and we forward all t r a c that is beyond this value. A few ows from encrypted applications are even beyond 18 SD.

3 : C D F s o f H N( U ) H M L E ~ SD (a) Skype TCP (b) Skype UDP (c) MSE (d) eDonkey TCP .~ F i g u r e 7 .

The lesize increases by 600kByte and 400kByte respectively. The size of the output le increases by only 1. bound original HN(U) 3 SD HN(U) 4 SD > HN(U) 3SD > 05% percentile Skype TCP 91 97.5% 99.1% eDonkey TCP eDonkey UDP 828 398 99.2% 98.7% 98. For the condence bounds o f > HN(U) 3 SD and the 0. For the ongoing work we decided to stay at a condence bound of 3 SD.5% 100% 99.1% 98.7% 100% TCP 12 100% 100% 100% 100% MSE eDonkey 5 4 100% 75% 100% 75% 75% 100% 100% 75% Table 7.01% Ping 2840 99.02% 2.4: Evaluation of condence bound based on real network trace Based on this trace increasing the condence bound to 4 SD would result in forwarding 36 additional ows.9% 98.2% 99.08% 2. Table 7.3: Evaluation of condence bound based on ground truth trace conf. so the fraction of forwarded unencrypted ows is not signicantly increased by increasing the condence bound to 4 SD. In case of our real network trace only 36 additional ows are forwarded.6% 94.8% 100% 98.5% 100% 99.9% Skype UDP 1973 98.5% 100% 94. when increasing the condence bound from 3 SD to 4 SD.9% 100% 99.8% Skype UDP probe UDP call 1445 1 99.3% 99.5% 99. as there are almost no practical implications when using> HN(U) 3 SD and the 0. For the condence bound of 4 SD a more detailed analysis based on our pre-lter for SPID is given in Section 7. where 16 are detected as Skype ows. .0% 99.4 shows the comparison for the condence bounds based on the real network trace.7% Table 7. Based on the Adami Skype detector 16 ows of them are Skype ows. The appropriate choice of the condence bound again depends on what is more important.conf.2% 99.0% 98.5.8MByte.4% 94.5% percentile instead. to forward less unencrypted t r a c or to forward a large fraction of encrypted t r a c . bound original HN(U) 3 SD HN(U) 4 SD > HN(U) 3SD > 05% percentile size 2GB 1.1% MSE 649 99.99% 2.6% 94.5% percentile the results are similar to 3 SD.7% 99.

we are able to drop a large fraction of non Skype ows.7 . For TCP t r a c the coding-based classier removes about 900kByte from the output le without changing the classication results of the Adami classier and SPID. Consequently.6% Skype ows are still present. we now continue to evaluate the code proposed in Subsection 6. Table 7. 900kByte seems to be a negligible amount of t r a c but deep packet inspection revealed that these are POP3 ows where the username and password are transported in plain text.5 shows the results of the evaluation. A 7.5 hour trace. For the three traces the lesize is reduced by a factor between seven and twenty-ve. Consequently the coding-based classier is not used for UDP t r a c .1 is evaluated. For all trace les the fraction of Skype ows that is still present is at least 97.ba sed C l a s s i e r Based on the real network trace we evaluated the usage for the coding-based classier for TCP and UDP t r a c . Using the coding-based classier for UDP t r a c does not inuence the classication at all. consequently the reduction of about 14% is close to optimal. Until now only the entropy-based and the coding-based classier have been evaluated. 7 . The evaluation is based on three dierent real network traces. The number of TCP ows is reduced to about 10%. thus the content-based classier is of great value in this case.F i l te r In this section the algorithm proposed in Section 6.4 Skyp e T r a c Pr e.5 hours and a 35 hours trace le collected in the cable network and a one hour trace le collected in the wireless network. where the factor for the wireless trace is the smallest.6.6%.3 Eva lu a ti on o f C od i ng .6. and ensure that at least 97. I t has to be noticed that using the Adami . I t has to be noticed that only 20% of the ows in the original le are detected to be non Skype ows. The percentage of UDP ows that is dropped is smallest for the 7.1 to have a Skype pre-lter based on information gathered from the rst packet of the ow. as privacy sensitive information is removed.

5 SPID T r a c Pre-Filter The evaluation of the algorithm proposed in Section 6. For encrypted protocols that we take into account. 7.0% 39394 98. Between 73% and 96% of the ows are dropped by our pre-lter. HTTP.Type Filesize [MB] TCP ows UDP ows Skype TCP Sykpe UDP 7.5: Results for Skype p r e .2 is based on the same trace les as the evaluation of the Skype t r a c pre-lter. in the . Skype TCP/UDP at least 76. whereas 0% indicates that the fraction is below 0. IMAP or SMTP are almost completely removed.3% 298856 9.5h/13GB 35h/48GB original ltered original ltered 13531 5.9% 494489 8. For UDP the major reason is the aspect that SPID does not use a timeout while Adami does.l t e r 1h/13GB original 13157 76199 80568 255 9512 ltered 13.9% 98.8% 28204 99.85% 249 98. which is a strong indication that only a small fraction of unencrypted t r a c is forwarded.7% (MSE) of the ows are still present in the ltered le. eDonkey TCP/UDP encrypted.0% 49.7% Table 7.42% 48527 3.6% Skype detector to process the original le takes signicantly longer than pre-ltering the original le and then processing the ltered result.6 shows the results of the evaluation. MSE. First we concentrate on ltering with the condence bound of 3 SD.8% 19. What is obvious is the dierence between the number of Skype ows detected by SPID and the Adami Skype detector. Table 7. this category is completely removed.98% 143867 11.99% 35076 85.4% 97.01%. For TCP it highlights the aspect that Skype detection is not trivial. as for Skype pre-ltering we forward all TCP trac on port 443. For eDonkey and Skype the fraction is above 93%. For example the signature of Adamis Skype Login message is HEX 2203010000. The rst major dierence is that the ltered les are about 25% smaller than the preltered les for Skype detection.6. The major part of the dierence is HTTPS trac on port 443. A zero indicates a 0 count.4% 243 98. and consequently two dierent algorithms lead to dierent results when executed on the same le. Unencrypted ows such as FTP.

.

. As stated above there is a trade-obetween detection performance and privacy preservation. the Adami Skype detector fails to detect these Skype ows. Increasing the condence bound to 4 SD slightly increase lesize as well as detection quality. Consequently.ground truth les from the authors of [26] the Skype Login messages have a payload of 1603010000.

The RT-ETD can be used as a privacy preserving pre-lter of network t r a c . Improved user privacy. 2. The pre-ltered t r a c is then used in more detailed classication steps. The task of the RT-ETD is to gather information from the rst packet of a ow and decide whether the ow is encrypted or not. The algorithm of the RT-ETD is enhanced by the usage of further classication modules. IP addresses. statistical information about the payload or the content of the payload. The core concept of the approach is based on estimating the entropy of the payload. and the need for appropriate t r a c classication to perform QoS. The work is motivated by the evolvement of privacy legislation into the digital life. due to the usage of the RT-ETD as pre-lter in privacy preserving network monitoring. and decides whether the ow is encrypted or not.8 Conclusion I n this thesis a real-time encrypted t r a c detector (RT-ETD) is proposed. Within these modules the decision is based on port numbers. Proposal of a new metric in t r a c classication based on entropy estimation of the payload of the rst packet of a ow. This estimated value is then compared to the estimated entropy of a uniformly distributed payload of the same length. The RT-ETD advances current state of the art in three main areas: 1. The RTETD gathers information from the rst packet of a ow. . Based on the 99% condence interval it is then decided whether the ow is encrypted or not.

3. that need layer seven approaches for classication. For protecting user privacy the RT-ETD is best used in a privacy preserving network monitoring architecture (e. For the practical evaluation of this a ground truth trace le with unencrypted t r a c will be collected. to be implemented within future rewall systems. to the layer seven classier module. This metric is able to enhance the quality of state of the art trac classiers like SPID [26]. There the concepts of the RD-EDT will be used to provide only t r a c . Real-Time t r a c pre-ltering as only processing information from the rst packet of a ow. which provides further modules for privacy protection such as anonymization or authorization. . 8. The RT-ETD is evaluated by the usage of ground truth traces and real network traces. Further the presented work will be enhanced such that the RT-EDT is able to operate in a high-speed multi-layer t r a c classication algorithm.g. and it can thus be decided whether data reduction or recall is more important. These two values are mainly inuenced by setting two parameters of the RT-ETD. PRISM [75]). Finally it seems promising to use entropy of the payload of the rst packet as a metric for the likeliness of encryption of a ow.1 Future Work In the future the work will be enhanced by a theoretical analysis to presume the amount of unencrypted t r a c that falls within our given condence interval. Based on the ground truth traces it is shown that more than 94% of the encrypted ows are detected as encrypted. Based on real network traces it is shown that more than 90% of the trac is dropped.

cisco.. M. Chung.. M. 2009.120.2010).: Convergence properties of functional estimates for discrete distributions. H.: RFC 1157: Simple Network Management Protocol (SNMP). Veciana. and Della Pietra. Springer-Verlag. D. Szabo.: WF2Q: Worst-case Fair Weighted Fair Queueing. A. Comput. D. 1996. A. NY. h t t p : //www. B. C.. I. and Tofanelli.. and Zhang. 3] Bennett..com/en/US/tech/ tk812/tsd_technology_support_protocol_home. Kim. 1991. Park. 10] Cisco Systems: Netow . Gustavo de.. Mellia. D. 2008. and Jeong. SIGCOMM Comput. 11] Cover. Heidelberg. A. Moltchanov. L. San Francisco. Berlin.. 6] Callado. T.. In Balandin. K. J. 11(3):3752. pages 511524. T. and Tassiulas. J. and Sadok. Esaki. 1996. Kelner. and Kontoyiannis. Pagano. 2007. Kim. Rossi. IEEE Communications Surveys & Tutorials. Meo. and Kato. 2009. D. T. Ross. J.. M. pages 112. S. Schostall.. H. H. J... H.cisco systems. M.. Algorithms.: A survey on internet t r a c identication .. 37(4):3748. Fedor. New York. ACM. In Procedings of the IEEE INFOCOM96. 8] Cho.08.. 4] Berger. Rev. 19(3-4): 163193. Y. WileyInterscience. S.. 5] Bonglio. J. Commun. IEEE Computer Society.: A Real-Time Algorithm for Skype Trac Detection and Classication. Yoon.. Callegari.. Fernandes. In Azcorra. and Davin.. New York. NY... In Proceedings of Network Operations and Management Symposium. A. Lee.: Revealing Skype t r a c : when randomness plays with you. A. S. 7] Case.. Random Struct. Linguist. P. Giordano..: Elements of information theory. B. and Pepe.: Content-aware internet application t r a c measurement and analysis. DC. 22(1):3971... V. D. T..Bibliography 1] Adami.. Washington. and Thomas. Internet Engineering Task Force. Gero.. J.html (20. 2] Antos. D. page pp. Della Pietra. C. . CA. C. S. Arturo. (editors): Smart Spaces and Next Generation Wired/Wireless Networking. 2001.: Observing slow crustal movement in residential user t r a c . 9] Choi.: A maximum entropy approach to natural language processing.. M. C. pages 168179. Fukuda. Keith W. K. Leandros (editors): CoNEXT 08: Proceedings of the 2008 ACM CoNEXT Conference.. 1990... and Koucheryavy. 2004.. M.

I. C. A. 19] Efron. eu/images/upload/ Deliverables/fp7-prism-wp2. New York.: D2.. pages 154 165. M. 1981. C. Geneva. H. and Laplante. G. Heidelberg. and Stork.Time Imaging.2010). D. D. Annals of Statistics. Fernando.. New York. M.stanford. F. 2005. and Orvalho. 1920.: The jackknife estimate of variance. and Mahanti. birthday and the strong birthday problem: a contemporary review.: The index of coincidence and its applications i n cryptology. h t t p : //fp 7 . Springer. D. L. I.12] Crispin.1. P. N. L. NY.. . Saltzman. h t t p : / / e e . W.VERSION 4rev1. Berlin. G. ACM. M. Leach. Pescape.. Donato. Internet Engineering Task Force.... Piscataway. and Salgarelli. Joao (editors): IDMS/PROMS. 20] Erman. IEEE Computer Society. 1-v1. 1995. F.: A rate controller for long-lived tcp ows. 14] Dainotti.. 2002. M. Riverbank Laboratories. Internet Engineering Task Force. SIGCOMM Comput....08. Mogul. C. Lebeau-Marianna. Hart. E. 2009. R. pages 281286. Lioudakis. Kaklamani. 2008.. IL.. U.1: Assessment of the legal and regulatory framework. 2007. Masinter. and Rossi. 22] Friedman. P. 16] Dornger. Wiley-Interscience.. Rev.. P.. 2006.: Pattern Classication (2nd Edition).. 37(1):516. Quoy. 1-d2. Arlitt.. P.: The matching. 2008.1. R. Rao.. de.. J. B.: Entropy and Information Theory. volume 2515 of Lecture Notes in Computer Science. Gettys. 1 . A. and Stein. Springer-Verlag. In Proceedings of the Global Communications Conference GLOBECOM. Edmundo.: RFC 2616: Hypertext Transfer Protocol HTTP/1. A.. E. Gringoli. Rittweger.. S. Boschi.. Dusi. Walden... C. 13] Crotti.. NJ. R. In Boavida.: Classication of network trac via packet-level hidden markov models. T. pages 21382 142. 18] Duda. 21] Fielding. Monteiro. 24] Gray.. 23] Gaudino. A. NY.. NY.. 130(1-2):377389. 15] DasGupta. Schmidl. Spring. O. In MineNet 06: Proceedings of the 2006 SIG COMM workshop on Mining network data. 17] Dougherty. J. 9(3):586596. Wiley-IEEE Press. M. S. and Berners-Lee. Commun. W. E. 1999. N.: RFC 3501: INTERNET MESSAGE ACCESS PROTOCOL ..2010). New York. P. J.08. and Hofmann. I.edu/~gray/it . Story. and Venieris. Passadelis.. M.: Introduction to Real. Washington. DC. P. 2003.: Trac classication through simple statistical ngerprinting.. F.: Trac classication using clustering algorithms. 2001. Frystyk.p ri sm . R.pdf (20. Journal of Statistical Planning and Inference. Brandauer. D.html(20. Charlot. A..

NJ. 30] John. NY. Veciana. In SNCNW09: Swedish National Computer Networking Workshop.: Statistical protocol identication with spid: Preliminary results. G. In I M 0 9 : Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management.. In Guerin. G.: Characterization and Classication of Internet Backbone Trac . caveats. and McCloskey. 2008.: Heuristics to classify internet backbone t r a c based on connection patterns. ACM... 32] Jordan. IEEE Computer Society Press..2010). A.com/userfiles/file/ ipoquei n t e r n e t .. M. Chalmers Reproservice. Ross.iana.study-2007. 27] IANA: Port numbers. http://www. M. In Azcorra. Arturo. pages 197202. NJ. K.. P.. Leandros (editors): Proceedings of the 2008 ACM Conference on Emerging Network Experiment and Technology. ACM. S.. ACM. D.. and Protocols for Computer Communications. Spatscheck. In ICOIN 08: Proceedings of the 22nd International Conference on Information Networking.. pages 229240.: I n ternet t r a c classication demystied: myths. O.2010). Architectures. NY. Sen. 2007. Debanjan. and the best practices.abstract-en. 2010. Piscataway..: Blinc: multilevel t r a c classication i n the dark. Barman. Ji. In IMC 04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement.ipoque. New York.. pages 137140. M. 2009. Keith W. 2005.. S. 29] IPOQUE: Internet study 2007. 26] Hjelmvik. Mitzenmacher.. M. H. S. and Faloutsos. M. Saha.08. M.. NY. and Tafvelin. pages 3742. M. 2005. Faloutsos. P. IEEE Computer Society Press. kc: Transport layer identication of p2p t r a c . Gustavo de. IEEE Computer Society Press. Piscataway. (20. New York. New York.. Clay kc. Faloutsos. 2008.: Four questions that determine whether t r a c management is reason-able. Faloutsos. (editors): Proceedings of the ACM SIGCOMM 2005 Conference on Applications. R. NY. and John. Papagiannaki... ACM Trans. Internet Technol. K. Uppsala. pages 112. Pappu. and Lee. T. E. Chuanyi.. pages 121134. W. In I N FOCOM09: Proceedings of the 28th IEEE international conference on Computer Communications Workshops. CoNEXT. 35] Karagiannis. T. Technologies.org/assignments/port-numbers 28] Iliofotou. Piscataway. W. ACM. and Varghese. Subhabrata.25] Haner.: Graph-based p2p trac classication at the internet backbone. and Minshall. and Wang. NJ. 36] Kim.. Broido.pdf (20.. In Sen. and clay. 31] John. 2009. and Tassiulas. 2009. 9(2):128. Fomenkov. http://www. W. 2009.08. Kim.: Implications of internet architecture on net neutrality. S. New York. D. 34] Karagiannis. 2004. R. Joe (editors): MineNet.: Acas: automated construction of application signatures. . H. Goteburg. Govindan. 33] Jordan.

Cardozo J. 2009.. pages 742746.. L. In ICOIN07.: Handbook of Applied Cryptography. http : //www .. D. V.... and Wierzbicki. 42] Lyda. Cambridge. J. NY. 1951. D. 2002. In IWCMC 10: Proceedings of the 6th International Wireless Communications and Mobile Computing Conference. and Barford. A.. W. M..: On dominant characteristics of residential broadband internet t r a c .. S. Boca Raton.: Transport layer identication of Skype t r a c . NY. IPOQUE. A..: RFC 5321: Simple Mail Transfer Protocol. 38] Kullback. Horton. pdf (20.. volume 5200 of Lecture Notes in Computer Science. C. N. Washington. 2009. 2010. (editors): Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement 2006.: Note on the bias of information estimates. J. J. pages 90102.. de/download/Whitepaper_DPI . Annals of Mathematical Statistics. R. A. F. and Susilo.. and Kasch. csn-germany. S. M. II-B:95100. and Leibler.: On information and suciency . Internet Engineering Task Force. 2008. I n t l & Comp. R.: Deconstructing the Kazaa network.: Information Theory. Berlin. ACM. 50] Mochalski. In IMC 09: Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference.. M. 2008.. FL. pages 112120. J. IEEE Computer Society. J. Technical report. CRC Press. Heidelberg. 1st edition. 2010..: Unexpected means of protocol inference. 22(1) :7986. 44] Mackay. ACM.. W. L. V. In WIAPP 03: Proceedings of the Third IEEE Workshop on Internet Applications. Ripeanu. In Almeida.: Statistical classication of services tunneled into ssh connections by a k-means based learning algorithm. Levchenko. 1955. J. G. H. 2007. Feldmann. 2003. Burbank. 39] Leibowitz. M. P. Internet Engineering Task Force. NY. S. 47] Menezes. 1996. New York. New York. C. and Voelker.. 2006. and Oorschot. G.. 2008. . 45] Maier. 48] Miller. and Di Iollo. C.2010). and Schulze. A. Martin.08. Inc. Almeida. 40] Lewen. G. applications & net neutrality. Inference & Learning Algorithms. Cambridge University Press.: RFC 5905: Network Time Protocol Version 4: Protocol and Algorithms Specication.S. Paxson. pages 465481.. K. and Hamrock. J. 41] Lu. 46] Maiolini. R. A. 5(2):4045.: Deep packet inspection: Technology. Baiocchi. 43] Ma. Kreibich. Safavi-Naini. K.. A. M. P.: Using entropy analysis to nd encrypted and packed malware. Springer. 16(1):173205. Vanstone. pages 313326. J. 49] Mills. IEEE Security & Privacy. Information theory in psychology. G.37] Klensin.. Rizzi. Savage.: Internet File-Sharing: Swedish Pirates Challenge the U. ACM. DC. A... New York. U. and Allman.

pages 4154. 56] Paninski. 1981. 1980.: Detecting subverted cryptographic protocols by entropy checking. 65] Sahu. Laboratoire Specication et Verication. volume 2515 of Lecture Notes in Computer Science. 53] Myers. and Firoiu. IEEE Transactions on Information Theory. L. CA. 2000. 54(10):47504755. D. Springer.: On Achievable Service Dierentiation with Tocken Bucket Marking for TCP. 2008. C. P. 2008. Internet Engineering Task Force. pdf (20. Heidelberg. G.: RFC 793: Transmission Control Protocol. 59] Paninski. ENS Cachan.Version 3 . 2005. and Reynolds. 1987. 1996. J.2010). 58] Paninski. c o l u m b i a .: Entropy-based reduction of trac data . 50(9):22002203. J. 64] Rish. In Proc. 1985. Research report LSV-06-13.: A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Communications Surveys and Tutorials. V. K. 2003. W. 62] Postel.fr/Publis/ RAPPORTS_LSV/PDF/rr-lsv. 57] Paninski. Towsley.: RFC 1034: Domain names . (editor): PAM: Passive and Active Network Measurement Workshop.html (20. T.: RFC 1939: Post Oce Protocol . A. Internet Engineering Task Force. Berlin. Diot. 10(1-4):5676. In Dovrolis.: RFC 959: File Transfer Protocol. Seattle.V. C. pages 4146.ens-cachan. In IJCAI-01 workshop on Empirical Methods in A I . 61] Postel. L.08. P. 11(2):191193. L.: Estimating entropy on m bins given fewer than m samples.concepts and facilities. 54] Nguyen.2006-13. Internet Engineering Task Force. of the ACM SIGMETRICS2000 Int. J.2010). J. Internet Engineering Task Force. J. IEEE Communications Letters. A. L. K. M. Internet Engineering Task Force. 60] Pescape. 2004.lsv. 15(6):11911253.: RFC 768: User Datagram Protocol. J.: Estimation of information-theoretic quantities..: An empirical study of the naive bayes classier. Nain. 2007. 63] Postel. http://www. WA.51] Mockapetris. and Armitage. and Papagiannaki.: Toward the accurate identication of network applications.. h t t p : //www . and Goubault-Larrecq. on Measurement and Modeling of Computer Systems. 55] Olivain. pages 2333. s t a t . T. J.: Estimation of entropy and mutual information. S.: A survey of techniques for internet t r a c classication using machine learning. T. Santa Clara. 2001.. and Rose. e d u / ~ l i a m / r e s e a r c h / i n f o _ e s t . 52] Moore. J. . Conf. 2006. Neural Computation. I. IEEE Transactions on Information Theory..08.

mathworks. Bell System Technical Journal. 70] Shannon. C. O. Berlin.. C. h t t p : //fp7-prism.: TCP/IP Trac Classication Based on Port Numbers. P.. and Schatzmann. In Claypool. h t t p : //www. IEEE Computer Society. Phys. R.make free calls and great value calls on the internet.: The boosting approach to machine learning: An overview. 75] Telscom AG: Privacy-aware secure monitoring. P. Boschi. Ricciato. (20. 37(27): L295L301. 1996. M... 80(1):197200. 78] Wagner. de Ruyter van. S. B.. pages 7281. M. P. In WETICE 05: Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise. 2010. Springer.. 2002.: Entropy and information i n neural spike trains.: On the validation of trac classication algorithms. Owezarski.. 2004. com (20.. Springer. and Uhlig. pages 172 177. (editors): TMA. and Wang. 77] Trammel. Berlin. I. Mallick B.. scalable in-network identication of p2p trac using application signatures. M. S. A. New York. T. Springer.: Bias analysis in entropy estimation. S. D. Hansen. http : / / www. E. In WWW 04: Proceedings of the 13th international conference on World Wide Web. Spatscheck. Koberle. volume 5537 of Lecture Notes in Computer Science. Lett. Submitted for Publication. M. and Karner..08. ACM. In Papadopouli.pdf (20. com/access/ helpdesk/help/techdoc/ (20.2010). D. Dornger. 2008.. Callegari. h t t p : //www . pages 93100.: A mathematical theory of communication.08. G.: A long history of skype: identifying and exploring skype t r a c in a large-scale. R. E. Journal of Physics A: Mathematical and General. Hyytia. P.: Entropy based worm and anomaly detection in fast IP networks.08. 67] Schneider.08.. DC. 68] Schurmann. M. Heidelberg. 2005. E. . S.: Accurate. Rev. and Bialek.. P. 74] Szabo. In D. and Plattner.ch/media/pdf/diploma_thesis . 73] Svoboda.2010).. long-term ow data repository. R. 2009. skype. 72] Strong.2010). NY. 27:379423. 1948. Orincsay.. R.66] Schapire. Denison. volume 4979 of Lecture Notes in Computer Science. and Szabo. Procissi. Malomsoky.. B. 62556.: Detection and tracking of Skype by exploiting cross layer information i n a live 3G network. W.. pages 512521. Steveninck.. 71] Skype Limited: Skype .2010). Rupp. 1998. eu 76] The MathWorks: Matlab documentation. 2004. A. (editors): PAM. schneider-grin. Heidelberg. Washington. E. F. and Pras. G. C. Yu (editor): MSRI Workshop on Nonlinear Estimation and Classication. D.. 69] Sen. New York. NY. Holmes B.

Student Workshop. In IEEE Symposium on Security and Privacy.: State of the art in t r a c classication: A research review. and Gerum. J. and Kruegel. John. E. Masters.. DC. pages 223238. Seoul.. 80] Yaghmour.: RFC 4251: The Secure Shell (SSH) Protocol Architecture. 2009. Washington. clay... N. PAM09: 10th International Conference on Passive and Active Measurement. C.79] Wondracek. W.. 2006.: A practical attack to de-anon ymize social network users. M. G. .: Building Embedded Linux Systems. OReilly. 81] Ylonen. 2010. and Lonvick.. G. Ben-Yossef. Holz. T. and Brownlee. CA.. T. 2008. P. Internet Engineering Task Force. C.. 82] Zhang. Kirda. Sebastopol. IEEE Computer Society. K. kc.

ACK Acknowledgement ANSI American National Standards Institute ASCII American Standard Code for Information Interchange CDF Cumulative Distribution Function CORBA Common Object Request Broker Architecture D D M Distributed Data Management DNS Domain Name Service F I N Final FTP File Transfer Protocol FTPS File Transfer Protocol Secure HEX Hexadecimal HTTP Hypertext Transfer Protocol HTTPS Hypertext Transfer Protocol Secure IDSs Intrusion Detection Systems IEEE Institute of Electrical and Electronics Engineers I I O P Internet Inter Object Request Protocol I M A P Internet Message Access Protocol I M A P S Internet Message Access Protocol Secure I P Internet Protocol I P F I X Internet Protocol Flow Information Export IRCS Internet Relay Chat Secure ISPs Internet Service Providers .

LDAPS Lightweight Directory Access Protocol Secure ML Maximum Likelihood MLE Maximum Likelihood Estimator MMS Media Management System MSE Message Stream Encryption NNTPS Network News Transfer Protocol Secure NSIIOPS Name Service Internet Inter Object Request Broker Protocol Secure NTP Network Time Protocol P2P Peer to Peer POP Post Oce Protocol POP3 Post Oce Protocol Version 3 POP3S Post Oce Protocol Version 3 Secure PRISM Privacy Preserving Secure Network Monitoring PSH Push QoS Quality of Service RT-ETD Real-Time Encrypted Trac Detector SD Standard Deviation SMTP Simple Mail Transfer Protocol SNMP Simple Network Management Protocol SPID Statistical Protocol Identication SSH Secure Shell SSL Secure Socket Layer SYN Synchronisation TCP Transmission Control Protocol TELNET S Telecommunication Network Secure TSTAT TCP Statistic and Analysis Tool TLS Transport Layer Security UDP User Datagram Protocol .

VoIP Voice over Interent Protocol WWW World Wide Web .

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer: Get 4 months of Scribd and The New York Times for just $1.87 per week!

Master Your Semester with a Special Offer from Scribd & The New York Times