Ltu Ex 2014 94117593

MASTER'S THESIS
Testing as a Service for Machine to

Machine Communications
Jorge Vizcano
2014
Master of Science (120 credits)

Computer Science and Engineering
Lule University of Technology

Department of Computer Science, Electrical and Space Engineering
Testing as a Service for Machine to

Machine Communications
Jorge Vizcaino
January, 2014
C ONTENTS
Chapter 1 Introduction
1.1 Background . . . . . .
1.2 Problem statement . .
1.3 Method . . . . . . . .
1.4 Delimitations . . . . .
1.5 Outline . . . . . . . . .
.
.
.
.
.
5
5
6
7
8
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
13
15
18
20
21
24
24
25
26
26
27
27
28
29
30
.
.
.
.
31
31
33
38
40
Chapter 4 Traffic Pattern Extraction

4.1 Collecting packet data . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Replaying traffic pattern . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
45
46
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 2 Related work

2.1 Communication protocols . . . . . . . .
2.1.1 HTTP protocol . . . . . . . . . .
2.1.2 IP . . . . . . . . . . . . . . . . .
2.1.3 Ethernet . . . . . . . . . . . . . .
2.1.4 UDP protocol . . . . . . . . . . .
2.1.5 TCP protocol . . . . . . . . . . .
2.2 Network Performance Metrics . . . . . .
2.2.1 RTT . . . . . . . . . . . . . . . .
2.2.2 Jitter . . . . . . . . . . . . . . . .
2.2.3 Latency . . . . . . . . . . . . . .
2.2.4 Bandwidth . . . . . . . . . . . . .
2.2.5 Throughput . . . . . . . . . . . .
2.3 Tools strongly associated with this thesis
2.3.1 Network Tools . . . . . . . . . . .
2.3.2 Programming language . . . . . .
2.3.3 Operating Systems . . . . . . . .
Chapter 3 Traffic Test
3.1 Client Server Application . . . .
3.2 Loading test with proxy . . . .
3.3 Loading test with several clients
3.4 Performance results . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 5 Multiplying Traffic Pattern

5.1 Design of TaaS for M2M communications . . . . . . . . . . . . . . . . . .
5.2 Reproduce testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Results of traffic recreation . . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
50
50
Chapter 6 Summary and Conclusions

6.1 Summary and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
55
56
Chapter 7 Appendices
57
iv
Acknowledgements
I would like to offer a word of thanks to my supervisor Laurynas Riliskis for helping me
to carry out this project. Thanks to his deep knowledge about this matter, he could
give me many useful advices and was able to figure out some questions I had during this
thesis.
Abstract
During the last years, cloud computing and Software-as-a-Service (SaaS) are becoming
increasingly important due to the many advantages that they provide. Therefore the
demand for cloud testing infrastructures is increasing as well. Analysis and testing of
cloud infrastructures are important and required for its effective functioning. Here is
where Test-as-a-Service (TaaS) comes in, providing an infrastructure along with tools for
testing in the cloud and evaluating performance and scalability. TaaS can offer several
kinds of cloud testing such as regression testing, performance testing, security testing,
scalability testing and so on. In this thesis TaaS concerns network testing, with the main
goal of finding out the performance of a server. To achieve this goal this thesis involves
mostly performance and scalability testing. In this thesis we created a TaaS system
that uses a different method to test network. This method is based on recreating traffic
pattern extracted from simulations and multiply this pattern to stress a server. All this
is carried out in the Amazon Cloud. In this way we can find out the server limits, build
a theoretical foundation and prove its feasibility. The traffic recreated must be as similar
as possible to the traffic extracted from the simulations. To determine this similarity we
compared graphs with the number of bytes over time in a simulation and in a session
where the traffic was recreated. The more similar, the more accurate and better results
we achieved. With the results obtained from this method, we can compare the traffic
network created by different number of data sources and carried out in different type of
instances. Several data such as packet loss, round trip time or bytes/second are analyzed
to determine the performance of the server. The work done in this thesis can be used
to know server limitation. Estimating the possible number of clients that there could be
using the same server at once.
C HAPTER 1
Introduction
1.1
Background
Cloud computing [1] provides access in the network to different resources such as software, servers, storage and so on in an efficient way. Clients can access to these services
on their own without human interaction since everything is done automatically. All applications are offered over Internet, therefore, users can access from any location and
with different electronic devices. Capability of cloud computing can be easily modified in
order to supply properly its services to the clients regardless of their number. Moreover,
applications can be monitored and analyzed to give information of their conditions to
both user and provider.
Cloud structure can be divided in two parts, the front end which is the part the user
can see, and the back end that involves the computers, servers and networks which are
part of the cloud computer [2]. Moreover a main server takes over of the cloud structure,
ensuring a good service depending on the number of clients.
Nowadays TaaS [3] is very significant as it implies cost sharing of computing resources,
cost reduction, scalable test structures and testing service availability at any time. Moreover TaaS provides the model pay-as-you-test for customers. All this characteristics make
TaaS be an efficient way for testing in the cloud. The main reasons why there would be
clients interested in TaaS is the fact that this system can inform about several significant software and network features such as functionality, reliability, performance, safety
and so on. In order to measure these characteristics, there are several types of tests for
services in the cloud. This thesis was mainly focused on performance testing [4]. These
tests are usually carried out to provide information about speed, scalability and stability.
It is very common the use of performance testing to find out the performance of software
before coming out to the market to ensure it will meet all the requirements to run efficiently. Performance testing, more specifically, can be divided in several kinds of tests.
The most related to this thesis are load testing, to find out the behaviour of the server
under traffic loads, and scalability testing, to determine also performance and reliability
5
Introduction
of the server when increasing the load. The process to develop a performance testing
involves the next steps [4].
1. Identify your testing environment: it is necessary to know the physical environment
where the test will be developed as well as the testing tools required.
2. Identify the performance criteria: this includes limit of response times and other
values the simulations must meet to consider that the performance is good enough
to offer a reliable service.
3. Design performance tests: test all the different cases which could be taken for the
service or application.
4. Configuring the environment: prepare the environment and tools before starting
the simulations.
5. Implement test design: develop suitable performance tests for the test design.
6. Run the tests: start simulations and display values of the test.
7. Analyze test: look into the results to check the performance of the service.
Performance testing will ensure cloud services so that applications will run properly.
These are the most recommended steps to develop TaaS [3] and we have taken them in
account in this work. This service provides good features such as elasticity, safety, easy
handling, reliable environment and flexibility when choosing options regarding instance
storage.
1.2
Problem statement
Nowadays TaaS [3] is something very common due to the wide use of internet clouds and
the large number of applications provided on them. Therefore, we found interesting to
use this concept to create a new approach for testing a particular scenario. In this thesis
we focused on developing a different method to apply TaaS in a M2M framework. In
addition, this project could be modified to test different scenarios for further research.
For example, it would be possible to add more servers increasing the number of instances
in the scripts used in this project.
The acronym M2M [5] can have different meanings such as Machine-to-Machine, Machineto-Man, Machine-to-Mobile, and so on. However, M2M does have a clear goal, which
is to allow the exchange of information over a communication network between two end
points.
When it comes to test networks, it is necessary to do it with the expected traffic that
will go through that network whenever is used. To do so, there are two different ways [6].
The first one is simulating the type of traffic that is suppose to go over the network. TaaS
1.3. Method
systems use this option, allowing TaaS users to configure the test simulations according
to their needs. To configure these simulations, some tools are used in the cloud such as
Selenium and Jmeter [3].
The second way to test networks is replay recorded network traffic. The purpose of
this thesis is to apply this second way to create a TaaS system to test networks in the
cloud. In order to replay recorded traffic, we followed a method based on a replay attack
[7], which is explained in the next section.
In this way we created a TaaS system that can estimate network performance using
a different method than the other systems already made. First we must also configure
the simulations to test the server. However, the main difference in our method is that,
we then extract the traffic pattern from those simulations in order to multiply it from a
black box, so that, we can stress the server. This is an interesting method because, since
we recreate precisely real exchange of traffic, the results are very actual and accurate.
Finally we had to prove the feasibility of the method applied.
The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed
us to set up the whole scenario easily and use different type of instances. These instances
differ in features such as memory, storage, network performance and so on [9]. Therefore,
it was interesting to compare results when we picked out one sort of instance or another.
1.3
Method
The method followed during this thesis is divided into three steps. First we set up a
proxy between client and server to extract the traffic. Then we had to figure out a way
to recreate this traffic to finally replay it M2M to test the server.
In the following paragraphs the method is described in detail. The first step consisted
of setting up a scenario client-proxy-server in the cloud. Then we could run simulations
to look into the behaviour of the packets going over the network. Afterwards we could
check how the network performance was different when we changed some factors (number
of clients, type of instance, etc.) in this network. The packets were sniffed in the proxy
with the tool tshark [10]. Once we have some knowledge about the network simulated,
we could start developing a method to extract a traffic pattern from those simulations.
We must take in account that the script programmed must obtain the traffic pattern
properly. So that, when it comes to recreate the same traffic M2M, the behaviour of the
packets was as similar as possible to the original simulation. To achieve this goal, we had
to extract the data sent and the timestamp of the packets with high precision.
Once the pattern from the simulations is extracted, we moved on to the third and last
step where we multiplied the traffic pattern scaling up the number of clients. In this
way, large traffic load recreations were carried out to test the server limits and find out
how this server could handle heavy traffic loads. These data sources sent the pattern
extracted directly to the server in a M2M framework. Finally, when we obtained the
final results, we could find out the server performance and the feasibility of the approach
Introduction
developed.
This method carried out is a kind of replay attack [7], where there is a man-in-themiddle (Wireshark sniffing in proxy), which intercepts the traffic. Then this traffic is
replayed pretending to be the original sender in order to create problems to the host
server. In our thesis this traffic is scaled up from a multiplier to stress and find out the
software limits.
Regarding the traffic pattern extraction, several tools were studied, and two methods
to extract and replay traffic were considered. The first method consisted of modifying
the pcap files previously recorded in the proxy, so that, the packets could be sent straight
to the server from the data sources. Tcprewrite [11] was the tool used to modify the pcap
file. The next step was to use another tool to recreate the traffic contained in the new
pcap. One of them was Scapy [12], which sent the packets but not in the appropriate
time. Another tool used was Tcpreplay [13], but it was not possible to simulate the server
since this tool does not work with the transport level. Therefore, we could not establish
a valid TCP connection [14]. Finally, we used the second method, which is based on
programming our own script to extract the traffic pattern and recreate it later on. This
way chosen was much trickier but much more suitable, as well as completely automatic.
With this method was not necessary to do anything handwriting (like modifying the
pcap file from the console). We just needed to type a few options such as name of file
to replay or server instance type. After programming the script to extract the pattern,
we needed to replay it somehow. We tried again with Scapy [12], a very good tool
when it comes to create and configure packets. However, there were some problems to
receive segments coming from the server since it was needed to work with sequence and
acknowledge numbers. Therefore, this was extremely hard and finally sockets were used
again to replay the pattern.
A diagram with the whole system carried out to develop the thesis is shown in the
Figure 1.1. The top of this Figure 1.1 (network traffic simulation) refers to the first part
of the thesis, where we had to set up a client-proxy-server communication in the cloud.
With this scenario we could run the simulations to exchange packets. Below we can see
the next steps. The traffic was recorded in the proxy for a further analysis and pattern
extraction. Finally, with the pattern obtained we came down to the last part of the
thesis, shown on the bottom of the Figure 1.1 (traffic pattern recreation). In this part
we set up a multiplier composed of many data sources which recreated the traffic pattern
towards the same server. In this way we could find out the server performance when it
comes to handle heavy traffic loads.
1.4
Delimitations
During this thesis we have made some delimitations. The project could be improved to
cover most of these aspects in further research.
The TaaS system developed can only function in the Amazon Cloud because the main
1.5. Outline
library used to program this system works only for this particular cloud. The goal of this
project is testing M2M communications, we cannot test different scenarios apart from a
client-server connection. Moreover, we have focused on finding out server performance,
therefore, the scope of this TaaS system is to carry out performance and scalability tests.
In addition, the TaaS system programmed uses TCP sockets to test servers, for instance,
we cannot use http requests for testing.
1.5
Outline
The thesis is organized as follows.

Introduction is in Chapter 1. Chapter 2 describes related work. In Chapter 3 is
described simulations and analysis of the scenario data source, proxy and server. Traffic
pattern extraction is described in Chapter 4. Design of TaaS [3] for M2M communications
and results achieved with this system are in Chapter 5. Summary of the whole thesis,
main results obtained and future work are in Chapter 6. Finally Chapter 7 includes the
appendices.
10
Introduction
Figure 1.1: Flow diagram of the developed system
C HAPTER 2
Related work
There are many different communication protocols and they are required to establish
connections within the network and transfer data among hosts. All of them have their
place in the OSI model [15] where they are classified depending on their function. In this
section, the main protocols are explained properly since they are essential to analyze a
client-server communication. Moreover, some significant data needed to measure network
performance in testing are described, as well as the tools to carry out simulations and
analyze segments.
2.1
Communication protocols
In order to develop this thesis, it was crucial to give a description of the most significant
protocols and explain their functions. In this way it was easier to look into all the sniffed
packets, check that everything is working properly. It is also useful to have a detailed
knowledge of protocols when it comes to recreate the traffic pattern.
Protocols are the objects that use the different OSI model layers in order to establish
a communication within a network [15]. Each protocol provides two different interfaces.
The first one is a service interface to deliver to the other objects in the same machine
that want to use the service offers for this protocol. The other interface is called peer
interface and is sent and used for its equivalent in another machine.
However, before explaining the different kind of protocols, it is important to describe
how they are organized depending on their function and see the whole where they all take
place. To avoid a system becomes too complex, it is needed to add levels of abstraction.
In networks systems this is also applied, creating layers with distinct functions each.
In this way the problem of building a network is divided into more manageable parts.
Another advantage is the ease to add new services, since it will not be necessary to
modify all the part, but only the one where the service will be introduce. In networks,
the architecture chosen is named the OSI model [15]. Networks follow this structure
11
12
Related work
when connecting computers. This architecture is composed by seven levels with different
functions. These levels are represented from top to bottom in the Figure 2.1.
Figure 2.1: OSI model
First, the physical layer identifies the physical features of the network. These characteristics can be related with the hardware, such as type of cables and connectors, or with
the network topology (bus, ring, star, and so on). This layer also determines voltage
and frequency that signals will use. About data link layer, it transmits the data from
upper levels to the physical layer, but is also in charge of error detection and correction,
and hardware addressing. The main function of the network layer is to provide a mechanism to select routes within the network in order to exchange packets among different
systems. This layer uses mainly the IP protocol. The transport layer takes charge of
transporting data in the network. To ensure packets get to the destination properly, this
layer can check errors in the sending, make sure the data goes to the right service in
the upper levels and divide packets in others more manageable (segmentation process).
The most significant protocols are TCP and UDP, about which we will talk later. The
session layer sets up connections between two endpoints (normally applications), making
sure the application on the other system has the proper settings to communicate with
the source application. The next level contain the presentation layer which transform
the data linked to the application into another format in order to send it through the
network. Finally the application layer gets requests and data from users in order to send
2.1. Communication protocols
13
them to the lower layers. The most common application protocol is HTTP.
2.1.1
HTTP protocol
The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used
for distributed, collaborative, hypermedia information systems. This network protocol
is used for communication among users, proxies or gateways to other Internet systems.
HTTP is used to deliver files, but another important function of this protocol is linked to
the transmission of resources. A resource is a network data object that can be identified
by a URI. Normally, these resources are either files or outputs of a script. Each HTTP
message has the general form shown in the List 2.1 [15].
Listing 2.1: HTTP message
START LINE <CLRF>

MESSAGE HEADER <CLRF>
<CLRF>
MESSAGE BODY <CLRF>
The first line shows whether this is a response or request message. The next lines
provide parameters and options for the message. There are different kinds of header lines
in HTTP and there is not limit on the number of lines that can be sent. The last part is
a body of data sent after the header lines.
Overall operation
HTTP is a request/response protocol and the port used by default is 80, but it is possible
to use other ports. An important HTTP request methods are called GET and POST
[16]. This method is used to request and retrieve data from a specified resource. However
there is another request method which is very significant for this thesis and its name is
CONNECT [16]. This method is used to send data through a proxy that can act like a
tunnel. Therefore we needed this method to establish a connection between client and
server through the proxy.
A few characteristics of HTTP communication must be pointed out. There is not a
permanent connection, when the request is sent, the client disconnects from the server.
The server will have to enable the connection again. As a result, client and server know
that there is a connection between them only during a request. Therefore, they cannot
keep information about the requests. Any kind of data can be transmitted by HTTP as
long as client and server know how to manage the data. A typical example of HTTP
request is shown in the Figure 2.2.
To set up a communication with this protocol, a client must open a connection sending
a request message to the server, which returns a response message. Afterwards, the
14
Related work
Figure 2.2: HTTP request
server will close the connection. First of all, we will describe the initial line of the request
and response message. Concerning the request message, this first line consists of three
parts. The first one is the HTTP request method. The second part is the path of the
requested resource. This part is called URI. And finally, the version of HTTP that is
being used. This idea can be clearly seen in the List 2.2. This example was extracted
from the simulations made during the thesis.
Listing 2.2: HTTP request with CONNECT
CONNECT
ec2 54217136250.euwest 1. compute . amazonaws . com : 5 0 0 0 7
HTTP/ 1 . 1
The initial line of the response from the server is also divided in three parts. The initial
part involves the version of HTTP used for the communication. Afterwards, there will
be a code [15] for the computer to understand the result of the request. The first digit
indicates the class of response. We have the codes shown in the List 2.3.
Listing 2.3: HTTP request result

1xx :
2xx :
3xx :
4xx :
5xx :
15
i n f o r m a t i o n a l message .
s u c c e s s i n t he c o n n e c t i o n .
r e d i r e c t s th e c l i e n t t o a n o t h e r URL.
e r r o r l i n k e d to the c l i e n t .
e r r o r l i n k e d to the s e r v e r .
Finally, there is a word or sentence in English to describe the status of the connection.
Header lines offer information about the request or response, or about any object sent
in the message body which will be explained later. There are many different headers lines
but they can be classified in four main groups [17]. The entity header involves information
about either the request, response or the information contained in the message body. A
general header is used in both the request and the response.The request header is sent
by a browser or a client to a server. Finally, the last kind of header is called response
and is sent by a server in a response to a request.The format of the header lines is
aHeader-Name: valuea. Two examples of header lines are shown in the List 2.4.
Listing 2.4: HTTP header lines
Useragent : M o z i l l a / 3 . 0
Host : www. amazon . com
Finally, an HTTP may have a body with data after the header lines. In a response, the
request resource is always sent in its body. There may be also texts giving information
or warning of errors. In a request, it is in the body where the user enters data or uploads
files which will be sent to the server. When the HTTP message contains a body, there are
usually header lines that provide information about the body. One of these header lines
is called Content-Type, and it indicates the MIME and type of the data in the body. For
instance text/html or image/gif. Another very common header line is Content-Length,
which provides how many bytes were used in the body.
2.1.2
IP
Internet Protocol (IP) is used to build and interconnect networks for the exchange of
packets [15]. It is important to be clear about this layer to make sure the information is
going to expected points within the network created in the cloud.
IP occupies the network layer in the OSI model. IP runs both hosts and routers,
defining an infrastructure that allows these nodes and networks operate as a single internetwork. Concerning the delivery, IP has the service model called best effort, which
provides an unreliable datagram delivery, therefore, it is not ensured that the datagrams
reaches their destinations. In addition this service model may cause more problems since
the packets can be delivered out of order as well as get the destination more than once.
16
Related work
IP Header
The Figure 2.3 shows all the files carried in the IP header.
Figure 2.3: Fields of the IP Header
The first field is the Version Type which indicates the IP version used in the transmission. The Header Length identifies the length of the header in 32-bit words. If there are
no options, the header has 5 words (20 bytes). The next field, Type of Service is used to
indicate the quality of the service. The field Total Length indicates the length in bytes
(unlike in Header Length where the length was count in words) of the whole datagram.
When it comes to the Identification Field, the sender always marks each IP datagram
with an ID number before the transmission. The goal is to have unique datagrams so
if several fragments arrive to the destination, since all of them had the same ID value,
the destination host can put together the fragments received. If some fragment does not
arrive, all the fragments with the same number will be discarded. In the next field there
are up to three flags. The first flag does not have any use for now, it is set to 0. The
flag D allows the fragmentation of data into smaller pieces when this flag is set to 1. The
flag M indicates whether the datagram received is the last one of the stream (set to 0)
or there are more datagrams left (set to 1).
The Fragment Offset is a value used by the sender to indicate the position of the
datagrams within the stream in which they have been sent, so the receiver can put them
in order. The first byte of the third word of the header is the field TTL which set the
maximum time that a datagram may be on the network before being discarded. The
main goal of this function is to discard datagrams that are within the network but never
17
reach the receiver. The next field is called Protocol Protocol and indicates the kind of
protocol that is expected in the datagram. The IP Header also uses a simple Checksum
to verify the integrity of the header and the data during the transmission. To send the
packet is required to fill the field Source Address with the IP address of the sender, as
well as to fill the Destination Address with the IP address of the receiver. There is also a
field to set up some options if they were required, and a Padding set with zeros to ensure
that the length of the header is multiple of 32.
Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse
technology, it is required to manage datagrams, so they can go over all the networks.
There are two choices available to figure this problem out [15]. The first one is to ensure
that every IP datagrams are small enough in order to fit inside a packet in any type
of network. The second option is to use some technique to fragment and reassemble
packets when they are too big to go through some network. This second option is the
most suitable since networks are continuously changing and can be especially difficult to
choose a specific size for the packet that fits in every network. This second option is the
one used in the Amazon networks where we ran the tests. It is significant to know how
the segments are fragmented to examine each segment sent and its respective answer. In
this way, the exchange of packets was more organized and the recreation of traffic pattern
was easier to make.
This second option is based on the Maximum Transmission Unit (MTU), which is the
biggest IP datagram that can be carried in a frame. Normally, the host chooses the MTU
size to send IP datagrams. If by chance the packets go over some network with smaller
MTU, it will be required to use fragmentation. For instance, if a packet of 1420 bytes
(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU,
the datagram will be fragmented in three packets. The first two packets will contain 512
bytes of data and another 20 bytes for the header. Therefore, there will be 376 bytes
left (1400 512*2), so that the last datagram will carry those 376 bytes of data plus 20
bytes for the header. The result would look like in the Figure 2.4.
It should be noted that the amount of data bytes in each packet must be always multiple
of 8. During this process, the router will set the M bit in the Flag of the first and second
datagram to indicate that there are more packets coming. As regards the offset field, in
the first packet it is set to 0 because this datagram carries the first part of the original
packet. However, the second datagram will have the Offset set to 64 since the first byte
of data is the 513th (512/8 bytes).
18
Related work
Figure 2.4: Datagram fragmentation
Ethernet address resolution protocol

Nowadays, Ethernet is the most widely used link layer network. To develop a mapping
between the link layer addresses and IP addresses is required to use the technic Address
Resolution Protocol (ARP), so that the physical interface hardware on the node can
understand the addressing scheme.
The method to get the link layer of a particular server through this technique involves
the next steps [15]. First of all the sender will check its ARP cache to find out if it has
already the link layer address (MAC) of the receiver. If it is not there, a new ARP request
message will be sent, which carries its own IP and link layer addresses and the IP address
of the server desired. This message is received by every device within the local network
since this message is a broadcast. The receivers compare the searched IP address with
their own IP address. The servers with different IP addresses will drop the packet, but
the receiver which we are looking for will send an ARP reply message to the client. This
server also will update its ARP cache with the link layer address of the client. When the
sender receives the ARP reply, the MAC address of the receiver is saved. The required
steps can be seen in the picture 2.5.
2.1.3
Ethernet
Ethernet occupies both the data link and the physical layer in the OSI model [18][19].
The data link layer is divided in two different sublayers, Media Access Control known as
MAC (defined by IEEE 802.3), and MAC client (defined by IEEE 802.2). The structure
is shown in the Figure 2.6.
The MAC client must be one of the next two different types of sublayers. The first one
is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer
to the upper layers. The other option is called bridge entity, which provides an interface
between LANs that can be using the same (for instance Ethernet to Ethernet) or different
protocols.
Concerning the MAC sublayer [18], this level takes charge of data encapsulation, assembling also the frames before sending them, as well as of analyzing these frames and
detecting errors during the communication. Moreover this sublayer is in charge of starting
Figure 2.5: ARP request
Figure 2.6: Ethernet layers in OSI model
19
20
Related work
frame transmissions and recovering them from communication errors.

The physical layer enables the communication between the data link layer and the
respective physical layer of other systems. In addition, this layer provides significant
physical features of the Ethernet, such as voltage levels, timing, but the most important
functions are related with data encoding and channel access. This layer can code and
decode bits between binary and phase-encoded form. About access to the channel, this
level sends and receives the encoded data we spoke about before, and detects collisions
in the packets exchange.
2.1.4
UDP protocol
User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC
768 [20]. It is used as transport protocol , therefore, its function is similar to the TCP
protocol, but UDP is sometimes preferred since it is faster, lighter and simpler than TCP.
However, it is less reliable. UDP provides a best-effort service to an end system, which
means that UDP does not guarantee the proper delivery of the datagrams. Therefore,
these protocols must not be used when a reliable communication is necessary.
UDP header
UDP messages are sent within a single IP packet and the maximum number of bytes is
65527 for IPv6 [21]. When a UDP datagram is sent, the data and the header go together
in the IP network layer, and the computer has to fill the fields of the UDP header in the
proper way. The scheme of the UDP protocol is represented in the Figure 2.7.
Among other things, UDP is normally used to serve Domain Name System (DNS)
requests on port number 53. DNS is a protocol that transforms domain names into IP
addresses. This is important in this thesis since the proxy between client and server
needs to work out the server IP address.
Figure 2.7: UDP protocol header
21
The UDP header is composed by four fields [15], each one contains 2 bytes. The
Source Port indicates the port from which the packet was sent and it is by default the
port where the reply should be addressed if there is no any change. The Destination
Port is the internet destination address where the packet will be sent. The field for the
Length indicates the total number of bytes used in the header and in the payload data.
Finally, the Checksum is a scheme to avoid possible errors during the transmission. Each
message is accompanied by a number calculated by the transmitter and the receiving
station applies the same algorithm as the transmitter to calculate the Checksum. Both
Checksums must match to ensure that any error happened during the transmission.
UDP ports
UDP ports give a location to send and receive UDP messages. These ports are used to
send different kinds of traffic facilitating and setting an order for the packet transmission.
Since the UDP port field is only 16 bits long, there are 65536 available ports. From 0
to 1023 are well-known port numbers. The destination port is usually one of these
well-known ports and normally each one of these ports is used for one application in
particular.
2.1.5
TCP protocol
Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15]. TCP is by far the most important
protocol in this thesis since our TaaS system is based on TCP sockets. With this protocol
a communication between two endpoints can be set up. Each endpoint is defined by two
parameters, the IP address and the TCP port number. The following are some of the
main characteristics of this protocol. In TCP the window size will decide the amount of
bytes that can be transferred before the acknowledgement from the receiver is required.
Whit TCP is possible to place the datagrams in order when they are coming from the
IP protocol. In addition this protocol allows the data management to create different
length fragment to forward them to the IP protocol. In TCP is also possible to transfer
data coming from different sources on the same line multiplexing this data. This task is
carried out by the ports.
The TCP header is more complex than the UDP header. The scheme of this header is
shown in the picture 2.8.
The field Source Port identifies the sender port as well as the Destination Port does
with the receiver port. The fields Sequence Number and the Acknowledgement Number
will be explained deeply in the next section, since it is important to know how they work
during a connection. The Header Length field, also called data Offset sets the size of the
TCP header keeping in mind that the length will be always a multiple of 32 bits. The
next field (called reserved in the picture) is useless for now and it is declared to zero.
The flags field is used for additional information in the packets transmission. The SYN
22
Related work
Figure 2.8: TCP protocol header
flag is used to set up a TCP connection and the FIN flag to finish it. The ACK indicates
that the packet is an acknowledgement. The URG flag is to inform that the segment
contain urgent data. The PUSH flag is activated by the sender in order for the receiver
to let the sender know that the packet was received. Finally the RESET is set to restart
the connection.
Other important issue is the windows size. Through this field we can know the number
of bytes that the receiver can accept without acknowledgement. With the Checksum
field the packets transmission will be more reliable since this field is used to check the
integrity of the header. The next field in the TCP header is called Urgent Pointer and its
function is to inform where the regular data (non-urgent data) contained in the packet
begins. There can be also different options in the header, the length of this field is
variable depending on what kind of options there are available. Finally there is a space
between the options and the data called Padding. It is set with zeros and the goal is to
ensure that the length of the packet is multiple of 32 bits.
TCP connection
To set up a connection, TCP uses an algorithm called three-way handshake [15], in
which three packets are sent. In TCP connections, sender and receiver must agree on
a number of parameters. When a connection is established, these parameters are the
starting sequence number. An example of the way to set up a TCP connection is shown
in the Figure 2.9.
First of all, the client sends a packet to start the communication, the SYN flag is set
to 1 and there will be a number carried in the sequence number field. When the server
23
Figure 2.9: Establishing a connection in TCP
responds, it will send a packet with the acknowledgement number equal to the sequence
number of the first packet plus one, and its own beginning sequence number. Both the
ACK and then SYN flags will be set to 1. Finally, the client responds with a packet
in which the acknowledgement number is one number higher than the sequence number
received from the server. Obviously the flag ACK must be set to 1 again.
Furthermore, the client can also request to finish a connection. The process to end
the communication starts with a packet sent by the client with the FIN flag activated.
Once the server receives the packet, it sends an acknowledgement with the FIN flag set
to 1 and keeps on sending the packets in progress. Afterwards, the client informs its
application that a FIN segment was received and sends another packet with the FIN flag
to the server to end the communication.
Reliable delivery
TCP provides an ordered and reliable delivery which is achieved through a method called
sliding window [15], where is possible to define a number of sequences that does not need
acknowledgements. This window moves depending on the acknowledgements received and
24
Related work
its size can be modified by the server changing the value in the window size field.
Figure 2.10: Sliding window method
The window moves to the right when the client receives the ACK, allowing the client
to send more packets. In the example represented in the Figure 2.10, the window ends
up two position to the right because the sender got two acknowledgements. The client
cannot send more than three packets straight without any ACK received since the size
of the window is three.
2.2
Network Performance Metrics
In this section we focus on different aspects relating to network efficiency and performance.
2.2.1
RTT
Round trip time (RTT) is the time interval from a packet is sent to acknowledgement
of the packet is received (ignoring retransmissions) [22]. This time is measured with
several samples in order to achieve a reliable result. This time depends on several factors
such as the data transfer rate of the connection, the material the network is made of,
the distance between sender and receiver, number of nodes the packets go through, the
amount of traffic in the network, and so on. The RTT has a established minimum time
since it cannot be less than the time the signals take to go through the network. The
2.2. Network Performance Metrics
25
formula to get the value of the RTT within a network is shown in the equation 2.1.
EstimatedRT T = EstimatedRT T + (1 ) SampleRT T
(2.1)
Where is a value (0 < <1) that must be set. For TCP it is advisable to fix this
parameter between 0.8 and 0.9. An example of exchange of packets and their direct
relation with the RTT is set out in the Figure 2.11.
Figure 2.11: Example RTT interval
2.2.2
Jitter
Jitter is a variation in the delay of the packets sent within a network [15]. A sender
will transmit many packets straight one after the other with a certain distance between
them. However, problems with network congestion, queues or configuration errors cause
that this distance between packets varies. The implications of the jitter in the pictures
can be seen in the Figure 2.12.
Jitter is a great problem since these fluctuations happen randomly and change very
quickly in time. Therefore, it is crucial to correct this problem as much as possible.
One solution for this problem is to set a buffer which receives the packets at irregular
intervals. This buffer will hold these packets for a short space of time in order to reorder
them if necessary and leave the same distance between each packet. The main problem
of this method is that this buffer adds delay to the transmission. They also will always
have a limited size, so if the buffer is full of packets, the new packets that come will be
dropped and they will never arrive to their destination.
26
Related work
Figure 2.12: Jitter effect
2.2.3
Latency
Latency [15] indicates the time a message takes to go from one point of the network
to another. Several factors affect to this parameter. The first contributor to network
latency is the propagation delay, which is basically the time a packet takes to get from
one point to another at the speed of light. The second factor to keep in mind is the time
it takes to transmit data and this depends on the bandwidth and the size of the packet.
The last contributor is related with the queueing delays in switches and bridges where
packets are usually stored for some time. These factors can be defined in the next three
formula:
2.2.4
Latency = P ropagation + T ransmit + Queue
(2.2)
P ropagation = Distance/SpeedOf Light
(2.3)
T ransmit = Size/Bandwidth
(2.4)
Bandwidth
This concept describes the number of bits that are transmitted within the network during
a second [15]. There is an important relationship between bandwidth and latency to talk
about. To visualize this idea, it may help to think in a pipe through where the data
pass. The bandwidth would be the diameter of the pipe and the latency the length of
this pipe. A simple draw with the relation among network, latency and bandwidth is in
the Figure 2.13.
If we multiply both terms we will achieve the number of bits that can be transmitted
in this pipe at a given instant. For instance, a channel with 50 ms of latency and 45
2.3. Tools strongly associated with this thesis
27
Figure 2.13: Relation between Latency and Bandwidth
Mbps of bandwidth will be able to contain:

50 103 s 45 106 bits/s = 2.25 106 bits
(2.5)
If more bandwidth is requested, just adding more pipes the problem is solved.
2.2.5
Throughput
Throughput [15] is defined as the amount of data that can be sent from one host to
another in a given time. This concept is used to measure the performance or efficiency
of hard drives, RAM and networks. The throughput can be calculated with the next
formula:
T hroughput = T ransf erSize / T ransf erT ime
(2.6)
T ransf erT ime = RT T + 1/Bandwidth T ransf erSize
(2.7)
Where RTT is the round trip time.

Throughput and bandwidth can be sometimes confusing terms. Bandwidth refers to
the number of bits per second that can be transmitted in practice. However, due to
inefficiencies of implementation or errors, a couple of nodes connected in the network
with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2
Mbps), so that the data can be sent at 2 Mbps at the most.
2.3
Tools strongly associated with this thesis
We shall briefly describe a variety of tools which might be useful to develop this project.
28
2.3.1
Related work
Network Tools
In this section we describe some tools and applications related with the computer network
management.
SSLsplit:
SSLsplit [23] is a tool to control attacks against SSL/TLS network connections. These
connections are intercepted and redirected to SSLsplit. This tool may end SSL/TLS and
launch a new SSL/TLS connection with the same receiver address. The goal of this tool
is to be helpful to test and analyze networks. This tool can work with TCP, SSL, HTTP
and HTTPS connections over IPv4 and IPv6.
Wireshark:
Wireshark [24] is a powerful network packet analyzer with a high number of functions.
This tool can capture datagrams and show in detail everything that the packet carries.
Overall, the aim of using wireshark is to solve and manage network problems, examine
security problems, remove errors in protocol implementations. This program displays
the characteristics of the packets in great detail, splitting them up in different layers.
With this program, users can see easily a list with captured packets running in real time,
the details of a selected packet and the packet content in hexadecimal and ASCII. In
addition, it is also possible to filter the datagrams in order to make easier the search for
the packets, which makes wireshark very manageable.
Tcpdump:
Tcpdump [25] is a tool to analyze packets that are going over the network. Some
reasons why it is interesting to use tcpdump are verify connectivity between hosts and
look into the traffic network. This tool also allows us to pick out particular kinds of
traffic depending on the header information. Moreover, it is possible to save all the traffic
captured in a file in order to be used in a future analysis. These tcpdump files can be also
opened with software like wireshark. Moreover, tcpdump provides many instructions to
capture packets in different ways, which give us a broad range of possibilities to manage
the traffic.
Proxy:
Proxy [26] is a server used as a gateway between a local network and another much
wider network. A proxy is located in the middle of the communication between sender
and receiver. The proxy receives the incoming data from one port and it forwards this
information to the rest of the network by another port. Proxies may cache web sites.
2.3. Tools strongly associated with this thesis
29
This happens each time a user from a local network asks for some URL. The proxy that
receives this request will store a temporary copy of the URL. The next time that a user
asks for the same web site, the proxy can send the cached copy to the user instead of
forwarding the request to the network to find again the URL. We can see this process in
the picture below, where the proxy asks for each web site only once. An example of how
a proxy works and handle the incoming requests is shown in the Figure 2.14.
Figure 2.14: Proxy operation
In this way, proxies can make much faster the delivery of packets within the network,
but this is not the only function they cover. They may also be used to avoid that hackers
get internal addresses since these proxies can block the access between two networks.
Proxies can take part as a component of a firewall.
2.3.2
Programming language
Several programming languages can be use for network programming. Python [27] is one
of the most important, and provides a library called Boto which could be very helpful
for this thesis.
Boto:
Boto [28] offers a Python interface to several services offered mainly by Amazon Web
Services (AWS). To use Boto is required to provide the Access Key and Secret Key, which
we can either give manually in every connection or add in the boto file. In addition it is
necessary to create connection objects before creating a machine. These machines provide
a stable and secure execution environment to run applications. Then main fields in which
30
Related work
Boto is involved are computer, database, deployment, application services, monitoring,

storage and so on.
2.3.3
Operating Systems
There are several sort of operating systems such as Microsoft Windows, Linux and Mac
OS. However, the opportunities and ease to manage network tools are not the same in
all of them. We believe that, for the development of this thesis, Linux would be more
suitable.
Linux:
Linux [29] is a computer operation system created by volunteers and employees of
many companies and organizations from every parts of the world in order to make of this
product free software. The main advantages of Linux are low cost, stability, performance,
network functionality, security and so on. This operating system very seldom freezes up
or slows down. It can also provide high performance and support for networks, where
client and server systems can be set up easily and quickly on a computer with Linux.
It is very secure as well, since Linux asks the user for the permissions. Nowadays, this
operating system is used more and more in both homes and companies due to all its
functionalities. Linux offers many network applications so it could be very useful for this
thesis.
We have described in this chapter many issues about networks, which are crucial in
the next sections. It is important to have a deep knowledge about this matter because
it is needed when it comes to analyze and recreate traffic network later on.
C HAPTER 3
Traffic Test
In this chapter we created the first scenarios to carry out the required simulations. We
started with a simple example M2M and we ended up adding a proxy in between and
simulating several clients. These scenarios were analyzed to acquire a deep knowledge
about this framework in order to extract the pattern properly later on.
3.1
Client Server Application
At this point, after describing the related work of this thesis, we are ready to develop
a client-server application with python [27]. The goal of this part was to analyze the
traffic in a very simple case, between single client and server. This is an easy way to
start setting up a connection and design a methodology with tools for developing a larger
scale testing later. The structure of this connection is shown in the Figure 3.1.
Figure 3.1: Structure client server
The tool chosen to program is called python. This is a high level programming language
very recommendable for network programming due to its ease of handling in this field.
31
32
Traffic Test
When it comes to program the client for this application, it was needed to set the
server Inet address and a random port for the exchange of data. It was also necessary
to create a socket and connect it to the address and through the port mentioned before.
In addition, to program the server is required to set the hostname and the same port
opened in the client. Moreover we have to create a socket and bind both hostname and
port to the socket. Finally we made the socket wait for incoming packets from the client
and accept the connection.
In the List 3.1 the required packets to establish a client-server connection are shown.
Listing 3.1: Establish connection
1 . 0 . 6 6 5 3 1 7 , 1 9 2 . 1 6 8 . 1 . 2 4 , 1 9 2 . 1 6 8 . 1 . 3 3 , TCP , 7 4 , 4 9 5 8 8 >
EtherNetIP1 [SYN] Seq =0, Win=5840 , Len=0, MSS=1460 ,
SACK PERM=1, TSval =4769150 , TSecr =0, WS=64
2 . 0 . 6 6 9 7 3 6 , 1 9 2 . 1 6 8 . 1 . 3 3 , 1 9 2 . 1 6 8 . 1 . 2 4 , TCP , 6 6 , EtherNet
IP1 > 49588 [ SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS
=1452 WS=4 SACK PERM=1
3 . 0 . 6 6 9 7 6 6 , 1 9 2 . 1 6 8 . 1 . 2 4 , 1 9 2 . 1 6 8 . 1 . 3 3 , TCP , 5 4 , 4 9 5 8 8 >
EtherNetIP1 [ACK] Seq=1 Ack=1 Win=5888 Len=0
This exchange of packets is called the three way handshake. Analyzing these segments
we can see clearly how the flags and the sequence and acknowledgement number have
the expected value. The client sends a message with the flag SYN set to 1 in order to
connect to the server and with a random sequence number x. Wireshark set by default
a relative sequence number starting with zero. The answer of the server has the flags
SYN and ACK activated and with sequence number y and acknowledgement number
x+1. Finally a third packet is sent from the client only with the flag ACK set to 1 and
acknowledgement number y+1.
When it comes to terminate the connection, a packet with the flags FIN and ACK
activated is sent from the point where is wanted to close the connection. Then, there
is an ACK segment as a response. This exchange must happened in both directions to
close the connection from both points, otherwise only one point would be closed and the
other one could still send data. These two packets are set out in the List 3.2.
Listing 3.2: Terminate connection
1 . 0 . 6 7 1 9 4 5 , 1 9 2 . 1 6 8 . 1 . 3 3 , 1 9 2 . 1 6 8 . 1 . 2 4 , TCP , 6 0 , EtherNet
IP1 > 49588 [ FIN , ACK] Seq=1 Ack=1 Win=182952 Len=0
3.2. Loading test with proxy
33
2 . 0 . 6 7 2 2 5 1 , 1 9 2 . 1 6 8 . 1 . 2 4 , 1 9 2 . 1 6 8 . 1 . 3 3 , TCP , 5 4 , 4 9 5 8 8 >
EtherNetIP1 [ACK] Seq=1 Ack=2 Win=5888 Len=0
In this section we made a simple test, establishing and terminating a connection between client and server and checking the packets going through the network. This is a
simple example to start looking into the behavior of the segments between client and
server.
3.2
Loading test with proxy
In this section we start with the first part of the method, which involves the creation of
the client-proxy-server scenario to run simulations with TCP sockets. The connection is
set up with a proxy Squid in between [30]. The structure is shown is the figure 3.2. After
setting up this connection we sent traffic in order to analyze the segments sent, measure
the performance and extract a traffic pattern.
Figure 3.2: Structure client proxy server
A proxy has been set up between the communication client-server to capture and
analyze the traffic, so that we can recreate the pattern of communications and make
realistic loads towards the server. In the beginning was needed to access to the instances.
This is done through the port 22, and there are two main reasons to do so. Firstly, we
had to configure the proxy to accept the incoming packets and forward them properly.
And secondly, to run scripts in the instance it was necessary to access there and install
the required libraries to use that script. Moreover some programs such as Tcpdump or
Wireshark were installed in the proxy instance to sniff the traffic. When the proxy was
ready we could move on to write the script to create the scenario and make simulations.
Several simulations were carried out with different types of instances [9]. The sort of
ec2 instance matters regarding memory but also speed which is important in these tests.
It is advisable to use a high performance instance for the proxy and the server in order
to handle all the packets quickly. Especially in later tests when there are several data
34
Traffic Test
sources. In order to develop these simulations we programmed the script Simulation.py

with boto so that, the tests would be done automatically. This script creates a scenario
comprised of, in the simplest case, three instances, which are data source, proxy and
server. This script gives also the possibility of picking out the type of instance used for
the simulation. Moreover, after starting the instances, the script set and initialized the
required server, data sources and proxy. Both server and data source were programmed
also with python due to its ease to develop anything related with networks.
The goal of the data source is to send TCP packets towards the server always going through the proxy. The server must answer to those packets creating a normal
connection. Obviously, before the exchange of data began, the data source established
connection sending packets with the flag SYN set to 1. This is just done once in the
whole communication.
When the packets were analyzed in the proxy, it was possible to see how a TCP segment
with the flag SYN was sent towards the proxy. Then, another TCP packet arrived to the
data source. This segment is the response from the proxy, with the flags SYN and ACK
set to 1. This indicates the connection is established and the system is ready to exchange
information. Finally the data source answers sending another packet to acknowledge the
previous packet. This is shown in the list 3.3 and is called 3 way handshake [31].
Listing 3.3: Establishing data source-proxy connection
1 , 0 . 0 0 0 0 0 0 , 1 0 . 3 4 . 2 5 2 . 3 4 , 1 0 . 2 3 5 . 1 1 . 6 7 , TCP , 7 4 , 4 5 1 2 5 >
ndlaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1
TSval =4294898611 TSecr=0 WS=16
2 , 0 . 0 0 0 0 5 4 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 0 . 3 4 . 2 5 2 . 3 4 , TCP , 7 4 , ndlaas
> 45125 [ SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460
SACK PERM=1 TSval =4294908735 TSecr =4294898611 WS=128
3 , 0 . 0 0 0 8 3 3 , 1 0 . 3 4 . 2 5 2 . 3 4 , 1 0 . 2 3 5 . 1 1 . 6 7 , TCP , 6 6 , 4 5 1 2 5 >
ndlaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval =4294900164
TSecr =4294908735
When this connection is established, the data source sends a HTTP packet to the
proxy, indicating the DNS server address. Then, the proxy looks for the IP address of
that server sending DNS packets. We can see this in the list 3.4.
Listing 3.4: Searching server IP address
4 , 0 . 0 0 0 8 5 9 , 1 0 . 3 4 . 2 5 2 . 3 4 , 1 0 . 2 3 5 . 1 1 . 6 7 , HTTP , 1 9 7 ,
CONNECT ec2 542289943.euwest 1. compute . amazonaws . com
: 5 0 0 0 7 HTTP/ 1 . 1
35
6 , 0 . 0 0 1 3 9 0 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 7 2 . 1 6 . 0 . 2 3 , DNS , 1 0 8 ,
Standard query 0 xb33a AAAA ec2 542289943.euwest 1.
compute . amazonaws . com
7 , 0 . 0 0 2 6 0 0 , 1 7 2 . 1 6 . 0 . 2 3 , 1 0 . 2 3 5 . 1 1 . 6 7 , DNS , 1 6 6 ,
Standard query r e s p o n s e 0 xb33a
8 , 0 . 0 0 2 7 6 9 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 7 2 . 1 6 . 0 . 2 3 , DNS , 1 0 8 ,
Standard query 0 x a 3 f 9 A ec2 542289943.euwest 1. compute .
amazonaws . com
9 , 0 . 0 0 3 7 0 8 , 1 7 2 . 1 6 . 0 . 2 3 , 1 0 . 2 3 5 . 1 1 . 6 7 , DNS , 1 2 4 ,
Standard query r e s p o n s e 0 x a 3 f 9 A 1 0 . 2 2 4 . 8 3 . 2 1
Finally, the proxy sends also a packet with the flag SYN activated to set up the communication between them two. In this way the whole communication data source-proxyserver is ready to work. This exchange of packets is shown in the list 3.5.
Listing 3.5: Establishing proxy-server connection
1 0 , 0 . 0 0 3 7 8 5 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 0 . 2 2 4 . 8 3 . 2 1 , TCP , 7 4 , 3 3 2 7 1
> 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1
TSval =4294908736 TSecr=0 WS=128
1 1 , 0 . 4 3 8 9 6 3 , 1 0 . 2 2 4 . 8 3 . 2 1 , 1 0 . 2 3 5 . 1 1 . 6 7 , TCP , 7 4 , 5 0 0 0 7
> 33271 [ SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460
SACK PERM=1 TSval =4294910381 TSecr =4294908736 WS=16
1 2 , 0 . 4 3 9 0 2 9 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 0 . 2 2 4 . 8 3 . 2 1 , TCP , 6 6 , 3 3 2 7 1
> 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval =4294908845
TSecr =4294910381
Then, a HTTP/1.0 200 OK connection established gets to the data source. Therefore,
now the connection is ready to start sending data. In these simulations was decided to
send data from time to time, with random time periods. This makes the simulations be
more realistic since normally it is difficult to know when a client is going to communicate
with a server.
The eight packets which compose the exchange of data between data source and server
are shown in the list 3.6.
Listing 3.6: Exchange of data source-proxy-server
1 5 , 0 . 4 6 6 8 0 0 , 1 0 . 3 4 . 2 5 2 . 3 4 , 1 0 . 2 3 5 . 1 1 . 6 7 , TCP , 7 1 , 4 5 1 2 5
36
Traffic Test
> ndlaas [ PSH, ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval
=4294900280 TSecr =4294908845
1 6 , 0 . 4 6 6 8 1 3 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 0 . 3 4 . 2 5 2 . 3 4 , TCP , 6 6 , ndl
aa s > 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval
=4294908852 TSecr =4294900280
1 7 , 0 . 4 6 6 9 7 5 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 0 . 2 2 4 . 8 3 . 2 1 , TCP , 7 1 , 3 3 2 7 1
> 50007 [ PSH, ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval
=4294908852 TSecr =4294910381
1 8 , 0 . 4 6 7 9 0 1 , 1 0 . 2 2 4 . 8 3 . 2 1 , 1 0 . 2 3 5 . 1 1 . 6 7 , TCP , 6 6 , 5 0 0 0 7
TSecr =4294908852
1 9 , 0 . 4 6 8 0 1 8 , 1 0 . 2 2 4 . 8 3 . 2 1 , 1 0 . 2 3 5 . 1 1 . 6 7 , TCP , 7 1 , 5 0 0 0 7
> 33271 [ PSH, ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval
=4294910389 TSecr =4294908852
2 0 , 0 . 4 6 8 0 2 9 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 0 . 2 2 4 . 8 3 . 2 1 , TCP , 6 6 , 3 3 2 7 1
TSecr =4294910389
2 1 , 0 . 4 6 8 0 8 3 , 1 0 . 2 3 5 . 1 1 . 6 7 , 1 0 . 3 4 . 2 5 2 . 3 4 , TCP , 7 1 , ndl
aa s > 45125 [ PSH, ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval
=4294908852 TSecr =4294900280
2 2 , 0 . 5 0 8 7 9 9 , 1 0 . 3 4 . 2 5 2 . 3 4 , 1 0 . 2 3 5 . 1 1 . 6 7 , TCP , 6 6 , 4 5 1 2 5
> ndlaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval
=4294900291 TSecr =4294908852
In this list 3.6 the packets with the PSH flag set to 1 denote that there is data being
sent in that segment [32]. In these simulations the data source sent packets with data to
the server, which simultaneously replayed with the same data to the data source. Every
packet in the list 3.6 with the flag PSH activated is sending data. First from data source
to proxy which forwards everything to the server. And then all the way around, sending
the data from server to data source.
To test the performance of the scenario created, many simulations were carried out
with different type of instances, number of data sources and amount of data. Each data
source was set in different instances, and the number was scaled up from one up to ten.
The network was firstly tested with a traffic load based on the sending of 1980 bytes of
data, and later with a heavier load of 5940 bytes of data. These loads were sent up to
37
200 times with a random waiting time between them of either 1 or 2 seconds.
The first simulations were carried out with only one data source. In this section we
show the graphs with the number of bytes going through the proxy just with one client
connected. The Figure 3.3 represents the number of bytes over time in the proxy in the
simulation with 1980 bytes of data. Furthermore, the Figure 3.4 represents the other
simulation with heavier traffic load. The type of instance used is the same in both
examples.
Figure 3.3: Bytes through the proxy with data burst of 1980 bytes
As expected, the average of bytes in the Figure 3.4 is approximately three times bigger
than in Figure 3.3. This makes sense since the data sent is three times bigger as well,
38
Traffic Test
therefore, is needed around triple number of packets. Other issue to point out is that the
Figure 3.4 is smoother than the other one. This is not only due to the effect of the scale
in the graph, but also because the frequency and amount of segments being sent in the
second case is bigger.
3.3
Loading test with several clients
After the simulations with one client, it was time to test the server harder. A realistic
way to do so is simulating the connection of several clients. To do so, we created a similar
environment, but in this case with a variable amount of data sources. All this scenario is
created with a python script as the environment used previously for one client. At this
point, the server was tested with up to ten data sources. The scheme is shown in the
Figure 3.5. Using the Amazon cloud it is possible to use instances, setting one client in
each instance. Therefore, proxy receives packets from different IP addresses, as would be
in a real case.
Figure 3.5: Structure for simulation
3.3. Loading test with several clients
39
The next two graphs represent the same two kinds of simulation developed in the
previous section, but in this case with ten data sources, the maximum number tested.
The Figure 3.6 shows the number of bytes going over the proxy with data burst of 1980
bytes as well as Figure 3.7 does with data burst of 5940 bytes.
40
Traffic Test
The Figure 3.7 shows a much bigger amount of bytes exchanged compared to Figure
3.6 due to the heavier data load sent from data sources. Both graphs are quite smooth
since the packet frequency that is being sent is high with ten data sources.
3.4
Performance results
In this section, the performance of the network was analyze in several ways. First we
looked into the RTT values with different number of clients and then we analyzed other
important features explained later. This becomes even more important in the last part
of the thesis, when the number of clients is highly scaled up.
First of all, we compare two graphs which represent the average RTT of two simulations
differing only in the number of data sources. For the Figure 3.8 packets were being sent
to the server from three different instances, however, in the Figure 3.9 there were up to
ten data sources working.
Figure 3.8: Average RTT with 3 data sources
In these graphs there is no great difference since the data sent did not represent a big
problem in the network performance. However, we can appreciate that the lowest value
during the traffic exchange (2 ms approximately), last much longer in the Figure 3.8. In
the other graph there are many higher peaks, therefore, the RTT in this case is slightly
superior. As expected, the more clients, the bigger congestion in the network and the
longer RTT.
For every simulation the average RTT was calculated to very to what extent different
amount of data, number of clients and type of instance affect to the network performance.
3.4. Performance results
41
Figure 3.9: Average RTT with 10 data sources
As was mentioned before, the RTT does not vary greatly. If we look over the Table
3.1 and 3.2 we do not see large differences. Moreover, the lower in the table, the shorter
period of time there should be. However, this does not apply in every case, therefore, the
type of instance is not very remarkable in these cases. The simplest instance seems to be
enough for these exchange of data speaking about RTT values. Concerning the number
of clients there is a slight difference, especially comparing the RTT between 5 or 10 data
sources with only one. But in general the results are quite similar because this amount
of packets do not represent serious problem for the network.
Server instance type
1 source
3 sources
5 sources
10 sources
t1.micro
m1.large
c1.medium
c1.xlarge
0.0031
0.0037
0.0031
0.0039
0.0046
0.0035
0.0035
0.0043
0.0033
0.0038
0.0051
0.0037
0.0039
0.0032
0.0048
0.0042
Table 3.1: RTT with data bursts of 1980 bytes
The next analysis regarded some characteristics of network performance such as packet
loss, TCP retransmissions and duplicate ACK. It is remarkable that in this tests, the
results were much more diverse. The results show an average of packets, since several
simulations were carried out for each case.
In the Table 3.3 we have the average number of TCP packets which have been retransmitted in each type of simulation. The number is low, and the tests with 10 data
sources have more retransmissions. Moreover, examining this table by type of instance,
the lowest quality instance (t1.micro) seems to have more difficulties in the communica-
42
Traffic Test
1 Source
3 Sources
5 Sources
10 Sources
t1.micro
m1.large
c1.medium
c1.xlarge
0.0026
0.0026
0.0028
0.0026
0.0022
0.0024
0.0031
0.0029
0.0021
0.0028
0.0025
0.0029
0.0029
0.0024
0.0030
0.0024
Table 3.2: RTT with data bursts of 5940 bytes
tion. With this instance the number of retransmissions is bigger. Either way there is no
simulation that stands out concerning packets resent.
The Table 3.4 shows packet losses. Here the differences among the tests carried out
are considerably wider. As expected, the worst simulation with more difficulties in the
communication was the one with 10 data sources, heaviest data burst and worst instance.
Here there is an average of up to 67 lost packets. Moreover we can appreciate how the
heaviest data burst is starting to create problems, because there are many more losses
than in simulations with only 1980 bytes. Every instances give better results than the
t1.micro one. Nevertheless there is no a very significant gap among these three instances
(m1.large, c1.medium, c1.xlarge). The most important result in this tale concerns the
growth of packet loss as the number of data sources increases as well.
Finally, in the Table 3.5 we can check how many ACK were duplicated. In this case
there are barely problems with the c1.xlarge instance unlike with t1.micro. The table also
indicates the higher difficulty to send traffic properly with many data sources. Finally,
it must be pointed out that in these simulations, a c1.xlarge instance is enough to avoid
problems in the communication.
t1.micro
m1.large
c1.medium
c1.xlarge
Data burst
1980 bytes
5940 bytes
1980 bytes
5940 bytes
1980 bytes
5940 bytes
1980 bytes
5940 bytes
1 Source
0
1.5
0
0
0
0
0
0
3 Sources
0
0
0
2.5
0
0
0
0
5 Sources
0
2
0
0
0
0
0
0
10 Sources
0
2
0
0
0
0
0
2.5
Table 3.3: Number of TCP retransmissions
Overall, in this chapter the traffic pattern was analyzed and the network was tested
3.4. Performance results

t1.micro
m1.large
c1.medium
c1.xlarge
Data burst 1 Source

1980 bytes
0
5940 bytes
2
1980 bytes
0
5940 bytes
0
1980 bytes
0
5940 bytes
0
1980 bytes
0
5940 bytes
0.5
43
3 Sources
0
6
0
5.5
0
7
0
5
5 Sources
0
15
0
1
0
13.5
0
9
10 Sources
0
67
0
36
2
50.5
5
54.5
5 Sources
0
7.5
0
0
0
2.5
0
0
10 Sources
6.5
2.5
0
0
4.5
2.5
0.5
0
Table 3.4: Number of lost packets

t1.micro
m1.large
c1.medium
c1.xlarge
Data burst 1 Source

1980 bytes
0
5940 bytes
3
1980 bytes
0
5940 bytes
0
1980 bytes
0
5940 bytes
0
1980 bytes
0.5
5940 bytes
0.5
3 Sources
1
1
0
0
0
0
0
0
Table 3.5: Number of duplicate ACK
to find out the influence of diverse factors in its performance. We have seen how the
RTT vary just a little in the different tests. Most probably due to the fact that the data
sent does not stress enough the server. However, in the last analysis the values measured
related with network performance gave more interesting results. For example, when it
comes to stress the network, the number of clients is more significant than the type of
instance picked out. Nevertheless the kind of instance was also important in to improve
the performance. This was more noticeable when the number of data sources was high.
The c1.large instance solved a large part of the packet losses problem compared with the
t1.micro one.
After achieving these results, we can move on to the next step where we extracted
the traffic pattern from these simulations. All the process is explained in the following
chapter.
C HAPTER 4
Traffic Pattern Extraction
In this section, a traffic pattern was obtained in order to recreate it and send this traffic
multiplied M2M (Machine-to-Machine) towards the same server later on. It was needed
to look up a method to know how to develop a proper extraction to generate the traffic
again. To do so, we looked into several documents with projects where was explained
how traffic generators create realistic traffic. From those publications, we obtained three
main characteristics to rebuild the traffic as similar as possible. The first one is the packet
length [33][34], this required packets to be created with same amount of data as well as
equally long headers. The second feature concerns packet timestamp and packet time
distribution [34][35]. It was needed that the outbound packets had similar frequency; the
length of time from one packet to the next must be as similar as possible to the capture
we wanted to replay. Finally, to create a realistic traffic network, it was significant to
send the same number of packets [33].
4.1
Collecting packet data
First of all, we needed to obtain, from a pcap file, the most significant characteristics
of the packets. We used the files recorded during the previous simulations in the proxy
instance. The next step was to program a script in python made especially to obtain the
features needed from every packet. The best option to make this possible was a python
library called dpkt [36]. Using this library a script was written to collect the required
data from a pcap file such as packet time stamp, length and data sent.
To recreate the traffic, the script had to extract the data of each packet one at a time in
order to resend them when replaying traffic. However, after some tests it was found out
that was much more accurate to gather the data from all the packets involved in one data
burst and put that data together again. This way, when it comes to replay the traffic,
all the data contained in one burst is sent at once instead of sending the data packet by
packet. This is exactly the same way the data source sent packets in the simulations,
45
46
Traffic pattern extraction
therefore, this method was much better to recreate the traffic pattern obtained. Moreover
the script worked out the length of time elapsed from the first packet captured in the
simulation until the data source sent the first packet with data. This was very helpful
when replaying the same capture since the data source started to replay the data at the
same time. The script also extracted the timestamp when every data burst was sent.
Therefore, it was easier to compare graphs and the traffic recreation was highly precise.
In the original simulations where the proxy was set, a few protocols were needed to
establish the communication over this proxy. These were a couple of HTTP and a few
DNS segments, which were meaningless to recreate M2M traffic since there is no proxy in
between. The best solution was to filter them out with the script written to extract the
pattern. These segments would be very difficult to recreate, and they are not noticeable
during the simulation due to their very low weight.
4.2
Replaying traffic pattern
After analyzing the pcap file to obtain data sent and timestamps, the same script must
save all this information, so that, a second script can use it to replay the packets properly.
The first script saved in a file the information gathered from one data burst, as well as the
timestamp. Extractpattern.py is the script that obtains all this information and saves
it in the file mentioned. Then, the second script access to the information contained in
this file. Knowing this information, this program is ready to replay the data in the most
accurate and timely manner, as well as with the same number of packets. The script
resent the traffic using socket programming in python, like in the simulations. Both the
file and this script were deployed in an instance from where the packets were sent to the
server. The script used to resend the traffic must be simple in order to run as fast as
possible and send the packets in an accurate manner. I have to point out that, in order to
test the server, it is only necessary to run the second script named Replaytraffic.py, since
it calls automatically the first script (Extractpattern.py) to obtain the traffic pattern.
When replaying traffic M2M, it was important to recreate the same traffic load than
in the original capture. With this approach, we could compare them to draw important
conclusions and check the accuracy of the method carried out. To achieve this, we had
to filter the data sent from data source to proxy. These very data sniffed were replayed
twice M2M with the second script, so that in the whole network, we are sending the same
amount of data, but in this case directly from client to server. This strategy allows to
receiving the same data from the server as well, therefore, the behaviour of the packets
was very similar to the real case. An example of this approach is represented in the Figure
4.1. The figure on the left shows the traffic network in the simulations with data source,
proxy and server. Furthermore the figure on the right is the result of implementing
the strategy mentioned before M2M. As we can see, the amount of data and number of
packets is the same.
The results of following this strategy are shown in the Figure 4.2. In the graphs we
4.2. Replaying traffic pattern
47
Figure 4.1: Structure of traffic replayed M2M
compare the number of bytes sent over time in the simulation with one client, and in the
recreation M2M.
Figure 4.2: Comparison between simulation and replayed traffic
As can be appreciated in the Figure 4.2, both graphs follow a similar trajectory, going
up and down in the same points. Especially in the first half of the graph, where the
traffic is exactly the same. Then there is a point slightly different and from there the
graphs are not exactly the same. However, we must point out that they still keep a
very similar trajectory until the end and the difference does not get worse and worse.
48
Traffic pattern extraction
Therefore, this method to recreate the traffic is very accurate regardless of the amount of
time the simulation last. Only in a few points the graphs do not match perfectly. This is
due to the fact that the server response is something that we could not control. Another
important issue is related with the duration. Both graphs seem to finish approximately
at the same time. Overall, the graphs are very similar in amount of bytes sent and
duration. Therefore, this approach to extract the traffic pattern is very accurate.
In this chapter, we explained the method to obtain a pattern from the traffic generated
in the simulations. Moreover the high precision of this approach was proven in the Figure
4.2 comparing the graph obtained in the simulation with another achieve recreating the
pattern obtained. Therefore, we can move on to the last part of the thesis where we
multiply this pattern to extract important results related with the server.
C HAPTER 5
Multiplying Traffic Pattern
In this chapter, some features of the TaaS system developed are explained. Then, remarkable results, obtained from recreating the pattern in large scale, are shown and
analyzed to find out how the server responses to certain quantities of data. With the
values achieved, we demonstrate the reliability of the TaaS [3] system created.
5.1
Design of TaaS for M2M communications
As in the case of the previous simulations, the scenario to recreate the traffic pattern
M2M was created within the Amazon cloud. First of all, the server was set up in a EC2
instance. The TaaS infrastructure created allowed us to choose among different type of
instances easily where to deploy servers or clients. Once the server was running we could
proceed to pick out number and type of instances in order to multiply the traffic pattern
towards this very server. When these data sources started to run, the TaaS system
sniffed the ongoing traffic in the server to finally download automatically the recording
for further analysis.
This TaaS infrastructure was very flexible and highly customizable. In the simulation,
the data source characteristics could be easily modified. For instance we could choose
amount of data to send, frequency of data bursts and number of repetitions. Therefore, it
was possible to generate light, normal or heavy traffic loads depending on the frequency
and the data sent. In addition we could create short or long simulations choosing number
of repetitions. The recordings obtained from these simulations are then multiplied M2M,
so that the configuration is the same both in the simulations and the traffic recreations.
Replaying real traffic is the best way to determine precisely [37], in this case, how
much traffic the server can handle providing a certain quality of service. The following
are some important characteristics when it comes to recreate traffic to stress a server
[37][38]. First of all, the traffic which is going to be replayed should not be a short time
period. We must use the same infrastructure and recorded traffic when it comes to replay
49
50
the traffic pattern. The traffic must be sent from different points at once, simulating in
this way the connection of different clients. Then scale up the number of clients in order
to increase the traffic towards the server until it stops working properly. And finally look
into the break to figure out the server capacity.
5.2
Reproduce testing
The following is a description of all the steps we must follow to use the TaaS system
created. First of all we must configure the script Client.py, setting amount of data,
repetitions and random waiting time with the values that suit us. Then we run the
script Simulation.py to exchange the information between client and server. The traffic
then is recorded and downloaded automatically in a pcap file to the computer where
we run Simulation.py. The second part is about replaying the traffic pattern. First we
must start the script Servertoreplay.py to set up the same server used in simulations but
now we have the chance to pick out the type of instance for this server. Finally we run
Replaytraffic.py that obtains the traffic pattern and create the multiplier to recreate this
pattern towards the server. Another file is then downloaded to the computer from which
we can extract the results of the tests.
5.3
Results of traffic recreation
The first step to replay a real traffic pattern was to record a session in the proxy. The
simulation chosen lasted nearly fifteen minutes, and there was one data source sending
3960 bytes of data up to 400 times. Between data burst, there is a waiting time that
can vary from 1 to 3 seconds. This feature make the simulations more realistic since we
never know when a client is going to establish a connection with a server.
Once we have got a capture, we were ready to multiply the traffic. First, we performed
several simulations scaling up the number of data sources. We started with one data
source and increased up to 80, which was considered an enough number of clients to
create heavy loads of traffic. Then we can compare between the different results and the
original simulation to extract interesting results. The number of player was increased
one at a time every five seconds.
In the Figure 5.1 is represented the number of bytes sent in different tests. The instance
used in this case to set up the server is m1.large. The black, red and blue graphs display
the recreation of the real simulation for one, two and ten data sources respectively. We
can appreciate that the red line seems to be twice as high as the graph for one client.
This is exactly the value expected. In addition the highest graph is ten times higher than
the black one, so that all the data sources sent the right number of bytes. The graph with
ten clients is rougher, with higher peaks. This is due to the fact that in those moments
several clients were not sending data, so there is a big difference in the number of bytes
5.3. Results of traffic recreation
51
exchanged.
Figure 5.1: Number of bytes over time in different tests
The Figure 5.2 shows the result of the same sort of tests but with higher number of
data sources. The black graph was created with the amount of packets exchanged among
20 clients and the intended server. The same for the red graph but in this case with up
to 80 sources sending data. The black graph appears to have double amount of bytes
than the blue one of the Figure 5.1. This is a expected result since the number of clients
is twice larger as well. However, the red graph is not four times bigger than the black
one and it is very rough. In fact in the 80 clients recreation appears to be problems to
send data very soon. After 150 seconds the graph goes up very little and slowly. In this
test with 80 clients, every 5 seconds a new client was being added, therefore, the graph
should go up approximately until 400 seconds. Nevertheless it stops going up after about
225 seconds. Therefore, in this case the communication seems to reach a limit in the
exchange of packets where bytes cannot be sent faster. In this case when the number
of clients get to about 30 (keeping in mind there is a slight delay when increasing the
number of data sources).
Figure 5.2: Bytes using a m1.large instance for the server
Now we can compare the Figure 5.2 with the results achieve running the same tests,
but with a higher quality instance for the server. We used the type c1.xlarge. If we
52
look over the Figure 5.3 and compare with the previous one, we can appreciate the wide
difference that there can be between type of instances. The black graph represents ten
clients and it has similar peaks and number of bytes sent than in the Figure 5.2. However,
the recreation with 80 clients is much higher using the best instance. The gap between
graphs is about three times larger in Figure 5.3. Either way there is another limit for the
data sent in this figure. Here we can see that the network cannot exchange data faster
after about 60 clients are running.
Figure 5.3: Bytes using a c1.xlarge instance for the server
When it comes to analyze RTT, as we can see in the Figure 5.4, the graphs obtained
from the tests simulating from 2 to 20 clients, are not very different. This is because
the server did not have problems to handle these amounts of clients as we could see in
the previous figures. The most noticeable difference is in the 80 clients graph where,
despite there are no peaks standing out, the average is quite higher than in the other
tests. This graph appears to be smoother as well because there are packets coming and
going more often than in the rest, so there cannot be large variations. Therefore, the
main characteristics when recreating many clients is the bigger RTT average and the
smoothness of the graph obtained. The rest of the test seems to have a similar RTT.
Figure 5.4: Average RTT extracted from the traffic recreations
5.3. Results of traffic recreation
53
Finally, we looked into another important feature of network performance, packet losses.
The percentage of the losses for the four type of instances tested is shown in the Table
5.1. In these cases, the results vary slightly depending on the kind of instance used to
set up the server. However, the most important differences in the results are related with
the number of clients sending traffic. In most of the cases for one or two clients there is
no even one segment lost. Nevertheless, with 80 clients, in some tests the number of lost
packets reaches a percentage that definitely affects to the quality of service. Normally
there is an important change in the results between 10, 20 and 80 becoming worse and
worse. One thing to point out in the decrease in the gap of packet losses between 20 and
80 clients as the type of instance becomes higher and higher quality.
1 Client
2 Clients
10 Clients
20 Clients
80 Clients
t1.micro
0.011
0.044
0.091
0.128
m1.large
0.027
0.053
0.128
0.154
c1.medium
0.007
0.039
0.076
0.085
c1.xlarge
0.007
0.004
0.067
0.120
0.125
Table 5.1: Percentage of lost packets
In this chapter we have shown many results obtained after recreating the pattern extracted. Overall, with the tests carried out and explained in this chapter, we could check
how the server could handle the traffic in different situations. There were similar quantities of data during the whole simulation. Afterwards we achieved some results relating
to number of bytes/seconds, where the exchange of packets always reached a limit before
being able to simulate up to 80 clients at once. We observed that the maximum number
of clients handled for the server can vary depending on the kind of instance used. Round
Trip Time was not really different until the number of clients increases quite a lot. With
many clients, the RTT graph becomes smoother and its RTT average gets higher. Finally
the Table 5.1 gave us significant information about the sending of packets. This table
informs about the risk in the packet delivery of connecting many clients to the server at
once. The number of lost segments is different with the sort of instance, but especially
with the number of clients used.
C HAPTER 6
Summary and Conclusions
In this last chapter we summarize the work carried out during this thesis and the results
obtained. We also describe the possible future work related with this project.
6.1
Summary and Results
In this thesis we wanted to develop a TaaS system for a M2M communication. The
whole work was mainly divided in several steps, creating a client-proxy-server scenario,
extracting traffic pattern and multiplying it directly from client to the server.
Therefore, in the first part we developed in the Amazon Cloud the scenario we just
mentioned. Then we tested it to find out how its behaviour and performance changed
under different conditions (number of clients, type of instance, etc). In the next step we
came down to the most important part of the thesis, where we had to extract and recreate
the traffic pattern. To achieve a correct pattern extraction and recreation, the number of
bytes and some packet features must be as similar as possible in the simulations and in
the traffic replayed M2M. When the script to recreate the traffic pattern was ready, we
proceeded to replay the packets towards the server. With the TaaS environment created,
the server could be set up in different type of instances to analyze how this affected to
the server performance. Moreover the number of clients could be highly scaled up to find
out how the server would manage to handle heavy traffic loads. Finally, after carrying
out many tests under different conditions, the results were shown and explained trying
to estimate the reliability of the TaaS system developed.
The results of the simulations show how the number of bytes was going up correctly as
the number of clients increased. The RTT was very similar in all the tests since the traffic
load was not heavy enough to stress the server. However, some network performance
features such as packet loss, tcp retransmissions and duplicated ACK had more different
values. This results shown a performance improvement in the network when using high
quality instances and a deterioration when the number of clients was raising.
55
56
Summary and Conclusions
When it comes to replay the traffic pattern, the first step was to compare the results
obtained from recreating the packets with the simulation. In this way we could find out
the quality of this traffic recreator. We compared the amount of bytes sent over time and
the result was practically the same most of the time. It is remarkable that the recreation
did not become worse and worse, the similarity in both graphs was permanent. Therefore,
with this TaaS system developed we may recreate long simulations. In the final results
obtained from the traffic pattern recreation, we have shown how the server reached a
maximum exchange of bytes. This happens when we highly increased the number of
data sources. It has been proven as well how this limit can be higher or lower depending
on the instance quality. The differences are slighter when measuring the RTT, which was
very similar except when the number of data sources running was very high. It was more
interesting the packet loss results. The percentage of lost packets was much higher when
many clients were sending data. There was also certain variance in these results when
we compared among different type of server instances.
Overall, there seems to be good results about the amount of bytes the server could
handle and the growth of packet loss with the number of clients. However, the RTT
results were not so satisfactory. I must say that, after testing many different TCP
servers, this RTT behaved in a different way depending on the server.
These results have estimated the good functionality and reliability of the TaaS for M2M
system created. In addition, this structure offers many options such as number of clients,
type of instances used, and so on. Therefore, the good results along with the flexibility
and many options this environment offers, prove the great usefulness of this system.
6.2
Future Work
In this thesis the TaaS system created is based in TCP protocols and this system focused
on performance and scalability testing mainly. As future work we propose to:
1. Develop other cloud tests such as connectivity testing, security testing, compatibility testing, etc.
2. Test different type of servers, for instance a HTTP server.
3. Work out test costs before starting.
4. The scripts may be modified to test different scenarios
C HAPTER 7
Appendices
57
List of Abbreviations
ACK: Acknowledgement
ARP: Address Resolution Protocol
ASCII: American Standard Code for Information Interchange
ATM: Asynchronous Transfer Mode
DNS: Domain Name System
HTML: HyperText Markup Language
HTTP: Hypertext Transfer Protocol
IEEE: Institute of Electrical and Electronics Engineers
ICMP: Internet Control Message Protocol
IP: Internet Protocol
Len: Length (amount of data)
M2M: Machine-to-Machine
MAC: Media Access Control
MSS: Maximum Segment Size
MTU: Maximum Transmission Unit
NAT: Network Address Translation
NetBIOS: Network Basic Input/Output System
OSI: Open Systems Interconnection
PDU: Protocol Data Unit
RPC: Remote Procedure call
RTT: Round Trip Time
SaaS: Software as a Service
SACK PERM: Sack Permission
Seq: Sequence
SSH: Secure Shell
SSL: Secure Sockets Layer
TaaS: Testing as a Service
TCP: Transmission Control Protocol
TSecr: Timestamp echo reply
TSL: Transport Layer Security
TSval: Timestamp value
UDP: User Datagram Protocol
59
60
Win: Window size
WS: Window Scale
Appendices
List of Tables
3.1
3.2
3.3
3.4
3.5
5.1
RTT with data bursts of 1980 bytes

RTT with data bursts of 5940 bytes
Number of TCP retransmissions . .
Number of lost packets . . . . . . .
Number of duplicate ACK . . . . .
Percentage of lost packets . . . . .
61
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
42
42
43
43
53
List of Figures
1.1
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.1
4.2
5.1
5.2
5.3
5.4
Flow diagram of the developed system . . . . . . . . .

OSI model . . . . . . . . . . . . . . . . . . . . . . . . .
HTTP request . . . . . . . . . . . . . . . . . . . . . . .
Fields of the IP Header . . . . . . . . . . . . . . . . . .
Datagram fragmentation . . . . . . . . . . . . . . . . .
ARP request . . . . . . . . . . . . . . . . . . . . . . . .
Ethernet layers in OSI model . . . . . . . . . . . . . .
UDP protocol header . . . . . . . . . . . . . . . . . . .
TCP protocol header . . . . . . . . . . . . . . . . . . .
Establishing a connection in TCP . . . . . . . . . . . .
Sliding window method . . . . . . . . . . . . . . . . . .
Example RTT interval . . . . . . . . . . . . . . . . . .
Jitter effect . . . . . . . . . . . . . . . . . . . . . . . .
Relation between Latency and Bandwidth . . . . . . .
Proxy operation . . . . . . . . . . . . . . . . . . . . . .
Structure client server . . . . . . . . . . . . . . . . . .
Structure client proxy server . . . . . . . . . . . . . . .
Bytes through the proxy with data burst of 1980 bytes
Structure for simulation . . . . . . . . . . . . . . . . .
Average RTT with 3 data sources . . . . . . . . . . . .
Average RTT with 10 data sources . . . . . . . . . . .
Structure of traffic replayed M2M . . . . . . . . . . . .
Comparison between simulation and replayed traffic . .
Number of bytes over time in different tests . . . . . .
Bytes using a m1.large instance for the server . . . . .
Bytes using a c1.xlarge instance for the server . . . . .
Average RTT extracted from the traffic recreations . .
63
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
12
14
16
18
19
19
20
22
23
24
25
26
27
29
31
33
37
37
38
39
39
40
41
47
47
51
51
52
52
R EFERENCES
[1] Alexa Huth and James Cebula, The Basics of Cloud Computing, 2011.
[2] J. Strickland, How cloud computing works. http://computer.howstuffworks.
com/cloud-computing/cloud-computing1.htm. Accessed: January 2014.
[3] J. G. et al., A cloud-based taas infrastructure with tools for saas validation, performance and scalability evaluation, in 2012 IEEE 4th International Conference on
Cloud Computing Technology and Science.
[4] Guru99, Performance testing. http://www.guru99.com/performance-testing.
html. Accessed: January 2014.
[5] O. E. David Boswarthick and O. Hersent, M2M Communications : A Systems Approach. Hoboken, NJ, USA: Willey, 2012.
[6] G.
Shields,
Testing
network
performance
with
real
http://communities.quest.com/community/nms/blog/2012/09/21/
testing-network-performance-with-real-traffic.
Accessed:
2014.
traffic.
January
[7] OWASP, Testing for ws replay. https://www.owasp.org/index.php/Testing_

for_WS_Replay_%28OWASP-WS-007%29. Accessed: January 2014.
[8] A. W. Services, Amazon elastic compute cloud (amazon ec2). http://aws.
amazon.com/ec2/. Accessed: January 2014.
[9] I. Amazon Web Services, Amazon ec2 instances. http://aws.amazon.com/ec2/
instance-types/. Accessed: January 2014.
[10] G. C. et al., tshark - dump and analyze network traffic. http://www.wireshark.
org/docs/man-pages/tshark.html. Accessed: January 2014.
[11] E. Software, tcprewrite. http://tcpreplay.synfin.net/wiki/tcprewrite. Accessed: January 2014.
65
66
[12] P. Biondi and the Scapy community, Welcome to scapys documentation!. http:
//www.secdev.org/projects/scapy/doc. Accessed: January 2014.
[13] E. Software, Welcome to tcpreplay. http://tcpreplay.synfin.net. Accessed:
January 2014.
[14] E. Software, Frequently asked questions. http://tcpreplay.synfin.net/wiki/
FAQ#Doestcpreplaysupportsendingtraffictoaserver. Accessed: January 2014.
[15] L.Peterson and S. Davie, Computer Networks A Systems Approach. San Franciso:
Morgan Kaufmann, 2007.
[16] R. Fielding, J. Gettys, Hypertext Transfer Protocol HTTP/1.1, 1999.
[17] Tutorialspoint, Http - quick guide. http://www.tutorialspoint.com/http/
http_quick_guide.htm. Accessed: January 2014.
[18] C. Systems,
Ethernet technologies. http://docwiki.cisco.com/wiki/
Ethernet_Technologies. Accessed: January 2014.
[19] B. Hill, Cisco: The Complete Reference. McGraw-Hill Osborne Media, 2002.
[20] J. Postel, User datagram protocol, 1980.
[21] S. Kollar, Introduction to ipv6, 2007.
[22] P. Karn and C. Partridge, Improving round-trip time estimates in reliable transport
protocols, 1991.
[23] D. Roethlisberger, Sslsplit, transparent and scalable ssl/tls interception. http:
//www.roe.ch/SSLsplit. Accessed: January 2014.
[24] R. S. Ulf Lamping and E. Warnicke, Wireshark users guide. http://www.
wireshark.org/docs/wsug_html_chunked. Accessed: January 2014.
[25] L. M. Garcia, Tcpdump libpcap. http://www.tcpdump.org, 2010-2014. Accessed: January 2014.
[26] B. Mitchell, Introduction to proxy servers in computer networking. http://
compnetworking.about.com/cs/proxyservers/a/proxyservers.htm, 2014. Accessed: January 2014.
[27] P. S. Foundation, Python programming language a official website. http://www.
python.org, 1990-2013. Accessed: January 2014.
[28] M. Garnaat, boto 2.24.0. https://pypi.python.org/pypi/boto. Accessed: January 2014.
67
[29] M. Garrels, Introduction to Linux, A Hands on Guide. 2008.
[30] D. Wessels, Squid: The Definitive Guide. Sebastopol: OReilly and Associates, 2004.
[31] T. Beardsley and J. Qian, The tcp split handshake: Practical effects on modern
network equipment, 2010.
[32] Information Sciences Institute, University of Southern California, Tansmission control protocol, 1981.
[33] P. Owezarski and N. Larrieu, A trace based method for realistic simulation, in
IEEE International Conference on Communications 2004.
[34] P. M. Sandor Molnar and G. Szabo, How to validate traffic generators?, in IEEE
International Conference 2013.
[35] C.-Y. K. et al., Real traffic replay over wlan with environment emulation, in
IEEE Wireless Communications and Networking Conference: Mobile and Wireless
Networks.
[36] J. Silverman, My documentation on dpkt. "http://www.commercialventvac.
com/dpkt.html. Accessed: January 2014.
[37] G.
Shields,
Testing
network
performance
with
real
http://communities.quest.com/community/nms/blog/2012/09/21/
testing-network-performance-with-real-traffic.
Accessed:
2014.
traffic.
October
[38] Triometric, Replaying xml traffic. http://www.triometric.net/solutions/

travel/replaying-xml-traffic.html. Accessed: January 2014.

Ltu Ex 2014 94117593

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ltu Ex 2014 94117593

Uploaded by

Copyright:

Available Formats

MASTER'S THESIS

Testing as a Service for Machine to

Master of Science (120 credits)

Lule University of Technology

Testing as a Service for Machine to

Chapter 4 Traffic Pattern Extraction

Chapter 2 Related work

Chapter 5 Multiplying Traffic Pattern

Chapter 6 Summary and Conclusions

The thesis is organized as follows.

Figure 1.1: Flow diagram of the developed system

Figure 2.1: OSI model

2.1. Communication protocols

START LINE <CLRF>

Figure 2.2: HTTP request

2.1. Communication protocols

Figure 2.3: Fields of the IP Header

2.1. Communication protocols

Fragmentation and Reassembly

Figure 2.4: Datagram fragmentation

Ethernet address resolution protocol

2.1. Communication protocols

Figure 2.5: ARP request

Figure 2.6: Ethernet layers in OSI model

frame transmissions and recovering them from communication errors.

Figure 2.7: UDP protocol header

2.1. Communication protocols

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

Figure 2.8: TCP protocol header

2.1. Communication protocols

Figure 2.9: Establishing a connection in TCP

Figure 2.10: Sliding window method

Network Performance Metrics

2.2. Network Performance Metrics

Figure 2.11: Example RTT interval

Figure 2.12: Jitter effect

Latency = P ropagation + T ransmit + Queue

P ropagation = Distance/SpeedOf Light

2.3. Tools strongly associated with this thesis

Figure 2.13: Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain:

T ransf erT ime = RT T + 1/Bandwidth T ransf erSize

Where RTT is the round trip time.

Tools strongly associated with this thesis

2.3. Tools strongly associated with this thesis

Figure 2.14: Proxy operation

Boto is involved are computer, database, deployment, application services, monitoring,

Client Server Application

Figure 3.1: Structure client server

Listing 3.1: Establish connection

3.2. Loading test with proxy

Loading test with proxy

Figure 3.2: Structure client proxy server

sources. In order to develop these simulations we programmed the script Simulation.py

3.2. Loading test with proxy

3.2. Loading test with proxy

Loading test with several clients

Figure 3.5: Structure for simulation

3.3. Loading test with several clients

Figure 3.8: Average RTT with 3 data sources

3.4. Performance results

Figure 3.9: Average RTT with 10 data sources

Table 3.1: RTT with data bursts of 1980 bytes