Professional Documents
Culture Documents
DAN PAN
DAN PAN
Due to low cost and portability of Wi-Fi technologies, wireless network deployment has been widely
accepted in the residential environment. The evaluation results of customers’ home wireless net-
work performance level provides a reference for operators to improve their network capacity in
order to face the emerging requirement of Wi-Fi service. However, the dynamic nature of Wi-Fi
network makes Wi-Fi performance analysis difficult to perform. In this thesis, a Wi-Fi parameter
visualization tool is implemented to show users’ Wi-Fi performance in a graphic way. This tool
could help operators investigate customers’ Wi-Fi environment to see if performance degradation
exists or not. Besides, a machine learning method is used for Wi-Fi performance analysis to predict
This function takes Wi-Fi parameters both for target AP and nearby interference APs as input,
and output is categorized Wi-Fi throughput, good, medium, poor or very poor. Different SVM
kernel functions conducted to evaluate the proposed model and results show that classification ac-
curacy can be up to 0.88. It demonstrates that Wi-Fi throughput could be classified using a simple
iii
Sammanfattning
På grund av låg kostnad och hög bärbarhet, för Wi-Fi-teknik, har trådlösa nätverk blivit mycket
vanliga i bostadsmiljön. Den stora anvndningen av Wi-Fi-tjänster betyder att operatrerna vill
förbättra nätverkstjänsterna, genom att känna till kundernas prestanda fr deras trådlsa nätverk i
hemmen. De dynamiska egenskaperna hos Wi-Fi-ntverk gr det dock svårt att utföra analysen av
Wi-Fi data.
I denna avhandling implementeras ett Wi-Fi-parameter visualiseringsverktyg, för att visa användar-
nas Wi-Fi-prestanda på ett graskt stt. Det här verktyget kan hjälpa operatörer att underska kun-
Denna klassiceringsmodell fungerar som en prediktionsfunktion som tar Wi-Fi-parametrar både för
den egna accesspunkten och närliggande accesspunkters interferens som input, och för utsignalen
kategoriseras datatakten som: bra, medium, fattig eller mycket dålig. Olika SVM-körfunktioner
utförda för att utvärdera den föreslagna modellen och resultaten visar att klassiceringsnoggrannhe-
ten kan vara upp till 0,88. Det visar att Wi-Fi-datatakten kan klassiceras med ett enkelt mätverktyg
iv
Acknowledgements
I would like to thank my supervisor Rius i Riu Jaume in Telenor for the opportunity to conduct
this valuable master thesis project. He helps me through all aspects of the project, guiding me in
the right direction, arranging meetings with other Telenor colleges who may help in my project,
Furthermore, I would like to thank Professor Ki Won Sung, my KTH academic supervisor and
Professor Anders Västberg, my KTH academic examiner, for organizing the monthly seminar
during the whole project, providing useful feedbacks from an academic point of view.
Also, I would like to thank other Telenor colleges, Tingsborg Fredrik, Wistedt Anna-Clara,
Roos Christer for providing data collection tool experiment equipments, helping with technique
problems.
v
Contents
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Benefits and Social Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 6
2.1 Wi-Fi Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 IEEE 802.11 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Overview of IEEE 802.11 standards . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Wi-Fi network performance parameter . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Wi-Fi data measurement and analysis tool . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Data measurement method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Data measurement tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Data analysis tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Machine Learning Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
vi
Contents vii
Bibliography 37
Bibliography 39
List of Tables
viii
List of Figures
ix
Chapter 1
Introduction
1.1 Overview
In today's digital technology environment, Wi-Fi, an acronym for Wireless Fidelity based on the
IEEE 802.11 specifications, plays a significant role facilitating access to the Internet. It could
provide any person to connect to the network anywhere without the need of any wires. Moreover,
according to a cisco report[1], traffic from wireless and mobile devices will account for two-thirds
of total IP traffic by 2020. It states that Wi-Fi and mobile devices will account for 66 percent
of IP traffic. In other words, clients of Wi-Fi operators are more online than ever. Therefore,
an important target for operators is to improve customer's broadband experience for delivering a
reliable wireless service.
Due to these high expectations from wireless users, the broadband operators focus on monitoring the
performance of a wireless network and trying to improve it from deeply understanding customers'
wireless environment.
For these reasons Telenor Sverige AB supported the development of three master thesis activities,
aiming at:
• Study on the Wi-Fi data collected from access points, visualize the data and propose a Wi-Fi
performance predictive model to find out the possible factors affecting Wi-Fi performance, carried
out by me.
• Propose an appropriate performance optimization approach based on the Wi-Fi data analysis
results, carried out by Diego Alonso Landa Torrejon[2].
• Develop a GUI tool to present customers' relevant data. This tool is intended to show performance
metrics at the appropriate aggregation and complexity level, as requested by the end user. This
activity is carried out by Yuqing Gu[3].
This thesis presents two contributions. First, a Wi-Fi data visualization tool was developed to
show physical layer metrics variation over a given time interval. Second, this thesis demonstrated
an analysis of the relation between the Wi-Fi performance parameters and a Wi-Fi performance
1
1.2. Related Work 2
indicator, the saturated throughput.This thesis describes how to use a limited Wi-Fi parameter set
to accurately estimate Wi-Fi throughput under the controlled radio communication environment
by using Support Vector Machine (SVM) learning techniques.
1.5 Methodology
This thesis uses Quantitative, and Experimental Research [10] methods in this thesis, including data
collection, visualization and analysis, Based on research objectives, a subsequent (non-iterative)
design process a suitable methodology for this thesis study, and the phases are illustrated in Figure
1.2:
In this step, a Wi-Fi parameter time evolution analysis and visualization tool will be deployed
to present Wi-Fi parameters graphically over different time. The visualization result will be
discussed in Chapter 3.
• Wi-Fi performance prediction model building
Here, Wi-Fi link performance is treated as a black box, an estimation model of Wi-Fi saturated
throughput is proposed by giving the neighbor APs’ information, including traffic volume, signal
strength, noise floor and the channel, and target AP’s signal strength and its noise floor. This
estimation model building, experimental setup and performance evaluation will be illustrated in
Chapter 4, 5, and 6.
• Chapter 3, a Wi-Fi parameter time evolution analysis and visualization tool is deployed to
present Wi-Fi metric time-dependent characteristics.
• Chapter 4, a learning model is proposed to predict Wi-Fi saturated throughput.
• Chapter 5, two experiments and results are described in this chapter.
Background
The purpose of this section is to provide a basic overview of Wi-Fi network, including an introduction
about the wireless protocol, specifications and performance parameters.
IEEE 802.11 is a cell technique by which a wireless network is separated by several cells. Each cell
is controlled by an access point(AP) that links two or more station(STA) as described in Figure
2.1 that is called Infrastructure mode. Multiple infrastructure BSS could be joined together into an
extended service set (ESS) providing continuous larger service coverage[13].
• Accesspoint: Packets delivered inside wireless network is 802.11 frame type, however, if a wireless
network wants to communicate with outside, i.e., Internet, accesspoint is a networking hardware
like a hub or a switch working as a frame converter.
• Station : All devices that can connect to the accesspoint via wireless network interface are
stations(STA). STAs use a wireless medium to transfer frames between each other.
6
2.1. Wi-Fi Network 7
The earliest IEEE 802.11 version was released in 1997 providing wireless communication at maxi-
mum data rate 2 Mbits/s based on DSSS/FHSS modulation scheme.
Due to the slow data rate in the first version deployment, 802.11 working group published two new
protocols, 802.11b and 802.11a in 1999 with different frequency band. Comparing to 1999-802,11,
the maximum data rate can be up to 11 and 54 Mbits/s respectively.
In 2003, a new standard called 802.11g came out on the market. It uses the same spectrum
band with 802.11b, but to achieve higher theoretical throughput, 802.11g adds OFDM modulation
scheme.
802.11n was published in 2009. It supports two spectrum bands, both 2,4GHz and 5GHz. 802.11n
firstly introduced advanced antenna technology, providing Multi-input and Multi-output (MIMO)
up to 4 spatial streams.Therefore, the maximum throughput of 802.11n can reach 600 Mbits/s.
802.11ac was published in 2013, providing very high-throughput on 5GHz by using up to 8 spatial
streams MIMO. Comparing to 802.11n, it has more option for the channel bandwidth, 40/80/160MHz.
Table 2.1 summarized parts of IEEE 802.11 specifications techniques that described before.
From 802.11 PHY standards, Wireless network typically uses two unlicensed spectrum at 2,4 GHz
and 5 GHz band.
Figure 2.2 demonstrates 2,4 GHz spectrum with 20 MHz channel width. There are 13 channels on
2,4 GHz spectrum allowed to use in Europe, and channel 14 is allowed in another country, e.g.,
Japan. The channel centers are separated by 5MHz. However, there are only three non-overlapping
channels without interference for each other, which are channels 1, 6, 11.
Figure 2.3 also describes 2,4 GHz spectrum channel allocation, but different from figure 2.2, it
shows 40MHz channel width for each channel on 2,4 GHz through joining two neighbor channels
together. As can be seen from figure 2.3 , there is no independent channel with 40 MHz channel
width. Therefore, it is not an optimal choice for multi-access points deployment.
As limited independent channels on 2,4 GHz spectrum band, IEEE 802.11 working group de-
fines more non-overlapping channels of 20MHz width with the center frequency from 5170MHz to
5835MHz. Channels 36 to 144 are allowed to use in Europe dividing by three Unlicensed National
Information Infrastructure (UNII) bands, UNII-1, UNII-2 and UNII-3. Moreover, aLL these 20MHz
channels can be simply bonded into 40MHz or 80 MHz and even 160 MHz channel width as in figure
2.4. Therefore, 5GHz deployment is suitable for high-density wireless environment since it has more
non-overlapping channels.
However, the question that which spectrum band should be chosen for the wireless communication
environment is not easy to answer. In the interference issue, the 5GHz band is better than 2,4
GHz band, but 2,4 GHz can travel a larger distance than 5GHz which can reach more coverage
2.2. Wi-Fi data measurement and analysis tool 9
than 5GHz. Therefore, wireless network deployment should be varied according to the different
requirement.
Although new 802.11 standards emerge continuously, 802.11n is still prevalent in nowadays wireless
network. Therefore, performance modeling with802.11n on 2,4 GHz spectrum band is determined
in Chapter 5 experiment setup section. However, it can be extended to other standards on other
band option in the future work if needed.
The common parameters used to indicate wireless network performance are throughput, jitter,
packet loss rate, latency[16].
• Throughput : It represents successful delivery message over a unit time between two wireless
nodes, measuring by bits/second, Kbits/second, or Mbits/second.
• Latency : In the network context, latency typically means how much time it takes for a packet
of data traveling from one network node to another. However, in some environment like TCP
traffic, latency is measured by Round Time Trip(RTT) that describes the delay calculating by
sending a packet to the destination and receiving an acknowledgment from the destination.
• Jitter : It describes the variation in the different packet delay, i.e., the time difference between
message arrival time. It may be an issue in the voice traffic environment, lower jitter more stable
in VOIP communication.
• Packet loss : It is also known as drop rate, happening when packets fail to deliver from sender
to receiver. It typically caused by network congestion, there is no available wireless medium to
send the packet but drop it. Other reason like errors happening during the transmission also
could result in packet loss.
According to [17], data rate is the most people care about, and this dimension of performance is
mainly driving to the wireless network deployment. Therefore, data rate is selected to study in
Chapter 4,5 and 6.
There are two common network data measurement methods[18]: active measurement and passive
measurement.
• Active Measurement
Active measurement needs to inject additional probe packets int to the wireless network. There-
fore, network performance indicators(such as end-to-end response time, transmit error rate, net-
work capability) can be calculated by tracking the probe packets. Active measurements can
2.2. Wi-Fi data measurement and analysis tool 10
better characterize client perceived service quality because they simulate actual traffic behavior
using a few test packages, however, since this measurement require to introduce additional traf-
fic, it shares the same network bandwidth with actual traffic and may disturb the normal traffic
flows.
• Passive Measurement
In passive network measurement, data is collected by passively capturing traffic by monitoring
network nodes, e.g., wireless routers. Most wireless routers have pre-installed passive measure-
ment tools, providing an easy way to record different types of network data (such as traffic
volume, packet loss). Besides, the passive measurements [9][19] are most widely used in wire-
less communication environment. Therefore, the passive measurement method is selected in this
thesis.
Three built-in passive measurement tools are described in this section: Ubus[11] ,Uci [20] and
Wlctl [21].
• Ubus
Ubus is a command line tool in OpenWrt based wireless router, allowing interaction between
ubus server and all registered services. It calls procedures with parameters and returns responses
using userfriendly JSON format.
• Wlctl
Wlctl (Web Listener Control) is a common wireless gateway interface for wireless measurement,
which can determine the effects of changes in the wireless network.
• Uci
Uci (Unified Configuration Interface) is OpenWRT centralized configuration interface, which
can modify the wireless access point configuration files (such as Wi-Fi channel, channel width,
transmit power).
In this thesis, an executable shell script written by ubus and wlctl is used to periodically scan access
point for experimental data collection in Chapter 5, and uci is used for change wireless access point
configuration.
• Iperf
Iperf [22] is a network performance measurement tool for TCP and UDP protocol. iperf allows
being set various parameters, such as time, packet size, for a testing network. It has a client and
server mode that can measure throughput between two network nodes, either one-way or two-way.
The output of iperf is a time-stamped report including the throughput and the amount of data
transferred for a particular time interval, In this thesis, iperf is used to generate experimental
2.3. Machine Learning Overview 11
traffic flow with different transfer data rate in Chapter 5, including saturated and unsaturated
traffic.
• Pandas
Pandas[23] is a powerful data analysis tool for Python programming language. Its flexible data
structures make label and present data more easy and fast. There are two data structures, Series
and DataFrame. DataFrame is used in this thesis, it is an Excel-like data structure including
ordered columns, which can be a different value type(such as string, numeric).
• Matplotlib
Matplolib[24] is also a Python toolkit for data visualization. It is a 2D plotting library which
could produces different figure formats(PDF, JPG, PNG,BMP) . In this thesis, after Pandas
structuring collected data, Matplolib library is used to develop a Wi-Fi parameter visualization
tool in Chapter 3.
• Scikit-learn
Scikit-learn[25] is another python module for machine learning. It integrates various features
including classification,regression,model selection and preprocessing. In this thesis, Scikit-learn
is selected to implement machine learning classification problem, which contains data scaling,
modeling and performance evaluation.
Machine learning(ML) solve a series of problems by computer learning the correlations between the
input and output modeling from collected data set.Normally, ML algorithm is applied if there are
no exactly mathematical relationships that can be observed between the input and output.
In this section, a brief introduction of ML algorithms based on [26][27] is discussed. The specific
ML algorithm chosen for the problem of modeling the Wi-Fi environment will be introduced in
Chapter 5.
There are some basic concepts that help to understand ML programming as below:
• Data : There are two types of data in ML, training data and testing data. Both two data are
generated by testbed or simulation, containing input vectors xi and corresponding output vectors
yi . Training data is used for learning in order to build model. Testing data is used for building
model performance evaluation.
• Feature : The concept of input vectors illustrated before is called features, describing properties
of the studied problem.
• Classification : Classification is tried to find an optimal classifier on the training data. In other
words, in the training step, the training data is separated by several classes. Then this defined
classes will be used to predict on testing data which class they belong to. Figure 2.5 illustrates
a simple linear classification problem.
2.3. Machine Learning Overview 12
• Regression : Different from classification concept, regression works on the value of training
data.The purpose is to find an optimal mapping function represented by a curve or a line to fit
all the data samples. Figure 2.6 illustrates a simple linear regression example
Beside the basic concept, ML is divided into two broad categories: Supervised machine learning
and Unsupervised machine learning.
• Supervised machine learning : Supervised machine learning is learning from labeled data,
a.k.a., training data sample, including input vectors x = [x1 , · · · , xi ] and output vectors y =
[y1 , · · · , yi ]. This process is known as model building. Then, this model is used to make predictions
based on new data, a.k.a, testing data, since the model is needed to be test how good it is, i.e.,
the predictive accuracy is calculated to evaluate model performance.
• Support vector machine(SVM) : SVM is one of Supervised machine learning methods. It
divided into two core groups, Support vector classification (SVC and Support vector regres-
sion(SVR). SVC performs classification to find a decision boundary between categorical labels
that is maximally far from any labels. SVR performs regression to predict continuous trend line
for ordered points in training data.
• Unsupervised machine learning : Unsupervised machine learning is studied on unlabeled data
which only contains input vectors x = [x1 , · · · , xi ]. This type of machine learning algorithm tries
to find out the hidden structure about the data sample and distinguish them accordingly.
In this thesis, the Wi-Fi performance analysis focuses on predicting the Wi-Fi throughput based
on the channel condition which belongs to the Supervised machine learning problem field.
Chapter 3
The chapter describes a Wi-Fi parameter analysis and visualization tool developed using Python
to diagnose Wi-Fi quality at any time quickly. With this tool, any access point and associated STA
data reported by the AP via UBUS interface can be selected and visualized in a graphical way.
Studies in [9][28][29] show that Wi-Fi quality could be impacted by the wireless channel condition,
i.e. factors, such as Wi-Fi signal strength(RSSI), traffic volume, resource contention (including
internal and external), and noise level can affect Wi-Fi performance. Therefore, this analysis and
visualization tool aims to analyze and present the features as below to reveal Wi-Fi performance
degradation:
• signal strength(RSSI) : RSSI demonstrates Wi-Fi signal at some location. If the RSSI value
drops dramatically for a period, it shows a weak signal level of wireless end-user around that
position.
• traffic volume: packets transmitted and received during a period, providing local wireless
network activity information. If no packet is transmitted or received over a time period, it
manifests no active end-user or coverage problem around that location.
• data rate per client: It is highly depended by users activities, e.g., browsing web pages,
downloading a file, watching a video.
• noise level: it signals the non-Wi-Fi interference sources, such as Bluetooth, cordless phone,
microwave oven, operating the same radio band(2,4 GHz) as a local wireless network. High
13
3.2. Analysis and visualizing Wi-Fi parameters evolution over
time 14
noise value means lower signal to noise ratio(SNR), which may lead to reduced available data
rate for wireless clients.
• information of neighbor Wi-Fi activity: it reveals how many other wireless networks and what
is the signal strength about those neighbor network.This extend Wi-Fi contention may also
result in reduced data rate for the local wireless clients.
The tool is developed by pandas and matplib library which were introduced in Chapter 2. In this
section, the graphic results produced by the tool are demonstrated in the following subsections. The
Wi-Fi data which is used as visualization tool input is collected from real Wi-Fi users’ environment.
Figure 3.1 represents a STA related information during a specific time range, i.e., from 15:55pm to
21:50pm ,which is described below:
1. Title: 5c:a3:9d:00:5a:e2 802.11n 2X2 WMM AMPDU 2.4G traffic
• 5c:a3:9d:00:5a:e2 : STA mac address
• 802.11n 2X2 WMM AMPDU : STA capabilities, it supports MIMO(2X2) model, Quality
of Service(QoS) and Frame Aggregation.
2. First graph
• red line : STA RSSI over time
• orange bar :STA received packets over time
• blue bar : STA sent packets over time
3. Second graph(STA transmit direction)
• black line :STA transmit phy data rate over time, unit is Kbps
• purple line :STA actual transmit data rate over time, unit is Kbps, according to Tech-
nicolor paper[30], if actual data rate is less than 1 Kbps, the record is 0.
• blue bar : STA transmit packets over time.
4. Third graph(STA received direction), the parameters are similar as second graph except in
the received direction.
• black line :The STA receive phy data rate over time, unit is Kbps
• purple line :The STA actual receive data rate over time, unit is Kbps, according to
Technicolor paper[30], if actual data rate is less than 1 Kbps, the record is 0.
• blue bar : The STA receive packets over time.
5. Fourth graph: STA transmit noack failure (percentage) without receiving acknowledge from
receiver,
Figure 3.2, Figure 3.3 and Figure 3.4 exemplify how many STAs contend for the same accesspoint
resource during the same time period and all those STAs individual parameters, i.e., rssi over time,
transmit physical rate over time and receive physical rate over time.
3.2. Analysis and visualizing Wi-Fi parameters evolution over
time 17
Figure 3.5 describe 2,4 GHz channel usage of accesspopint during a specific time period as follows:
• blue bar : channel width(MHZ) over time
• green line : accesspoint physical data rate over time
• red dot : accesspoint used channel over time
Figure 3.6 represents 2,4GHz band background noise(dBm) during a specific time period. The
green color is from light to dark corresponding to the noise level is from low to high.
3.3. Conclusion 18
3.3 Conclusion
As the results of all Wi-Fi performance parameters evolution over time shown in this chapter, this
analysis and visualization tool could let operators monitor Wi-Fi environment to have visibilities
for overall performance, which helps service providers see if performance degradation exists and
what are the potential reasons.
Chapter 4
In this chapter, a Wi-Fi performance estimation model is proposed based on Machine Learning
method, which allows service providers to predict Wi-Fi saturated throughput1 from easy measure-
ment on the access point.
According to the factors impacting Wi-Fi performance that are discussed in Chapter 3, the proposed
Wi-Fi throughput prediction model considers as a function of signal strength, resource contention
and noise level as shown in equation (4.1):
In this approach, Wi-Fi performance parameters(such as RSSI, noise level and contention informa-
tion) are extracted from measurement collection tools that described in Chapter 2 are defined as
machine learning input features, and saturated throughput collected in the same way is labeled as
machine learning output features. The steps are illustrated as Figure 4.1:
1 In this thesis, saturated throughput is selected as the Wi-Fi performance indicator in the proposed estimation
19
4.2. Machine learning based Modeling 20
Firstly, label each measurement as input and output features in the data collection process. Sec-
ondly, preprocess input features through features normalization. Thirdly, apply selecting machine
learning method on the features for model building. Finally, evaluate performance of the prediction
model.
The original idea for SVC is used for simple 2-classification, which use an optimal classification
line, i.e., hyperplane to classify two classes. However, by converting the multi-class classification
problem into several 2-classification problems, SVC could apply for Wi-Fi saturated throughput
estimation that belongs to multi-class classification. The concept of finding a hyperplane is shown
in Figure 4.2 :
4.2. Machine learning based Modeling 21
Given a set of training vectors x = {x1 , x2 , · · · , xi }representing by dots and crosses in Figure 4.2
as input features. The purpose of SVC is to use an optimal classifier, i.e., hyperplane (the red line)
can be written in equation (4.8) when y = 0:
y =ω·x+b (4.2)
ω is the weight vector, x is the input features, b is bias weight.The two classes needed to be predicted
are the value of output feature, i.e., y = ±1 .
Therefore, the main problem for SVC is to find the optimal hyperplane , i.e., to fine the best position
of red line in Figure 4.2 to minimize the miss-classification probability . This can be converted into
maximum the margin(m)[31] between two other hyperplane,i.e., ω · x + b = +1 and ω · x + b = −1,
which is defined as equation (4.3) and data samples that locate at these two hyperplane are called
support vectors.
2
m= (4.3)
kωk
kωk is the norm of ω. This margin(m) is subject to (4.4) :
ω · xi + b ≥ 1, if yi = 1
(4.4)
ω · xi + b ≤ −1, if yi = −1
where xi is the ith training vector, yi is the correct output of the SVC classification for sample xi .
The equation (4.3) and two constrain in (4.4) can be combined to (4.5) :
min : kωk, subject to : yi (ω · xi + b) ≥ 1 (4.5)
Equation (4.5) is a quadratic program problem aiming to solve linear classification. Dataset may
not be always separately linearly, then this linear classifier (4.5) is modified to use dual function
[31] as optimization function that could solve non-linear problem as well:
N N N
X 1 XX
M aximum : L(α) = αi − αi αj yi uj K(xi , xj )
i=1
2 i=1 j=1
(4.6)
N
X N
X
Subject to : 0 ≤ αi ≤ C, αi yi = 0, ω = αi yi xi
i=1 i=1
4.2. Machine learning based Modeling 22
Where K(xi , xj ) is kernel function. The purpose of kernel function is to map original data vectors
into higher dimension , which could lead to a linear classification solution. In the equation (4.3) , C
is an upper bound constant which controls the trade-off between the training error and the model
optimization.
The optimization equation (4.6) can be solved by a solver[26], and the parameters in the hyperplane
classifier, i.e., equation (4.8), can be obtained as follow:
N
X
ω= αi yi xi
i=1
(4.7)
1 X X
b= (yi − αj yj K(xi , yj ))
NS
i∈S j∈S
yˆi is the predictive value of ith test sample, and yi is the true value of ith test sample . I(·) is
indicator function, is defined by (4.13):
(
1 if yi = yˆi
I(yi , yˆi ) = (4.13)
0 if yi 6= yˆi
Therefore, in order to find a proper classifier, the higher Accuracy score the better modeling.
4.3 Conclusion
SVM based modeling can produce robust classification results with relevant input information in
a convenient way. The input information can be linear or not linear. To predict Wi-Fi saturated
throughput, a few efficient steps are needed to get device throughput level. Firstly, normalize
the input relevant information(noise, signal strength, resource contention) described in subsection
4.2.2. Secondly, select different model parameters described in subsection 4.2.3 to find an optimal
prediction model in equation (4.8). Finally, evaluate the prediction model described in subsection
4.2.4. For the above reason, SVM based modeling is selected in this thesis to predict Wi-Fi device
saturated throughput.
Chapter 5
In this chapter, two kinds of experiments are conducted to predict a wireless connected device's
throughput in a real Wi-Fi environment. One experiment is performed with no knowledge about
the surrounding neighbor APs' traffic volume. The other experiment is performed with knowing
about the near-interference APs' traffic volume. The dominated factors that impact Wi-Fi device's
throughput will be discussed in the experimental results.
24
5.1. Experiment 1: No control with Neighbor traffic 25
Figure 5.1 shows the network diagram of the testbed experiment.There is only one AP connecting
to two nodes, one node is used as the traffic generator, and the other wireless node becomes the
traffic receiver. The saturated TCP throughput between a client (iPhone 6) and its AP (target AP)
is investigated under the controlled wireless communication conditions.All the nodes specifications
are shown in Table 5.1:
Regarding the traffic sender and receiver nodes in Table 5.1, network tool application is installed
in receiver nodes to use iperf in server mode, and turn off Auto Lock to prevent the receiver from
sleep mode. Iperf client mode is used in traffic sender nodes to generate traffic from the AP to
iPhone 6.
Experiments are conducted in an indoor environment in Sodra Langgatan 36, 169 59 Solna, one
apartment in four floors building which have many surrounding Wi-Fi APs.
Parameter Values
Frequency Band 2.4GHz
Area Size 7m x 10m
Target AP IPERF (TCP) saturate the link between target AP and its client
Channels [1,6,11]
Channel width 20MHz
The experiments are designed to investigate the impact of Wi-Fi throughput in the 2.4GHz band.
According to this purpose, measurement points are generated by periodically sampling the target
AP with different configurations shown in Table 5.2 and uci described in Chapter 2 is used for
modifying the APs configuration.
Regarding the measurements in Figure 5.1 testbed environment, iPhone 6 moves around the
experiment area (7m x 10m) with blue point shown in Figure 5.2.
5.1. Experiment 1: No control with Neighbor traffic 26
Ubus and Wlctl that described in Chapter 2 are used to periodically collect all the measurement
points that are saved as txt files.
– The power received by target AP from client Iphone 6. There is one such power feature.
– The power received by target AP from neighbor APs. There are sixteen such power features.
– Target AP noise floor.There is one such noise feature.
– Neighbor APs noise floor. There are sixteen such noise features
These input features for iPhone 6 prediction is shown in Figure5.3.
• Performance Evaluation
To use SVC based estimation approach, a dataset that consists of all features that illustrate
in Figure 5.3 is needed to be established. Python code, introduced in Chapter 2 filter all
necessary features from the raw txt files generated from Figure 5.1 testbed environment. The
dataset includes 600 data points as shown in Figure 5.4.
The dataset is divided into two parts, 30% of the dataset is randomly selected as testing set,
the other 70% of the dataset is training set used for modeling. SVC in Python is used to
build a prediction model on training set. Then, this classification model is assessed using the
5.1. Experiment 1: No control with Neighbor traffic 28
testing set.The testing set is separated from the dataset before building an estimation model.
Therefore, the accuracy of validating the estimation model with the testing set could give a
reliable performance evaluation result.
In Figure 5.4, the throughput in column A is represented the iphone6 saturated throughput
under the different wireless communication environment as shown in Table 5.3 :
In order to choose the best kernel function to predict the iphone 6 throughput , Grid-
SearchCV is applied to the 70% of the dataset and three different kernel functions (linear,
gaussian,polynomial ) were tested. Linear kernel refers to a linear classifier among the training
data set. Gaussian kernel represents a feature transformation in input space via Gaussian
function. Polynomial kernel also refers to the input space mapping over polynomials of the
original features, Gaussian and polynomial kernel belong to non-linear model. This three
different kernel function definition are described in subsection 4.2.3. The result is shown in
Table 5.4, the accuracy score defined in the result means how accuracy the prediction result
which has been described in subsection 4.2.4. The value of accuracy score is from 0 to 1. 1
represents the prediction result is perfectly matched. Therefore, the higher Accuracy score
the better prediction result.
From table 5.4, the overall accuracy for all three kernel function is low, around 0,55. Gussian
Kernel has the highest accuracy and the shortest running time, which was selected for esti-
mation modeling. Finally, this model is used to evaluate the test data that is 30% of dataset
as shown in Table 5.5 :
From table 5.5, the accuracy to predict unseen data is very low with only 0,56. Then the ex-
periment is repeated to increase the measurement points and the result of prediction accuracy
for different data points is shown in figure 5.5.
As shown in figure 5.5, the prediction accuracy is still around 0,54 even increasing the data
points up to 1355.
• Prediction accuracy with different input features
figure 5.6, with one input feature of iPhone 6 RSSI, the prediction accuracy is 0,38. However,
the prediction accuracy only grows up to 0,54 with all four input features. Therefore, in mul-
tiple dwelling units (MDUs) environment, adding input features with surrounding neighbor
RSSI and noise level is not enough to predict a Wi-Fi device's throughout. Besides these
four features, the neighbor APs' traffic volume [33] also impact the target Wi-Fi device's
throughput. Then another experiment adding neighbor traffic load is discussed in the next
section.
Figure 5.7 shows the network diagram of the testbed experiment.There are 3 APs; one is target AP,
the other two are neighbor APs for introducing competing traffic to the target AP. Besides, two
nodes are connecting to each AP, one node is used as the traffic generator, and the other wireless
node becomes the traffic receiver. The saturated TCP throughput between a client (iPhone 6) and
its AP (target AP) is investigated under the controlled wireless communication conditions.All the
nodes specifications are shown in Table 5.6:
5.2. Experiment 2: Control with neighbor traffic 31
The network tool application installed in traffic senders and receivers are the same introduced in
5.1.1 Experiment testbed of Experiment 1.
Experiments are also conducted in an indoor environment in Sodra Langgatan 36, 169 59 Solna, one
apartment in four floors building. Besides, to minimize other surrounding neighbors’ interference,
all the experiments proceed from 2 am to 4 am in the morning, and a Windows application Wi-Fi
Ch analyzer [34] is installed to check Wi-Fi channels utilization.
Parameter Values
Frequency Band 2.4GHz
Area Size 7m x 10m
Target AP IPERF (TCP) saturate the link between target AP and its client
Interference AP IPERF (UDP)[3Mbps, 45Mbps] step = 3Mbps
Channels channel(target) = 1, channel(interference)=[1,2,3]
Channel width 20MHz both in target AP and interference AP
The experiment is conducted following the same process described in 5.1.2 Data Collection of
Experiment 1 except with different APs configurations shown in table 5.7.
• Measurement synchronization
Different from Experiment 1 in 5.1 ,all the measurements in this Experiment 2 are passively
collected from both target AP and neighbor APs. Therefore, the measurement points in a dataset
are needed to be synchronized. This process is done with Linux date[35] on all the APs to set
the same clock time.
5.2. Experiment 2: Control with neighbor traffic 32
• Performance Evaluation
The whole process of building and evaluating prediction model is the same with 5.1.3 perfor-
mance evaluation of Experiment 1. There are total 721 measurement points in this experiment,
which has the similar data structure to the Experiment 1 dataset shown in figure 5.4, except
including additional input feature called neighbor traffic load.
The dataset is divided into two parts, 30% of the dataset is randomly selected as testing set,
the other 70% of the dataset as training set is used for prediction model building.
The result of prediction accuracy on training set with three different kernel functions (linear,
gaussian,polynomial ) is shown in table 5.8, the description of different kernel function and
accuracy score are the same as shown in Experiment 1 of section 5.1.
From table 5.8, Gaussian kernel has the highest accuracy and the shortest running time, which
was selected for estimation modeling. Then the testing set is used to evaluate the gussian
prediction model and the result is shown in table 5.9 :
As a result shown in Figure 5.9, the accuracy as a function of data set size is calculated. The
prediction accuracy grows by increasing the number of measurement points. Moreover, using
1304 measurements is enough to obtain a good prediction accuracy, 0.87.
The impact of different input features used in Experiment 2: Control with neighbor traffic
are shown in Figure 5.10. The blue bar represents accuracy score with different features
combination. With one feature of iPhone 6 RSSI, the prediction accuracy is 0,39, and it is
only 0,47 with one additional feature of iPhone 6 noise level. However, with a combination
of iPhone 6 RSSI, iPhone 6 noise level and neighbor traffic load, the accuracy can achieve up
to 0,86, which has a large improvement compared to the two input features(iPhone 6 RSSI
and iPhone 6 noise level). Besides, the accuracy only increases to 0,88 by adding another two
features(neighbor RSSI and neighbor noise level), which means the RSSI and noise level of
two neighbors shown in figure 5.7 experimental deployment do not have much impact on the
prediction accuracy.
Therefore, in the environment of controlling with near neighbors' traffic, the neighbor traffic
load feature plays the most important role on a prediction of a Wi-Fi device's throughput.
Chapter 6
6.1 Conclusion
This master thesis is aimed to investigate Wi-Fi performance in residential environment. It in-
volves the development of visualization tool for Wi-Fi parameter and Wi-Fi throughput prediction
modeling.
• A Wi-Fi parameter visualization tool has been implemented in Python. Time-dependent graphs
for different Wi-Fi parameters can be shown in a clear way. All these graphs are plotted according
to the real user’s traffic data, which can demonstrate if there is Wi-Fi performance degradation
in residential wireless environment.
• A machine learning based classification model has been proposed, which allows service providers
to predict Wi-Fi saturated throughput from passive measurement in the access point. In this
classification model, Wi-Fi throughput is considered a function of several input features containing
RSSI, noise level and contention traffic. This model is performed and evaluated in a testbed with
different network traffic and configurations. The result shows that this prediction method can
reach high accuracy up to 0.88 with knowing near interference APs traffic load.
Limitation
The features selected in the Wi-Fi throughput prediction model is not only measurable but also
easily accessible from Wi-Fi vendor’s router(e.g. RSSI, noise level and transmit data rate). It is
a trade-off between complexity and estimation accuracy. Therefore, such measurement does not
capture more detailed environment characteristics like packet size, 802.11n frame aggregation size
or signal reflection (multi-path fading) which may improve prediction accuracy.
For the purpose of improving this project, the future work is considered as follows:
• Add another filter function in the visualization tool to categorize STAs by the level of RSSI, traffic
volume or data rate. This feature can give service providers an overview about performance
degradation at first glance when many STAs connect to the same access point.
35
6.2. Future Work 36
• Improve and extend throughput prediction model by introducing local Wi-Fi contention in the
experiment testbed. In this thesis, only extend Wi-Fi contention is considered. Besides, more
input features that impact Wi-Fi performance could be included to improve the accuracy of the
prediction model.
Bibliography
[1] Cisco. Cisco visual networking index forecast and methodology, 2015-2020. [Online].
Available : http://www.cisco.com/c/en/us/solutions/collateral/service-provider/
visual-networking-index-vni/complete-white-paper-c11-481360.html(Last accessed
on 2017-02-10).
[2] Diego Alonso Landa. Analysis and evaluation of viable features for an ieee 802.11n/ac
self-optimizing solution. [Online]. Available : https://drive.google.com/open?id=
0B0sISjvbr4krNm1VTGpKY2Nrc2s(Last accessed on 2017-07-25).
[3] Yuqing Gu. Home wi-fi optimization application front-end design. [Online]. Available : https:
//drive.google.com/open?id=0B0sISjvbr4krN29pQTF5cFNkWVk(Last accessed on 2017-07-
25).
[4] Z Gal, T Balla, and A Sz Karsai. On the wifi interference analysis based on sensor network
measurements. In Intelligent Systems and Informatics (SISY), 2013 IEEE 11th International
Symposium on, pages 215–220. IEEE, 2013.
[5] A Kamińska-Chuchmala. Performance analysis of access points of university wireless network.
Rynek Energii, 2016.
[6] Marcel Dischinger, Andreas Haeberlen, Krishna P Gummadi, and Stefan Saroiu. Charac-
terizing residential broadband networks. In Internet Measurement Comference, pages 43–56,
2007.
[7] Shravan Rayanchu, Ashish Patro, and Suman Banerjee. Airshark: detecting non-wifi rf devices
using commodity wifi hardware. In Proceedings of the 2011 ACM SIGCOMM conference on
Internet measurement conference, pages 137–154. ACM, 2011.
[8] Partha Kanuparthy, Constantine Dovrolis, Konstantina Papagiannaki, Srinivasan Seshan, and
Peter Steenkiste. Can user-level probing detect and diagnose common home-wlan pathologies.
ACM SIGCOMM Computer Communication Review, 42(1):7–15, 2012.
[9] Ashish Patro, Srinivas Govindan, and Suman Banerjee. Observing home wireless experience
through wifi aps. In Proceedings of the 19th annual international conference on Mobile com-
puting & networking, pages 339–350. ACM, 2013.
[10] Anne Håkansson. Portal of research methods and methodologies for research projects and
degree projects. In Proceedings of the International Conference on Frontiers in Education:
37
Bibliography 38
Computer Science and Computer Engineering (FECS), page 1. The Steering Committee of
The World Congress in Computer Science, Computer Engineering and Applied Computing
(WorldComp), 2013.
[11] OpenWrt. Ubus (openwrt micro bus architecture). [Online]. Available : https://wiki.
openwrt.org/doc/techref/ubus/(Last accessed on 2017-02-05).
[12] Telenor. How wifi has changed the world. [Online]. Available : http://purple.ai/
wifi-changed-world/(Last accessed on 2017-02-04).
[13] Matthew Gast. 802.11 wireless networks: the definitive guide. ” O’Reilly Media, Inc.”, 2005.
[14] IEEE Standards Association. Telecommunications and information exchange between systems
local and metropolitan area networks–specific requirements part 11: Wireless lan medium
access control (mac) and physical layer (phy) specifications. IEEE Std, 802, 2012.
[15] David D Coleman and David A Westcott. Cwna: certified wireless network administrator
official study guide: exam Pw0-105. John Wiley & Sons, 2012.
[16] Theodore S Rappaport. Wireless communications–principles and practice, (the book end).
Microwave Journal, 2002.
[17] Jack L Burbank, Julia Andrusenko, Jared S Everett, and William TM Kasch. Wireless Net-
working: Understanding Internetworking Challenges. John Wiley & Sons, 2013.
[18] Venkat Mohan, YR Janardhan Reddy, and K Kalpana. Active and passive network measure-
ments: a survey. International Journal of Computer Science and Information Technologies,
2(4):1372–1385, 2011.
[19] Ratul Mahajan, Maya Rodrig, David Wetherall, and John Zahorjan. Analyzing the mac-level
behavior of wireless networks in the wild. In ACM SIGCOMM Computer Communication
Review, volume 36, pages 75–86. ACM, 2006.
[22] Bruce A. Mah Jeff Poskanzer Kaustubh Prabhu Jon Dugan, Seth Elliott. Iperf (the ultimate
speed test tool for tcp, udp and sctp).
[23] Wes McKinney. pandas (powerful python data analysis toolkit). [Online]. Available : http:
//pandas.pydata.org/pandas-docs/stable/(Last accessed on 2017-03-15).
[24] Numfocus organization. matplotlib (powerful python visualization tool). [Online]. Available :
https://matplotlib.org/index.html(Last accessed on 2017-03-18).
[25] Scikit-learn. Python svc tool: scikit-learn. [Online]. Available : http://scikit-learn.org/
stable/modules/svm.html#svc(Last accessed on 2017-03-20).
Bibliography 39
[26] CM Luscombe. Pattern recognition and machine learning (information science and statistics),
2007.
[27] Christopher JC Burges. A tutorial on support vector machines for pattern recognition. Data
mining and knowledge discovery, 2(2):121–167, 1998.
[28] Guillaume Kremer, Philippe Owezarski, Pascal Berthou, and German Capdehourat. Predictive
estimation of wireless link performance from medium physical parameters using support vector
regression and k-nearest neighbors. In International Workshop on Traffic Monitoring and
Analysis, pages 78–90. Springer, 2014.
[29] Ajay Gupta and Prabhash Dhyani. Performance indicators in a 802.11 wlan deployment.
In Advances in Recent Technologies in Communication and Computing, 2009. ARTCom’09.
International Conference on, pages 490–494. IEEE, 2009.
[30] Ioannis Pefkianakis, Henrik Lundgren, Augustin Soule, Jaideep Chandrashekar, Pascal
Le Guyadec, Christophe Diot, Martin May, Karel Van Doorselaer, and Koen Van Oost. Char-
acterizing home wireless performance: The gateway view. In Computer Communications (IN-
FOCOM), 2015 IEEE Conference on, pages 2713–2731. IEEE, 2015.
[31] Tong Zhang. An introduction to support vector machines and other kernel-based learning
methods. AI Magazine, 22(2):103, 2001.
[32] Julien Herzen, Henrik Lundgren, and Nidhi Hegde. Learning wi-fi performance. In Sensing,
Communication, and Networking (SECON), 2015 12th Annual IEEE International Conference
on, pages 118–126. IEEE, 2015.
[33] Aniket Mahanti, Niklas Carlsson, Carey Williamson, and Martin Arlitt. Ambient interference
effects in wi-fi networks. In International Conference on Research in Networking, pages 160–
173. Springer, 2010.
[34] Metageek. Metageek wi-fi chanalyzer. [Online]. Available : http://www.metageek.com/
products/wi-spy/(Last accessed on 2017-03-25).
[35] Linux. Linux date man page. [Online]. Available : http://man7.org/linux/man-pages/
man1/date.1.html (Last accessed on 2017-04-10).
TRITA TRITA-ICT-EX-2017:63
www.kth.se