You are on page 1of 10

COMP4336 Term Project Report

WiFi Fingerprinting

z5186632—------------------------ Wenjie Liu


Introduction
Location Identification with WiFi Fingerprinting is a technique that uses a mobile
device to connect multiple WiFi APs in an indoor environment to collect datas which would
be used to identify current location. Due to distance-dependent path loss of wireless signals,
Each location’s(2-3 meters) RSS for the same WiFi is different. In order to avoid some error,
it selected more than 1 WiFi AP as the sample to calculate in this project.

Background

Data Collection
In order to satisfy multiple free-WiFi-AP’s needs, a shopping center is selected to
collect samples of this project. The name of the shopping center is 9-square and I would use
this name in the rest of the report. There are only 3 free WiFis in 9-square which I select in
this project collection:
AP1: NaiXue Milktea
AP2: 9-square public Wifi
AP3: Watsons
But a problem appears in the process of collection. 9-square public Wifi has a large WiFi
coverage and great wifi signal but other 2 WiFis’ (Watsons and NaiXue Milktea) coverage
are relatively small compared to 9-square public Wifi. This cause a problem that I collect
data in 10 different location, the RSS of 9-square public Wifi is still -1 dBm or greater than -1
dBm which can not display in wireshark.(I use Windows 10 system and the Microsoft
Network Monitor cannot collect the data for RSS if the signal is greater than -1 dBm. The
data in RSS is blank shown in wireshark). So I had to discard the data collected for
AP2(9-square public Wifi) and the dataset for my project only contains 2 WiFi APs’ data(AP1
NaiXue Milktea and AP3 Watsons).
The graph followed is the structure for the 9-square shopping center and the 10
locations that I selected in my project.
As is shown in the graph, I select 10 locations (7 in level1 and 3 in level 2) to collect
data from 2 APs. In each location, I connect 2 APs in turns and use a browser(Google
chrome) to load some web pages in order to catch more packets by Microsoft Network
Monitor and stay there for about 1 minute.
Data Process
Because I use the Windows 10 system, I use the Microsoft Network Monitor to collect
packets as the capture files and then use wireshark to analyze data. To express the RSS for
each WiFi AP, I select the signal strength as the standard of distinction.
The original capture files may contain thousands of rows’ data which may contain
some useless rows that have no data in the signal strength column. In order to filter such
rows, I use a filter wlan_radio.signal_dbm >= -100 as the condition and keep
No,Time,Source,Destination,Protocol,LengthRSSI,Signal Strength and Info as the
column(attribute left) which help identify the packet and the RSS.

If the files contain too many rows, I would use time as the filter to select part of the
rows as the datas. The goal of this operation is that I need to load the csv and calculate the
average Signal strength to represent the location in my algorithm. Too many rows would
make the algorithm slow in performance. The sample csv files are attached as follows.

Algorithm

Sample output and input:


In order to do the calculation and deal with the data collected, I use python to
program the algorithm. It’s a single-file program. It could be run in the terminal with several
arguments. The sample format is:

python algo.py PathOfAP1Director PathOfAP2Director PathOfPredictionLocationDirector

and it would output several lines to return the result as the following graph shows.
Design of algorithm
The basic idea of the algorithm is calculating the variance of the goal location’s signal
strength which would be predicted as the one of location of samples and average signal
strength for each location and each WiFi AP then select the smallest variance binded by
sample location as the predicting location.
The formula is
2 2
variance = (𝑔𝑜𝑎𝑙 − 𝐴𝑃1𝑃𝑙𝑎𝑐𝑒(𝑖)) + (𝑔𝑜𝑎𝑙 − 𝐴𝑃2𝑃𝑙𝑎𝑐𝑒(𝑖)) )
i represents the location number and AP1Place(i) represents the average signal strength
received by the mobile device at Place i. Using the graph above as the example, the goal
location is predicted as location 1.

Details of algorithm
Include libraries

Using pandas to load data from csv.


Using sys and os to get command argument and path of each files(csv)
Using statistics.mean() to calculate the average
Using math.sqrt() to calculate variaence.
Check number of arguments and get arguments

In order to avoid path error, I use os.getcwd() and argument to get the absolute path
of the files directory.

Calculate average

Create average signal strength lists for 2 WiFi APs.


Each list contains 10 locations’ average signal strength collected by mobile device.
Using the helper function calAvg(para:filePath) shown as follows.
Because the files are sorted by name, the order of the list is 1,10,2,3….,9 and I will
put the location 10 to last by the steps followed.
Using pandas to load data from csv files and catch the specific column Signal strength
(dBm) into a list. Because the data in the list is string and the format is “-60 dBm”, I use the
replace function delete “ dBm” and then transverse it to int in order to calculate the average.

Adjust the order of location 10 and show the two lists(avg) in the output.

Finally calculate the variance and select the smallest one and its corresponding location as
the predicting location.
Use diff to represent the smallest variance, set its initial value as -1 at first.
Use for-loop to show each variance in output and get the smallest variance.
Use num to represent the predicted location number. Notice that the index is 1 less than the
number so add one at last.
Finally show the predicted location in output.

Script
python algo.py data/9squereData/csv/AP1sample data/9squereData/csv/AP3sample
data/9squereData/csv/real

Arguments:
AP1 sample files directory path: data/9squereData/csv/AP1sample
AP3 sample files directory path: data/9squereData/csv/AP3sample
goal(predicted) files directory path which contains two AP files:
data/9squereData/csv/real

Conclusion
This WiFi Fingerprinting project uses the simple technique to collect and analyze the
WiFi data from different APs for an indoor environment. There are a lot of improvement
spaces for this project in the future. The machine learning technique could be included to
train the sample and add accuracy of location prediction. Also, rather than collect data
manually by wireshark and Microsoft Network monitor in finite different locations, it could be
calculated based on the several samples and distance by machine learning technique to
simulate all the locations in the indoor environment accurating to centimeters.

You might also like