Professional Documents
Culture Documents
WiFi Fingerprinting
Background
Data Collection
In order to satisfy multiple free-WiFi-AP’s needs, a shopping center is selected to
collect samples of this project. The name of the shopping center is 9-square and I would use
this name in the rest of the report. There are only 3 free WiFis in 9-square which I select in
this project collection:
AP1: NaiXue Milktea
AP2: 9-square public Wifi
AP3: Watsons
But a problem appears in the process of collection. 9-square public Wifi has a large WiFi
coverage and great wifi signal but other 2 WiFis’ (Watsons and NaiXue Milktea) coverage
are relatively small compared to 9-square public Wifi. This cause a problem that I collect
data in 10 different location, the RSS of 9-square public Wifi is still -1 dBm or greater than -1
dBm which can not display in wireshark.(I use Windows 10 system and the Microsoft
Network Monitor cannot collect the data for RSS if the signal is greater than -1 dBm. The
data in RSS is blank shown in wireshark). So I had to discard the data collected for
AP2(9-square public Wifi) and the dataset for my project only contains 2 WiFi APs’ data(AP1
NaiXue Milktea and AP3 Watsons).
The graph followed is the structure for the 9-square shopping center and the 10
locations that I selected in my project.
As is shown in the graph, I select 10 locations (7 in level1 and 3 in level 2) to collect
data from 2 APs. In each location, I connect 2 APs in turns and use a browser(Google
chrome) to load some web pages in order to catch more packets by Microsoft Network
Monitor and stay there for about 1 minute.
Data Process
Because I use the Windows 10 system, I use the Microsoft Network Monitor to collect
packets as the capture files and then use wireshark to analyze data. To express the RSS for
each WiFi AP, I select the signal strength as the standard of distinction.
The original capture files may contain thousands of rows’ data which may contain
some useless rows that have no data in the signal strength column. In order to filter such
rows, I use a filter wlan_radio.signal_dbm >= -100 as the condition and keep
No,Time,Source,Destination,Protocol,LengthRSSI,Signal Strength and Info as the
column(attribute left) which help identify the packet and the RSS.
If the files contain too many rows, I would use time as the filter to select part of the
rows as the datas. The goal of this operation is that I need to load the csv and calculate the
average Signal strength to represent the location in my algorithm. Too many rows would
make the algorithm slow in performance. The sample csv files are attached as follows.
Algorithm
and it would output several lines to return the result as the following graph shows.
Design of algorithm
The basic idea of the algorithm is calculating the variance of the goal location’s signal
strength which would be predicted as the one of location of samples and average signal
strength for each location and each WiFi AP then select the smallest variance binded by
sample location as the predicting location.
The formula is
2 2
variance = (𝑔𝑜𝑎𝑙 − 𝐴𝑃1𝑃𝑙𝑎𝑐𝑒(𝑖)) + (𝑔𝑜𝑎𝑙 − 𝐴𝑃2𝑃𝑙𝑎𝑐𝑒(𝑖)) )
i represents the location number and AP1Place(i) represents the average signal strength
received by the mobile device at Place i. Using the graph above as the example, the goal
location is predicted as location 1.
Details of algorithm
Include libraries
In order to avoid path error, I use os.getcwd() and argument to get the absolute path
of the files directory.
Calculate average
Adjust the order of location 10 and show the two lists(avg) in the output.
Finally calculate the variance and select the smallest one and its corresponding location as
the predicting location.
Use diff to represent the smallest variance, set its initial value as -1 at first.
Use for-loop to show each variance in output and get the smallest variance.
Use num to represent the predicted location number. Notice that the index is 1 less than the
number so add one at last.
Finally show the predicted location in output.
Script
python algo.py data/9squereData/csv/AP1sample data/9squereData/csv/AP3sample
data/9squereData/csv/real
Arguments:
AP1 sample files directory path: data/9squereData/csv/AP1sample
AP3 sample files directory path: data/9squereData/csv/AP3sample
goal(predicted) files directory path which contains two AP files:
data/9squereData/csv/real
Conclusion
This WiFi Fingerprinting project uses the simple technique to collect and analyze the
WiFi data from different APs for an indoor environment. There are a lot of improvement
spaces for this project in the future. The machine learning technique could be included to
train the sample and add accuracy of location prediction. Also, rather than collect data
manually by wireshark and Microsoft Network monitor in finite different locations, it could be
calculated based on the several samples and distance by machine learning technique to
simulate all the locations in the indoor environment accurating to centimeters.