You are on page 1of 6

Wi-Fi Usage Profiles Evolution: A First Look on a

National K-12 Education Service Provider


Germán Capdehourat, Cecilia Aguerrebere, Federica Bascans, Germán Álvarez and Pedro Porteiro
{gcapdehourat, caguerrebere, fbascans, galvarez, pporteiro}@ceibal.edu.uy
Plan Ceibal, Avda. Italia 6201, Edificio Los Ceibos, 11500, Montevideo, Uruguay.

Abstract—This paper presents the first study with real-world a clustering method to identify the main usage profiles. The
traffic data from a national K-12 Education Service Provider methodology developed proved to be useful to compare the
(ESP). This ESP, which supports a one-to-one computing pro- profiles found in the two available datasets, corresponding to
gram in Uruguay, is in charge of the Wi-Fi Internet access at
all K-12 schools in the country. Network users include teachers the beginning and the end of the school year, respectively.
and students, where the latter typically range between 6 and The results indicate that the main usage profiles identified
18 years old. In order to gain knowledge of the user behavior were the same in both periods, with a high level of coin-
in such a novel scenario, the work is focused on finding out cidence. This is a remarkable result, as it means that the
the typical network usage patterns and study their evolution categories in which the users should be classified according to
during the year. Based on the selected features and proper
distance measures, four main Wi-Fi usage profiles were identified their Wi-Fi network activity, remain stable during the school
through clustering algorithms. In addition, these classes are stable year. Based on this result, it is possible to build a Wi-Fi
throughout the year, which makes possible to implement a Wi- activity monitoring system which can be trained with data
Fi activity monitoring system trained with data collected during collected during the first weeks of the year only. Once the
the first weeks of the year only. The results suggest that network system has learned the usage profiles, the categories remain
traffic analysis could benefit the ESP for evidence-based decision
making, not only from the network operator perspective, but also fixed, and allow to monitor the individual evolution of users
providing vital information for learning analytics purposes. during the rest of the school year (e.g. increase or decrease in
Index Terms—network traffic, education, clustering, usage their activity). With this information, the ESP could provide
profiles. more personalized services to teachers and students, accord-
ing to their profile. Furthermore, the usage evolution could
I. I NTRODUCTION be exploited for learning analytics purposes (e.g. study the
The increasing data availability, jointly with recent advances correlation with academic performance improvements).
in machine learning and big data analysis, have laid the
groundwork for several studies with datasets from real-world II. P REVIOUS W ORK
Wi-Fi networks (e.g. [1], [2], [3], [4]). These studies are Most of the previous works which are based on data from
typically carried out with data from Internet service providers real-world Wi-Fi networks, typically present a descriptive
(ISPs) or mobile network operators (MNOs). Although many analysis, including several empirical distributions from diverse
articles refer to educational settings such as university cam- measurements [1]. We highlight some studies particularly
puses, studies in K-121 scenarios are very rare. In this paper, focused on pattern analysis, with traffic data from different
we analyze traffic data from Plan Ceibal [5], a major K-12 networks. Cerquitelli et al. [6] analyze users with a similar
education service provider2 (ESP), which leads a nationwide Internet access performance, while Mirylenka et al. [3] look
one-to-one computing program in Uruguay. This governmental for similar temporal activity patterns, both with ISP residential
agency provides technological support to the national K-12 clients data. Concerning the selected features they are at
education system, including Wi-Fi connectivity and videocon- opposite ends, from pure traffic histogram in the first case
ference infrastructure for all public schools, as well as access (where time does not matter at all to describe each user), to
to educational platforms. pure time series in the second case (where users must have
To the best of our knowledge, this work stands out from similar activity at the same time to be grouped together).
the previous as the first of its kind in a K-12 scenario, where An intermediate approach is preferred by Mucelli et al. [4],
most of the users are between 6 and 18 years old. The analyzing a large dataset from a major MNO. A first clustering
main Wi-Fi usage profiles at schools were analyzed, based on stage is based on features extracted from traffic volumes,
unsupervised classification from the collected traffic data. For identifying three main profiles: light, medium and heavy
this purpose, relevant features were extracted, and appropriate users. Then, a second clustering is applied, now with features
distance measures were selected and validated, ending up in based on the session frequency, resulting in two subcategories:
occasional and frequent users. We follow a similar approach,
1 Term coined to refer to primary and secondary education (from 6 to 18
combining both temporal patterns and traffic volumes to rep-
years old).
2 Education service provider (ESP): Organization which helps the education resent users’ behavior. However, instead of making a separate
system to implement comprehensive reforms. sequential clustering, we decided to integrate them both in a
Fig. 1: Daily evolution of the total connected clients through- Fig. 2: Traffic evolution during Set-Nov ’17 (WIFI -
out school days for the WIFI - END dataset (Set-Nov ’17). END dataset).

single feature vector. Thus, we do not impose a fixed structure compares the traffic evolution for each period, where the hori-
of main clusters based on traffic volumes and sub-clusters zontal lines indicate the median of the maximum daily values.
based on session frequency, but instead we explore the data A clear boost is observed from April-May to September-
structure provided by the combined traffic-frequency features. November (24% for downlink and 18% for uplink), which can
Furthermore, we compute separate features for different times be explained by an increase in the Internet access bandwidth
of the day, attending to the main education schedules, which for a large number of schools.
significantly affect the Wi-Fi usage in this context. An additional dataset was collected, including traffic vol-
umes per application (e.g., Facebook, YouTube and Plan
III. D ESCRIPTION OF THE DATASETS
Ceibal’s educational platforms), which was not considered for
The datasets used in this study were collected from urban the profiling stage, but only for the characterization of the
schools, which cover more than 95% of the K-12 students in resulting clusters. This data was gathered at the beginning of
Uruguay. All these schools are provided with an optical fiber the school year (during the same period as WIFI - INI) from
Internet connection and Wi-Fi infrastructure. Each record in a subset of 111 schools (10% of the main dataset) and is
the datasets corresponds to the amount of downlink and uplink hereafter referred to as APP - DATA. In this case the vantage
traffic per device, aggregated during 1 minute, and sampled point was not the AP but the server located at each school,
every 15 minutes. Data were collected from 8587 APs located where the NTOP tool [7] was used to collect traffic flow
in 1110 schools, where 65% are primary schools and 35% are data per application. The measurements were gathered every
secondary schools. 5 minutes with a time window of 5 minutes. The resulting
In order to analyze the evolution throughout the year, two dataset has records from 264.145 different devices, where 19%
datasets were collected: one at the beginning of the school were delivered by Plan Ceibal, and the remaining correspond
year, right after Easter holidays (April to May 2017) and the to BYOD.
other one at the end of the school year, right after Spring break
(September to November 2017), hereafter referred to as WIFI - IV. U NVEILING W I -F I N ETWORK U SAGE P ROFILES
INI and WIFI - END , respectively. The former corresponds to a In order to identify the main usage profiles in Plan
6 weeks period and includes records from 778.940 different Ceibal’s wireless network, a standard unsupervised classifica-
devices, where 28% correspond to the ones delivered by Plan tion pipeline was followed: preprocessing, feature extraction,
Ceibal, and the rest are due to BYOD3 . The latter corresponds and clustering. First, a threshold-based filter was used in order
to an 8 weeks period and includes 975.441 unique devices, to discard many devices with scarce activity. For this purpose,
21% provided Plan Ceibal and the remaining 79% BYOD. we defined minimum values for the accumulated downlink and
Figure 1 shows the daily school days evolution of the total uplink traffic, and the number of days with network activity.
number of simultaneous connected devices (the thick line is a) Features: The selected features combine temporal
the median). It is worth noting that, even with an important activity with traffic volumes. For this purpose, we used a multi-
increase in the total number of unique devices (25% more in histogram feature vector, i.e., a concatenation of histograms of
WIFI - END ), the number of simultaneous devices connected to different types: traffic histograms and frequency histograms.
the Wi-Fi network is quite similar in both datasets. Figure 2 All of them have the same traffic bins, which follow a quasi
3 Bring Your Own Device (BYOD): Acronym which refers to the policy of
logarithmic scale [6]: {0, 1kbps, 10kbps, 100kbps, 1Mbps,
allowing students or employees to bring personally owned devices (laptops, 10Mbps, 20Mbps, 50Mbps, 150Mbps} for downlink and {0,
tablets, and smartphones) to their school or workplace. 0.1kbps, 1kbps, 10kbps, 100kbps, 1Mbps, 10Mbps, 20Mbps,
50Mbps} for uplink, which results in a 8-bin histogram for 1.00

● ● ●

each traffic direction. Records which lay out of the bounds ●



0.75 ●

are discarded as outliers. ●


TPR

For the traffic histograms, each bin counts the amount of 0.50 ●

traffic registered in the corresponding traffic range, at the 0.25 ●




given time of day (morning, afternoon, evening and nighttime) ●











on school days or anytime on weekends. For frequency his- 0.00 ●


0.00 0.25 0.50 0.75 1.00


tograms, each bin accumulates one for each day where the user FPR
has at least one record in the corresponding traffic range, dif- ●
Canb. Cross−Bin L1 L2−MinMax
ferentiating school days and weekends. Hence, the maximum Chi−2 Inter L_2 L2−Std
value for frequency histograms is the total number of days 1.00

under consideration. Therefore, the resulting feature vector has ●




dimension 112, 80 values for the traffic histograms (8 bins, 2 0.75 ●



directions, 5 temporal ranges: morning, afternoon, evening and

TPR

0.50 ●

nighttime on school days and anytime on weekends) and 32 ●




values corresponding to the frequency histograms (8 bins, 2 0.25 ●







directions, 2 type of days: school days and weekends). 0.00










Next, we present the discrimination power analysis of the 0.00 0.25 0.50 0.75 1.00
FPR
selected features, also taking into account the study of an
appropriate distance metric. Then, we discuss the clustering Fig. 3: ROC curves for the evaluated histogram distances under
strategy used in Section IV-B. the random (left) and sequential (right) subset division.
A. Features and Distance Measures Validation
In this section we evaluate the discrimination power of the
proposed features, comparing at the same time the perfor- light on the question about the temporal stationarity of users’
mance of a series of histogram distances. For this purpose, online behavior. If users’ behavior changes drastically from the
we assume that each user has a relatively constant activity time period used to build S1 to the one used to build S2 , the
pattern in the considered time period.Then, we assess the expected classification performance of all the tested distance
performance of various histogram distances at the verification functions should be low. Otherwise, it should be close to that
task. That is, given a device feature vector and his identity of the random division.
(i.e., the unique MAC address of the device), verify whether c) Histogram distances: A wide variety of histogram
the provided identity is correct. distance functions have been proposed in the literature [8],
a) Methodology: First, the considered dataset is split into [9]. Most of them fall under one of two categories: bin to
two subsets S1 and S2 , each one comprising a fraction of bin or cross-bin, depending on whether information from
the samples available for each user. Then, using the samples different bins is combined or not. Bin to bin distances are
in each subset, two independent feature representations are often faster and simpler to compute, while cross-bin distances
computed for each user i, one to be used as ground-truth gi tend to be more robust and performant, at the cost of a
and the other one for testing fi . Next, for each distance d(·) to higher computational complexity. In this study, we evaluate
be evaluated, N users are randomly chosen and for each user a series of widely used bin to bin histogram distances: L1 ,
we compute, i) the distance between the feature representations L2 , chi-squared, Canberra and intersection, as well as a cross-
of the same user in both subsets d(fi , gi ), i = 1, . . . , N and, bin distance: a quadratic form distance combining each bin
ii) the distance between the feature representation of each user with its two adjacent neighbors with weights 1/2 and 1/4
and that of M randomly chosen different users d(fi , gj ), j = respectively. For the L2 distance, two normalization strategies
1, . . . , M . If the distance between users is below a threshold are also evaluated: standard score and minmax scaling.
the users are assumed to be the same, otherwise they are not. d) Results: Figure 3 shows the ROC curves for the evalu-
Then, the performance of each evaluated distance function d(·) ated distances for the WIFI - INI dataset restricted to Frequent-
is computed as the percentage of correctly classified users. Active users only (c.f. Section V). All of them perform re-
Varying the verification threshold we obtain the ROC curves markably well, considering the users’ activity profile described
shown in Figure 3. by the proposed features is enough to verify their identity
b) Subset division: The division of the dataset into sub- well above 50%-50% performance. This shows the capacity of
sets S1 and S2 is performed in two ways: i) sequential, where the proposed features for user discrimination. Moreover, the
S1 includes the samples of the first half of the considered time chi-squared distance outperforms the rest, reaching a 87.5%
period and S2 those of the second half, and ii) random, the true positive rate with a 12.5% false positive rate under the
temporal order is not taken into consideration and both sub- random subset division. Similar results were obtained under
sets include samples corresponding to interleaved timestamps. the sequential subset division. Although the classification
Comparing the results under these two division strategies sheds performance is slightly degraded, a high classification rate
Sporadic Frequent Frequent MED MED
Active Inactive Sporadic Inactive
10% Active 7% 30% 30%
14% 16%
HIGH+WK HIGH+WK
6% 5%
Ceibal Ceibal
37% Frequent Ceibal Ceibal
Active
BYOD 37% Frequent BYOD BYOD BYOD 15% HIGH
Active 17%
HIGH
Sporadic 39% 40% 47%
Inactive Sporadic 50%
Inactive LOW
LOW

(a) WIFI - INI data (Abr-May ’17) (b) WIFI - END data (Set-Nov ’17) (a) WIFI - INI data (Abr-May ’17) (b) WIFI - END data (Set-Nov ’17)

Fig. 4: Threshold-filter results for the beginning and the end Fig. 5: Clustering results for the beginning and the end of the
of the school year. school year.

is still achieved, suggesting users’ online behavior is fairly A. Wi-Fi Usage Profiles and Evolution During the Year
stationary. In order to study the evolution of the Wi-Fi usage profiles
during the year, two different questions were addressed:
B. Clustering Approach 1) Are the clusters found in the WIFI - INI and WIFI - END
Considering the discriminative capacity of the normalized datasets the same?
L2 distance has been validated, we propose to employ a sim- 2) Do users have the same usage profile throughout the
ple, widely used algorithm: k-means. After the feature scaling year?
normalization, k-means algorithm is applied with different k To answer the first question we need to assess whether the
values. Then, the elbow method is used to find the optimal k, feature space partition given by the selected clustering method
by analyzing the evolution of the variance explained. Finally, is the same for both datasets. First, we found that the optimal
the clusters stability is studied. For this purpose, 85% of the number of stable clusters for both datasets was four. Then,
samples are randomly chosen and the k-means algorithm is we aligned the clusters of both datasets by computing the
applied to partition this subset. The Jaccard index is then distances between the centroids and matching the closest
computed to assess the similarity between the original and ones. This way, we found that the defined cluster mapping
the subset partition. If the original clusters are stable, we corresponds to a bijective function. Moreover, the minimum
expect them to remain after randomly dropping 15% of the distances were small, meaning that the typical usage profiles
samples and therefore have a high Jaccard index with respect identified in both time periods are equivalent. Figure ?? shows
to the original clusters. This process is repeated 100 times and the four clusters found in the Frequent-Active subsets. The
clusters with an average Jaccard index above 0.9 are kept. cluster labels (LOW, MED, HIGH and HIGH + WK) are related
to the activity level for each usage profile, as detailed in the
V. DATA A NALYSIS AND R ESULTS characterization presented in Table I. The difference between
HIGH and HIGH + WK is mainly explained by the activity on
This section presents a summary of the analysis con- weekends. The results also show that the percentage of BYOD
ducted with the ESP data presented in Section III, using the increases with usage intensity.
methodology described in Section IV. The first step was the This result is quite relevant, since it implies that the partition
preprocessing, which corresponds to the data filtering based defined from the data at the beginning of the year has a high
on two thresholds: a minimum number of active days, defined level of coincidence with the partition generated from the
as 15% of the total number of days in each dataset; and a data at the end of the year. Thus, it is possible to model the
minimum aggregated traffic volume, fixed in 1MB for both usage profiles only once at the beginning of the school year.
datasets. This way, as depicted in Figure ??, four different Then, the resulting model could be used to monitor the users’
classes were defined, according to whether the users were behavior evolution for the rest of the year, which enables to
frequent or sporadic, and whether they were active or not. detect changes in user activity and give more personalized
It is worth noting that the percentages in each category were services according to their usage profile.
very similar for the data of both times of the year. The fact that the clusters found in both datasets are equiva-
From now on, we focus only on the Frequent-Active subset, lent does not imply that the same users fall in the same clusters
which has 288.413 different devices for the WIFI - INI dataset at both time periods, as each individual’s behavior can vary
and 359.319 for the WIFI - END dataset. In both cases the subset during the year. Thus, attending to the second question, we
corresponds to 37% of the total number of unique MACs focused on the common devices between the two datasets to
observed in the original datasets. Moreover, the resulting analyze the individual’s behavior changes. The intersection
sharing between devices delivered by Ceibal and BYOD after set has 142.688 devices, that is 49.7% of those included
the filter is also the same for both datasets: 32% and 68% in the WIFI - INI dataset. This is explained by the devices’
respectively. replacement carried out by Plan Ceibal during the year, as well
TABLE I: Main Wi-Fi usage profiles characterization. b) User sessions statistics: Three indicators were defined
WIFI - INI dataset (Apr-May ’17)
in order to analyze user sessions: total sessions per day, session
WIFI - END dataset (Sep-Nov ’17)

Avg. # Days (%) Acum. Down.a(%) Avg. # Days (%)


start time and duration. For each indicator, we consider the
Acum. Down.a(%)
average among each cluster sessions. The time granularity for
Cluster Sch. Days Wknds Morn. After. Even. Sch. Days Wknds Morn. After. Even.
the session duration is given by the sampling rate, which in
LOW 30 3.7 22 21 6 27 3.8 14 13 3
MED 61 6.6 42 38 12 56 6.3 this case
34 was
33 15 minutes.
8 Table II summarizes the results,
HIGH 84 5.8 100 86 22 77 5.8 where 100we can81 see 13 that the amount and duration of sessions
HIGH + WK 82 49 78 49 25 81 43 85 52 24
a
increase for higher activity profiles. This result, combined with
Normalized to 100% for the maximum value observed during the corresponding period.
the characterization shown in Table I, indicates that the most
frequent and with higher traffic volume users are also those
who connect to the network more often (have more sessions
per day) and spend more time online (have longer sessions).
Moreover, all clusters have an average start time close to noon,
which is the middle time between the two main schools shifts
(morning and afternoon). However, as the usage increases,
the average start time is earlier. This is consistent with the
results in Table I, where the bias towards the morning shift
is larger for the for the most active clusters. This fact is also
related to the school type, since secondary schools have more
Fig. 6: Cluster mobility during the school year. activity in the morning, and as we saw above, they have much
greater incidence in the clusters HIGH and HIGH + WK. On the
contrary, in primary schools, which have greater incidence
in the clusters LOW and MED, the activity is higher in the
as those users who renew their own devices. To study the users afternoon.
behavior evolution, we applied the k-means partition learned
from the WIFI - INI dataset to the WIFI - END dataset. Then, we
focused the analysis in the intersection set only, looking for Primary
the cluster mobility observed between the two time periods. H+WK 4.9% 95.1% Secondary
Figure 6 summarizes the results, where we highlight that,
for all the clusters, a large amount of users remain in the HIGH 8.8% 91.2%
same cluster, i.e., 67%, 44%, 46% and 55%, for LOW, MED,
HIGH and HIGH + WK respectively. However, considering all MED 20.9%
the devices and not just the intersection set, we have found 79.1%
that newer devices (i.e., those included only in WIFI - END but LOW 46.6%
not in WIFI - INI) represent a larger proportion in the most 53.4%
active clusters, while the number of devices in those clusters 0 20000 40000 60000 80000
remained almost the same (HIGH + WK fell 2.9%, while the Number of devices
union of HIGH + WK and HIGH fell 0.3%). This insight suggests
that renewing the devices promotes an increase in online Fig. 7: School type distribution.
activity, an effect that has already been observed in previous c) Most popular applications: Which are the most pop-
studies [?]. ular applications per cluster? In order to answer this question,
we use the APP - DATA dataset to compute two indicators:
B. School Type, Sessions Time and Relevant Applications the average number of days (as a percentage of the total
This section introduces an in-depth analysis of the identified days) that each application was used, and the percentage of
usage profiles, focused on other relevant aspects: school type users in each cluster that have used it. It is worth recalling
distribution, user sessions and most common applications us- that the schools considered in the APP - DATA dataset are a
age. In this context, a user session is defined as the time period subsample of those in the WIFI - INI dataset, which covers more
the user was connected to the Wi-Fi network uninterruptedly. than 30% of the users. Figure 8 summarizes the results for
Results are presented for the WIFI - INI dataset only, as they are the top 5 applications. We observe that Google, Facebook,
quite similar for the WIFI - END dataset. and WhatsApp are the most popular applications in terms of
a) School type: Figure 7 shows the clusters’ composition average days usage for all clusters, and the average days usage
according to the school type. As we can see, the trend is very is larger for higher activity profiles. Regarding the percentage
clear, showing that the vast majority of the most active users of users in the cluster that use the app, Google and Facebook
in the network correspond to secondary schools. Furthermore, are almost equally (highly) used among clusters, whereas
73% of the devices corresponding to primary schools are in Instagram or WhatsApp have much larger penetration for the
the cluster with less network activity (LOW). most active clusters (HIGH and HIGH + WK), which clearly
% Days % Users to improve the discrimination power. Another relevant aspect
LOW
MED
YouTube HIGH YouTube for the ESP will be to analyze the correlation between the
HIGH+WK
Wi-Fi usage profiles and the users’ activity in the educational
Instagram 75 WhatsApp Instagram 75 WhatsApp platforms.
50 50
25
E THICAL S TATEMENT
The datasets used in this study were de-identified and han-
dled according to the Uruguayan privacy protection legislation.
Facebook Google Facebook Google
R EFERENCES
Fig. 8: Popular applications usage for each cluster.
[1] S. Biswas, J. Bicket, E. Wong, R. Musaloiu-E, A. Bhartia, and D. Aguayo,
“Large-scale measurements of wireless network behavior,” in Proceedings
of ACM SIGCOMM 2015, pp. 153–165.
[2] A. Frömmgen, J. Heuschkel, P. Jahnke, F. Cuozzo, I. Schweizer, P. Eug-
relates to the larger incidence of secondary school students ster, M. Mühlhäuser, and A. Buchmann, “Crowdsourcing measurements
and BYOD in this clusters. of mobile network performance and mobility during a large scale event,”
in Proceedings of PAM 2016, pp. 70–82.
C. Digging Deeper in the Usage Profiles [3] K. Mirylenka, V. Christophides, T. Palpanas, I. Pefkianakis, and M. May,
“Characterizing Home Device Usage From Wireless Traffic Time Series,”
In order to deepen the analysis of each usage profile, the in Proceedings of EDBT 2016, Bordeaux, France, 2016.
same method based on k-means was applied on each of the first [4] E. M. R. Oliveira, A. C. Viana, K. Naveen, and C. Sarraute, “Mobile
level clusters. In this case, the number of sub-clusters found data traffic modeling: Revealing temporal facets,” Computer Networks,
vol. 112, no. Supplement C, pp. 176 – 193, 2017.
was 13, again the same for both datasets. For each cluster, 3 [5] Plan Ceibal, “One-to-one program implementation in Uruguay.” 2007-
sub-clusters were identified, with the exception of HIGH + WK, 2018. [Online]. Available: https://www.ceibal.edu.uy/en/institucional
for which there were 4. The novelty observed in these fine- [6] T. Cerquitelli, A. Servetti, and E. Masala, “Discovering users with similar
internet access performance through cluster analysis,” Expert. Syst. Appl.,
grained usage profiles was that most of them are associated vol. 64, no. Supplement C, pp. 536 – 548, 2016.
to the main school shifts (i.e. morning or afternoon), which [7] ntop, “High-Speed Web-based Traffic Analysis and
did not happen for any of the first level clusters. Moreover, Flow Collection,” accessed: 2018-01-26. [Online]. Available:
https://www.ntop.org/products/traffic-analysis/ntop/
comparing both periods the result was also different, since the [8] S.-H. Cha and S. N. Srihari, “On measuring the distance between
sub-clusters alignment was not perfect as previously. Some histograms,” Pattern Recognition, vol. 35, no. 6, pp. 1355–1370, 2002.
clusters were split in a different way, such as HIGH. For this [9] P. A. Marı́n-Reyes, J. Lorenzo-Navarro, and M. Castrillón-Santana,
“Comparative study of histogram distance measures for re-identification,”
cluster, in the WIFI - INI dataset 2 sub-clusters were identified arXiv preprint arXiv:1611.08134, 2016.
associated with the morning shift and 1 to the afternoon shift,
while for the WIFI - END dataset the cluster was divided into 2
sub-clusters corresponding to the afternoon shift and only 1 to
the morning shift. Considering all subclasses, 5 out of the 13
sub-clusters match between both time periods (i.e. centroids
are pretty close to each other). This means that still some
of those fine-grained usage profiles are also stable during the
year.
VI. C ONCLUSIONS
The present work tackles the analysis of Wi-Fi usage
profiles in a novel context: a national K-12 education service
provider. For this purpose, relevant features and distance mea-
sures were selected and validated, studying their discrimina-
tion power among different Wi-Fi connectivity usage patterns.
Then, the main usage profiles at schools were identified,
comparing the results for the beginning and the end of the
school year. The evolution showed a quite stable behavior
throughout the year. An in-depth analysis of the resulting
clusters was presented, with different insights about the users
activity in the network. The study shows how ESPs could
take advantage of network data in order to obtain valuable
information for evidence-based decision making. This is not
only from a network operator perspective, but also for learning
analytics purposes, which in the end is the final goal of an ESP.
For future work, different features could be explored, as well
as introducing weights to the different components, as a way

You might also like