You are on page 1of 1

Unveiling Network Usage Profiles of a Nationwide

Education Service Provider


Germán Capdehourat, Cecilia Aguerrebere, Federica Bascans, Germán Álvarez and Pedro Porteiro
{gcapdehourat, caguerrebere, fbascans, galvarez, pporteiro}@ceibal.edu.uy
Plan Ceibal, Avda. Italia 6201, Edificio Los Ceibos, 11500, Montevideo, Uruguay.

Abstract—The present study addresses network’s usage profil- ending up in a multi-stage clustering method to determine
ing in a novel educational context: an Education Service Provider the main user profiles. In addition, the evolution of the users’
(ESP), which, among other tasks, provides Internet access at behavior throughout the school year was studied by comparing
schools. For this purpose, a multi-stage clustering method is
proposed, and applied to data from the ESP in charge of the the clustering results obtained at the beginning and at the
nationwide one-to-one K-12 computing program in Uruguay. The end of the year. The results indicate that the main four user
selected features and distance measures are validated, studying profiles identified were the same at both time periods. This is a
its discrimination power. The resulting profiles are analyzed, remarkable result, as it enables the user profile characterization
including their temporal evolution, showing a stable behavior from data collected at the beginning of the school year and
throughout the school year. The applicability of these results is
twofold for an ESP evidence-based decision making: on the one then tracking the individuals’ evolution during the rest of the
hand from a network operator perspective, and on the other hand year. The insights found are useful from a network operator
providing vital information for learning analytics purposes. perspective, but also provide very valuable information for the
Index Terms—network traffic, education, clustering, usage ESP management and learning analytics purposes.
profiles.

I. I NTRODUCTION II. P REVIOUS W ORK


Traditionally, traffic analysis and demand characterization
have been mostly focused on network planning, where the Typical works found in the literature perform a descriptive
main goal is to guarantee the minimum resources needed analysis, providing a deep view of the network operation
to provide a good service [1]. Additionally, knowing the through data exploration and several empirical distributions
typical traffic dynamics enables the detection of anomalies, from diverse measurements [5]. However, there are some
which may alert about network failures or traffic attacks [2]. works particularly focused on finding user behavior patterns.
The increasing data availability, jointly with recent advances For example, Cerquitelli et al. [6] looks for users with a similar
in machine learning and big data analysis, have laid the Internet access performance, while Mirylenka et al. [7] similar
groundwork for even more sophisticated studies (e.g. urban temporal activity patterns, both with using ISP residential
mobility analysis [3]). clients data. The two works are at opposite ends concerning
These studies are typically carried out with Internet service the selected features, from pure traffic histogram in the first
providers (ISPs) or mobile network operators (MNOs) data. case (where time does not matter at all to describe each user),
Although many articles refer to educational settings such as to pure time series in the second case (where users must have
university campuses, studies in K-121 scenarios are very rare. similar activity at the same time to be grouped together).
In this paper, we analyze traffic data from Plan Ceibal [4], An intermediate solution is preferred by Mucelli et al. [8], to
a major K-12 education service provider2 (ESP) which leads analyze a large dataset from a major MNO. A first clustering
a nationwide one-to-one computing program in Uruguay. It stage is based on traffic volume features, finding three profiles:
provides technological support to the national K-12 education light, medium and heavy users. Then, a second clustering is
system, including Wi-Fi connectivity and videoconference applied, now based on the session frequency features, resulting
infrastructure for all public schools, as well as access to educa- in two subcategories: occasional and frequent users. We follow
tional platforms (e.g. a digital library, a learning management a similar approach, combining both temporal patterns and
system and an intelligent tutoring system for math). traffic volumes to represent users’ behavior. However, instead
To the best of our knowledge, this work stands out from of making a separate sequential clustering, we integrate them
the previous as the first of its kind in a K-12 scenario. We both in a single feature vector. This way, we do not impose a
present a general methodology to identify the different network fixed structure of traffic level clusters with session frequency
usage profiles, based on the individuals’ traffic dynamics. For sub-clusters, but explore instead the data structure provided
this purpose, specific features were defined and validated, by the combined traffic-frequency features. Furthermore, the
1 Term features selected are computed for different times of the day,
coined to refer to primary and secondary education.
2 Educationservice provider (ESP): Organization which helps the education attending to the particular education context where different
system to implement comprehensive reforms. shifts exist, which significantly affect the users’ behavior.

You might also like