You are on page 1of 6

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

Active Learning and Machine Teaching for Online


2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) | 978-1-6654-4337-1/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICMLA52953.2021.00197

Learning: A Study of Attention and Labelling Cost


Agnes Tegen∗ , Paul Davidsson, Jan A. Persson
Internet of Things and People Research Center
Malmö University, Malmö, Sweden
∗ agnes.tegen@mau.se

Abstract—Interactive Machine Learning (ML) has the poten- We present a framework for interactive online ML where
tial to lower the manual labelling effort needed, as well as increase the efforts of the human annotator are divided into providing
classification performance by incorporating a human-in-the- labels and attention. The framework highlights the parts of
loop component. However, the assumptions made regarding the
interactive behaviour of the human in experiments are often not the labelling process where the system can be proactive and
realistic. Active learning typically treats the human as a passive, the parts where the annotator can be proactive respectively.
but always correct, participant. Machine teaching provides a We also perform experiments using different interactive ML
more proactive role for the human, but generally assumes that strategies in an online cold start scenario, where we study the
the human is constantly monitoring the learning process. In this connections between the labelling efforts for the annotator and
paper, we present an interactive online framework and perform
experiments to compare active learning, machine teaching and the classification performance.
combined approaches. We study not only the classification
performance, but also the effort (to label samples) and attention
II. R ELATED WORK
(to monitor the ML system) required of the human. Results from While less common than pool-based [2] settings, there exists
experiments show that a combined approach generally performs a reasonable amount of work studying active learning in online
better with less effort compared to active learning and machine
teaching. With regards to attention, the best performing strategy
ML settings. For example, Krawczyk introduces an active
varied depending on the problem setup. ensemble learning algorithm for online activity recognition [6].
Index Terms—interactive learning, active learning, machine Žliobaitė et al. studies different active learning strategies abil-
teaching, online learning ity to handle concept drift [7] in data streams [8]. Mohamad
et al. proposes a stream-based active learning algorithm that
I. I NTRODUCTION addresses both concept drift and concept evolution [9].
Annotating data is in general a costly process. With data Except for certain applications and areas such as agents
streaming in a single-pass manner [1] even more issues arise, learning from humans through reinforcement learning [10],
as the order in which the data samples arrive cannot be [11] and adversarial attacks [12], [13], interactive online ML
controlled. Moreover, only the current data sample can be where the human is not well studied. Reinforcement learning
annotated, i.e. it is not possible to access samples from the and interactive online ML have similarities, but where the
past nor have knowledge of which samples will appear in the former aims to find the input that will maximize the reward
future. Applications with this type of data are for example real- function, the latter is tries to reconstruct the function that will
time activity recognition in healthcare and real-time security provide correct output, given input [14]. Machine teaching
supervision of systems. Interactive Machine Learning (ML), and adversarial learning both aim to maximally influence the
including active learning [2] and machine teaching [3], [4], learner through careful design of the training set [15]. In the
aims to lower the amount of labelled data needed in ML while former however, the teacher is benevolent while the equivalent
still retaining a high performance. However, the assumptions in the latter, the attacker, is malicious. Liu et al. presents an
made in interactive ML are often unrealistic. Generally, the iterative machine teaching framework and three teacher models
human annotator is considered a passive oracle that always for gradient learners [16]. Chen et al. studies the teaching
replies when queried in active learning, while in machine of version space learners in an interactive setting [17]. Amir
teaching it is assumed the annotator will constantly monitor et al. investigate when a reinforcement learning framework
the system and data, while the system itself is passive. It is can benefit from interactive teaching strategies [5]. Strategies
less common to consider systems and annotators that work where the teacher and learner work jointly are presented and
jointly [5], even though it is a reasonable assumption in many evaluated in experiments in a game setting. A related area is
application areas. Furthermore, the cost associated with the learning from demonstration, where the human teacher shows
labelling budget in interactive ML is often only measured in or directs the learner rather than programming it [18], [19].
number of labelled samples, while the cost of the workload of The workload needed from the annotator for monitoring
the annotator, e.g. for monitoring the system, is ignored. or being interrupted by queries, is typically not addressed in
previous work. If an annotator is performing a task, but is
This research was partially funded by the Knowledge Foundation. sometimes interrupted with queries from an active learning

978-1-6654-4337-1/21/$31.00 ©2021 IEEE 1215


DOI 10.1109/ICMLA52953.2021.00197

Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on April 06,2023 at 04:15:06 UTC from IEEE Xplore. Restrictions apply.
system, the workload for the annotator increases [20], even Algorithm 1 Interactive Online Learning Framework
if they do not respond. In the case of machine teaching, the
Input: data stream X, labelling budget B, query cost qc,
annotator needs to monitor the system and decide when and
labelling cost function flc , classifier Ψ, active learning strat-
where to put attention and effort [21]. The cost of interruptions
egy strategyAL (parameters), machine teaching strategy
and monitoring, as well as the cost of annotation, are generally
strategyM T (parameters)
all represented by the number of labelled samples. This is
not always representative however, as the interactive process initialize labelling expenses b̂
between the annotator and the system is complex [22], [23], repeat
meaning that the modelling of the workload might be more x ← read next data sample from X
complex than typically assumed, as well as dependent on the ŷ ← Ψ(x)
application area [24], [25]. query ← f alse
estLc ← estimate annotation cost
III. I NTERACTIVE O NLINE L EARNING F RAMEWORK if (b̂ + qc + estLc) < B then
We present a framework (algorithm 1) that refines the query ← strategyAL (parameters)
concept of labelling budget in interactive online learning by end if
assigning a cost to the attention demanded from the annotator
through interruptions and monitoring as well as the cost of if query then
providing labels. The framework provides an environment query the user to provide label y for x
were different setups of query and labelling costs can be update labelling expenses b̂ with qc
explored. This is relevant because of the varying costs for provideslabel ← f alse
different applications and scenarios. The active learning com- calculate annotation cost, lc ← flc (x)
ponent, highlighted with pink, contains the steps the system if (b̂ + lc) < B then
takes before involving the annotator and the machine teaching provideslabel ← strategyM T (parameters)
component, highlighted with blue, contains the steps where end if
the annotator is included.
Fig.1 contains a flowchart of the process, where the con- if providesLabel then
nection between environment, annotator and learning system user provides label y for x
is displayed. The environment is the context of the problem update classifier Ψ with (x, y)
that is studied, where the problem is to continuously produce end if
estimations of the current status of the environment. For end if
instance, if the problem at hand is to do activity recognition update labelling expenses b̂
in real-time via a setup of sensors, the environment contains until end of data stream X
the information regarding the current status of the environment
available to the human (via observations) and to the system
(via collecting data with the sensors). When a new data sample
total cost for a label, including both query and labelling, is
reaches the learning system, the machine learning algorithm
normalized to 1. Since the labelling cost may vary and the
produces an estimation of the status. The estimated status
cost for the current data instance is unknown by the system at
is provided to the active learning component, which decides
this step, an estimation, estLc, is used. If the current labelling
whether the human should be queried for the true label. It
expenses b̂ together with the query cost qc and the estimated
is also displayed to the human, who can use it to check the
labelling cost estLc is lower than the labelling budget B,
machine teaching component. In online learning there is no
it is estimated that a label can be afforded. In the second
control of the order of data samples and a decision has to
step, the criteria of the given active learning strategy is tested
be made in the moment of whether the system should query.
for the data instance. In uncertainty sampling for instance, a
Furthermore, it is not known how much data will be streamed
measurement of uncertainty is calculated for the current data
and, in theory, it could be infinite. To tackle these issues and
instance. If the classifier is considered uncertain with regards
keep track of how much labelling budget currently is used up,
to its own estimation the active learning criteria is fulfilled
a sliding window containing the labelling status of the latest
and the system will query for a label. At step three the system
data samples, denoted the labelling expenses b̂, is used.
has taken the decision to query the annotator. This will cost
Four steps need to be fulfilled before a label is provided,
regardless of whether a label is provided and the labelling
where the first two belong to the active learning component
expenses b̂ is therefore updated with the query cost qc. The
and the final two belong to the machine teaching component.
actual cost of providing a label can now be calculated. The
The first step is to estimate whether a new label can be
updated labelling expenses b̂ together with the true labelling
afforded. The cost for the system to demand attention from
cost lc are compared to the labelling budget B and if labelling
the annotator, either through interruption or monitoring, is
can be afforded, the next step is pursued. In the fourth and
represented by the query cost qc. Apart from this cost, there is
final step the criteria defined by the given machine teaching
also the cost for the annotator to provide the label. The average
strategy is tested. This is the strategy the annotator employs

1216

Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on April 06,2023 at 04:15:06 UTC from IEEE Xplore. Restrictions apply.
TABLE I: A summary of the datasets used in the experiments.

Name Features Classes Samples Runs


mHealth 23 11 343195 10
HAR 24 7 14301 100

[28], contains recorded data streams and are summarized in


table I. Certain steps of preprocessing were done before they
were used in the experiments. The mHealth dataset contains
10 separate recordings of subjects performing exercises in
a sequence, one activity after the other. Since the data is
streamed in a single-pass manner and in a cold start scenario,
the results of learning from the raw data would never give
the algorithm a chance to evaluate properly. Instead, each
Fig. 1: Flowchart illustrating the process of the interactive activity in a recording is divided into 10 intervals, where the
online learning framework. order of the samples is kept. The order among the activities is
also kept so that the first segment of the rearranged recording
contains one interval from each activity and is then repeated
to decide whether to provide a label or not. If all steps are
for each interval. The HAR dataset contains recordings from
passed a label will be provided for the given data sample and
9 subjects of an accelerometer and a gyroscope from phones.
b̂ is updated with the labelling cost lc.
The raw data was sampled at 1 Hz and the mean value and
IV. E XPERIMENTS variance was calculated. For each run, the recordings from
the subjects were concatenated one after the other to create a
In this section the framework is used to illustrate examples
longer sequence. To evaluate the experiments a testset of 5%
of the trade-off between attention and labelling effort from the
was extracted from the data before training started. Every time
annotator as well as classification performance for different in-
a new annotation was added the updated model was run on
teractive learning strategies. Further details on the experiments
the testset and the macro-averaged F1-score calculated. Naı̈ve
can be found in the source code1 .
Bayes classifier was chosen for machine learning algorithm as
A. Experimental setup it needs a relatively low number of samples to be used and
Two different levels of query cost and labelling cost were has low computational complexity (i.e. suitable for cold start
used in the experiments to represent two different kind of setting and real-time data streams) [6].
cases. In the first, the query cost qc = 0.1 and the labelling Combinations of two different types of active learning
cost lc = 0.9, i.e. the cost is relatively low for interrupting the strategies and one machine teaching strategy were included in
annotator, but if a label is provided a relatively larger effort the experiments. The active learning strategies were random
is needed. Examples of this can be a setting where context sampling, random and uncertainty sampling entropy. Also
switching from other activities is comparably low-cost for the we considered the situation with no active learning strategy,
annotator, while the labelling process is time consuming and where the system is not proactive in choosing which samples
thereby costly. In the second scenario qc = 0.9 and lc = 0.1, to label. Instead the annotator monitors the system to decide.
i.e. the cost for interrupting the annotator is relatively high For uncertainty sampling, entropy was chosen as it is a
compared to the cost of labelling. An example of this can be if popular strategy [2]. The machine teaching strategy used was
the annotator is occupied with another task and the interruption error and is a corrective approach where the annotator will
of a query increases the workload for the annotator. When provide a label only if the prediction from the system is
they already have been interrupted however, the extra work of incorrect [29]. Similar to active learning, the situation with
providing a label is relatively low. The relation between qc and no machine teaching strategy were included, i.e. a label will
lc is application dependent and should be carefully determined be provided whenever there is a query. Five of the resulting six
for the application at hand. combinations of these strategies were used in the experiments,
In the experiments we employ a cold start setting, which as the combination of no active learning strategy and no
means the machine learning algorithm is not trained on any machine teaching strategy was excluded. A version of pool-
data before the first sample in the data stream arrives. This based learning was included as baseline for the experiments,
gives insight to how different strategies differ when data where performance given number of annotations was studied.
is scarce and what the learning curve looks like. The two As the data samples not are presented in a specific order in
datasets used in the experiments, mHealth [26], [27] and HAR pool-based learning, the given number of data instances were
randomly sampled from the dataset, excluding the testset. For
1 https://github.com/ategen/iml-attention-framework
each run, 1000 runs were done for the pool-based learning the

1217

Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on April 06,2023 at 04:15:06 UTC from IEEE Xplore. Restrictions apply.
(a) F1-score vs. annotations, (b) F1-score vs. annotations,
qc = 0.1, lc = 0.9 qc = 0.9, lc = 0.1

(c) F1-score vs. attention, (d) F1-score vs. attention,


qc = 0.1, lc = 0.9 qc = 0.9, lc = 0.1
Fig. 2: Results from the experiments on the HAR dataset.

average value was calculated over these and all runs in the For the mHealth dataset the machine teaching strategy have
experiments to produce Baseline. The labelling budget was almost as good results as the combined approaches. For the
set to 5% based on earlier experiments [29]. Experiments were second row, comparing performance to amount of attention,
performed on additional datasets, machine learning algorithms the results vary depending on the cost setup and the dataset.
and interactive learning strategies, with similar results. For the HAR dataset and qc = 0.1, the combined strategy
AL + M T : Random + Error gives the best result, while
B. Experimental results the other combined strategy AL + M T : Entropy + Error
The results from the experiments are presented in Fig. 2 and is similar to the active learning strategies. In the case of a
Fig. 3. The first row in each figure displays the F1-score over higher query cost, qc = 0.9, AL + M T : Random + Error is
the number of annotations the machine learning algorithm has performing at the same level as the active learning strategies
been trained on. In the second row, the F1-score over amount and AL + M T : Entropy + Error is slightly worse at the
of attention from the annotator is displayed. When looking at start. For the mHealth dataset, an increase in query cost results
the performance compared to the number of annotations the in a decrease in performance for the combined approaches.
strategies that combine active learning and machine teaching These figures also display that the machine teaching strategy
have the highest values. They are almost on the same level as requires a substantially larger amount of attention to reach the
or even better than the results from the pool-based learning. same level of performance as the other strategies. Specifically,

1218

Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on April 06,2023 at 04:15:06 UTC from IEEE Xplore. Restrictions apply.
(a) F1-score vs. annotations, (b) F1-score vs. annotations,
qc = 0.1, lc = 0.9 qc = 0.9, lc = 0.1

(c) F1-score vs. attention, (d) F1-score vs. attention,


qc = 0.1, lc = 0.9 qc = 0.9, lc = 0.1
Fig. 3: Results from the experiments on the mHealth dataset.

M T : Error is sometimes omitted from the figure, as it is on of the machine teaching strategy might be good considering
a completely different scale compared to the others. the number of annotations, when examining the amount of
attention needed to obtain those annotations, the value is high.
V. D ISCUSSION When employing active learning strategies without machine
The experiments show that combined strategies have a teaching, the user does not have to be attentive as much,
higher performance with regards to the number of annotations, which means the attention metric is lower, but it might also
especially compared to the active learning strategies. The com- result in worse performance. When combining active learning
bined strategies do however have a more similar performance and machine teaching, the user only has to pay attention
level to the active learning strategies with regards to the when queried by the system (less attention needed) but can
amount of attention. These results show that giving the annota- be more selective with the data instances they label (better
tor a more proactive role can be beneficial, but also depends on performance metric). The results also show how the nature of
the setting. The amount of annotation becomes very high for the dataset can affect the learning process and performance.
the machine teaching strategy where the annotator constantly There will typically be a learning curve, since there is a
monitors the system as expected, but the high amount of cold start setting and the order of the data stream cannot be
attention provided does not lead to increased performance in optimized. This type of streaming data often has the same
the quality of annotations collected. While the performance class for a period of time, e.g., in activity recognition the

1219

Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on April 06,2023 at 04:15:06 UTC from IEEE Xplore. Restrictions apply.
type of activity does not typically change for every new [7] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A
sample. Furthermore, data collected during an interval of time survey on concept drift adaptation,” ACM computing surveys, vol. 46,
no. 4, pp. 1–37, 2014.
might not represent the entire range of data for the class, as [8] I. Žliobaitė, A. Bifet, B. Pfahringer, and G. Holmes, “Active learning
e.g. with concept drift. Another conclusion from the results with drifting streaming data,” IEEE transactions on neural networks and
is that the active learning strategy AL : Entropy perform learning systems, vol. 25, no. 1, pp. 27–39, 2013.
[9] S. Mohamad, M. Sayed-Mouchaweh, and A. Bouchachia, “Active learn-
on the same level or worse compared to random sampling ing for classifying data streams with unknown number of classes,”
AL : Random. Other well-known active learning strategies Neural Networks, vol. 98, pp. 1–15, 2018.
tested yielded similar results. While there might exist strategies [10] D. Hadfield-Menell, S. J. Russell, P. Abbeel, and A. Dragan, “Cooper-
ative inverse reinforcement learning,” in NIPS, 2016, pp. 3909–3917.
that would give higher performance, the focus of this work is [11] Y.-S. Chuang, X. Zhang, Y. Ma, M. K. Ho, J. L. Austerweil, and X. Zhu,
to study the connections between attention, labelling effort and “Using machine teaching to investigate human assumptions when teach-
performance for different types of interactive learning. ing reinforcement learners,” arXiv preprint arXiv:2009.02476, 2020.
[12] M. Abramson, “Toward adversarial online learning and the science of
If a lot of attention is given from the annotator but still only deceptive machines.” in AAAI Fall Symposia, 2015, pp. 2–5.
few annotations are collected, would it not be better to use all [13] A. Rakhsha, G. Radanovic, R. Devidze, X. Zhu, and A. Singla, “Policy
that attention to collect samples, similar to how the active teaching via environment poisoning: Training-time adversarial attacks
against reinforcement learning,” in ICML, vol. 119. PMLR, 2020, pp.
learning strategies are modelled in the experiments? It might 7974–7984.
be better and result in a higher performance depending on the [14] S. C. Hoi, D. Sahoo, J. Lu, and P. Zhao, “Online learning: A compre-
scenario, however the experiments show that in some cases it is hensive survey,” Neurocomputing, vol. 459, pp. 249–289, 2018.
[15] S. Mei and X. Zhu, “Using machine teaching to identify optimal
better to be selective regarding which labels are collected, even training-set attacks on machine learners.” in AAAI, 2015, pp. 2871–2877.
at the cost of attention that most of the time does not result [16] W. Liu, B. Dai, A. Humayun, C. Tay, C. Yu, L. B. Smith, J. M. Rehg, and
in more labels. Another important aspect is when we want to L. Song, “Iterative machine teaching,” in ICML, 2017, pp. 2149–2158.
[17] Y. Chen, A. Singla, O. Mac Aodha, P. Perona, and Y. Yue, “Understand-
optimize performance. If it is important that the performance ing the role of adaptivity in machine teaching: The case of version space
is high as soon as possible one strategy might be chosen, while learners,” in NIPS, 2018, pp. 1476–1486.
if there can be a period of time before the performance should [18] A. Sena and M. Howard, “Quantifying teaching behavior in robot
learning from demonstration,” The International Journal of Robotics
be optimized and minimizing the labelling effort is of interest Research, vol. 39, no. 1, pp. 54–72, 2020.
another strategy might be preferred. Apart from the fact that [19] M. Cakmak and A. L. Thomaz, “Active learning with mixed query types
it might not always be preferred to maximise the number of in learning from demonstration,” in Workshop on New Developments in
Imitation Learning, 2011.
annotations obtained for a given amount of attention, it might [20] P. D. Adamczyk, S. T. Iqbal, and B. P. Bailey, “A method, system,
not be possible in a real-world scenario, if the assumptions on and tools for intelligent interruption management,” in 4th international
the annotator are not realistic. workshop on task models and diagrams, 2005, pp. 123–126.
[21] C. D. Wickens, J. Goh, J. Helleberg, W. J. Horrey, and D. A. Talleur,
VI. C ONCLUSIONS “Attentional models of multitask pilot performance using advanced
display technology,” Human factors, vol. 45, no. 3, pp. 360–380, 2003.
We presented an interactive online framework that expands [22] R. Loftin, B. Peng, J. MacGlashan, M. L. Littman, M. E. Taylor,
on the conventional labelling cost typically used in interactive J. Huang, and D. L. Roberts, “Learning something from nothing: Lever-
aging implicit human feedback strategies,” in 23rd IEEE International
learning with regards to the workload for the human annotator. Symposium on Robot and Human Interactive Communication, 2014, pp.
The framework separates the query cost (the cost of interrupt- 607–612.
ing with a query or monitoring) and the labelling cost (the [23] J. MacGlashan, M. K. Ho, R. Loftin, B. Peng, G. Wang, D. L. Roberts,
M. E. Taylor, and M. L. Littman, “Interactive learning from policy-
cost of providing a label). Experiments using the framework dependent human feedback,” in International Conference on Machine
are carried out were the trade-off between attention from Learning, 2017, pp. 2285–2294.
the annotator, labelling effort and classification performance [24] B. Settles, M. Craven, and L. Friedland, “Active learning with real
annotation costs,” in NIPS workshop on cost-sensitive learning, vol. 1,
is studied. While the performance varies depending on the 2008.
application, the results indicate that a combined approach of [25] S. Sivaraman and M. M. Trivedi, “Active learning for on-road vehicle
active learning and machine teaching gives better performance detection: A comparative study,” Machine vision and applications,
vol. 25, no. 3, pp. 599–611, 2014.
with regards to both number of annotations and amount of [26] O. Banos, R. Garcia, J. A. Holgado-Terriza, M. Damas, H. Pomares,
attention. I. Rojas, A. Saez, and C. Villalonga, “mhealthdroid: a novel framework
for agile development of mobile health applications,” in International
R EFERENCES workshop on ambient assisted living. Springer, 2014, pp. 91–98.
[1] E. Lughofer, “On-line active learning: a new paradigm to improve [27] O. Banos, C. Villalonga, R. Garcia, A. Saez, M. Damas, J. A. Holgado-
practical useability of data stream modeling methods,” Information Terriza, S. Lee, H. Pomares, and I. Rojas, “Design, implementation and
Sciences, vol. 415, pp. 356–376, 2017. validation of a novel open framework for agile development of mobile
[2] B. Settles, “Active learning literature survey,” University of Wisconsin- health applications,” Biomedical engineering online, vol. 14, no. 2, p. S6,
Madison Department of Computer Sciences, Tech. Rep. 1648, 2009. 2015.
[3] X. Zhu, “Machine teaching: An inverse problem to machine learning [28] A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow, M. B. Kjærgaard,
and an approach toward optimal education,” in AAAI, 2015. A. Dey, T. Sonne, and M. M. Jensen, “Smart devices are different:
[4] X. Zhu, A. Singla, S. Zilles, and A. N. Rafferty, “An overview of Assessing and mitigatingmobile sensing heterogeneities for activity
machine teaching,” arXiv preprint arXiv:1801.05927, 2018. recognition,” in 13th ACM conference on embedded networked sensor
[5] O. Amir, E. Kamar, A. Kolobov, and B. J. Grosz, “Interactive teaching systems, 2015, pp. 127–140.
strategies for agent training,” in IJCAI, 2016. [29] A. Tegen, P. Davidsson, and J. A. Persson, “A taxonomy of interactive
[6] B. Krawczyk, “Active and adaptive ensemble learning for online activity online machine learning strategies,” in Joint European Conference on
recognition from data streams,” Knowledge-Based Systems, vol. 138, pp. Machine Learning and Knowledge Discovery in Databases. Springer,
69–78, 2017. 2020, pp. 137–153.

1220

Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on April 06,2023 at 04:15:06 UTC from IEEE Xplore. Restrictions apply.

You might also like