Professional Documents
Culture Documents
“boost” the learning ability of humans? This question the members of the crowd and this can at times rival
might seem ill-posed, but precisely such a question was the performance of an expert. A key contrast of our
posed in the context of computational learning theory work with human computation and crowdsourcing, is
by Kearns & Valiant (1989), where they asked whether that the goal in human boosting is to use humans not
a “weak learner”, specifically a learning algorithm as fixed classifiers and subroutines, but as changing
which can learn a classifier that is only slightly better and learning algorithms. Moreover, as we detail in the
than random guessing, could be “boosted” into an ar- main section of the paper, the protocol of using hu-
bitrarily accurate strong learning algorithm?. Interest- mans in our human boosting method, is complicated
ingly, Schapire (1990) was able to answer this question and even asynchronous, unlike typical tasks in crowd-
in the affirmative. This technique, called boosting, sourcing. Nonetheless, the work here is very related
has been the subject of considerable research over the to these frameworks. As such, our research raises the
last two decades, and its performance is now reason- question of whether a richer use of humans (e.g. as
ably well-understood (Friedman et al., 2000; Collins learning algorithms) would allow them to be embed-
et al., 2002; Meir & Rätsch, 2003). Boosting constructs ded in non-trivial ways within other larger machine
an ensemble of learners by providing a weak learning learning frameworks? Our work can also be positioned
algorithm with iteratively modified training datasets, as the converse of active learning. In active learning,
putting increasingly greater weights on the examples eg.(Settles, 2010), the human acts as an oracle and
deemed more difficult as iterations proceed. We review the algorithm asks the human to label what it believes
boosting in Section 2. are the “most informative” examples: a machine per-
forms classification and is guided by a human. In our
In this paper, we investigate the possibility of using hu-
work, the human performs the classification task and
mans as weak learners and boosting them to strongly
the machine guides humans.
learn complex concepts better than any single human.
Our principal contributions are as follows. (a) The Applications: The question of whether human learn-
standard boosting algorithms are not amenable to hu- ers can be boosted is certainly interesting from a cog-
man learners (weighted examples may not be appropri- nitive science point of view. But as we detail, it has a
ately interpreted by humans for instance). We describe number of practical applications as well.
the modifications that need to be made, along with
The first is the use of humans as “supervised” feature
theoretical justifications, that extend the state of the
extractors. For many natural tasks such as those in
art in the analysis of boosting. (b) We design suitable
vision and text, humans have access to evolutionar-
experiments for boosting on humans: we consider two
ily evolved implicit feature spaces, which we find diffi-
synthetic tasks, which we call the crosshair and gabor
cult to even capture with a computer. If each human
tasks, and one real world task, detecting fake reviews
learner is asked to learn a “decision stump” involv-
in the Opinion Spam dataset of (Ott et al., 2011). (c)
ing a single or a few features of their choosing within
We run category learning experiments on the crowd-
our boosting framework, then we can use these to ex-
sourcing platform, Amazon’s Mechanical Turk. Our
tract task-driven implicit human features. It is useful
experimental results show that boosting can overcome
to compare this to manifold learning. For instance,
some intrinsic limitations of human learners and pro-
in the high dimensional space of images represented
duce gains in accuracy.
as a vector of their pixels, images of a given face lie
Related work: Our work is in the vein of interesting on a manifold, and face recognition essentially requires
recent work by (Zhu et al., 2011) which has focused a learner to identify this manifold. However, for any
on combining human classifiers using co-training. It manifold, there is an appropriate mapping into lower-
will be useful to position our work against the frame- dimensional Euclidean spaces where classification be-
works of Human Computation and Crowdsourcing. comes easy. Manifold learning techniques aim to ex-
Human Computation (Quinn & Bederson, 2011) has tract these embedding using large amounts of data.
been termed “a paradigm for utilizing human process- Extracting implicit features from humans could po-
ing power to solve problems that computers cannot yet tentially allow us to achieve similar performance using
solve”. Thus, it uses humans as “subroutines” for solv- much smaller amounts of training data.
ing sub-tasks, particularly those that computers find
A human boosting framework can also be used to boot-
difficult. Crowdsourcing on the other hand, has been
strap labeled data for difficult learning problems. Us-
termed “the outsourcing of a job typically performed
ing a small number of labeled examples, we could have
by an expert to a crowd that typically have lower-
humans “strongly learn” complex concepts and cre-
expertise but are large in number”. The key idea in
ate larger datasets for consumption by machine algo-
crowdsourcing is to suitably aggregate the outputs of
Human Boosting
rithms. Indeed, ground truth labels for typical ma- it can be shown that the output classifier H(x) is a
chine learning datasets are either provided by or val- “strong” classifier that might well have been learnt by
idated by humans. Thus, the problems selected are a “strong” learning algorithm.
typically those in which humans excel. Boosting hu-
We first describe the popular boosting algorithm of
man learners allows us to solve problems which even
Adaboost (Schapire, 2003) as it is closely related to
humans find difficult, and expands the frontier of ma-
our proposed algorithm. In each round, Adaboost
chine learning research. These methods could be used
maintains a weighted distribution over the training ex-
to replace a domain expert or a complexly engineered
amples, with weights corresponding to the difficulty
solution with a number of non-expert humans. While
of that example. Initially, all examples are weighted
it is true that it is very expensive to use humans, the
equally. Each iteration of Adaboost has three steps:
success of crowdsourcing has shown that for several
(a) train the weak-learning algorithm on this weighted
problems, it is cheaper on the whole to use humans
data distribution, (b) compute its classification error
than to employ engineering solutions.
given the new data distribution (c) multiplicatively in-
crease the weights on the “difficult” examples. This
2. Boosting Human Learners ensures that the weak learning algorithm tries to fit
those points more accurately in the next iteration. In-
We start the section by discussing the characteristics of
deed, over the past decade, Adaboost has been ana-
humans as “learners,” in the context of machine learn-
lyzed both experimentally (Dietterich, 2000) and the-
ing. We then discuss boosting, and Adaboost, and
oretically (Friedman et al., 2000; Collins et al., 2002).
why modifications to the Adaboost are essential to be
amenable to the characteristics of human learners. We Certain properties of Adaboost make it amenable for
then detail our “HumanBoost” algorithm. Finally, we use with human learners. First, it is a wrapper proce-
provide a convergence analysis for an idealized version dure, i.e. it does not need to know the inner workings
of our HumanBoost algorithm. of the weak learner; this makes it conducive for use
with “black box” human learners. Second, it does not
Human Learners: Humans as learners differ from
directly operate on the training data (xi , yi ) (where xi
machine learning algorithms in several ways. First, hu-
is the input stimulus and yi is the label), so xi could
man learners are a black box, in the sense that we do
be complex stimuli such as text or video. Third, there
not have access to their inner workings. Second, ma-
is variability among humans learners, they are not re-
chine learners use high dimensional feature vectors as
liable and their output is noisy. Adaboost addresses
input, which humans cannot interpret or visualize. On
this by adaptively reweighting the bad human learners.
the other hand, humans do understand images, video,
audio and text, and to an extent that even state of Certain other parts of Adaboost need to be modified.
the art machine learning algorithms cannot capture. Adaboost reweights training examples. Humans may
In this paper in particular, we will represent input not interpret these weights correctly or consistently,
vectors via intuitive visual stimuli for use with hu- and simply presenting the numerical weights would be
man learners. Third, machine learning algorithms take unreliable. The theoretical guarantees of Adaboost fail
large numbers of training examples as input. Human in such a case. One potential solution is to resample
learners can only be given a small number of train- training examples, drawn from a multinomial distribu-
ing examples. Finally, it is not clear whether humans tion corresponding to Adaboost weights. The caveat
can handle classification noise and how they would re- with this is that it will lead to examples with large
spond to differing distributions in the data. There is weights being repeated, and humans may not inter-
some evidence (Fried & Holyoak, 1984) that humans pret repetitions correctly. For instance, if we consider
can learn distributions and can learn both discrimina- stimuli such as text reviews (as in section 3.2.3), it will
tive and generative models (Hsu et al., 2010). be obvious to the learner if a review was repeated, and
the learned results would be unreliable.
Boosting and Adaboost: The framework of boost-
ing is now understood(Friedman et al., 2000) to Proposed algorithm: Our proposed boosting algo-
greedily
P fit an additive model of the form H(x) = rithm, called HumanBoost, is detailed in Algorithm 1.
α
t t th (x) to a classification-surrogate loss such as Recall that any boosting algorithm has the following
the exponential function; and where the functions three steps in each iteration: (a) train a weak-learner
{ht (·)} are the classifiers learnt by a “weak” learn- on the new data distribution, (b) compute the classi-
ing algorithm, and {αt } are the weights assigned to fication error given the new data distribution, and (c)
these classifiers. With a careful selection of weights create a new data distribution in light of the difficult
and modified training sets for the intermediate steps, examples in the previous round. We now discuss how
Human Boosting
Algorithm 1 HumanBoost the learner has high accuracy on D −S, we can assume
Input: MTurk oracle O, D = {(xi , yi )}N that it is sufficiently accurate. A key algorithmic de-
i=1 , δt ,, mt
Set weights wi1 = 1/N , i = 1, . . . N sign aspect of HumanBoost as pertaining to step (b)
for t = 1, . . . , T do is the classification-surrogate loss used to compute the
Sample training set S using Filter(mt , wt , δt ). classification error, and the weights. While Adaboost
Use O to obtain weak human classifier ht for S uses the exponential loss, its weights can grow arbi-
trarily large which reduces its tolerance to noise. One
Label the test P set T = D − S using Pht . robust approach as in Madaboost(Domingo & Watan-
Edge γt = i∈T 1[yi = h(xi )]wit / i∈T wit − 0.5.
Set weight αt ← 0.5 log(0.5 + γt )/(0.5 − γt ) abe, 2000), is to truncate the weights explicitly, to a
Update Ht (x) ← Ht−1 (x) + ht (x). maximum of 1. We use the logistic function as in Log-
Update wit+1 ← 1/(1 + exp(yi Ht (xi ))). itBoost(Friedman et al., 2000). Step (c) of creating the
end for new data distribution is similar to that of Adaboost,
P
Return strong learner HT (x) = sign( t αt ht (x)). given the classification error in Step (b).
function Filter(mt , wt , δt ) Remark: Algorithm 1 extends Discrete Adaboost us-
r =#calls to filter; δ 0 = δt /r(r + 1); imax = 2δt /. ing a logistic weight function and rejection sampling
Permute (D, w); Set S = φ; i = 1. without replacement. Note that we can also extend
while |S| ≤ mt do “Real Adaboost” in a similar fashion. However, Real
(x, y) ← D(i); i ← i + 1. Adaboost requires classifiers to output probability es-
With probability wit : add (x, y) to S. timates. Experiments have shown that humans are not
If no update to S in past imax iters, hypothesis good at estimating probabilities(Cosmides & Tooby,
Ht is sufficiently accurate; exit program. 1996). Teaching humans to output confidence values
end while or calibrating their output for Real Adaboost is an
Return S. interesting direction for future research. Another ex-
end function tension of Adaboost potentially amenable to humans
is the boosting by relabeling, also known as Agnostic
Boosting(Ben-David et al., 2001). In agnostic boost-
ing, easy examples are randomly relabeled so that the
the HumanBoost algorithm performs these three key weak learner can focus on more difficult examples.
steps. HumanBoost maintains the new distribution via This effectively adds label noise to the dataset pro-
weights over the data points, just as in Adaboost. But portional to the weight in the distribution. Whether
for training a human weak-learner, we do not want to humans will interpret this noise correctly is an open
provide either weights over the samples, or a resampled question, which we will investigate in future work.
data set with potentially repeated samples. Instead,
we perform sampling without replacement, with prob- Sampling: We can show that our sampling procedure
ability proportional to the normalized weights. Such is efficient, following the lines of (Bradley & Schapire,
subsampling without replacement in the context of 2008). Specifically, if the rejective filtering procedure
boosting was first experimentally analyzed for boosted terminates without giving enough examples, then the
trees in (Friedman, 2002) in the context of Stochastic error errt of our hypothesis H is small.
Gradient Boosting. Here, we use a rejection sampling Proposition 2.1 If in any call of Filter, n ≥
framework similar to FilterBoost(Bradley & Schapire, 2
ln(1/δ) consecutive examples are rejected, then
2008) and that of (Zadrozny et al., 2003), and as de- errt ≤ with probability 1 − δ.
scribed in the function filter in algorithm 1. We go
through the samples in a random order, and reject easy Analysis: A popular class of analytical results in
examples one by one till we collect a sufficient number boosting (Collins et al., 2002; Bradley & Schapire,
of difficult examples S. For step (b), and computing 2008) use the recurrences arising from the new data
the weights αt , we need to measure the weighted ac- distribution reweighting difficult examples in the pre-
curacy of ht . As we detail in the experimental section, vious iteration, and the weak-learners having an
we train the human learner ht over S in an online fash- “edge” over random guessing even on the new data
ion, and then ask them to label examples from D − S. distributions. Another recent line of analytical results
Learners may memorize examples from S so we cannot are based on the view of boosting as greedy stagewise
use S in the test set. This introduces an unavoidable functional gradient descent (Mason et al., 1999; Fried-
error in the computation of αt , as S contains the most man et al., 2000; Grubb & Bagnell, 2011). Recently,
difficult examples and so D − S contains easy exam- Grubb & Bagnell (2011) provided an elegant analysis
ples. However, for human learners, S is small, and if of boosting under the restriction of using weak learn-
Human Boosting
ers, by interpreting it as restricted gradient descent in E[∇0 ] = ∇ on xi . Our boosting algorithm can then be
function space. Our analysis is an extension of this abstracted as performing the step: ft+1 = ft + αt ht ,
recent result, where we interpret our algorithm as a where ht ∈ arg maxh∈H h∇0 , hi/khk, where ∇0 is the
stochastic version of their restricted gradient descent. stochastic gradient detailed above. For the purpose of
analysis, set the weight as αt = −1/ηt h∇0 , ht i/kht k2 .
Our presentation here is terse for lack of space, and
Further, we require that the restriction set satisfy the
we refer to (Grubb & Bagnell, 2011) for extended de-
weak-learning assumption with respect to the stochas-
tails of their framework. Consider the function space
tic gradient, so that arg maxh∈H h∇0 , hi/khk ≥ γk∇0 k,
L2 (X , R, P ) of square-integrable functions, equipped
for some γ > 0. We can then extend the analysis of
Rwith the inner product hf, giP = EP [f (x)g(x)] = (Grubb & Bagnell, 2011) to obtain the following:
X
f (x)g(x)dP (x). Denote by P̂ the empirical dis-
tribution over the data {(xi , yi )Ni=1 },Pand the corre- Theorem 2.2 Let Remp [f ] be a α-strongly con-
N
sponding inner product, hf, giP̂ = i=1 f (xi )g(xi ). vex, and β-strongly smooth functional over F :=
2
Consider the objective R : L (P̂ ) → R derived from L2 (X , R, P̂ ). Let H ⊂ F be a restriction set with edge
PN
the empirical risk Remp [f ] = N1 i=1 l(yi f (xi )); as- γ. Suppose further that the variance of the stochastic
sume the loss function l is convex, and differentiable gradients is bounded so that E[k∇0t − ∇t k] ≤ κk∇t k.
(note that the function l in our case is the logistic func- Let f ∗ ∈ arg minf ∈F Remp [f ]. Given a starting point
tion l(t) = log(1 + exp(−t))). Denote the “restriction f0 , and step-size ηt = 1/β, after T iterations of our
set” of weak learners as H. Boosting can then be de- algorithm, we have:
rived as a method toP minimize Remp [f ] subject to the E[Remp [fT ]] − Remp [f ∗ ] ≤ ρT [Remp [fo ]] − Remp [f ∗ ]] ,
constraint that f = t αt ht with ht ∈ H and αt ∈ R.
where ρ := (1 − (2α/β)(γ/4 − κ2 − κ(1 + γ/2))), and
The subgradient of Remp [f ] at any f can be written where the expectation is over the randomization in our
as ∇Remp [f ] = {g|g(xi ) = ∂l(y∂fi f(x)
(xi ))
= l0 (yi f (xi ))yi }; sampling procedure.
denote this by ∇. The key insight in Grubb &
Bagnell (2011) was to leverage the fact that each
3. Experiments and Results
boosting step can be written as ft+1 = ft +
αt ht , where ht ∈ arg maxh∈H h∇, hi/khk, and αt = We first describe our experimental setup and then
−1/ηt h∇, ht i/kht k2 , for some step-size ηt . Another present our experimental results demonstrating Hu-
key idea there was that the weak-learning assumption man Boosting on category learning tasks, where learn-
can be asserted as an approximate gradient condition: ers are expected to learn to distinguish between two
arg maxh∈H h∇, hi/khk ≥ γk∇k, for some γ > 0. They classes A and B of stimuli, based on examples.
then relate this γ to the edge parameter used in typ-
ical boosting analyses. Note that the boosting step 3.1. Experimental Setup
is in effect a restricted gradient descent, which they
then leverage to show that (i) the algorithm asymp- We conduct experiments using Amazon’s Mechani-
totically converges to the optimum of the desired loss cal Turk(MTurk). MTurk is a crowdsourcing market-
function, and that (ii) it does so efficiently, i.e. at a place which connects requesters, who assign HITs (Hu-
rate polynomial in 1/, where is the desired error. man Intelligence Tasks), to workers who perform the
HITs. MTurk is ideally suited to microtasks which can
The key difference in our boosting algorithm when be performed quickly. Our program posted HITs on
compared to the setting of (Grubb & Bagnell, 2011) MTurk to recruit workers and led them to our server
is that we sample training data without replacement, for experiments. On completing the experiment, they
and provide them to the learner without weights, so were given a unique random string to enter on MTurk
that instead of the ∇ derived above, we use a stochas- as verification for payment. Since HITs are paid tasks,
tic gradient ∇0 : where ∇0 (xi ) = Si yi , with Si ∈ {0, 1} workers have the incentive to spam tasks, i.e. click
is an indicator for data instance i, i.e. 0 if xi ∈ / S through them quickly giving wrong results. Workers
and 1 if xi ∈ S. Our algorithm is thus a stochas- on MTurk are also typically less attentive than vol-
tic restricted gradient descent, and we can extend the unteers in laboratory experiments. To ensure worker
analysis of (Grubb & Bagnell, 2011) to obtain con- attentiveness and to discourage spammers, a one sec-
vergence rates even in our setting. We first note that ond delay was given after every training example and
the function Filter in 1 ensures that p(xi ∈ S) = a half second delay was given after every test example.
1/(1 + exp(−yi f (xi ))) = l0 (yi f (xi )) where f is the Our experiments used workers with > 1000 approved
current hypothesis. It thus follows that the inclusion HITs, and a > 95% acceptance rate. All HITs were de-
probability of xi satisfies: E[Si ] = l0 (yi f (xi )), so that signed to be completed in less than 20 minutes. Each
Human Boosting
Parameter y
and is known to be superior to observational training 60
4 0.9
3 0.8
# of learners
Accuracy
2 0.7
1 Training
0.6
Test
0 0.5
0.4 0.5 0.6 0.7 0.8 5 10 15 20 25 30
Accuracy # of iterations
Figure 2. Boosting on the Crosshair Task: The training accuracy of the boosted learners approaches the maximum possible
accuracy of 85% in 10 rounds, performing better than the best control group learner. Boosting can overcome noise in the
dataset
3 1.0
0.9
# of learners
0.7
2
Accuracy
Training
Accuracy
0.8
Test
0.6 0.7
1 Training
0.6 Test
0 0.5 0.5
0.4 0.5 0.6 0.7 0.8 2 4 6 8 10 5 10 15 20 25 30
Accuracy # of iterations # of iterations
(a) Histogram for control group (b) Boosting with 20 examples (c) Boosting with 5 examples
Figure 3. Boosting on the Gabor Task: Given 20 examples in each round, the boosted learner suffers the same limitation
as the best control group learner (uses only the stronger perceptual dimension). However, with 5 examples, boosting
forces learners to concentrate on the discriminative features and achieves > 90% training accuracy in 30 rounds
in the control group who were trained with 5 exam- A Gabor patch(Figure 6(a)) is a sinusoidal grating
ples each. They achieved an average test accuracy of with control parameters: spatial frequency d and ori-
55.74% ± 11.35% with a maximum of 70%. The re- entation θ. The patches are generated using a sine
ported rules were typically noisy decision stumps in wave z = sin((0.25 + d/50)x) rotated θ anticlockwise
either dimension. With 5 examples, humans are weak from the positive x-axis and presented as a 200 × 200
learners on this task. With more examples, humans do pixel image. These dimensions are separable so hu-
perform better, but report confusion about the class mans can reason independently about each of these
boundary. We perform experiments with 5 examples dimensions but they find it difficult to learn decision
to demonstrate the effect of boosting. rules jointly involving both dimensions. This task is
an example of learning information-integration bound-
Figure 2(b) shows the performance of the boosted
aries, albeit around 800 examples are required to learn
learners. In 10 rounds, human boosting achieves 85%
the information-integration boundary. The classes are
training accuracy and 75% test accuracy, which is bet-
drawn as in Figure 6(b) where the optimal decision
ter than the best control group learner. The gap be-
boundary is linear at a 45◦ angle to the axes. We
tween the training and test accuracy is called the “gen-
use parameter values as in (Ell et al., 2009), which
eralization error” and is a common occurrence in ma-
were chosen such that the resulting patches are not too
chine learning algorithms. This error typically goes to
dense and it is possible to count the number of lines
zero as the size of the training set is increased.
in the grating and the effects of the dimensions are
balanced. Since humans naturally find this task diffi-
3.2.2. Gabor Patch Task
cult, we do not add noise. Note that an ideal boosting
Distribution for Gabor task procedure using decision stumps on the true param-
80
eters can achieve 100% accuracy on this task. Thus,
Orientation
60
40
20
this task is one of learning an information-integration
0 boundary using an aggregation of rule-based learners.
200 250 300
Spatial Frequency We demonstrate that boosting excels at this task and
achieves very high accuracy.
Figure 6. Gabor Learning Task In preliminary experiments, we found that if the deci-
Human Boosting
3
0.8
Training
# of learners
2 0.7 Test
Accuracy
0.6
1
0.5
0 0.4
0.35 0.4 0.45 0.5 0.55 0.6 0.65 5 10 15 20 25 30
Accuracy # of iterations
Figure 5. Boosting on the Opinion Spam dataset: Due to the complexity of the task, control group learners can identify
relevant features, but not the class towards which they point. After 30 rounds, the boosted learner performs better than
all control group learners, though the accuracy will increase if we run it for more iterations
sion boundary is axis-parallel, then it is learned per- dataset introduced in (Ott et al., 2011) which con-
fectly with < 10 examples. On the distribution shown tains truthful hotel reviews extracted from TripAd-
in Figure 6(b), the average accuracy of human learn- visor and deceptive reviews for the same hotels gen-
ers from the control group trained with 20 examples erated using Mechanical Turk. By design, deceptive
each was 60.11% ± 12.22%(see Figure 3(a)). Most reviews are meant to deceive humans and Turkers do
learners learned decision rules involving the spatial fre- find this task difficult, often performing near random.
quency while none used the orientation. Some partic- This task is challenging because of the number of pos-
ipants learned rules based on irrelevant artifacts such sible perceptual dimensions to the task, because the
as “ropiness” of lines and their responses could not nature of the decision boundary is unknown and be-
discriminate between classes. The optimal spatial fre- cause the classes may overlap. This task is particularly
quency decision boundary achieves accuracy of 72%, interesting for a black-box procedure such as boosting,
achieved by 3 learners, though 100% performance is since the relevant features are unknown. The decision
possible in principle. boundaries learned by human learners are based on
semantic features which are difficult to capture with
Boosting with 20 training examples(Figure 3(b))
computers (is the review humorous?, sarcastic?, well-
quickly achieves 72% training accuracy and stabilizes
written?). Preliminary experiments showed that if the
at this value (first 11 rounds shown). This is a failure
learners are informed that they are trying to separate
of the boosting algorithm as it only performs as well as
fake and truthful reviews, they completely ignore the
the best human learner. Examining the user responses
given training set and use their own preconceived no-
revealed that all learned decision rules used only the
tions of what constitutes a truthful review. Hence, in
spatial frequency. We speculate that this is because,
our experiments they are only informed that the task is
with 20 examples, learners focus on the most obvi-
to separate two classes of reviews. This may, however,
ous discriminative perceptual dimension(spatial fre-
lead to many features not being discriminative. The
quency) and ignore the others. Boosting with 5 train-
average accuracy over the control group human learn-
ing examples(Figure 3(b)) on the other hand allows
ers (Figure 5(a)) trained with 10 examples was 51% ±
weak learners to learn decision rules based on the ori-
8.76%, i.e. only slightly better than random. In this
entation and consequently achieves > 90% training ac-
task, we only expect human learners to identify dis-
curacy. This finding might be of interest even from a
criminating features in the training set and to classify
cognitive science point of view. It is interesting to
the remaining instances according to that feature.
note that (Friedman, 2002) observed a similar effect
with decision trees, viz subsampling can improve per- With human boosting(Figure 5(b)), we gradually
formance, though their differences were not as stark. achieve 70% training accuracy and 65% test accuracy
on this task after 30 boosting rounds. We expect the
3.2.3. Opinion Spam Task increasing trend in the training accuracy to continue
if the algorithm is run for more iterations.
While the earlier experiments were based on synthetic
data, we now consider the performance of our method
on a real world dataset. The Opinion Spam task aims Acknowledgments
to separate truthful and deceptive reviews. Each stim-
We acknowledge the support of NSF via IIS-1149803,
ulus is a hotel review, usually a small paragraph with
and DoD via W911NF-12-1-0390.
4-5 sentences. We used the opinion spam detection
Human Boosting