You are on page 1of 2

Hostile Takeovers in EM based

Crowdsourcing Systems

Ashwin Kumar, Bradford Orr and Sayantan Kumar


CSE 518A Course Project

In this report we will talk about the progress we have made so far in the course
project. The following 2 sections describes our data generation strategy and the
the variants of EM algorithms found in our literature review that we plan to
implement to illustrate the effects of adversarial behaviour on those algorithms.

Data Generation
For our work, we mainly generated 2 types of data sets. One is the full matrix
where all workers have answered all questions, and the other a sub sampled
version of the full matrix. For the sub sampled data set, we have considered 4
cases : (1) Fixed number of answers per question (2) Each user answers same
number of questions. (3) Variable answer rate per user. (4) Different number of
answers for each question. While data generation, we have assumed the fraction
of adversarial users as a user input and we plan to model that in the later stages.
Contrary to the usual case where a worker making a random guess has a skill of
0.5, we have sampled the initial worker skills from a Gaussian distribution with
mean 0.5 and standard deviation 0.25.

Apart from worker skills, we plan to model the difficulty level of the question.
Since we assuming random values between +1 and −1 for the ground truth and
the answers, it would be more realistic if workers answer only those questions
whose difficulty level is less than their skills. We have considered tags or labels
of each question. The tags will not be implicitly known to the workers. It would
be an interesting experiment analyze how the event of adversarial workers tar-
geting some specific tags overall affect the system.

For our data generation, we have assumed mainly 2 categories of workers,


the adversarial workers whose skill level is 1 and the other normal workers
who have a chance of 0.25 − 0.75 to know the correct answer to the questions,
and guess randomly otherwise. While this assumption is a more constrained
type, one interesting insight of this is how will the overall model be changed
if we assume 3 categories of workers : skilled adversarial, skill normal and
noisy workers.

1
Probable EM approaches
To test the effect of adversarial workers on the system, we have planned to try
some popular algorithms where the EM framework is used. While our first aim
is to test the adversarial effect on the basic EM, we want to analyze if other
more complicated frameworks are susceptible to the hostile takeover. Apart
from the basic EM, we also plan to introduce adversarial users while Majority
Voting and Weighted Majority Voting as part of the baseline approaches.

After a brief literature review, we have selected 2 papers. First one is the
paper which implemented GLAD [Whose Vote Should Count More: Optimal
Integration of Labels from Labelers of Unknown Expertise]. The 2nd one is
Learning From Noisy Singly-labeled Data which developed an algorithm called
Model Bootstrapped EM algorithm (MBEM). We are in the process of reading
more papers to find EM based algorithms that suits our purpose.

You might also like