Professional Documents
Culture Documents
Introduction to crowdsourcing
Contents
Lecture 1.1. What is crowdsourcing? ...................................................................................................................... 1
Lecture 1.2. Crowdsourcing in ML .......................................................................................................................... 3
Lecture 1.3. Basic steps of launching a crowdsourcing project ................................................................ 5
Lecture 1.4. Project decomposition and why we need it ............................................................................. 7
Lecture 1.5. Ways to decompose a task ............................................................................................................... 9
Crowdsourcing can be applied in any field where there is a need for solving a complex problem,
including astrophysics. The most quickly growing area for crowdsourcing is AI and machine
learning.
Examples of tasks that can be solved with crowdsourcing (quickly, every day):
1. Search engine development. Assess how hundreds of thousands of sites match user
queries.
2. Navigation app development. Find areas that are open to traffic and are blocked off.
3. Taxi service development. Check if drivers cleaned their cars before driving out to serve
passengers.
— an engineering problem,
the final result and the quality of data depend on the quality of the process itself but not on the
competence of a single individual.
Decision. Break these challenges down into clear scenarios + master these scenarios =
crowdsource management turns into a clear methodology; managers turn into crowd solutions
architects (CSA).
Crowdsourcing platforms:
Crowd marketplace: 1. Performers offer their time and skill. 2. Platform offers a meeting point
and useful tools. 3. Requesters offer tasks and earnings.
A well-designed task can be scaled very quickly. An annotation can be quickly stopped if it's no
longer necessary: since open crowdsourcing platforms are based on the marketplace principle,
performers will simply switch to other projects.
The greater the speed and volume with which a machine learning team can collect data, the
faster they can train algorithms and roll out new technologies.
AI products depend on data labeled by a human either to train algorithms or to validate them. As
AI continues to gain influence, specialists will need more and more data from humans. And the
better the algorithms get, the more new data we'll need to pour into them to get a noticeable
improvement in quality.
The quality of the project doesn’t depend only on the crowd; it is also the responsibility of a
requester. If the instruction is unclear there will be no good quality from the crowd. It is our
goal to make the instructions as clear and as concise as possible. So, the crowd project design
does matter.
There are six necessary steps for launching a crowd-based project that will give you reliable
data:
1. Decomposing a task
2. Writing clear instructions and designing an interface
3. Selecting and training performers
4. Setting robust quality controls
5. Deciding on a price
6. Implementing results aggregation
Decompose a task
— Break up a task into several smaller ones that are easier to complete.
— A simple task always attracts more users who can fit a project.
— Simpler tasks are cheaper because they require less effort and no special knowledge from
the performers.
— Simpler tasks allow requesters to use basic quality control settings that work well out of the
box.
— Decompose a task properly, so you won’t need to write complicated instructions or set up
intricate interfaces.
— Write concise, clear instructions and develop convenient task interfaces.
— Use templates for your interfaces.
— Both instructions and interfaces should serve the performers and should help them to
complete your tasks well.
— Use basic entry filters: for example, a language filter or an age filter.
— Prepare a training or a test for performers in order to check they’ve read and understood
instructions and examples.
— Use the same training sets to select more and more performers every time you need them.
— Before launching a task, think about how much the performers will get for their work, and
how to motivate them to give you more effort.
— The basic price depends on the time spent on a task and can be increased if the task
demands some extra skill (for example, speaking a rare language).
— Setting the prices high by default does not guarantee you good quality: overpriced projects
attract cheaters instead of motivating performers to work better.
— Spend extra budget on bonuses or special pricing schemes for those who work with
outstanding quality.
Aggregate results
The principle of decomposition lies at the core of every crowdsourcing project. The goal is to
crystalize the task the way that most of the performers can understand it and complete well.
2. Such tasks confuse the performers because the definition of “good enough” is too
broad, and the risk of getting noisy data is very high.
3. To avoid these situations, ask performers simple and straightforward questions: was
the customer support response grammatically correct? Was it comprehensive? Was it
friendly?
4. Going through a checklist will make the task much more clear and easier to control in
real time, and, as a result, it will be completed much faster and with much higher
quality.
— The simpler the task, the more performers can understand the instructions and pass the
training.
— Simple task design reduces the number of mistakes.
— Simple tasks are easier to review.
— Decomposed tasks are cheaper because they don’t require too much effort or rare skills
from the performers.
There’s no need to decompose a task when two criteria are met: task instructions fit on half of an
A4 sheet of paper, OR the task can be completed through a single action like choosing from a
few categories.
— Simplify a complex challenge. If you need an answer to a complicated question, try dividing
the question into a set of simpler ones.
— Split a multi-task. If you need to classify some content and also check if it’s adult-only, ask the
corresponding questions gradually or even separately.
— Group a multitude of options. If you have more than 10 possible answers to choose from,
group them thematically and then work inside a specific group.
— Divide into several processes. If your task involves complicated quality control mechanisms
and human evaluation, add a post-verification project where performers will check each other.
Lecture 1.5. Ways to decompose a task
— a complex task, which is decomposed into several simple ones, can be completed more
efficiently,
— it makes quality control easier for requesters.
If not decomposed: pictures are good, but business details are bad; business details are good, but
pictures are bad.
! Decomposition both simplifies a project's organization and also makes it possible to solve
non-trivial cases that seem unsuitable for crowdsourcing at first glance.