You are on page 1of 10

Week 1.

Introduction to crowdsourcing

Contents
Lecture 1.1. What is crowdsourcing? ...................................................................................................................... 1
Lecture 1.2. Crowdsourcing in ML .......................................................................................................................... 3
Lecture 1.3. Basic steps of launching a crowdsourcing project ................................................................ 5
Lecture 1.4. Project decomposition and why we need it ............................................................................. 7
Lecture 1.5. Ways to decompose a task ............................................................................................................... 9

Lecture 1.1. What is crowdsourcing?

Examples of crowdsourcing projects:


— Wikipedia, a massive, multi-lingual encyclopedia contributed by a huge number of authors.
— NASA’s projects on analyzing the earth's light pollution and discovering new objects in outer
space, millions of photos that were taken in outer space and then classified or annotated by
large numbers of regular people.

Crowdsourcing can be applied in any field where there is a need for solving a complex problem,
including astrophysics. The most quickly growing area for crowdsourcing is AI and machine
learning.

Examples of tasks that can be solved with crowdsourcing (quickly, every day):

1. Search engine development. Assess how hundreds of thousands of sites match user
queries.
2. Navigation app development. Find areas that are open to traffic and are blocked off.
3. Taxi service development. Check if drivers cleaned their cars before driving out to serve
passengers.

Crowdsourcing is... (in the scope of this course):

— a special way of organizing business processes,


a complex task is broken down into multiple micro-tasks, and then those micro-tasks are
completed by thousands of individuals.

— a new style of management,


thousands of performers complete a small piece of a task instead of a limited number of
experts doing routine tasks for 8 hours per day.

— an engineering problem,
the final result and the quality of data depend on the quality of the process itself but not on the
competence of a single individual.

Crowdsourcing brings new challenges:

— How to ensure quality?


— How to choose performers?
— How to make the most of a budget?

Decision. Break these challenges down into clear scenarios + master these scenarios =
crowdsource management turns into a clear methodology; managers turn into crowd solutions
architects (CSA).

Skills a CSA should possess:

— ability to design an environment suitable for any performer,


— ability to ensure the quality of the received data,
— ability to create instructions and check if performers read them carefully,
— ability to stay on a budget and get the most out of money.

Crowdsourcing platforms:

Platforms that offer ready-made solutions for ML + support for requesters:


— Scale AI
— Hive Data
— Alegion

Open crowdsourcing platforms (crowd marketplace):


— Amazon's MechanicalTurk
— Tolóka

Crowd marketplace: 1. Performers offer their time and skill. 2. Platform offers a meeting point
and useful tools. 3. Requesters offer tasks and earnings.

A well-designed task can be scaled very quickly. An annotation can be quickly stopped if it's no
longer necessary: since open crowdsourcing platforms are based on the marketplace principle,
performers will simply switch to other projects.

What is the power of the crowd?


— available 24/7,
— various countries and various languages,
— vast collective experience,
— various services,
— scaled and stopped when necessary.

The benefits of crowdsourcing:


— speed,
— scalability,
— variability.

Lecture 1.2. Crowdsourcing in ML

ML is based on three things:

— big computing power,


— high-level algorithms,
— datasets for its training.

The greater the speed and volume with which a machine learning team can collect data, the
faster they can train algorithms and roll out new technologies.

Example 1. Search engine rankings


Ranking algorithms of search engines require ongoing human evaluations.

1. A ranking algorithm is trained using a data pool and then is implemented for real
searches.
2. At the same time, new versions of algorithms are created to replace the current one.
3. Developers need to know: the quality of the current model's output and the quality of
the new algorithms.
4. Developers use crowdsourcing to evaluate a certain set of search queries and matching
results to get the ratings of sites that appear in search results.
5. The ratings are used to construct quality metrics for the algorithms and to compare
them.

Human evaluation for a ranking algorithm:


— annotating the training set,
— evaluating the quality of the current model,
— comparing the quality of new models.

Example 2. Human-in-the-loop moderation


Crowdsourcing helps when a machine can't classify a comment and needs human help.

1. There is a service that requires comment moderation.
2. All comments are given to an automatic classifier that makes 95% right decisions.
3. The difficult cases are given to a group of performers who are familiar with the rules of
moderation.
4. The final verdict is the one that is approved by the majority of performers.
5. Super-complex cases are submitted for moderation by a super-expert.

More examples of areas to use crowdsourcing for ML:


— self-driving cars,
— voice assistants, ---------------->>> AI products
— machine translation,
— online maps and navigation,
— and many other products.

AI products depend on data labeled by a human either to train algorithms or to validate them. As
AI continues to gain influence, specialists will need more and more data from humans. And the
better the algorithms get, the more new data we'll need to pour into them to get a noticeable
improvement in quality.

Other examples of areas to use crowdsourcing, not only for ML:


— digitizing document data,
— having offline performers collect data on physical objects in cities,
— running UX testing.
Lecture 1.3. Basic steps of launching a crowdsourcing project

The quality of the project doesn’t depend only on the crowd; it is also the responsibility of a
requester. If the instruction is unclear there will be no good quality from the crowd. It is our
goal to make the instructions as clear and as concise as possible. So, the crowd project design
does matter.

There are six necessary steps for launching a crowd-based project that will give you reliable
data:

1. Decomposing a task
2. Writing clear instructions and designing an interface
3. Selecting and training performers
4. Setting robust quality controls
5. Deciding on a price
6. Implementing results aggregation

Decompose a task

— Break up a task into several smaller ones that are easier to complete.
— A simple task always attracts more users who can fit a project.
— Simpler tasks are cheaper because they require less effort and no special knowledge from
the performers.
— Simpler tasks allow requesters to use basic quality control settings that work well out of the
box.

Develop interfaces and instructions

— Decompose a task properly, so you won’t need to write complicated instructions or set up
intricate interfaces.
— Write concise, clear instructions and develop convenient task interfaces.
— Use templates for your interfaces.
— Both instructions and interfaces should serve the performers and should help them to
complete your tasks well.

Select and train the performers

— Use basic entry filters: for example, a language filter or an age filter.
— Prepare a training or a test for performers in order to check they’ve read and understood
instructions and examples.
— Use the same training sets to select more and more performers every time you need them.

Establish quality control


— Use the mechanisms of quality control to limit the access of the dishonest performers,
anyone who isn’t paying attention, someone who doesn't understand the task for some reason.
The examples of such mechanisms are control tasks, overlap means, CAPTCHA, post-
verification, fast answers check.

Design a pricing scheme

— Before launching a task, think about how much the performers will get for their work, and
how to motivate them to give you more effort.
— The basic price depends on the time spent on a task and can be increased if the task
demands some extra skill (for example, speaking a rare language).
— Setting the prices high by default does not guarantee you good quality: overpriced projects
attract cheaters instead of motivating performers to work better.
— Spend extra budget on bonuses or special pricing schemes for those who work with
outstanding quality.

Aggregate results

— Aggregate results in a way that provides the best quality.


— Use smart aggregation techniques such as “majority vote”; the answer that appeared most
often becomes your final result.
Lecture 1.4. Project decomposition and why we need it

Project decomposition in real life:


— IKEA furniture. IKEA customers are able to easily put together a chair or a table by
themselves. The reason is that the process is broken down into a series of simple steps, and
each step described in the manual.

The principle of decomposition lies at the core of every crowdsourcing project. The goal is to
crystalize the task the way that most of the performers can understand it and complete well.

Example. Project decomposition in crowdsourcing

1. A project consists of screenshots of a customer support service’ conversation with a


client. Performers are asked whether the response of the customer support was good
enough.

2. Such tasks confuse the performers because the definition of “good enough” is too
broad, and the risk of getting noisy data is very high.

3. To avoid these situations, ask performers simple and straightforward questions: was
the customer support response grammatically correct? Was it comprehensive? Was it
friendly?

4. Going through a checklist will make the task much more clear and easier to control in
real time, and, as a result, it will be completed much faster and with much higher
quality.

The reasons why a task should be decomposed

— The simpler the task, the more performers can understand the instructions and pass the
training.
— Simple task design reduces the number of mistakes.
— Simple tasks are easier to review.
— Decomposed tasks are cheaper because they don’t require too much effort or rare skills
from the performers.

There’s no need to decompose a task when two criteria are met: task instructions fit on half of an
A4 sheet of paper, OR the task can be completed through a single action like choosing from a
few categories.

In all other cases, designing a project involves several steps:


1. Define your goals and objectives.
2. Think about the design of the pipeline and the steps which are absolutely necessary to
get the result.
3. Decompose steps that are too complicated.
4. Go back to step 2 and do another round if necessary.

Basic decomposition scenarios

— Simplify a complex challenge. If you need an answer to a complicated question, try dividing
the question into a set of simpler ones.

— Split a multi-task. If you need to classify some content and also check if it’s adult-only, ask the
corresponding questions gradually or even separately.

— Group a multitude of options. If you have more than 10 possible answers to choose from,
group them thematically and then work inside a specific group.

— Divide into several processes. If your task involves complicated quality control mechanisms
and human evaluation, add a post-verification project where performers will check each other.
Lecture 1.5. Ways to decompose a task

Reasons for decomposition:

— a complex task, which is decomposed into several simple ones, can be completed more
efficiently,
— it makes quality control easier for requesters.

Example 1. Collecting offline data for a map service

If not decomposed: pictures are good, but business details are bad; business details are good, but
pictures are bad.

Example 2. Outlining objects for computer vision


If not decomposed: some pictures don’t contain necessary objects, standard quality checks don’t
work.

Example 3. Writing ads for online marketing campaigns

If not decomposed: snippets that are not meeting your criteria.

! Decomposition both simplifies a project's organization and also makes it possible to solve
non-trivial cases that seem unsuitable for crowdsourcing at first glance.

You might also like