Professional Documents
Culture Documents
Is
word-‐of-‐mouth
correlated
to
General
Election
results?
The
results
are
in.
Our
experiment
to
identify
if
there
is
a
correlation
between
the
volume
of
Twitter
mentions
for
election
candidates
and
the
election
results
is
complete
–
producing
the
following
results:
1. Individual
seat
predictions
were
69%
accurate
(av.
sample
size
=
677
tweets
per
constituency)
2. Regional
party
performance
predictions
were
87.5%
accurate
(av.
sample
size
=
37,000
tweets
per
region)
3. National
share
of
votes
predictions
were
90.5%
accurate
(an
average
error
of
1.75
points
for
every
party
–
lower
than
most
opinion
polls)
(sample
size
=
2,010,000
tweets)
From
these
results
we
can
draw
three
key
insights:
1. There
is
a
correlation
between
the
number
of
Twitter
mentions
and
a
candidate
winning
the
seat.
2. The
model
is
better
at
predicting
national
&
regional
trends
than
predicting
the
outcome
of
individual
local
events.
3. There
is
a
strong
correlation
between
sample
size
of
tweets
analysed
and
accuracy.
Despite
instances
in
which
the
model
is
susceptible
to
disproportionate
media
activity,
the
model
is
accurate
in
gaining
insight
into
trends
at
a
national
and
regional
level.
This
leads
to
the
following
conclusions:
1. The
experiment
succeeded
in
predicting
the
national
vote
with
comparable
accuracy
to
opinion
polls.
2. The
data
accurately
indicated
party
performances
at
a
regional
level.
3. The
larger
the
sample
size
of
tweets,
the
higher
the
accuracy
of
the
predictions.
National
and
regional
trends
most
definitely
impact
local
outcomes,
yet
the
distribution
of
this
impact
given
the
smaller
sample
size
of
tweets
at
a
local
level
cannot
be
fully
assessed
by
measuring
buzz
alone.
The
results
present
an
interesting
correlation
between
Twitter
mentions
and
electoral
success
–
suggesting
social
media
‘buzz’
on
platforms
like
Twitter
are
a
good
indicator
of
election
performance
and
gauging
the
public
mood.
Recap
of
the
predictive
modelling
experiment.
From
Tuesday
March
30th
up
until
the
election
we
counted
the
mentions
for
candidates
on
Twitter
and
modelled
predictions
for
the
constituency,
regional
and
national
votes
based
on
this
data.
The
aim
of
the
study
was
to
assess
if
the
frequency
of
Twitter
mentions
for
candidates
could
help
to
predict
which
ones
would
be
successful.
The
data
set
was
fed
from
all
433
constituencies
represented
on
Twitter,
i.e.
candidates
mentioned
on
Twitter
could
be
attributed
to
433
out
of
650
UK
constituencies.
2,010,000
tweets
were
processed
over
the
4
week
study
period.
The
full
methodology
we
used
can
be
found
at:
http://www.scribd.com/doc/29154537/Tweetminster-‐Predicts
This
experiment
was
not
a
polling
exercise,
nor
a
statistical
analysis
project.
As
a
result
we
do
not
present
a
statistical
margin
of
error
calculation,
standard
deviation
calculations
or
claim
statistical
relevance
in
the
results.
However,
the
results
are
too
accurate
to
be
accounted
for
by
chance
or
coincidence,
they
strongly
suggest
that
the
level
of
accuracy
of
the
predictions
are
grounds
for
confirming
the
predictive
power
of
Twitter
is
reliable.
Methodology
notes:
• All
data
was
gathered
by
querying
the
Twitter
API.
• The
mathematical
methodology
is
simple
addition
(e.g.
1
+
1
=
2)
of
candidate
mentions.
• The
candidate
with
the
most
mentions
was
predicted
the
winner
in
each
seat.
• The
shares
of
national
vote
percentages
were
calculated
from
adding
the
percentages
of
mentions
for
each
party
in
the
total
sample.
• The
regions
that
formed
the
regional
breakdowns
are
the
standard
use
definitions
of
UK
mainland
regions.
• The
national
vote
was
calculated
by
looking
at
the
percentage
breakdown
of
party
mentions
(by
candidate)
in
each
of
the
analysed
seats
and
calculating
the
percentages
of
party
mentions
within
the
433
seats
on
Twitter.
National
party
shares
of
vote
predictions
To
predict
the
top-‐line
national
figures
no
weighting
was
applied
-‐
we
recorded
2,010,000
mentions
and
counted
the
mentions
for
candidates
in
each
of
the
433
constituencies
analysed,
and
repeated
this
count
each
week
to
include
candidates
joining
Twitter
during
the
study
period.
The
percentage
of
mentions
for
each
party
in
the
433
constituencies
gave
the
following
projected
share
of
the
vote:
(actual
may
6th
figures
and
error
in
our
prediction
in
red)
Conservatives
Labour
Liberal
Democrats
Others
35%
30%
27%
8%
37%
(-‐2)
30%
(0)
24%
(+3)
10%
(-‐2)
Which
gives
the
predictive
model
an
average
accuracy
of
90.5%
-‐
or
an
average
error
of
1.75
Compared
with
polling
predictions,
our
experiment
was
less
accurate
than
ICM
(1.25),
on
a
par
with
Ipsos
MORI,
Populus
&
Harris
(1.75),
and
more
accurate
than
YouGov,
ComRes,
Opinium
(2.25)
Angus
Reid
&
TNS
BMRB
(3.25)
(Source:
http://ukpollingreport.co.uk/blog/archives/2692)
During
the
four
weeks
of
our
study,
the
top-‐line
figures
varied
as
followed:
Conservatives
Labour
Liberal
Democrats
Others
34%
36%
35
33%
35%
33%
32%
30%
22%
23
28%
26%
9%
10
7%
9%
8%
35%
(nc)
30%
(nc)
27%
(+1)
(-‐1)
Verdict:
The
key
insight
here
is
that
the
media
buzz
around
Nick
Clegg
after
the
TV
leader’s
debate
did
not
translate
into
actual
votes
cast,
suggesting
media
attention
stimulated
more
mentions
of
Lib
Dem
candidates.
Regional
party
performance
predictions
The
seat
wins
predicted
in
UK
Regions
allowed
us
to
make
predictions
for
party
performances
as
follows:
• SNP
not
gaining
new
seats
(validated
by
election
result
-‐
the
SNP
made
no
gains)
• Labour
and
Liberal
Democrats
performing
well
in
Scotland
(validated
by
election
result:
both
emerged
with
their
total
number
of
seats
intact
bucking
the
national
trend)
• No
significant
change
in
Plaid
Cymru
support
(validated
by
election
result:
they
gained
one
seat)
• Liberal
Democrats
to
hold
ground
against
the
Conservatives
in
the
South
West
(validated
by
election
result:
there
was
only
a
1%
LDEM
to
CON
swing
in
the
South
West)
• Labour
to
perform
better
in
London
than
polls
forecasting
(validated
by
election
result:
the
LAB
to
CON
swing
was
2.5%
compared
to
the
6.1%
in
the
rest
of
England
and
5.03%
across
the
country
as
a
whole)
• Conservatives
to
perform
well
in
the
East
Midlands
(validated
by
election
result:
In
the
East
Midlands
the
Conservatives
gained
12
seats
and
the
LAB
to
CON
swing
was
6.7%)
• Conservatives
to
perform
well
in
Wales
(validated
by
election
result:
in
Wales,
the
Conservatives
gained
5
seats
and
the
LAB
to
CON
swing
was
5.6%).
• Conservatives
to
gain
a
few
seats
in
Scotland
(not
validated
by
election
result:
they
didn’t).
Verdict:
87.5%
of
the
predicted
regional
trends
were
accurate,
suggesting
Twitter
mentions
give
good
insight
into
UK
regions.
For
each
of
these
regional
predictions
we
analysed
an
average
of
37,000
tweets
per
region.
This
suggests
the
sample
size
for
these
predictions
gave
an
accurate
insight
into
trends
missed
by
some
opinion
polls.
It
also
challenges
the
perception
that
Twitter
is
primarily
a
London-‐centric
platform
with
significantly
less
relevance
to
the
rest
of
the
UK.
Constituency
level
predictions
For
constituency-‐by-‐constituency
predictions
the
most
mentioned
candidate
in
each
constituency
was
predicted
the
seat
winner.
Some
filtering
of
the
sample
was
necessary
to
ensure
we
predicted
seats
using
consistent
sets
of
data
and
reduced
errors
due
to
unequal
representation.
I.e.
1. Number
of
mentions
in
seats
with
one
candidate
from
the
3
major
parties
(Lib
Dem,
Con,
Lab)
represented
(128
seats)
2. Number
of
mentions
of
seats
where
at
least
one
candidate
from
any
of
the
three
major
parties
was
mentioned
(367
seats)
The
results
showed:
1. In
69%
seats
where
each
of
the
main
parties
had
a
candidate
on
Twitter
the
most
mentioned
candidate
won.
(128
seats,
av.
Sample
size
677
mentions)
2. In
55%
seats
where
at
least
one
candidate
from
any
of
the
3
major
parties
was
on
Twitter
the
most
mentioned
candidate
won.
(367
seats,
av.
sample
size
313).
Verdict:
Seats
with
only
one
candidate
mentioned
on
Twitter
are
harder
to
predict
than
seats
with
2
or
more
candidates.
This
is
also
due
to
a
smaller
sample
size
of
tweets.
Comparing
seats
with
different
numbers
of
candidates
mentioned
reduces
the
average
accuracy
of
the
predictions
because
they
confuse
the
results
with
less
representative
samples.
It
is
advisable
to
filter
seats
into
groups
with
similar
levels
of
representation
to
increase
the
accuracy
of
seat-‐by-‐seat
predictions.
• The
accurate
prediction
of
Caroline
Lucas
winning
in
Brighton
Pavilion
demonstrates
that
in
seats
where
most
candidates
are
on
Twitter
and
the
sample
size
is
significant,
the
model
is
more
accurate.
• The
incorrect
prediction
of
Esther
Rantzen
winning
in
Luton
South
shows
that
the
model
is
susceptible
to
notable
media
frenzy
that
generates
considerable
buzz
online.
Conclusions
The
accuracy
of
the
national
shares
of
vote
and
regional
party
performance
trends
suggest
that
there
is
a
strong
correlation
between
online
buzz
(candidate
mentions)
and
party
performance.
This
conclusion
is
backed-‐up
by
the
fact
the
Twitter
model
predictions
closely
resembled
opinion
poll
forecasts
and
the
actual
votes
cast
on
May
6th.
It
would
be
extremely
unlikely
that
these
numbers
are
coincidentally
accurate
compared
to
both
forecasts
and
actual
events
–
and
appear
to
demonstrate
the
‘wisdom
of
crowds’.
The
results
also
strongly
suggest
the
demographic
make-‐up
and
political
preferences
of
Twitter
users
is
not
necessarily
a
significant
factor
when
predicting
national
and
regional
trends
using
large
samples
of
data
mined
from
Twitter
posts.
The
accuracy
of
the
predictions
in
the
Twitter
experiment
were
similar
to
(and
in
some
cases
better
than)
demographically
weighted
opinion
polls.
This
supports
the
case
that
measurements
made
through
data
mining
in
social
media
channels
can
be
as
reliable
as
traditional
opinion
polling
techniques
when
the
sample
size
is
sufficiently
large.
This
study
makes
a
robust
argument
that
data
such
as
the
volume
of
posts,
reach
of
messages
through
Retweets
and
influence
of
individual
Twitter
users
within
the
sample
are
insightful
and
indicators
of
public
opinion
and
behaviour.
The
results
also
clearly
demonstrate
that
the
predictions
are
susceptible
to
disproportionate
media
activity
when
considering
predictions
made
from
small
samples,
e.g.
an
extreme
case
of
skewing
the
prediction
was
Esther
Rantzen’s
candidacy
in
Luton
South.
This
type
of
skewing
is
based
on
such
small
numbers
it
doesn’t
affect
the
conclusions
drawn
from
large
samples
(i.e
the
national
and
regional
predictions)
The
accuracy
of
the
predictions
would
be
improved
by
human
insight
to
filter
out
such
anomalies
because
they
are
easy
to
spot.
While
national
and
regional
trends
obviously
impact
local
realities
to
some
extent,
our
assessment
of
constituency
level
predictions
shows
that
representation
of
all
candidates
in
a
geographically
focused
sample
would
give
significantly
more
accurate
constituency-‐level
forecasts.
The
findings
of
the
study
are
therefore
similar
to
those
of
a
HP
experiment
that
predicted
box
office
success
with
97.3%
accuracy
(http://www.fastcompany.com/1604125/twitter-‐predicts-‐box-‐office-‐sales-‐better-‐
than-‐anything-‐else)
yet
the
same
methodology
would
(based
on
our
experimental
results)
find
it
more
complex
to
predict
a
film’s
income
by
city
–
but
have
greater
success
at
predicting
success
on
a
regional
basis
with
sample
sizes
in
the
10,000s.
This
kind
of
large
sample
predictive
model
can
accurately
predict
the
success
of
multiple
parties
at
a
national
and
regional
level.
At
a
local
level
with
small
samples
the
predictions
were
55
–
69%
accurate,
however
the
accuracy
of
predicting
the
overall
results
in
the
UK
election
was
very
high
(90.5%).
We
can
therefore
conclude
that
the
larger
the
sample
size,
the
more
accurate
the
predictive
power
of
Twitter
analysis
–
and
it
is
unaffected
by
demographics
when
measured
using
the
frequency
of
mentions
methodology
in
our
predictive
modelling
experiment.
About
Tweetminster
Established
in
December
2008,
Tweetminster
is
a
media
utility
that
aims
to
make
UK
politics
more
open
and
social.
You
can
use
Tweetminster
to:
• Find
and
follow
MPs
and
PPCs
on
Twitter:
http://tweetminster.co.uk/
• Access
curated
lists
of
relevant
news,
commentary
and
politicians
http://twitter.com/tweetminster
• Measure
the
pulse
of
UK
politics
in
real
time:
dynamically
analyse
and
make
sense
of
information
and
data
around
political
conversations
and
news
stories:
http://search.tweetminster.co.uk/pages/about
Find
out
more:
www.tweetminster.co.uk
Follow
us
on
Twitter:
www.twitter.com/tweetminster