A Laboratory Evaluation of Six Electronic Voting Machines

Fred Conrad
University of Michigan

Multi-institution, Multi-disciplinary Project
University of Michigan Frederick Conrad Emilia Peytcheva Michael Traugott University of Maryland Paul Herrnson Ben Bedersen

Georgetown University University of Rochester Michael Hanmer Richard Niemi

Agenda

The problem:

Usability can affect election outcomes! Anything unique about what we did? Satisfaction Performance

Method:

Some results:
 

Implications

Acknowledgements

Wil Dijkstra, Ralph Franklin, Brian Lewis, Esther Park, Roma Sharma, Dale Vieriegge National Science Foundation:

Grant IIS-0306698

Survey Research Center Partners:

Institute for Social Research, University of Michigan
Federal Election Commission (FEC), Maryland State Board of Elections, National Institute of Standards and Technology (NIST)

Vendors:
 

Diebold, Hart InterCivic, ES&S, NEDAP, Avante Note: Sequoia declined invitation to participate

Scope and limits of current work

Today’s talk presents a small scale study that was designed to demonstrate potential challenges and inform future work It does not address system accuracy, affordability, accessibility, durability or ballot design The voting systems tested were those available when the study was conducted; some machines may have been deployed with different options; some machines may since have been updated

Voter intent and e-voting

Hanging chads in Florida 2000 came to symbolize ambiguity about voter intent E-voting (e.g. touch screen user interfaces) can eliminate this kind of ambiguity
 

With e-voting, no uncertainty about whether vote is recorded Though whether or not voter pressed a button on a touch screen can be ambiguous

E-voting may introduce usability problems that threaten credibility of voting tallies

Usability ≠ Security

Much of the e-voting controversy surrounds security

Are the systems vulnerable to systematic, widespread fraud?

We propose that at least as serious a threat to integrity of elections is usability

Are voters ever unable to enact their intentions because of how the user interface is designed? Are they ever discouraged by the experience?

Procuring e-voting systems may depend on usability, security and cost, among other criteria

Usability is only one characteristic of overall performance

Our focus on usability is not intended to suggest that other dimensions of system performance are not important We are simply focusing on usability Accuracy, Accessibility*, Affordability, Durability, Security, Transportability

 

*we did not test with disabled users

Some Hypotheses

Voters will make more errors

If they have limited computer experience

unfamiliar with interface and input conventions: scroll bars, check boxes, focus of attention, keyboard e.g. writing-in votes, changing votes

For some voting tasks than others

Voters will be less satisfied

the more effort required to vote

e.g. more actions like touching the touch screen

Current Project

Examines usability of 6 e-voting systems
 

5 commercial products (used in 2004) 1 research prototype

Field (n ≈1500 ) and laboratory (n= 42)

Breadth vs. depth

Focus today on laboratory study

The machines
 

Selected to represent specific features Vendors (with exception of NEDAP) implemented ballots for best presentation Photos that follow taken by our research group – not provided by vendors

Avante Vote Trakker
Image removed to reduce size of file; contact author for complete presentation

Diebold AccuVote TS
Image removed to reduce size of file; contact author for complete presentation

ES&S Optical Scan
Image removed to reduce size of file; contact author for complete presentation

Hart InterCivic eSlate
Image removed to reduce size of file; contact author for complete presentation

NEDAP LibertyVote
Image removed to reduce size of file; contact author for complete presentation

UMD Zoomable System
www.cs.umd.edu/~bederson/voting

Image removed to reduce size of file; contact author for complete presentation

General approach (lab and field)

Before voting, users indicate intentions by circling choices in each contest

In some contests, instructed how to vote

All users asked to vote on all 6 machines

with one of two ballot designs:
 

Office Block Straight Party option

in 1 of 6 random orders (Latin Square)

General approach (cont’d)

Tasks:
   

change a vote write-in a vote abstain (undervote) in one contest two contests required voting for 2 candidates

Users complete satisfaction questionnaire after each machine

Lab Study Design
Computer Experience Ballot Design
Office Block Low 21 High* 9

Straight Party

10 n = number voters

2

* > twice a week

Lab Study: Design and Procedure
 42
 

people recruited via newspaper ads

31 with limited computer experience 29 over 50 years old

Why did we oversample older users with little computer experience?

Because e-voting systems must be usable by anyone who wants to vote If anyone is unable to enact their intentions because of the user interface, the technology is failing We wanted to focus, in our small sample, on those people most likely have problems

More about users

Visited lab in Ann Arbor, MI in July and August, 2004

paid $50 for 2 hours

Previously voted in an election
 

95% reported voting previously 7% reporting using touch screens when they voted

Prior voting experience
    

Paper: Punch card: Lever machine: Dials and Knobs: Touch screen:

43% 69% 48% 19% 7%

Design and Procedure (cont’d)
 

All machines in a single large room 2 video cameras on rolling tripod
 

1 per 3 machines Proprietary designs ruled out use of direct screen capture e.g. scan converter or Morae

Satisfaction Results

Preview:
  

Left-most bar (Diebold) Right-most bar (Hart) Consistent with data from field study (n

≈ 1500)

Provides face validity for lab results with small sample

“The voting system was easy to use”
(1= Strongly Disagree, 7 = Strongly Agree)

7 6 5 4 3 2 1
Diebold ES & S Zoomable Avante NEDAP Hart

Agreement

Machine

“I felt comfortable using the system”
(1= Strongly Disagree, 7 = Strongly Agree)

7 6 5 4 3 2 1
Diebold ES & S Zoomable Avante NEDAP Hart

Agreement

Machine

“Correcting my mistakes was easy”
(1= Strongly Disagree, 7 = Strongly Agree)

7 6 5 4 3 2 1
Diebold ES & S Zoomable Avante NEDAP Hart

Agreement

Machine

“Casting a write-in vote was easy to do”
(1= Strongly Disagree, 7 = Strongly Agree)

7 6 5 4 3 2 1
Diebold ES & S Zoomable Avante NEDAP Hart

Agreement

Machine

“Changing a vote was easy to do”
(1= Strongly Disagree, 7 = Strongly Agree)

7 6 5 4 3 2 1
Diebold ES & S Zoomable Avante NEDAP Hart

Agreement

Machine

Why the differences in satisfaction?

We believe the answer lies in the details of the interaction Thus, we focus on subset of voters using these two machines:
   

Office block ballot Limited computer experience n = 21 Represents 20% of (what we project will be) 13,000 codable behaviors

Focus on subgroup of users
Computer Experience Ballot Design
Office Block Low 21 High 9

Straight Party

10

2

n = number voters

Coding the Video

Image removed to reduce size of file; contact author for complete presentation

Coding the Video (2)

Image removed to reduce size of file; contact author for complete presentation

Sequential analysis

Goal is to identify and count event patterns

Order is critical because each event provides context for events that follow and precede it E.g. trouble changing votes when original vote must be deselected:
How many times did voters press new candidate without first deselecting?  How often did they do this before consulting Help?  How often did they do this after consulting Help?

Tree analysis example

Number of Actions

For every touch screen action there are two actions with rotary wheel
 

Touch screen: press screen with finger Rotary wheel: move wheel and press “Enter”

Empirically, people take proportionally more actions
 

Diebold: Hart:

1.89 actions per task 3.92 actions per task

Number of Actions
14 12 Number of Actions 10 8 6 4 2 0
t e . r e d d p 's e 1 e n . ud or is. eri ff nt ss ep .2 . 3 t. 4 iew ur to tic st en d g udg o ar oar st. R e er n Stat . G e st st s v ce sid e ena e S e A te S te R o mm Sh e Co Jus . Ju . Ju B J tt tat u e Qu e Qu e U Re Ac r e o v . of a t. y B Qu t S c e a C r A S Q ty f. s G P St c so ic C bat St ty oun p rem . Ch D i ibra US s o it Se ff un C Pr ans er L Ct t. A Tra Su Co o f up. p. C Tr mb k S er Su Me Cl tate ate S St

Diebold Hart

Getting started

Change vote

Write-in

Voting Task

Duration

Voting duration (mins) varied substantially by machine
 

Diebold: Hart:

4.68 10.56

(sd = 1.27) (sd = 4.53)

Presumably due to larger number of actions in Hart than Diebold  And possibly more thorough ballot review

Accuracy

Varies by Machine and Voting Task:

2 Candidates (State Representative)
 

Inaccurate enough for concern Errors of Omission: just voted for one candidate

Write-In (Member Library Board)
  

Quite inaccurate for Hart Errors of commission: name spelled wrong Errors of omission: no write-in vote (in the end)

Changing Vote (Probate Court Judge):
 

Overall accurate but slightly less accurate for Diebold Error of commission: unintended candidate remains selected

Voting Accuracy
Diebold Hart 1 Proportion Correct 0.9 0.8 0.7 0.6 0.5

2 Candidates

Change vote

Write-in

. . f t s e . t e e 4 3 rd a rd t. 1 t. 2 or Rep rnor ate en Au d en Rep mis e ri f o ur stic ust' t. t. en dg dg at t oa m t. G ate Sh e C . J u c. J t. Ju e J u t. B y Bo u es u es u es u es e S ate s id Sen US o ve of S t o ty at St re S t A S Q Q Q Q G is ar P hf yC n c. so ic C ba t St em U ibr n t Cou up r t. C . As it D Se ff Pro ou C Ct S ra ns er L C T f a p. . Tr mb k o Su up S le r ate Me C te t S Sta

Voting Task

Number of Actions: Getting Started

Image removed to reduce size of file; contact author for complete presentation

8 actions minimally required to access system: 4 selections and 4 “Enter” presses

Number of Actions: Getting Started

Image removed to reduce size of file; contact author for complete presentation

2 actions required to access system: Insert access card and press “Next”

Access examples

Hart

 

Voter is not able to select digits with rotary wheel, attempts to press (non-touch) screen, requests help Help does not help Voter figures it out Voter slides access card into reader Presses “Next”

Diebold
 

Number of Actions: Vote Change

Diebold requires “de-selecting” current vote in order to change it
  

Clicking on already checked check box Likely to be opaque to non-computer users Despite manufacturer-provided instructions

On only 11/21 occasions, voters correctly deselect on first try  On 10/21 touched second candidate without first deselecting original selection

Number of Actions: Vote Change

Changing votes is essential for correcting errors and expressing change of heart Example of problem changing vote:

Voter 27

Number of Actions: Write-in

Write-in votes generally involve as many actions as letters in the name

Double this if navigation and selection required

Example of problems correcting write-in mistakes:

Voter 38

Review

Both machines offer similar ballot review:

Displays voters’ choices and highlights unselected contests

In both cases, ballot review spans two pages

Review: Hart
Image removed to reduce size of file; contact author for complete presentation

Review: Diebold
Image removed to reduce size of file; contact author for complete presentation

How often do voters review their votes?

On how many occasions did voters cast ballot without reviewing all choices (displaying the second review page)?
 

Hart: Diebold:

8/34 17/29

Diebold review much briefer than Hart suggesting cursory review
 

Hart: Diebold:

55.5 seconds 9.8 seconds

Review Example 1

Diebold:

Voter (seems to accidentally) not vote in one contest, resulting in an undervote Completes ballot and system displays review screen She immediately presses “Cast Ballot” and says “That one I felt confident in … didn’t even need to go over it”

Review Example 2

Hart
   

Voter (seems to accidentally) not vote in two contests, resulting in two undervotes Completes ballot and system displays first of two review screens He selects first undervote (in red text) and system displays relevant contest in ballot He selects intended candidates, i.e. votes for circled candidates in voter info booklet, and system displays first review screen He repeats for second undervote

Review screens

Some designs promote more review and correction of errors than others

Hart review screens visually distinct from ballot screens and, if voter presses “Cast Vote” after first review screen, system displays second screen Diebold review screens hard to distinguish from ballot screens and if voter presses “Cast Ballot” without scrolling to see lower part of screen, system casts ballot

More review and correction surely improves voting accuracy but involves more work which may lead to lower satisfaction

Summary

User satisfaction and performance related to particular features

Touch screen involves fewer actions and seemed more intuitive to these users than wheel-plus-enter sequence Deselecting a choice in order to change it seemed counterintuitive to many voters and responsible for at least one incident of casting an unintended vote Review screens designed to promote review (distinct from ballot, hard to cast vote in middle) led to more review and correction

Summary (cont’d)
These users were more successful on some tasks with Hart and on others with Diebold  Fit between features and tasks more appropriate level of analysis than overall machine

Conclusions

In a situation designed to maximize usability problems, the machines mostly fared well But they did exhibit some usability problems and accuracy was not perfect
 

Both unintended votes and no votes Substantial proportion voters did not review their ballots

Seems likely that non-computer users will not recognize interface conventions:

E.g. De-selection and scrolling

Even very low error rates -- for just computer novices -- can matter in very close elections

Conclusions (con’t)

We cannot compare voters’ performance with new technology to older techniques But we will be able to use performance with the ES&S (paper ballot, optical scan) as a rough baseline Certainly, voting systems are now being securitized in a way they were not before

Implications

Most of these design problems can be improved by applying usability engineering techniques But industry and election officials need to make this a priority

EAC/NIST developing usability guidelines Systems should be usable by all citizens all the time, even if used once every few years

Unparalleled design challenge:

Thank you!

Additional Slides if time permits
User Interface Can Affect Outcome  Variance  Bias  Some usability measures  Measures (cont’d)

User Interface Can Affect Outcome

Ballot Design

Butterfly ballot Casting ballot too soon Changing votes Writing-in votes Navigating between contests Reviewing votes Abandonment Lower Turnout in Future Voters might question results

Interaction
    

Frustration, Increased Cynicism
  

Variance

Interface-related error is not systematic

all candidates should suffer equally from this (all else being equal) E.g. if difficult to change votes, doesn’t matter which selections require change

But unlikely that error for different candidates is exactly complementary

Bias

Interface systematically prevents votes from being cast for a particular candidate

Results either in no vote being cast or voter choosing unintended candidate

e.g. Butterfly Ballot may have led Jewish voters who intended to vote for Al Gore to vote for Pat Buchanan

Some usability measures
 

Satisfaction Accuracy
 

Do voters vote for whom they intend? In lab, compare circled choices to observable screen actions In field, compare circled choices to ballot images and audit trails

Measures (con’t)

Number of Actions
 

Presses and clicks Substantive actions, e.g. requests for system help, revisions of earlier selections

Duration
 

Per task Overall

Sign up to vote on this title
UsefulNot useful