You are on page 1of 34

Using Episodic Memory for User Authentication

SIMON S. WOO, Sungkyunkwan University


RON ARTSTEIN, USC/Institute for Creative Technologies
ELSI KAISER, USC/Linguistics
XIAO LE and JELENA MIRKOVIC, USC/Information Sciences Institute

Passwords are widely used for user authentication, but they are often difficult for a user to recall, easily
cracked by automated programs, and heavily reused. Security questions are also used for secondary authen-
tication. They are more memorable than passwords, because the question serves as a hint to the user, but
they are very easily guessed. We propose a new authentication mechanism, called “life-experience passwords
(LEPs).” Sitting somewhere between passwords and security questions, an LEP consists of several facts about
a user-chosen life event—such as a trip, a graduation, a wedding, and so on. At LEP creation, the system ex-
tracts these facts from the user’s input and transforms them into questions and answers. At authentication,
the system prompts the user with questions and matches the answers with the stored ones. We show that
question choice and design make LEPs much more secure than security questions and passwords, while the
question-answer format promotes low password reuse and high recall.
Specifically, we find that: (1) LEPs are 109 –1014 × stronger than an ideal, randomized, eight-character pass-
word; (2) LEPs are up to 3× more memorable than passwords and on par with security questions; and (3) LEPs
are reused half as often as passwords. While both LEPs and security questions use personal experiences for 11
authentication, LEPs use several questions that are closely tailored to each user. This increases LEP security
against guessing attacks. In our evaluation, only 0.7% of LEPs were guessed by casual friends, and 9.5% by
family members or close friends—roughly half of the security question guessing rate. On the downside, LEPs
take around 5× longer to input than passwords. So, these qualities make LEPs suitable for multi-factor au-
thentication at high-value servers, such as financial or sensitive work servers, where stronger authentication
strength is needed.
CCS Concepts: • Security and privacy → Usability in security and privacy;
Additional Key Words and Phrases: Authentication, password, security question, usability, template
ACM Reference format:
Simon S. Woo, Ron Artstein, Elsi Kaiser, Xiao Le, and Jelena Mirkovic. 2019. Using Episodic Memory for User
Authentication. ACM Trans. Priv. Secur. 22, 2, Article 11 (March 2019), 34 pages.
https://doi.org/10.1145/3308992

This research was supported by the Ministry of Science and ICT (MSIT), Korea, under the ICT Consilience Creative program
(Grant No. IITP-2017-R0346-16-1007) supervised by the Institute for Information & communications Technology Promotion
(IITP), by NRF of Korea by the MSIT (Grant No. NRF-2017R1C1B5076474).
Authors’ addresses: S. S. Woo, 2066 Seobu-ro jangan-gu, Sungkyunkwan University, Suwon, 16419, South Korea; email:
swoo@g.skku.edu; R. Artstein, USC ICT, 12015 Waterfront Drive, Playa Vista, CA 90094-2536; email: artstein@ict.usc.edu;
E. Kaiser, Department of Linguistics Grace Ford Salvatori 301 Los Angeles, CA 90089-1693; email: emkaiser@usc.edu; X.
Le and J. Mirkovic, USC ISI, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292; emails: {xiaole, mirkovic}@isi.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2471-2566/2019/03-ART11 $15.00
https://doi.org/10.1145/3308992

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:2 S. S. Woo et al.

1 INTRODUCTION
Textual passwords are widely used for user authentication. An ideal password should be easy
for a user to remember but difficult for others to guess. Users value these two requirements very
differently. High recall is very important, as it allows the user to access the desired service. Security
is important, but less so, as it only guards against an unwanted, but from a user’s perception
rare, unauthorized access. These different priorities lead to two prevalent approaches to password
creation. In one, users relate their passwords to some personally salient facts, such as names and
birth dates, which makes them easy to remember but also easy to guess (Veras et al. 2014; Bonneau
2012). In the other approach, users create several moderately strong passwords (Wash et al. 2016),
but then reuse them to achieve high recall. This lowers security, because passwords stolen from
one server can be used to gain access to another one (Blog 2016).
Many alternatives to textual passwords have been proposed, such as graphical passwords,
cognitive authentication, one-time passwords, hardware tokens, phone-aided authentication, and
biometric passwords. However, research has shown that none of these can offer convenience,
simplicity, and user familiarity (Bonneau et al. 2012, 2015b) comparable to that of textual
passwords. We thus focus on trying to improve text-based authentication. Specifically, we seek to
change how passwords are created and used to make their recall more natural, while improving
their security and diversity.
We observe that it is highly unnatural to humans to create new, complex memories (such as
passwords) and recall them in minute detail (e.g., recall capitalization and placement of special
characters) after a long time period, and without any hints. Humans remember by association,
relating new facts to existing memories (Mastin 2016). This makes it difficult to create and recall
many new, strong passwords. Humans also recall by reconstructing facts, sometimes imprecisely,
from relevant data stored in the brain (Mastin 2016). Thus, a user may recall that they used a family
member’s name plus their birth year for a password, but forget which family member they chose,
if they used their first or last name, how it was capitalized, and so on. This makes it difficult to
precisely recall passwords.
We propose a new authentication method using life-experience passwords (LEPs), which extract
authentication secrets out of a user’s existing memories related to a chosen life experience, such
as a trip, a graduation, a wedding, a place, and so on. This lowers users’ cognitive burden and
mimics what users already do—build passwords out of existing memories. To achieve high recall,
we transform existing user memories into a series of questions and answers. The questions are
used at authentication time as hints for the user, and the answers become the password. We further
apply imprecise matching at authentication. Authentication succeeds when a user recalls enough
answers with enough precision. LEPs have 2–3× higher recall than regular passwords;—73% are
recalled after a week and 54% are recalled after 3–6 months. LEPs are 109 –1014 × stronger than an
ideal, random, eight-character password, making them hard to guess via offline attacks. Friends
and family also have a hard time guessing LEPs. In our studies, only 0.7% of LEPs were guessed by
acquaintances and 9.5% by very close friends or family members. We further achieve low reuse of
LEPs by having different servers prompt users for different memories at password creation time.
In our studies, LEPs were reused half as often as passwords.
LEPs resemble security questions, in that both use personal experiences for authentication and
both achieve very high recall. But LEPs use more diverse, unique, and memorable personal facts
than security questions, and these are deeper, more specific facts. This makes LEPs 24–35× harder
to guess than security questions (Schechter et al. 2009). LEPs contain 2.4–3.2× fewer fake answers
compared to security questions.
There are two downsides to LEPs: (1) user burden for creating and authenticating with an LEP is
3–6× higher than when using passwords—a burden that may be prohibitive for some applications

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:3

and devices; (2) LEPs may contain sensitive information, which poses a privacy risk. While these
are serious drawbacks, our studies found that 93.7% of users would use LEPs for high-security
content (e.g., banking applications). We further found that only 3% of LEPs contained generally
sensitive information. We can reduce this risk with better user prompts, which discourage use of
sensitive information. We believe that LEPs are a promising authentication method for some users
and certain applications, where the large benefits would outweigh the user burden and privacy
risk.
LEPs were originally introduced in Woo et al. (2016a), where we described their design and two
reference implementations and evaluated their security, recall, diversity, and resistance to acquain-
tance guessing. In this article, we extend our prior work to add additional implementation, provide
more details on automatic question generation, improve our evaluation of LEPs’ security, and eval-
uate resistance to guessing by family or very close friends. Thus, the current article represents a
complete study of LEPs for user authentication.
The article is organized as follows. We discuss related work to LEPs in Section 2. We present LEP
design in Section 3, and we detail the setup of our human studies in Section 4. Section 5 presents
the results of our evaluation of LEPs. Section 6 summarizes our results, and Section 7 provides
discussions. Finally, we offer our conclusions in Section 8.

2 RELATED WORK AND COMPARISON TO OTHER APPROACHES


There is much research on non-textual alternatives to passwords, such as graphical pass-
words (Jermyn et al. 1999; Corporation 2015; Davis et al. 2004; Mnemonic Guard 2015a, 2015b;
Inc. 2015; Brain Authentication 2016; Bojinov et al. 2012), video passwords (Yamamoto et al. 2008;
Denning et al. 2011), and biometric passwords (Duncan 2016). Due to space limitations, we only
survey research that is directly related to textual passwords.
Helping users create stronger passwords. A few recent publications (Castelluccia et al. 2012;
Veras et al. 2014; Komanduri et al. 2014; Huh et al. 2015) proposed improvements in password
design using Markov models, semantic patterns of passwords, and user feedback. For example,
Telepathwords (Komanduri et al. 2014) help users create strong passwords by comparing user input
with popular substrings. When such a substring is detected, Telepathwords provides actionable,
real-time feedback to steer a user toward a different choice. In Huh et al. (2015) the system ran-
domly generates strong passwords and then allows a user to replace a few characters to make the
password more memorable. Also, Ur et al. (2017) proposed the neural-network-trained data-driven
password strength meter. It provides the detailed text-based feedback to users so that users can
make informed decision when creating their passwords. However, all these techniques increase
password strength, but all require users to create new memories of complex strings. Our work on
LEPs differs fundamentally from these approaches, because we seek to exploit existing memories
and thus increase strength and memorability of textual passwords and lower their reuse. In par-
ticular, we believe that encouraging use of different LEPs for different accounts is an important
factor in today’s user authentication where password reuse is rampant.
Security Questions. Security questions (Guideline 2006) are often used for secondary authen-
tication, e.g., when a user loses her password, or to supplement password-based authentication
for high-security servers (e.g., bank). A user is offered a choice of a small number of fixed ques-
tions, such as mother’s maiden name, pet names, favorite teacher names, best friend names, and
so on. Recently, Micallef and Arachchilage (2017) proposed the game-based approach to help users
improve memorability of system-generated security questions. While both security questions and
LEPs use personal knowledge for authentication, there are significant differences. We discuss them
here and summarize them in Figure 1, which also illustrates sample security questions and an LEP
on the high school topic.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:4 S. S. Woo et al.

Fig. 1. Security questions versus LEP example for “high school” topic.

Applicability. Organizations usually offer a very limited choice of security questions. There may
be users to whom no question applies. For example, a user may not have a favorite high school
teacher or a best friend, or may have multiple teachers/friends that she likes. When faced with
such questions users select answers that they do not recall at authentication time. Schechter et al.
(2009) and Bonneau et al. (2015a) measured that 20–40% of security questions cannot be recalled
by users. Conversely, during LEP creation users can choose, with very little constraint, the topic
they want to talk about and the facts about that topic, which are memorable to them. This leads
to more personalized facts and thus higher recall.
Depth of facts. Security questions ask for shallow facts or factoids (e.g., pet’s name, best friend’s
name), which are generally applicable to many users. Such facts can be mined from public sources
(Griffith and Jakobsson 2005; Schechter et al. 2009) or guessed using statistical attacks (Bonneau
et al. 2015a). Easy guessability leads users to provide fake answers to security questions, which
leads to low recall when they cannot remember the fake answers. However, LEPs ask for deep
facts—deeper contextual information about the user involved memorable events, such as memo-
rable people, places, activities, or objects. Answers to such questions are not easily found on social
networks, or guessed by family and friends, which removes the need for users to lie. In our studies,
only 0.7% of LEPs were guessed by friends (compared to 17–25% of security questions (Schechter
et al. 2009)) and only 11.5–15.7% of answers were fake (compared to 37% for security questions
(Bonneau et al. 2015a)).
Number of facts. Security questions contain only one fact, which may be easily guessed or ob-
tained from public sources. LEPs contain a larger number of facts, and a user must recall most or
all of them for authentication. Thus, the barrier for a successful guessing attack is higher.
Another approach to security questions is to let users choose the questions themselves. This
allows users to freely choose facts that are relevant to them, but decreases security (Just and
Aspinall 2009; Schechter et al. 2009). While LEPs also allow users to choose which facts to provide,
our fact elicitation guides them toward secure, memorable, and stable facts. This allows LEPs to
outperform user-chosen security questions.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:5

Fig. 2. LEP creation and authentication (either precise or imprecise matching can be applied for authenti-
cation, where imprecise matching improves usability).

Cognitive Passwords. Similar to LEPs, cognitive passwords are based on personal facts, in-
terests, and opinions that are thought to be easily recalled by a user. Article Cognitive password
(2016) provides an overview, definitions, and some examples of cognitive passwords. Das et al.
(2013) and Nosseir et al. (2005) explore autobiographical authentication that uses facts about past
events, which are captured by smartphones or calendars, without any user input. Similarly Hang
et al. (2015) explore use of call, SMS, and application use/installation facts on smartphones to au-
thenticate phone users. While phone usage and calendar information may be memorable in short
intervals after it is collected, humans do not remember ordinary daily events for long periods nor
with great consistency (Bradburn et al. 1987). Recall of automatically gathered autobiographical
information ranged from 64% to 75%, while acquaintances could guess 40–50% of correct answers.
However, LEPs require more user effort during creation but elicit more salient facts (Niedzwienska
2003; Bradburn et al. 1987). LEP recall is 70–73% after one week, but it remains stable over long
time periods (53–54% after 3–6 months), and only 0.7% of LEPs can be guessed by acquaintances.
Mnemonic. While mnemonic passwords can improve memorability, Kuo et al. (2006) found
that a majority of users’ mnemonic passwords are from popular phrases that can be found on the
Internet. Woo and Mirkovic (2016) proposed hint- and guide-mnemonics to help users create strong
and memorable passphrases. They successfully demonstrated methods to prevent users from using
phrases from popular quotes and dictionary words, when creating mnemonic-based passphrases.
Narrative Authentication. Somayaji et al. (2013) proposed use of narratives for user authen-
tication but did not evaluate them. Their narratives require users to associate imaginary objects
with past memories (e.g., contents of a drawer from a childhood bedroom) and may also be fully
fictional. Because of fictional facts, we expect they would be less memorable than LEPs.

3 LIFE-EXPERIENCE PASSWORDS
This section describes the design and the implementation of LEPs. We discuss our choices for
LEP topics and facts in Section 3.1, attacker models and strength calculation in Section 3.2,
the LEP creation process in Section 3.3, fact extraction and question generation in Section 3.4,
and LEP-based authentication in Section 3.5.
Figure 2 shows the LEP creation process. A user identifies a life-experience she wants to use
for an LEP. She then inputs a title for this experience and recounts interesting facts, with some
guidance from the system. The system mines the facts from the user’s input, and transforms them

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:6 S. S. Woo et al.

Table 1. LEP Topics

Category Subcategory
Event Engagement, wedding, birth, death, accident, graduation, party, trip
Learning Driving, skiing, snowboarding, swimming, biking, skill/art, language
About Person, place

into question-and-answer pairs. Questions and the title must be stored as clear text, because they
are shown to users during authentication. To store the answers, we concatenate either all of them,
or several subsets (see Section 3.5), add the salt, and hash the resulting string(s). During authenti-
cation, the system displays the title and the questions, and user answers are hashed and compared
to those stored by the system using imprecise matching.

3.1 Topics and Facts


In this Section, we discuss how much guidance should be provided to users during LEP creation.
In general, users need some guidance to remember interesting facts to recount. Further, elicitation
must be carefully developed to produce facts, which can be accurately recalled by users, and cannot
be easily guessed by others.
Which experiences can be used for LEPs? Letting a user freely select an experience to talk
about, without any guidance, may not produce secure and memorable input, as shown by research
on self-built security questions (Just and Aspinall 2009; Schechter et al. 2009). This motivated us
to build a list of diverse and general topics, to guide password creation (see Table 1).
How to elicit useful facts? A useful fact in the context of LEP is a fact that is strong, stable,
and immutable.
A strong fact has many possible answers to the question, which gives it strength against pass-
word guessing attacks (see Section 3.2). The stronger facts used in an LEP, the fewer of them we
need for a desired password strength, thus reducing the time needed for password creation and
authentication.
A stable fact is consistently recalled by a user. We have learned by exploring several LEP designs
that stability is influenced the most by the fact’s type and our elicitation method during creation
and authentication. Subjective facts about feelings and opinions are inconsistently recalled by
users. We thus ask about objective facts, such as names, locations, times, objects, and activities.
Further, elicitation specificity makes a large difference. The more specific questions we pose during
elicitation, the more stable facts we get. A user may use multiple terms for the same person (e.g.,
“my mom,” “mom,” “mother,” “Jennifer”). Transforming “who” questions into “first and last name”
questions reduces the ambiguity and increases stability of answers. Another source of instability
comes from asking for a singular answer to a question that may have a plural answer (e.g., a user
has two best friends, a user took multiple trips to the same location). However, asking too specific
questions (e.g., “What is the first and last name? . . . ” but the user only recalls the first name) or
questions that do not apply to a given user (e.g., a pet’s name when the user does not own a pet)
also leads to unstable facts. We believe that stability may be responsible for many authentication
failures found in the past studies of security questions (Schechter et al. 2009; Bonneau et al. 2015a).
We have refined our elicitation process to contain very specific prompts, which depend on the
user’s prior input. This improved the stability of the facts we were able to elicit.
Another technique we leveraged to improve stability was to ask users, once they chose their
topic and title, if this title would apply to multiple life events. For example, if a user chose to talk
about a trip and supplied the title “Trip to Spain,” then we would ask if this is the only trip to

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:7

Spain they took. If the answer was “no,” then we would ask the user to provide more words that
uniquely describe their chosen trip, e.g., “trip to Spain after college.”
An immutable fact does not change over time. For this reason, we do not use preferences and
opinions (e.g., “What is your favorite band?”) for building LEPs, because such facts are mutable.
During LEP creation, we mine facts about people, locations, time, objects, and activities.
These facts are objective, and thus immutable. People and location facts have high strength (see
the Section 3.2), while time, objects and activities have lower strength but still substantial strength.
Further, we have designed our elicitation process to produce very specific questions, and thus
stable facts.
Privacy risk. LEP questions and answers contain information about some past event, which
may pose privacy risk to a user if questions are observed by others, or if answers are guessed or
cracked and the facts, which are revealed are sensitive. We have evaluated how easy it would be for
attackers to guess LEPs in Section 5, and our results show that LEPs are very robust to guessing.
With regard to reveal of sensitive facts in LEPs, we have designed our fact elicitation process
to advise users against using sensitive or incriminating facts. We have also analyzed our facts to
determine how many of these could be labeled “generally sensitive.” We say that a fact is “generally
sensitive” if it talks about illness, injury, death, incarceration, crime, or love affair. We found that
only 3% of LEPs in our study had sensitive information. Predominantly these facts were supplied
in response to our “illness” or “death” topic suggestions. Removing these topic suggestions is thus
likely to greatly reduce sensitivity of user-provided information.

3.2 Strength and Attacker Models


Strength of a password can be measured as the number of trials a guessing attacker has to make un-
til success—this is known as the guess number (Bonneau 2012) or the heuristic measure of password
strength (Bonneau et al. 2015b). We use the guess number as our measure of strength of a typi-
cal textual password. In particular, we use Dell’Amico and Filippone (2015)’s Monte Carlo method
trained on 21 million leaked passwords from Rock You, LinkedIn, MySpace, and eHarmony dataset
to obtain the statistical strength.
In calculating strength of LEPs, we consider the following attacker models: A statistical at-
tacker compiles ranked dictionaries of popular answers within a fact category, and tries them in
the order of popularity. An acquaintance attacker forms guesses using her personal knowledge
of the user, and may mine some guesses from the user’s social network pages or use search en-
gines. A family/close-friend attacker behaves as acquaintance attacker but has much deeper
knowledge about the target individual. He/she is the most powerful social knowledge attacker. A
password-reuse attacker has stolen a user’s password from one server and attempts to reuse it,
in the exact or a slightly modified version, to gain access to another server.
We assume that statistical and password-reuse attackers are offline attackers (Bonneau et al.
2015a). They will use automated programs to crack LEPs, just like they do for passwords today.
They can make as many guesses as their dictionaries allow. An acquaintance and a family/close-
friend attacker are online attackers (Bonneau et al. 2015a), and will attempt to guess passwords
manually. We assume that these online attackers are allowed to make a small number of guesses,
before being locked out by the server for excessive failed logins.
Unlike passwords, an LEP consists of multiple facts. Users input all the facts at once and the
system compares combinations of the facts with hashed and salted combinations stored at creation
time (see Section 3.5).
We calculate the strength against a statistical attacker as the product of individual-fact strengths:

st
S LEP = ki=1 S fsti , where S st denotes the rank of the given fact on the list of all facts of the given

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:8 S. S. Woo et al.

Table 2. Fact Categories and Their Statistical Strengths (See Appendix for Sources)

Category Description Statistical strength


Lists Min. size Max. size Unique items
FN first name (e.g., John) 384 3 38,717 150,695
LN last name (e.g., Doe) 80 100 151,671 223,096
FL first and last name (e.g., John Doe) combinations of FN and LN
PL place (e.g., UCLA) 48 10 18,467 36,864
CI city (e.g., Seattle) 8 85 870 2,230
OBJ object (e.g., watch) 30 6 19,681 22,210
ACT activity (e.g., kayaking) 4 14 276 385
DT date (e.g., 2/28/1972) n/a
YR year (e.g., 2001) n/a
RL relationship (e.g., mom) n/a
HU approx. 100 choices (e.g., Toyota) n/a
TN approx. 10 choices (e.g., yellow) n/a

type, sorted from the most to the least popular, k is the number of facts in a given LEP, and fi is the
ith fact. This method of calculation assumes an ideal guessing strategy for the attacker, where he
keeps guessing the first fact until success, then moves on to the second, and so on. Such guessing
is unrealistic, since the LEPs are stored as hashes of fact combinations, and the attacker cannot
tell if he guessed a part of the combination correctly. Thus, our statistical strength estimate is the
lower bound of the actual strength of LEPs against statistical attacks.
Our method of attacker guessing also assumes fact independence. However, in real life, facts
are related to each other and some combinations will be more likely than others. We found it very
hard to find sufficient public data to explore these correlations and include them in our calcula-
tions. We thus address this issue in a limited and conservative manner: (1) for first and last name
combinations, which have the highest strength, we conservatively assume that there is only one
first name, which can be coupled with a given last name; (2) for objects, actions, and locations
related to a topic (e.g., wedding), we perform Bing search for that topic and mine most frequent
objects and locations from the answers. We use this information to create a ranked list of popular
items per topic, as described below.
To determine strength of facts against statistical attacks, we classify LEP facts into categories,
shown in Table 2. We assume an intelligent attacker, who can infer the fact’s category or topic
from the user’s authentication prompt, and guesses answers only within that category or topic.
We measure the statistical strength of a fact as: the rank of the fact on a ranked list of pop-
ular facts in its category or for a given topic. A challenge lies in creating sufficiently large and
comprehensive lists of popular facts, and ranking them based on their popularity.
We examined many different data sources, seeking to identify: (1) the total number of possible
facts, and (2) the ranked list of popular facts, within each fact category. We provide a brief explana-
tion of our data sources here and refer the readers to the Appendix for more details. Our estimates
and popular list sizes are shown in Table 2.
Statistical strength calculation. To calculate statistical strength, we needed lists of popular
facts in each category and for each topic, ranked by popularity. For per-category lists, we used on-
line domain-specific sources to form these lists. We gathered around 434,000 unique popular list
items from more than 530 different online sources. These sources include (1) Wikipedia/DBpedia
(Wikipedia, The Free Encyclopedia 2004), (2) Freebase (Freebase 2016), (3) U.S. Government
sources: U.S. Census, U.S Social Security Administration, Dept. of Education, Dept. of Labor Stat.,

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:9

National Center for Education Statistics, (4) Other domain specific online sources: TripAdviser
for popular travel destinations, Forbes and U.S. News for educational institutions, IMDB for movie
names, and so on, (5) Popular English word lists from Google 20K (Inc. 2016), and 5K nouns, words,
and lemma from the Corpus of Contemporary American English (COCA) (COCA 2015). We fur-
ther incorporated popular lists for different categories from the Bonneau et al. (2010) dataset. More
details are provided in the Appendix.
Some lists had items ranked by popularity, while others did not (e.g., the list of the 100 most
popular Chinese names from Wikipedia). For unranked lists, we used the Bing search engine to
calculate the number of Web pages containing each list item. We automatically built structured
queries as a “fact category + the item” (e.g., “first name Hao”) and mined the number of pages
from the search engine’s reply. We then assumed that the popularity of an item is proportional to
the number of Web pages containing it, and used this to rank the items in the list. While this may
not be an accurate reflection of the popularity of each item, we believe it is a good approximation
for relative rankings.
If our popular lists were too small, then we would overestimate the statistical strength of LEPs.
We show the count of popular lists per category, and their minimum and maximum sizes, as well
as the total number of unique items in Table 2. For example, in the FN (first name) category, we
have 384 popular lists, ranging from 3 to 38,717 inputs, and containing the total number of 150,695
unique names. We further evaluated how many facts collected in our user studies were covered by
our popular lists. We were able to find 75% of first name (FN), 99% of last name (LN), 81% of first
and last name (FL), 63% of city (CI), 54% of object (OBJ), 46% of activity (ACT), and 34% of place
(PL) inputs on our popular lists. Thus, our popular lists seem comprehensive enough for statistical
strength calculation.
To model attacker’s use of correlations among popular objects, activities and places for a given
topic (e.g., wedding), we crawled the most popular 100 websites returned by Bing search with a
query specific to each LEP topic. We used Selenium Python packages to crawl websites and applied
Natural Language Toolkit (NLTK) and WordNet to process web data and extract nouns and verbs
as shown in Figure 3. Then, for each topic, we created ranked lists of these nouns and verbs based
on their frequency.
We have multiple lists of popular facts per category, and objects, actions and locations also have
lists of popular facts per topic. We calculate the strength of an LEP fact as its lowest rank on any
list. This approach assumes a strong statistical guessing attacker, which has the best popular list
for each input.

3.3 Creation
LEP creation requires users to actively provide input, from which the system extracts useful facts.
An appropriate method for LEP generation should meet the following requirements: (1) It is easy
and natural for humans to use it, (2) It exposes a sufficient number of useful facts for any user. In
our work, we have investigated freeform, guided, and semi-guided methods for LEP input. These
methods are triggered after a user has chosen the topic they want to talk about and provided its
title. Figure 5 illustrates these input methods with one specific title “Trip to France.”
In the freeform method shown in Figure 5(a), the user inputs a paragraph of free narrative about
her chosen life experience. Out of this narrative, we automatically mine useful facts and gener-
ate question and answer pairs. We use question generation based on semantic and dependency
parsing (Woo et al. 2016b).
In the guided method a user is prompted with a series of questions, chosen from a fixed set
about each type of life experience. The questions are displayed one at a time and the choice of the
subsequent questions may depend on the user’s answers to the preceding ones. This is illustrated

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:10 S. S. Woo et al.

Fig. 3. Creating frequently used word lists from extracting terms by crawling the top 100 websites for each
topic.
Table 3. Comparison of Input Methods

Dimension Relationships between input methods


1. User burden freeform > guided > semi-guided
2. Number of useful facts freeform < semi-guided < guided
3. Freedom to choose freeform > semi-guided > guided
user-relevant facts
4. NLP needs freeform  guided > semi-guided
(always) (only for open-ended questions) (never)

in Figure 5(b). Some questions may be open-ended, e.g., “What else do you remember about . . . ?”.
The samples of questions used in the guided method are provided in the Appendix.
In the semi-guided method the user is prompted to input a certain number of facts in the given
category, and to provide a “hint” for each fact that will be used to form the authentication prompt.
This is illustrated in Figure 5(c). Also, the samples of questions used in the semi-guided method
are presented in the Appendix.
With regard to ease of input, freeform poses the highest burden on the user. User input in our
example (28 words) resulted in a much smaller amount of text, grouped into five useful facts
(9 words). In guided and semi-guided methods, our input methods guide the user toward use-
ful facts, such as names, locations, objects, and so on, and away from facts, which are not useful,
such as preferences, opinions and feelings. In our example, guided method required the user to
type 7 words, with 5 words leading to useful facts, while semi-guided input required 11 words,
and all led to useful facts. Guiding users toward specific facts lowers strength against statistical
attacks, since the attacker can focus his guessing within specific fact categories. It also increases
the chance that a user may not have answers to some questions (e.g., Who traveled with you on
this trip? No one). But lower guidance may also lower stability of extracted facts. The freeform
and semi-guided methods allow more freedom to the user to choose facts, but lead to less stable
facts, compared to the guided method. Freeform also leads to fewer useful facts, as users like to
chat about feelings and preferences, which cannot be used for LEPs. Finally, user input in natural
language requires some natural-language processing to be converted to question/answer format
LEPs need. Freeform input requires the highest support from NLP techniques, followed by semi-
guided and guided methods. Table 3 summarizes the relationships of the input methods along all
the above-discussed dimensions.

3.4 Fact Extraction and Question Generation


Fact extraction is needed for freeform input and for user answers to open-ended questions in the
guided input. Various NLP techniques such as part-of-speech (POS) tagging (pos 2015), semantic

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:11

Fig. 4. Question generation pipeline.

role labeling (srl 2015), named entity recognition (ner 2015), and question generation seem well-
suited for fact extraction. We experimented with the two well-known question generation frame-
works: (1) Heilman & Smith (H&S) (Heilman and Smith 2010) and (2) OpenAryhpe (OA) (Yao et al.
2012). However, these tools generated long and complex questions, which leaked answers to other
questions. This lead us to develop our own fact extraction and generation method.
Our fact extraction and question generation proceed through the following stages: (1) pre-
processing, (2) segmentation of complex sentences into simpler ones, (3) post-processing, (4) sen-
tence combining, (5) dependency parsing or semantic role labeling, and (6) question generation.
Question generation returns multiple question and answer pairs and a fact category for each pair.
Our system selects the pair with the highest statistical strength (see Section 3.2). We use mate-
tools (Björkelund et al. 2009; bar 2015) for dependency parsing and semantic role labeling to iden-
tify important parts of speech such as subject, objects, action, roles, time, location, and so on. In
extracting and processing useful facts from user responses, first we normalize inputs for capitaliza-
tion and punctuation. We also use POS tagging (Manning et al. 2014), dependency parsing (Chen
and Manning 2014), noun stemming, and semantic role labeling (srl 2015) to extract the specific
parts of user responses, such as verbs, nouns, subject, object, location, time, action, and person in-
formation. This helps us transform facts into question/answer pairs, and to identify, for multi-word
inputs, those parts that carry the most meaning for the user (nouns, proper names of people and
locations, verbs, or adjectives). The overall question generation pipeline is provided in Figure 4,
and more details on how we construct questions are given in Woo et al. (2016b).
After performing fact extraction, we manually examined all the resulting facts, and corrected
them where needed to produce more useful guidance for users. Manual correction was needed
in approximately 50% of freeform inputs, and 5% of guided inputs, which were open-ended. The
main reasons for requiring manual correction were: (1) users could not adhere to the required
input format, e.g., we asked for complete sentences like “John Smith gave a book” but many users
would provide “John Smith, book” instead, (2) users would omit parts of the answer, (3) some
input was in complex sentences, which our segmentation process could not handle, e.g., “because
of the architecture of the library, that is where I developed my love and admiration books, and the
buildings which housed them,” (4) some input was in broken English, like “I am find very happy
at the moment of sky.”

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:12 S. S. Woo et al.

Fig. 5. LEP input methods.

3.5 Authentication
Let an LEP contain N facts. We require that a user recalls M facts for authentication success, where
M ≤ N , and that the strength of the recalled facts be greater than some target value. The smaller
 N  N and M, the stronger the authentication criterion. Further, if M < N , the
the difference between
system must store M hashes for one LEP. During authentication, the system produces all possible
combinations of M user answers, and hashes them. Any match between these and stored hashes
leads to authentication success. We have explored different values for N and M, and provide more
information about their performance in Section 5.
Authentication may fail not only because a user forgot her answers but also because she recalled
them imprecisely. Imprecise recall of LEPs occurs due to a high redundancy of natural language,
as explained below. We address some sources of mismatch through imprecise matching, which
includes normalization, keyword extraction, and reorder matching. While imprecise matching will
reduce strength of LEPs, our evaluation shows that resulting LEPs still have high security (Sec-
tion 5), and that imprecise matching significantly improves recall.
Capitalization, reordering, and punctuation. A user might respond to the prompt using
different capitalization or punctuation than she did during password creation, or use a singular
instead of plural, or a different tense for a verb. We overcome this by normalizing user answers
before storage and authentication. This includes stemming verbs and converting all nouns to sin-
gular form. We also remove all capitalization and punctuation. A user may also list several parts
of the answer in a different order, e.g., she may reply “Nice, Paris,” where the original answer was
“Paris, Nice.” We resolve this through reorder matching. We detect when an answer may consist
of multiple parts, try all permutations of these parts in real time, and compare these with stored
hash values.
Misspelling. A user may misspell a reply during password creation or authentication. We leave
handling of misspelling for future work.
Synonyms. A user may reply to a question with a near-synonym to the extracted term, such as
responding with “lake” instead of “pond”; or with a term that is more specific (hyponym) or more
general (hypernym) than the expected term, such as “dog” instead of “poodle.” We leave handling
of synonyms, hyponyms, and hypernyms for future work.
Extraneous words. A user may provide extraneous words in an answer. For example, a ques-
tion “what was red?” may lead the user to input “my apple was red,” even though we expect just
“apple” as an answer. We address this via keyword extraction, and we store and try to match only
these keywords. We apply keyword extraction both during password creation and during authenti-
cation, in the same manner. The extraction method depends on the answer category (see Table 2).

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:13

Table 4. High Level Overview of the User Study Descriptions and Methods

Study Measurement Goal


Freeform Performance of the freeform input
Performance Strength, memorability and reuse
Acquaintance Guessing (AcqGuess) Strength against acquaintance attackers
Family/Close Friend Guessing (CloseGuess) Strength against close friends/family attackers

From answers in OBJ category, we extract nouns only; from PL answers, we extract nouns and
out-of-dictionary words (likely place names); from ACT answers, we extract verbs only. Other
categories do not need keyword extraction.

4 USER STUDIES
We evaluated LEPs through a series of user studies and compared their performance against per-
formance of textual passwords and security questions. We implemented LEP creation and authen-
tication as a Web application, so that it can be used remotely. We then ran multiple user studies
over the period of three years, with Amazon MTurk participants (Turk 2018) and with students
at University of Southern California. We used results of early studies to refine and improve both
our user interface, and our elicitation process. In this article, we report on the four culminating
studies, which we used to evaluate LEP performance. The high-level overview of each user study
is presented in Table 4, which we describe in detail in the following sections.
All studies were approved by our Institutional Review Board (IRB). All communication with
participants was in English, and their inputs were also required to be in English. Participants were
required to be 18 years of age to participate.
Each participant was first shown the Information Sheet explaining the purpose of the study.
Following this she registered in one of three ways, depending on the specific study’s design: by
entering their MTurk ID, or their E-mail address, or the system assigned them a random identifier.
A participant then input their demographic information (age, gender, and native language) and
proceeded with the study. We describe our study design in this section and provide detailed results
and their interpretation in the next section.

4.1 Freeform Study


This study was run in Spring 2014, to evaluate advantages and disadvantages of the freeform input
method. At the time, we had not yet designed the semi-guided method. We recruited participants
from Amazon MTurk, and from students at our institution. Each participant was asked to create
three LEPs—one for an E-mail server like Gmail, one for a social network site like Facebook and
one for a bank, like Bank of America. We displayed corresponding logos on our LEP creation and
authentication pages, to help participants associate each LEP with the specific site. Participants
were asked to return for authentication after one week, and were reminded about this via E-mail.
We allowed only one authentication attempt per LEP. All participation was done online. This lim-
ited our ability to understand how participants interacted with the system, and to collect detailed
feedback, but it helped us recruit more participants. We paid 30 cents to Amazon MTurk partic-
ipants for LEP creation and 30 cents for authentication. Our student participants were not paid.
Total of 40 participants completed the study. We provide detailed results of this study in Section 5.
In summary, results showed that the freeform input resulted in low LEP strength, and low recall.
This was mostly due to the participants providing a few useful facts without system guidance.
We thus abandoned freeform input and continued to refine and improve guided input. We also

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:14 S. S. Woo et al.

designed semi-guided input as a way to allow users to tap into a larger set of useful facts without
jeopardizing usefulness of facts.

4.2 Performance Study


This study was run in Fall 2015 and Spring 2016. It was designed to evaluate strength against statis-
tical guessing attacks, memorability, and reuse of LEPs, and we compare them to the same qualities
of ordinary, 3class8 (Komanduri et al. 2014) textual passwords. We recruited 93 participants from
Amazon MTurk and asked each to create ten LEPs and ten ordinary passwords. This scenario is
unnatural, since no user would create so many passwords in such a short time span in real life.
However, asking for ten passwords enabled us to study password reuse, because we could measure
how many among the ten (passwords or LEPs) are similar or the same.1 We asked participants to
create passwords (LEPs or 3class8 passwords) for the following ten online sites and displayed their
logos during creation and authentication: Facebook, Google+, Gmail, Outlook, Bank of America,
Chase Bank, Target, WalMart, Wall Street Journal, and CNN. Total of 93 participants completed
the study.
Asking users to create passwords for fictional servers and recall them later will necessarily
underestimate recall in real life, because user motivation to remember these passwords is low.
This confounding factor will likely have a similar effect on LEP recall as on password recall, thus
our results are still useful to relate the performance of these two authentication methods.
We required the ordinary passwords to follow 3class8 policy—being at least eight characters
long, and containing characters from three out of four character classes: uppercase and lowercase
letters, digits and special characters. Each LEP was required to specify at least five facts. Users
were asked to return for authentication after one week. We allowed three authentication attempts
per LEP or password. We paid $1 for the creation task and $2 for the authentication task.
To minimize password reuse, we did not let participants select an LEP topic from a list. Instead,
we offered a topic for each LEP, which was randomly selected from our topic list. A participant
could reject the offered topics, until she finds the one that she wants to talk about. We expect that,
in actual deployment, servers could apply the same elicitation technique to similarly discourage
password reuse.
At registration, each participant was randomly assigned to either guided or to semi-guided input
category. She then created and authenticated with all 10 LEPs using the same input method. The
guided group was asked between 5 and 15 questions per LEP. The semi-guided group was asked to
specify two people, one location and two objects for each LEP, and to specify a hint for each fact.
After creating all 10 LEPs, each participant was asked to create 10 textual passwords, for the same
servers. All passwords were enforced to be 3class8 passwords. We suggested to users that they
may want to create different passwords for different servers, but we did nothing to enforce this.
Thus, password (and LEP) reuse were possible. Participants were reminded via E-mail to return for
authentication after one week. We also invited them to return for authentication after 3–6 months
to measure long-term recall.

4.3 Acquaintance Guessing (AcqGuess) Study


This study was run concurrently with our performance study, using the same system. We recruited
47 participants from our institution, University of Southern California (USC), to conduct an in-lab
study, with the goal to measure the strength of LEPs against acquaintance attackers. In addition

1 We initially attempted a phased study design, where a participant created one password (LEP or 3class8) and returned

after 1 week to authenticate. After authentication the participant created another password for the next cycle. However,
we had to abandon this study idea due to high attrition rates.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:15

to personal knowledge, we encouraged the guessers to fully utilize information available from
various social network sites and search engines. We observed that many participants in the study
indeed made use of these online information sources.
Participants were required to enroll into the study with at least one other friend (acquaintance)
at USC. We advertised the study via class announcements, wall posters, and flyers. Each participant
was paid $10, and took 45–60min to complete the study. Unlike our previous studies, this study
used deception in the Information Sheet (approved by our IRB), by not informing the participants
that they will be guessing each others’ LEPs. This was necessary to prevent participants from
intentionally creating LEPs, which would be either too easy or too hard for a given acquaintance
to guess. We designed our study to mimic the real-life password use, where one does not know
who may try to guess their password.
After reviewing the Information Sheet, each participant was asked how close they were with
their friend on the scale from 1 to 5 (closest), and how long they knew each other. Next, they
were asked to create three LEPs for three different online accounts: Gmail, Facebook, and Bank of
America. We displayed corresponding logos during creation and authentication. Each participant
was randomly assigned to either guided or semi-guided input method, and created all LEPs using
this method.
Next, participants were asked to authenticate with each LEP, and were allowed unlimited num-
ber of trials, but required to make at least three. After authentication, we debriefed each participant
about the deception and explained our reasons for this. We informed them that their friend would
be guessing their LEP and vice versa. They were offered a chance to quit the study at this time,
and still receive the full payment. No participants quit. Each participant attempted to guess her
friend’s LEP’s, and was allowed unlimited number of trials, and required to make at least three.
Afterwards, we reviewed successfully guessed answers with participants, asking them about their
strategy. We ended the study with a short survey about usability of LEPs. Last, we asked the par-
ticipants not to disclose details about deception to other students on campus, so we could continue
recruitment.

4.4 Family and Close Friend Guess (CloseGuess) Study


After our performance study has shown that semi-guided LEPs performed best in memorability
and security, we decided to evaluate them more closely against guessing. Specifically, the draw-
back of our AcqGuess study was that participants we could recruit on campus had limited knowl-
edge of each other. In many cases they were classmates but not roommates and knew each other
only via University. In the CloseGuess study, ran in Fall 2016 and Spring 2017, we recruited two
kinds of participants, which created one LEP each, and, which attempted to guess that LEP. Unlike
AcqGuess study, where both creators and guessers were recruited directly by us from our univer-
sity, in CloseGuess study 44 creators were recruited among students. Each creator was then asked
to recruit at least three different guessers among their close friends and family.
We realized that we cannot fully prevent cheating during guessing, e.g., when a creator guesses
its own LEP or works together with the guesser. Instead, we worked minimize cheating by min-
imizing incentives for it. We only compensated creators for participation. They received $10 at
the end of the creation phase. We also explained to them that their data is useless to us if they
do not recruit at least one guesser or if they perform guessing themselves. We then emailed them
a unique hyperlink to send to guessers and, if no guesser joined our study within two weeks for
the given creator, we resent that email one more time. We reasoned that this limited reminder
may help creators who wanted to help us but forgot, and would not be so burdensome to entice
creators to cheat just to stop future reminders. While this study design ran the risk of having too
few guessers, in reality 84% of creators were able to find at least one guesser.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:16 S. S. Woo et al.

During guessing, a guesser was asked to answer how long they knew the creator and in which
capacity. They were then allowed to make as many guesses as they wanted to guess the creator’s
LEP. At least three guesses were required to complete the study. At the end, we asked guessers
about the sources of information they used in guessing. Finally, we notified the creator that the
guesser completed the study.

4.5 Limitations and Ecological Validity


Our studies had the following limitations, many of which are common for online password studies.
First, it is possible but very unlikely that a participant may enroll into our Performance study more
than once. While the same Mechanical Turk user (as identified by her MTurkID) could not enter
the study twice, it is possible for someone to create multiple Mechanical Turk accounts. There is
currently no way to identify such participants.
Second, we cannot be sure that our Performance study participants did not write down or pho-
tograph their LEPs or passwords. Similarly, it is possible that in our CloseGuess study creators
attempt to act as guessers. We designed our studies to provide very low incentives to cheat. We
promised payment regardless of authentication or guessing success. Our study mechanisms further
detected copy/paste actions, and we have excluded any participant that used these (for whatever
reason) from the study. We also reminded the participants multiple times to rely on their memory
only. If any cheating occurred, then it was likely to affect all the results equally, without bias to
LEPs or passwords.
Third, we did not ask any participants if they cheated or not. Instead, we worked to design our
studies in such a way to lower incentives to cheat.
Fourth, Mechanical Turkers may not be very motivated or focused—this makes it likely that
actual recall both of real-world LEPs and of passwords would be higher. While it would have been
preferable to conduct our Performance study in the lab, the cost would be too high (for us) to
afford as large a participation as we had through the use of Mechanical Turkers. We believe that
any effect from participant motivation on recall applies equally to LEPs and to passwords. Also,
other researchers have shown (Kelley et al. 2012; Huh et al. 2015) that Mechanical Turkers can
produce high-quality work.
Fifth, participants in our AcqGuess study were all recruited from our university and thus some
of them were not very close. We addressed this limitation by running our CloseGuess study.

5 STUDY RESULTS
In this section, we report on the results of our user studies. We found that: (1) freeform LEPs have
decent security but abysmal recall, (2) guided and semi-guided LEPs (in the rest of this paragraph,
we just call them “LEPs” for short) are 109 –1013 × stronger than an ideal, randomized, 8-character
password, (3) acquaintance success is guessing LEPs is 24–35× lower than for security questions,
(4) family/close friend success in guessing semi-guided LEPs is 1.8–2.6× lower than for security
questions, (5) LEPs are 2–3× more memorable than passwords, (6) LEPs are reused half as often as
passwords, and (7) LEPs contain 2.4–3.2× fewer fake answers than security questions.

5.1 Participant Statistics


Table 5 shows the breakdown of our participants in the first two rows. We show the count of
participants, who completed all required tasks in a given study, and the total number of passwords.
We also collected demographics, age and language of participants but found that these factors did
not have significant impact on security, recall or password reuse. With regard to topics chosen by
participants, most popular were learning, people, and trips.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:17

Table 5. Participant Statistics and Results of Our Studies, Compared to the Baseline:
Passwords and Security Questions

Our results Baseline


Section 5.2 Section 5.3 Section 5.4 Section 5.3 Lit.
# Measure Performance Acq/CloseGuess
Free-form Passwd. Sec. Quest.
Guided Semi-G. Guided Semi-G.
1 Participants 40 44 49 47 44 93 —
2 Passwords 114 440 490 141 132 930 —
3 Statistical strength (avg) 1022 1024 1020 — — 1011 —
4 all-fact 7% 31.6% 45.7% — —
5 five-fact — 47.7% 45.7% — — 32.1%–83.9%
Recall
6 four-fact — 70.0% 73.0% — — 26% [Bonneau
(1 week)
7 three-fact — 82.1% 89.2% — — et al. 2015]

8 all-fact — 16.5% 32.3% — —


Long-term
9 five-fact — 33.9% 32.3% — — 6.4%–79.2%
Recall
10 four-fact — 53.0% 54.0% — — 9% [Bonneau
(3-6 mo.)
11 three-fact — 66.5% 73.6% — — et al. 2015]

12 all-fact — — — 0.7% 0%
13 five-fact — — — 0.7% 0% 17–25%
Acquaintance
14 four-fact — — — 0.7% 0% — [Schechter
Guessing
15 three-fact — — — 1.3% 4.5% et al. 2009]

16 all-fact — — — — 0%
Close Friend
17 five-fact — — — — 0% 17–25%
or Family
18 four-fact — — — — 9.5% — [Schechter
Guessing
19 three-fact — — — — 11.9% et al. 2009]

20 Fake Information — 15.7% 11.5% — — — 37% [Bonneau


et al. 2015]
21 Reuse ident. (avg) — 3.1% 2.7% 5.7% —
22 Attack sim. (avg) — 15.4% 4.6% 31.6%
23 Time to create (med) 236.5s 112.7s 112.0s 16.8s —
24 Time to succ. auth (med) 57s 51.9s 37.3s 11.3s —

5.2 Freeform Is Not Competitive


Our Freeform study (Table 5, column “Freeform”) resulted in decent security—LEPs were on the
average 107 –1013 times stronger than a random, 3class8 password. However, recall was very poor.
Only 7% of LEPs were correctly recalled after one week. Freeform further required most support
by NLP and most manual intervention to arrive at usable questions and answers.
Because of low recall and high needs for NLP support, we abandoned freeform input and continued
evaluation of guided and semi-guided for performance and strength against guessing.

5.3 Security
In this section, we report and summarize findings from our Performance study with regard to
security of LEPs.

5.3.1 Statistical Strength. Table 5 shows the total number of LEPs, and their average statistical
strength in the third row. LEPs are 109 –1013 × stronger than an ideal, randomized, eight-character
password, with guided LEPs being 104 × stronger than semi-guided because of higher number of
useful facts.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:18 S. S. Woo et al.

5.3.2 Reuse Attack. We also explored strength of LEPs and passwords against a password-reuse
attacker. The results are shown in Table 5 in rows 22 and 23. We first investigated how many out
of 10 passwords were identical, for each given user. An LEP fact is said to be identical to a fact in
another LEP, by the same user, if their answers would match during authentication (accounting for
capitalization, reordering, punctuation and extraneous words). An LEP l 1 is identical to the LEP l 2
if all of l 1 ’s facts match the facts l 2 . There were 2.7–3.1% identical LEPs, compared to 5.7% identical
passwords.
We next investigated how many out of a user’s ten passwords were sufficiently similar to each
other, so that a password-reuse attacker could easily guess one if he knew the other. Because LEPs
are authenticated based on fact matches, and passwords based on the exact string match, it is hard
to devise a similarity measure that applies equally well to both concepts. To define similarity of two
authentication tokens (LEPs or passwords), we borrow from the Linux Pluggable Authentication
Modules (PAM) (PAM 2015) design. We say that two tokens t1 and t2 are similar, if more than 1/2
of items in t2 also appear in t1. For passwords, items are characters and for LEPs, items are facts.
This definition allows us to directly apply pam_cracklib to detect similar passwords. We say that
a password op1 is similar to the password op2 if one of the following conditions is met: (1) more
than half of op1’s characters appear in op2 (this includes cases when (2) op1 is a palindrome of op2
or a rotated version of op2), or (2) op1 differs from op2 only in case. There were 4.6–15.4% similar
LEPs, in semi-guided and guided categories, respectively, compared to 31.6% similar passwords.
Thus, passwords were reused more than twice as often as LEPs, and guided LEPs were reused 3×
more often than semi-guided LEPs.
One possible reason for such low LEP reuse is the way we prompted users for LEPs (described in
Section 4). Whenever an LEP was to be created, we offered to the participant one topic, randomly
selected from our list. The participant could reject the offered topics, until she found the one that
she wanted to talk about. These prompts seem to have stimulated recall of diverse memories of
past events, and lowered LEP reuse. We expect that servers that adopt LEPs would also use such
user prompts to lower reuse.

5.4 LEPs Are Strong Against Acquaintance and Family/Close Friend Guessing
This section discusses results of our AcqGuess and CloseGuess study.

5.4.1 Acquaintance Guessing (AcqGuess) Attack. We recruited a total of 91 participants and a


few participants came in groups of 3 or more people, forming 100 different pairs. All of the partici-
pants completed the study in the same sitting. The participants were students from our institution
from freshmen to graduate students, with majors in engineering, social science, theater, biology,
math, international relations, music, business, economics, psychology, and linguistics.
Participants knew each other for 3 months to 6 years. All of them were already friends on
at least one social network (e.g., Facebook, Instagram, etc.) and were encouraged to use social
networks and public sources during guessing. For participant pairs that were not from the U.S.,
both participants were from the same country and shared the same cultural background, which
helped them make more educated guesses. Average closeness rating on the scale 1–5, with 1 being
the lowest, was 2.6 for the guided group, and 3 for the semi-guided group. These low closeness
values are the reason why we labeled this study “acquaintance guessing study.” In the rest of this
section, we will refer to the LEP creator as a “user” and to the person trying to guess an LEP as an
“acquaintance.”
Most users were able to fully authenticate with their LEPs in the second stage of our study. A
few authentication failures we observed were due to misspelling, synonyms and more/less specific
answers provided during authentication than during creation.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:19

Table 6. Fact Recall and Guess Success per Category

Category Recall Failure reasons FriendGuess Family/CloseGuess


FN 77% Misspelling, nickname, FL 20% —
FL 81% FN, misspelling 5% 25.9%
PL 82% Misspell., abbrev, synon. 13% 21.8%
CITY 92% Misspell., more/less specific ans 53% —
OBJ 79% More/less specific ans 10% 9.3%
ACT 51% Similar verbs — —
DATE 77% Total miss 16% —
YR 73% Total miss 17% —
RL 95% Abbreviation 48% —
HU 82% Synonym, more/less specific 21% —
TN 80% Synonym 36% —

We show acquaintance guessing success in the Table 5, rows 13–16, using the same authentica-
tion schemes as in Section 3.5. Guess success rate is very low (0–0.7%) for all-fact authentication,
and climbs up to 1.3–4.5%, for three-fact authentication. Taken together with recall results, this
data suggests that the four-fact authentication seems to strike the right balance between achiev-
ing high authentication success (70–73%), and reasonable strength, while keeping the acquaintance
guessing success low (0–0.7%).
Compared to security questions, where friends/family could guess 17–25% (Schechter et al.
2009), LEP’s are 3.7–35× stronger, assuming four-fact or three-fact authentication. AcqGuess at-
tack success rate per fact category, for facts described in Table 2, is shown in column 4 of Table 6.
Acquaintances could successfully guess more than half of the cities, and more than 20% of rela-
tionships, places, first names, and facts in HU and TN categories. Other categories, such as FL, PL,
OBJ, DATE, and YR had lower guess success. This shows another strength of LEPs over security
questions. Bonneau et al. (2015a) found that security questions could not strike the right balance
between security and memorability, because facts that were memorable for users were also eas-
ily guessed (Schechter et al. 2009). Conversely, LEPs have multiple fact categories with high user
recall and low acquaintance guess rate (e.g., FN, FL, PL, OBJ).
Because guess rate for semi-guided LEPs was zero in this study, we continued investigation of semi-
guided LEPs only for close friend or family guessing.

5.4.2 Family/Close Friend Guessing (CloseGuess) Attack. We recruited a total of 50 participants


directly, to create LEPs, and we asked each participant to recruit close friends or family mem-
bers for guessing. We refer to the participants creating LEPs as “users” and to those trying to
guess an LEP as “friends.” Only 42 users were able to recruit at least one friend for the second
part of the study. Among those 42 friends, 32% were family members of a creator, and 68% were
close/best friends with a creator. Average closeness rating on the scale 1–5, with 1 being the low-
est, was 17.9%, and 5 being the highest was 50% of participants. The average closeness was 3.7. We
show the guessing success in the Table 5, rows 17–20 using the same authentication schemes as in
Section 3.5. While no one could guess the full five-question LEP, the overall success rate was 9.5%
for four-fact and 11.9% for three-fact authentication. Close friends and family were able to guess
significantly more LEPs than acquaintances in our AcqGuess study. However, compared to secu-
rity questions, where friends/family could guess 17–25% (Schechter et al. 2009), LEP’s were still
1.4–2.5× stronger.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:20 S. S. Woo et al.

Guess success rate per fact category, for facts described in Table 2, is shown in column 5 of
Table 6. Close friends and family could successfully guess 25.9% of names and 21.8% of places,
compared to 9.3% of objects. Thus, one way to increase strength of LEPs against close friend/family
attackers would be to increase the number of object facts.

5.5 Usability
In this section, we report on usability aspects of LEPs, with regard to recall and user-friendliness.

5.5.1 Short-term Recall. We report the percentage of successful authentications, after one week,
in Table 5, in the rows 5–12 for LEPs and for passwords. For LEPs, in addition to all-fact recall,
we investigated three alternative authentication schemes—five-fact, four-fact and three-fact recall.
In these, the user must successfully recall three, four, or five facts, respectively, and the statistical
strength of the recalled facts must exceed the strength of fully random, 3class8 password of 958
guesses, which we denote as S 3c8 .
Authentication success after 1 week is shown in rows 5–8 of Table 5. Password recall in our
study was 26% (others have found 45–70% recall (Komanduri et al. 2014; Shay et al. 2012, 2014),
but they asked users to recall only one password after 2 days). Our findings are consistent with
psychological literature (Averell and Heathcote 2011; Ebbinghaus 1913), where Ebbinghaus found
that people retain only 25% of new information they learned after six to seven days (Ebbinghaus
1913).
Imprecise matching is a trade-off between usability and security. Imprecise matching greatly
helps to increase LEP recall. With exact matching, all-fact authentication would be 19% for guided
and 9.6% for semi-guided LEPs. With imprecise matching, it is 31.6% and 45.7%. Imprecise matching
reduces strength of LEPs against statistical attacks, compared to exact matching, but only slightly.
This is because each fact has only a handful of variations, which are all counted as same by im-
precise matching (e.g., “see, saw, have seen, will see”). This small cost is more than justified by
significantly better recall, which also means faster authentication and better usability. Our sta-
tistical strength attack models already take into account imprecise matching, i.e., we account for
each noun and each verb once and do not match for order, punctuation or capitalization.
LEP authentication success with all facts is 30–75% higher than that for passwords. When we
require fewer facts for authentication the success rate increases significantly. At four facts, LEP
success is 2.7× higher than password success rate, and at three facts it is 3.2× higher. Allowing users
to authenticate with fewer than all facts lowers security of LEPs. Since we also require that the
statistical strength of recalled facts exceeds S 3c8 , the remaining strength of these “shortened LEPs”
is sufficiently large to thwart attacks. In Section 5.4, we will investigate how requiring M < N facts
for authentication affects strength against friend attacks.
Security questions have a wide range of recall rates after 1 month—from 32.1% for frequent flyer
number to 83.9% for city of birth (Bonneau et al. 2015a). LEP recall with four-fact and three-fact
authentication is 70–89.2% and thus resembles recall for memorable security questions. If an LEP
were equivalent to a set containing several security questions, then its best recall rate would be
83.94 = 50% for four-fact and 83.93 = 59% for three-fact authentication. The fact that LEP recall
exceeds these values shows the power of user-customized questions and imprecise matching, over
general questions and exact matching.
To understand reasons for failures to recall an LEP, we examine recall rate per fact category (as
given in Table 2). Table 6 shows the percentage of correctly recalled facts (in at least one attempt)
per fact category in column 2. Relationships and cities are most accurately recalled, followed by
items in the HU category, places, and first and last names. Overall, all categories except ACT have
recall of more than 70% after one week. While this is quite high, it may be puzzling why a user

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:21

would fail to recall all facts correctly, or at least at higher rates. We note some frequent reasons for
failed recall per category in column 3. Many of these could be handled by better NLP techniques,
e.g., trying synonyms during matching, building a database of common abbreviations (e.g., gf for
girlfriend), and so on. This would further improve recall, at some security cost. We leave this
direction for future work.
Overall there were 18.2% of facts, which a user failed to recall in any authentication attempt.
Users provide fake answers to security questions. They may also provide fake answers to LEPs,
which they could not later recall. While we cannot accurately establish which facts are fake and
which are not, we estimate this by looking for facts, which a user failed to recall, and where failure
cannot be attributed to the NLP reasons we listed in Table 6; i.e., it is a total miss. We find that
11.5–15.7% of facts were not recalled by users, and thus may be fake. This rate is less than a half of
the fake-answer rate for security questions (Bonneau et al. 2015a). A finer investigation of these
“fake facts” shows that about half of them fail authentication, because a user used an initial instead
of a last name in LEP creation, but reverted to full name in authentication. We could handle this
case with better authentication prompts.

5.5.2 Long Term Recall. We invited all participants in our performance study to authenticate
with their LEPs and passwords once again, in May 2015. A total of 54 participants returned. The
time between creation and authentication for these return participants ranged from 104 to 231
days, with a median of 120 days. Table 5 shows the long-term authentication success in rows 9–12.
While both LEP and password recall have declined, time lapse affected recall of passwords much
more than recall of LEPs. Password recall declined by 66%, while LEP recall declined by 17–47%.
We thus conclude that LEPs are more robust with regard to long-term recall, than passwords.
Security questions have a wide range of recall rates after 3–6 months—from 6.4% for frequent
flyer number to 79.2% for city of birth (Bonneau et al. 2015a). LEP recall with four-fact and three-
fact authentication is 53–73.6%, within the range of more memorable security questions.

5.5.3 Time to Create and Authenticate. We show the median time to create and authenticate an
LEP or a password in Table 5, in rows 24 and 25. LEPs require 6.7× longer to create and 3.3–4.6×
longer to authenticate, than passwords. This is expected as they require a user to both read the
questions and to provide input that is approximately five times longer than a password.

5.5.4 Participant Feedback. We collected participant feedback in Performance and Guess


studies.
At the end of the Performance study, we asked participants to give us feedback about advantages
and disadvantages of LEPs, when compared to ordinary passwords. This feedback was given in
freeform and coded by us. With regard to advantages, 58.2% of participants stated that LEPs were
much harder to guess than the current security questions, and more memorable than ordinary
passwords, since they were built from personal experience.
During the AcqGuess study, we carefully observed participant behavior during guessing and
interviewed them about their strategy after each guessing study. Except two participants, who
attempted random guesses, the rest invested significant effort in looking for possible guess options
online. They reported using the following information sources for guessing: personal knowledge,
Facebook, Google search engine, Instagram, RenRen, WeChat, QQ, Line, LinkedIn, and Spokeo.
Overall, 78–83% of participants reported using personal knowledge, 76–79% used social networks,
and 50–56% used search engines. At social networks, participants checked the personal profile and
friend lists first, and then they scanned the recent wall posts. They complemented this with online
searches for popular items which appeared in LEP questions.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:22 S. S. Woo et al.

We further collected participant feedback on why guessing LEPs was hard. About half of re-
sponses (48%) stated that LEPs involved too much detail about the topic, which was hard to mine
from online sources or from personal knowledge. Unless the acquaintance participated in the same
event, it was difficult to mine correct answers online. In addition, 26% of participants acknowl-
edged that many facts were too personal and thus not shared between the acquaintance and the
user, nor did they appear in social network posts, such as events from early childhood and elemen-
tary school. These private and unique facts make guessing difficult, unlike facts used for security
questions, which can be easily found in online sources.
During a CloseGuess study, we collected information about guessing strategy using a survey
after the guessing phase. We asked friends to rate the use of personal knowledge, social network,
and search engine to guess information in 1 to 10 scale, where 10 being the highest use. The most
prominent use was personal knowledge with the average score of 7.1, while the average rating
of social network and search engine was 3.2 and 3.4, respectively. When asked why LEPs were
hard to guess, friends said the questions were general in nature, abstract or vague (33.3%), had
inadequate information (13.3%), or were too detailed/specific (53.3%).

5.6 Deployability
5.6.1 Storage. LEP answers should be concatenated and stored as one or several hashes. In case
of all-fact authentication, we store only one hash per LEP. If we  N allow for M-fact authentication
(M = 3, 4, or 5), then we would have to create, hash, and store M combinations of facts for each
LEP, where N is the number of all facts: 77% participants would need up to 35 hashes, 88% would
need up to 70, 92% would need up to 126 hashes, and the worst case scenario would require 1,287
hashes. Even in the worst case, the storage cost would only amount to several Kilobytes per user,
which is negligible for today’s server storage. Because one-way hashing is fast, the processing
cost should also be acceptable. We could further limit the storage cost of LEPs by discarding all
but N strongest facts, at creation time. For M = [3, 4] and N = 8 each LEP would require at most
70 hashes.
We conclude that performance of LEPs was good. The strength of both guided and semi-guided
LEPs was much higher than strength of passwords, and recall for both when we allow for four-fact
authentication was comparable to recall of security questions. We thus continued evaluation of guided
and semi-guided LEPs for acquaintance and family/close-friend guessing.

6 SUMMARY: LEPS ARE USABLE AND SECURE


Bonneau et al. (2012) present the usability-deployability-security (UDS) framework, which defines
25 properties of Web authentication schemes. They use UDS to rate 35 password-replacement
schemes. In this work, we present the rating of passwords and personal knowledge (security)
questions in Table 7, and compare LEPs with other competing approaches. LEPs perform better
than passwords in six categories. Regarding usability, LEPs are Quasi-Memorywise-Effortless,
since users still forget some of LEP answers. Also, they are Quasi-Scalable-for-Users, since users
have possibly abundant memories that servers can elicit by randomizing topic offerings, as we
demonstrated in our user studies and question design with 10 different password creation sessions.
While the abundant memories make LEPs theoretically scalable, our experiments only tested
creation of up to 10 LEPs, while in reality users have many more password-protected accounts
(Hanamsagar et al. 2018). We hope to further investigate scalabiity for users in future research.
LEPs are also labeled as Infrequent-Errors, because LEP’s imprecise matching takes care of many
causes of authentication failures. Regarding security, LEPs are Resilient-to-Throttled-Guessing and

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:23

Table 7. Comparing LEPs to Security Questions and Passwords Using the UDS Framework

Quasi-Resilient-to-Unthrottled-Guessing, as our evaluation shows. In addition, LEPs have low reuse,


as they are also Quasi-Resilient-to-Leaks-from Other Verifiers. LEPs perform worse than passwords
in one usability, two deployability, and one security category. They are less Efficient-to-Use, as
LEPs take longer to create and authenticate. In addition, they are not Server-Compatible nor
Mature, because they are still a protoype and are not widely deployed yet. With regard to strength,
LEPs are also Quasi-Resilient-to-Targeted-Impersonation—friends and family members can guess
a relatively small portion of LEPs, but this is likely still higher than the portion of passwords,
which could be guessed by friends and family.
LEPs perform better than security questions in six categories. They are Quasi-Scalable-for-Users,
because of the wide applicability and broad range of life-experience topics, while security questions
are not, and security questions are very limited. LEPs are further Infrequent-Errors, because we
apply imprecise matching. LEPs are further Quasi-Resilient-to-Targeted-Impersonation, Resilient-
to-Throttled-Guessing, Quasi-Resilient-to-Unthrottled-Guessing, and Quasi-Resilient-to-Leaks-from-
Other-Verifiers, while security questions offer none of these qualities. There are only two categories
where LEPs lag after security questions—LEPs are less Efficient-to-Use and not Mature.
Comparing LEPs to other 33 authentication technologies in publication (Bonneau et al. 2012)
(Table 1 in Bonneau et al. (2012)), only federated authentication schemes provide both improved
usability and security, however, at the cost of lower deployability. Therefore, we believe LEPs are
competitive compared to other authentication schemes, and can offer a unique combination of
features, which other approaches cannot provide.

7 DISCUSSION
We now briefly discuss our findings, limitations, and practicality of LEPs for different uses. LEPs
were clearly superior to passwords with regard to their strength and recall. When servers prompted

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:24 S. S. Woo et al.

users with a random LEP topic, we also ended up with LEPs that were much more diverse than
passwords. LEPs thus successfully addressed the problem we started with—the balance between
security and memorability of passwords. However, this improvement comes at the cost of usability,
because LEPs take much longer to create and to input than passwords. Extended authentication
time may be especially burdensome to users on mobile devices, where keyboard input is slower
than on desktops/laptops. LEPs thus are not well-suited to all authentication tasks.
One possibility would be to use LEPs instead of passwords for first-time authentication, when
cookies have expired, when the user is accessing an online service from a new machine, or when
the user is logging onto his local machine after a logout. In these rare situations the added overhead
of LEPs may be acceptable to users, at the benefit of higher security. LEPs could further be activated
only for high-value accounts, such as bank or e-mail, where secure access is crucial.
LEPs could also be used for secondary or added authentication, instead of security questions.
We show in Section 5 that LEPs surpass security questions in security, recall and strength against
friend guessing. Many high-value services currently use text messages sent to user phone with a
code, to reduce risk of password cracking. LEPs could be used in lieu of the code, when a users
does not have access to her phone (e.g., during international travel).
LEPs could also be used for continuous authentication on high-security servers, such as gov-
ernment or bank servers. A logged-in user may be prompted for one or several facts after a period
of inactivity, to verify that she still has physical access to her computer.
One limitation of our work is that our strength measurement model may be imprecise. Answers
to specific LEP questions may be correlated both with the topic (e.g., users may talk about “cake”
in the context of a wedding) and among themselves. We modeled these correlations in a naive way
due to lack of publicly available information, but it is possible that attackers may come up with
a smarter strategy or have better access to useful data about correlations. Another limitation of
our work is due to the Hawthorne effect, where users tend to provide more positive answers in a
study than they would do in their real life. Therefore, low fake answers and password reuse rates
we obtained in studies could map into higher values in real life.
Yet another limitation of ours, and similar password studies, is that we measured recall over
relatively short time intervals and with small user populations. We were limited by the research
funds we had and could not afford a longer-term, larger-population study. In particular, studying
the likelihood of a user to share the specific memory they used for an LEP in their daily social
interactions would be an interesting future work.
LEPs could also be vulnerable to social engineering, when a user is coaxed by a friend to provide
an answer to their authentication questions. For example, if a user’s LEP contains the fact that “Joe
Smith” accompanied this user on a trip to Spain, a friend may ask “Who went with you to Spain?”
We hope that the specificity of LEP questions may represent a hindrance to social engineering
attacks. For example, if asked a “who” question the user will likely reply with too short an answer
(e.g., “John went with me”). But if asked a “first and last name” question the user may get suspicious,
because such question is too specific for a casual conversation. We hope to perform a user study
in the future to validate this intuition.
We have not tested LEPs with different cultures and did not consider the cross-cultural aspect.
For example, our initial list of 17 LEP topics in Table 1 relates to our life experiences and has worked
well in our user studies, but it may not be general and broad enough across different cultures. We
believe that it would be possible to identify a wealth of topics in any culture, which are relevant
to many people in that culture. We leave addressing of the cross-cultural aspects for future work.
It is possible that a negative emotional effect could arise when users are asked to recall a neg-
ative topic such as “death” or “accident.” According to psychology research by Kensinger (2007),
negative events are remembered in greater detail than positive ones, which would aid LEP recall.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:25

However, users may experience emotional pain if we asked them to recall traumatic events. And
we have already seen that these topics lead to leakage of sensitive information. We thus plan to
remove those negative topics from LEP offerings and replace them with more positive ones. We
leave the definition of such added topics for our future work.
Last, reminiscing about positive topics, such as “trip” or “wedding,” could have a positive emo-
tional effect on users, making password creation and recall more pleasant than it is today.

8 CONCLUSIONS
Textual passwords are a widely used form of authentication and suffer from many deficiencies,
because users trade security for memorability. Users create weak passwords because they are
memorable, and they reuse the same password across many sites. Forcing users to create strong
passwords does not help, as these are easily forgotten.
We have proposed life-experience passwords (LEPs) as a new authentication mechanism, which
strikes a good balance between security and memorability. We investigated several LEP designs
and evaluated them in two user studies. Our results show that LEPs are much more memorable
and secure than passwords, they are less often reused, and they are strong against friend guessing.
While they take more time to create and input during authentication, we believe their benefits
may make them a viable primary authentication mechanism for high-security servers or a much
better secondary authentication mechanism than security questions.

A APPENDIX
A.1 Popular List Sources and Sample Questions
Table 8 shows data sources, which we used to build our popular item lists. Some samples of our
guided questions and semi-guided questions are shown in Tables 9 and 10, respectively.

Table 8. Popular Lists, Their Sizes, and Sources

Subcategory Possible ans. Sources


Last Name (LN)
US ≥88,879 [L1, L2]
China ≥100 [L3]
India ≥580 [L4]
43 European Countries ≥77,000 [L5] (Bonneau et al. 2010)
23 Other Countries ≥116,000 [L6, 7, 8, 9]
First Name (FN)
US ≥5,494 [L1, L10]
China ≥259 [L11, L12]
India ≥1833 [L13, L14]
43 European Countries ≥72,408 [L15] (Bonneau et al. 2010)
23 Other Countries ≥1,240 [L15]
Cities (CI)
US ≥298 [L16]
China ≥642 [L17]
India ≥870 [L18]
U.K Cities ≥100 (Bonneau et al. 2010)
Top 85 largest Cities in the world by Population 85 [L19]
(Continued)

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:26 S. S. Woo et al.

Table 8. Continued

Subcategory Possible ans. Sources


Popular Places (PL)
US and World Tourist Attraction Places (Best Places to visit in ≥568 [L20, L21]
US and World)
Other Places (Schools, Hotels, Hospitals, Restaurants, etc) ≥452 [L22-L26]
Activities/Actions (ACT)
Top 25 activities in US 25 [L27]
List of Hobbies 276 [L28]
Objects (OBJ)
Top Hobby, Popular jobs in US, Popular Grad. Gift, Top Activ- ≥1,162 [L29-L47]
ities, Best Movies, Best Singers, Best TV Series, Popular Sports
in US, Popular Sports in World, Best Writer, Best Books, Pop-
ular Wedding flower, Popular Wedding cake, Popular color of
bride maid dress, Top 50 food in the world
Google 20,000 words (*after removing stopwords) 19,000 [L48]
COCA 5,000 [L49]

Popular List Sources


—L1. Frequently Occurring Surnames from Census 1990—Names Files.
http://www.census.gov/topics/population/genealogy/data/1990_census/1990_census_
namefiles.html
—L2. Frequently Occurring Surnames from the Census 2000.
http://www.census.gov/topics/population/genealogy/data/2000_surnames.html
—L3. List of common Chinese surnames.
https://en.wikipedia.org/wiki/List of common Chinese surnames
—L4. List of common Chinese surnames.
https://en.wikipedia.org/wiki/List of common Chinese surnames
—L5. List of the most common surnames in Europe.
https://en.wikipedia.org/wiki/ List of the most common surnames in Europe
—L6. List of most common surnames in Asia.
https://en.wikipedia.org/wiki/List of most common surnames in Asia
—L7. List of most common surnames in South America.
https://en.wikipedia.org/wiki/ List of most common surnames in South America.
—L8. List of most common surnames in South America.
https://en.wikipedia.org/wiki/ List of most common surnames in South America
—L9. List of most common surnames in Oceania.
https://en.wikipedia.org/wiki/List of most common surnames in Oceania
—L10. Popular Baby Names.
https://www.ssa.gov/OACT/babynames/limits.html
—L11. Chinese girl names.
http://www.top-100-baby-names-search.com/ chinese-girl-names.html.
—L12. Chinese male names.
http://www.top-100-baby-names-search.com/ chinese-male-names.html
—L13. Popular Indian Boy Names.
http://babynames.extraprepare.com/boy-popular.php
—L14. Popular Indian Girl Names.
http://babynames.extraprepare.com/girl-popular.php

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:27

— L15. List of most popular given names.


https://en.wikipedia.org/wiki/List of most popular given names
— L16. List of United States cities by population.
https://en.wikipedia.org/wiki/List of United States cities by population
— L17. List of cities in China.
https://en.wikipedia.org/wiki/List of cities in China
— L18. List of towns in India by population.
https://en.wikipedia.org/wiki/List of cities and towns in India by population
— L19. List of cities proper by population.
https://en.wikipedia.org/wiki/List of cities proper by population
— L20. Tripadviser.
http://www.tripadvisor.com
— L21. List of national parks of the United States.
https://en.wikipedia.org/wiki/List of national parks of the United States
— L22. Top 100 Chains: U.S. Sales.
http://nrn.com/us-top-100/top-100-chains-us-sales
— L23. Academic Ranking of World Universities.
https://en.wikipedia.org/wiki/ Academic Ranking of World Universities
— L24. List of largest hotels in the world.
https://en.wikipedia.org/wiki/List of largest hotels in the world
— L25. The Worlds Top 20 Honeymoon Destinations.
http://www.brides.com/honeymoons/2014/04/best-honeymoon-destinations#slide=2
— L26. Best Places to Propose.
http://www.brides.com/honeymoons/2014/04/best-honeymoon-destinations#slide=2
— L27. Top 25 Most Popular Sports/Recreational Activities in the U.S.
https://www.sfia.org/
— L28. List of hobbies.
https://en.wikipedia.org/wiki/List of hobbies
— L29. Highest Rated TV Series With At Least 5,000 Votes.
http://www.imdb.com/chart/top
— L30. 100 Greatest Actors of All Time.
http://www.imdb.com/list/ls000034841/
— L31. Top Rated Movies.
http://www.imdb.com/search/title?num votes=5000,&sort=user rating,desc&title type=
tv series
— L32. List of best-selling music artists.
https://en.wikipedia.org/wiki/List of best-selling music artists
— L33. Sports in the United States.
https://en.wikipedia.org/wiki/Sports in the United States
— L34. List of sports attendance figures.
https://en.wikipedia.org/wiki/List of sports attendance figures
— L35. Forbes Magazine. 2017. The Worlds Most Valuable Sports Teams.
https://www.forbes.com/sites/kurtbadenhausen/2017/07/12/
full-list-the-worlds-50-most-valuable-sports-teams-2017/#6981f5f4a05c
— L36. List of best-selling fiction authors.
https://en.wikipedia.org/wiki/List of best-selling fiction authors/
— L37. List of best-selling music artists.
https://en.wikipedia.org/wiki/List of best-selling music artists

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:28 S. S. Woo et al.

— L38. List of best-selling fiction authors.


https://en.wikipedia.org/wiki/List of best-selling fiction authors/
— L39. Top Ten Most Requested Cake Combos.
http://nymag.com/shopping/guides/weddings/planner/features/toptencakes.htm
— L40. Top Ten Most Requested Cake Combos.
http://nymag.com/shopping/guides/weddings/planner/features/toptencakes.htm
— L41. Wikipedia Party.
https://en.wikipedia.org/wiki/Party
— L42. Your pick: Worlds 50 best foods.
http://travel.cnn.com/explorations/eat/readers-choice-worlds-50-most-delicious-foods-
012321
— L45. 50 College Graduation Gift Ideas.
https://www.universityparent.com/topics/parent-posts/50-college-graduation-gift-ideas-
for-parents-2/
— L46. List of most commonly learned foreign languages in the United States.
https://en.wikipedia.org/wiki/List of most commonly learned foreign languages in the
United States
— L47. Occupational Employment Statistics.
http://www.bls.gov/oes/current/oes nat.htm
— L48. Google-10000-english.
https://github.com/first20hours/google-10000-english/
— L49. The Corpus Of Contemporary American English (COCA).
http://corpus.byu.edu/coca

Table 9. Some Samples of Guided Input Questions

topic question fact category


engagement Was this your engagement? BI
engagement What was the first and last name of the [bride/groom]-to-be? FL
engagement When did the engagement occur? DATE
engagement In which city did the couple get engaged? CITY
engagement At which location did the couple get engaged? (e.g., name of the PL
hotel/park/chapel/street)
engagement How many other memorable people were present at the engagement? (e.g., special HM
friends, parents)
engagement List the first and last names of other memorable people present. FL
engagement Was the ring’s design special in any way? BI
engagement Describe what was special about the ring. OPEN
engagement Did the groom propose in a special/unusual way? (e.g., with a song) BI
engagement Describe what was special about the proposal. OPEN
wedding Was this your wedding? BI
wedding What was the first and last name of the [bride/groom]? FL
wedding What was the first and last name of the bride? FL
wedding What was the first and last name of the groom? FL
wedding When did the wedding occur? DATE
wedding Which city did the wedding occur in? CITY
wedding At which location did the wedding occur? (e.g., name of the hotel/park/chapel/street) PL
wedding How many bridesmaids were there? HM
wedding List the first and last names of bridesmaids. FL
wedding What was the first and last name of the best man? FL
(Continued)

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:29

Table 9. Continued

topic question fact category


wedding What was the first and last name of the maid of honor? FL
wedding What color were the bridesmaids’ dresses? HU
wedding Which flowers were in the bride’s bouquet? HU
wedding What was the first and last name of the person who married the couple? FL
wedding Which city did the couple go to for their honeymoon? CITY
wedding What kind of cake was served? HU
wedding How many other memorable guests were present? (e.g., the only grandfather at the HM
wedding, someone who did something unexpected)
wedding For each memorable guest list their first and last name and why they were memorable OPEN
(e.g., John Smith came from Nevada, Alice Red was a flower-girl)
birth What was the first and last name of the baby? FL
birth When did the birth occur? DATE
birth Which city did the birth occur in? CITY
birth Which hospital did the birth occur in? PL
birth What was the first and last name of the father? FL
birth What was the first and last name of the mother? FL
birth How many siblings did the baby have at that time? HM
birth List the first and last names of the siblings. FL
birth What was the first and last name of the doctor that delivered the baby? FL
birth How many people visited the baby while at hospital? HM
birth List the first and last names of the visitors. FL
death What was the first and last name of the person who passed away? FL
death What was their relationship to you? RL
death In which city did XX live? CITY
death In which city did XX die? CITY
death Did XX die in a hospital? BI
death What is the name of the hospital where XX died? PL
death What was the cause of death? HU
death How old was XX when they died? HU
death What was the XX’s profession? HU
death How many siblings did XX have? HM
death List the first and last names of the siblings. FL
death What was the first and last name of the XX’s father? FL
death What was the first and last name of the XX’s mother? FL
accident Did this accident happen to you? BI
accident What was the first and last name of the main person involved? FL
accident What type of accident was this (e.g., fall, car crash, etc.)? HU
accident When did the accident occur? DATE
accident In which city did the accident occur? CITY
accident Where did the accident occur? (e.g., name of the street/park/building/school/room) PL
accident How many other people were involved, except that main person? HM
accident List the first and last names of people who were involved (other than the main person). FL
accident List the first and last names of people that got hurt and their injuries (e.g., Alice Red OPEN
had a broken nose, John Smith had a concussion).
Italicized questions are used only to form titles. XX in questions is replaced with relevant information from the title. HM
questions are used only to inform the system how many text boxes to show for the next question and how to phrase the
next question (e.g., List the first and last names of N people that were there).

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:30 S. S. Woo et al.

Table 10. Some Samples of Semi-guided Input Questions

topic question fact category


engagement Was this your engagement? BI
engagement Please provide a title for this engagement. ST
engagement List two people that are related to this engagement, other than the couple. For each FLH
person, specify a hint that will help you recall them during authentication.
engagement List one location that is related to this engagement or the couple. Specify a hint that PLH
will help you recall this location during authentication.
engagement List two objects related to this engagement or the couple. For each object, specify a OBJH
hint that will help you recall it during authentication.
wedding Was this your wedding? BI
wedding Please provide a title for this wedding. ST
wedding List two people that are related to this wedding, other than the couple. For each FLH
person, specify a hint that will help you recall them during authentication.
wedding List one location that is related to this wedding or the couple. Specify a hint that will PLH
help you recall this location during authentication.
wedding List two objects related to this wedding or the couple. For each object, specify a hint OBJH
that will help you recall it during authentication.
birth What was the first and last name of the baby? FL
birth List two people that are related to this birth, other than the baby. For each person, FLH
specify a hint that will help you recall them during authentication.
birth List one location that is related to this birth. Specify a hint that will help you recall PLH
this location during authentication.
birth List two objects related to this birth. For each object, specify a hint that will help you OBJH
recall them during authentication.
death What was the first and last name of the person who passed away? FL
death List two people that are related to this death, other than the deceased. For each FLH
person, specify a hint that will help you recall them during authentication.
death List one location that is related to this death. Specify a hint that will help you recall PLH
this location during authentication.
death List two objects related to this death. For each object, specify a hint that will help you OBJH
recall them during authentication.
accident Did this accident happen to you? BI
accident What was the first and last name of the main person involved? FL
accident What type of accident was this (e.g., fall, car crash, etc.)? HU
accident List two people that are related to this accident, other than the main person. For each FLH
person, specify a hint that will help you recall them during authentication.
accident List one location that is related to this accident. Specify a hint that will help you recall PLH
this location during authentication.
accident List two objects related to this accident. For each object, specify a hint that will help OBJH
you recall them during authentication.
graduation Was this your graduation? BI
graduation What type of school was this graduation for (e.g., college, elementary, high school)? TN
graduation Please provide a title for this graduation. FL
graduation List two people that are related to this graduation, other than the main person. For FLH
each person, specify a hint that will help you recall them during authentication.
graduation List one location that is related to this graduation. Specify a hint that will help you PLH
recall this location during authentication.
graduation List two objects related to this graduation. For each object, specify a hint that will help OBJH
you recall them during authentication.

(Continued)

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:31

Table 10. Continued

topic question fact category


party Provide a title for this party. ST
party List two people that are related to this party. For each person, specify a hint that will FLH
help you recall them during authentication.
party List one location that is related to this party. Specify a hint that will help you recall PLH
this location during authentication.
party List two objects related to this party. For each object, specify a hint that will help you OBJH
recall them during authentication.
trip Did you visit the same location at any other trip? BI
trip What was your main destination on this trip? PLH
trip List two people that are related to this trip. For each person, specify a hint that will FLH
help you recall them during authentication.
trip List one location that is related to this trip. Specify a hint that will help you recall this PLH
location during authentication.
trip List two objects related to this trip. For each object, specify a hint that will help you OBJH
recall them during authentication.
Italicized questions are used only to form titles. XX in questions is replaced with relevant information from the title. FLH =
FL + Hint, PLH = PL + Hint, OBJH = OBJ + Hint.

Table 11. Guided Input: A Sample LEP

Question Answer
“Learning to swim”
Which year did you learn to swim? 2010
In which city did you learn to swim? Jaipur
Provide the name of the lake, sea, beach, pool or river where you learned to swim. Pico Paradise
What was the full name of the swim school? Pico
List the first and last name of one person that taught you to swim? Ashay Ramani
Names have been anonymized to protect participant privacy.

Table 12. Semi-guided Input: A Sample LEP

Question Answer
“My high-school graduation”
List the first and last name of “first in line.” Nana C
List the first and last name of “second in line.” Dad Bob
Specify the location where “graduation took place.” gym
Specify the object related to “new for graduation.” tan shirt
Specify the object related to “from Steven’s family.” rose
Names have been anonymized to protect participant privacy.

A.2 Survey Questionnaires


Instruction. Please rate the following qualities of ordinary and life-experience passwords on the
scale of 0 (lowest) to 10 (highest).
For Ordinary passwords:

(1) How difficult it was to create one ordinary password? (0–10 scale)
(2) How difficult it was to come up with different ordinary passwords? (0–10 scale)
(3) How secure are your ordinary passwords against guessing by a stranger? (0–10 scale)

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:32 S. S. Woo et al.

(4) How secure are your ordinary passwords against guessing by a close friend? (0–10 scale)
(5) How difficult it was to remember your ordinary passwords? (0–10 scale)
For LEPs:
(1) How difficult it was to create one LEP? (0–10 scale)
(2) How difficult it was to come up with different LEPs? (0–10 scale)
(3) How secure are your LEPs against guessing by a stranger? (0–10 scale)
(4) How secure are your LEPs against guessing by a close friend? (0–10 scale)
(5) How difficult it was to remember your LEPs? (0–10 scale)
For Acq/Close Guessing:
Instruction. Please help us understand the closeness by assigning the score to it from 0 (do not
know him/her well) to 5 (family member).
(1) Closeness (0–5 scale)
Instruction. Please help us understand which source of information you used in guessing your
friend’s password by assigning the score to it from 0 (not used at all) to 10 (used all the time).
(1) Personal knowledge (0–10 scale)
(2) My friend’s profile on social network (Facebook, Google plus) (0–10 scale)
(3) Search engine (0–10 scale)
(4) Other methods (0–10 scale)
Tables 11 and 12.

ACKNOWLEDGMENTS
We thank Jason Hong (CMU) for valuable feedback on previous versions of this manuscript. We
also thank anonymous reviewers for their helpful feedback on drafts of this journal.

REFERENCES
Lee Averell and Andrew Heathcote. 2011. The form of the forgetting curve and the fate of memories. J. Math. Psychol. 55,
1 (2011), 25–35.
Anders Björkelund, Love Hafdell, and Pierre Nugues. 2009. Multilingual semantic role labeling. In Proceedings of the
13th Conference on Computational Natural Language Learning: Shared Task. Association for Computational Linguistics,
43–48.
The GitHub Blog. 2016. GitHub Security Update: Reused password attack. Retrieved from https://github.com/blog/
2190-github-security-update-reused-password-attack.
Hristo Bojinov, Daniel Sanchez, Paul Reber, Dan Boneh, and Patrick Lincoln. 2012. Neuroscience meets cryptography:
Designing crypto primitives secure against rubber hose attacks. In Proceedings of the 21st USENIX Conference on Security
Symposium. USENIX Association, 33–33.
Joseph Bonneau. 2012. The science of guessing: Analyzing an anonymized corpus of 70 million passwords. In Proceedings
of the IEEE Symposium on Security and Privacy. Retrieved from http://www.jbonneau.com/doc/B12-IEEESP-analyzing_
70M_anonymized_passwords.pdf.
Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, and Mike Williamson. 2015a. Secrets, lies, and account recovery:
Lessons from the use of personal knowledge questions at google. In Proceedings of the 24th International Conference on
World Wide Web. International World Wide Web Conferences Steering Committee, 141–150.
Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano. 2012. The quest to replace passwords: A frame-
work for comparative evaluation of web authentication schemes. In Proceedings of the IEEE Symposium on Security and
Privacy. Retrieved from http://www.jbonneau.com/doc/BHOS12-IEEESP-quest_to_replace_passwords.pdf.
Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano. 2015b. Passwords and the evolution of imperfect
authentication. Commun. ACM (July 2015). Retrieved from http://www.jbonneau.com/doc/BHOS15-CACM-imperfect_
authentication.pdf.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:33

Joseph Bonneau, Mike Just, and Greg Matthews. 2010. What’s in a name? In Financial Cryptography and Data Security.
Springer, 98–113.
N. M. Bradburn, L. J. Rips, and S. K. Shevell. 1987. Answering autobiographical questions: The impact of memory and
inference on surveys. Science 236, 4798 (1987).
Brain Authentication. 2016. http://brainauth.com/testdrive/.
Claude Castelluccia, Markus Dürmuth, and Daniele Perito. 2012. Adaptive password-strength meters from Markov models.
In Proceedings of the Network and Distributed System Security Symposium (NDSS’12).
Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceed-
ings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Vol. 1. 740–750.
Cognitive password. 2016. http://en.wikipedia.org/wiki/Cognitive password/.
Microsoft Corporation. 2015. Sketch-based password authentication. US Patent number 8,024,775.
Sauvik Das, Eiji Hayashi, and Jason I. Hong. 2013. Exploring capturable everyday memory for autobiographical authenti-
cation. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 211–220.
Darren Davis, Fabian Monrose, and Michael K. Reiter. 2004. On user choice in graphical password schemes. In Proceedings
of the USENIX Security Symposium, Vol. 13. 11–11.
Matteo Dell’Amico and Maurizio Filippone. 2015. Monte Carlo strength evaluation: Fast and reliable password checking.
In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 158–169.
Tamara Denning, Kevin Bowers, Marten van Dijk, and Ari Juels. 2011. Exploring implicit memory for painless password
recovery. In Proceedings of the Annual Conference on Human Factors in Computing Systems. ACM, 2615–2618.
Geoff Duncan. 2016. Why haven’t biometrics replaced passwords yet? http://www.digitaltrends.com/android/
can-biometrics-secure-our-digital-lives/.
Hermann Ebbinghaus. 1913. Memory: A Contribution to Experimental Psychology. Number 3. University Microfilms.
Named entity recognition. 2015. Retrieved on October 10, 2014 from https://en.wikipedia.org/wiki/Named-entity_
recognition.
Pluggable Authentication Modules for Linux (PAM). 2015. Retrieved on October 10, 2014 from http://www.linux-pam.org/.
Freebase. 2016. http://www.freebase.com/.
Virgil Griffith and Markus Jakobsson. 2005. Messin’ with texas: Deriving mother’s maiden names using public records. In
Applied Cryptography and Network Security. Springer, 91–103.
NIST Electronic Authentication Guideline. 2006. NIST Special Publication 800-63 Version 1.0. 2.
Ameya Hanamsagar, Simon S. Woo, Chris Kanich, and Jelena Mirkovic. 2018. Leveraging semantic transformation to in-
vestigate password habits and their causes. In Proceedings of the Conference on Human Factors in Computing Systems
(CHI’18). ACM, 570.
Alina Hang, Alexandre De Luca, and Heinrich Hussmann. 2015. I know what you did last week! Do you?: Dynamic security
questions for fallback authentication on smartphones. In Proceedings of the 33rd Annual ACM Conference on Human
Factors in Computing Systems (CHI’15). 1383–1392.
Michael Heilman and Noah A. Smith. 2010. Good question! Statistical ranking for question generation. In Proceedings of the
2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies. Association for Computational Linguistics, 609–617.
Jun Ho Huh, Seongyeol Oh, Hyoungshick Kim, Konstantin Beznosov, Apurva Mohan, and S Raj Rajagopalan. 2015. Sur-
pass: System-initiated user-replaceable passwords. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and
Communications Security. ACM, 170–181.
Google Inc. 2015. Facial Recognition. U.S. Patent number 8,457,367.
Google Inc. 2016. 10,000 Most Common English Words. Retrieved October 10, 2015 from https://github.com/first20hours/
google-10000-english/.
Ian Jermyn, Alain Mayer, Fabian Monrose, Michael K. Reiter, and Aviel D. Rubin. 1999. The design and analysis of graphical
passwords. In Proceedings of the 8th USENIX Security Symposium. Washington DC, 1–14.
Mike Just and David Aspinall. 2009. Personal choice and challenge questions: A security and usability assessment. In
Proceedings of the 5th Symposium on Usable Privacy and Security. ACM, 8.
Patrick Gage Kelley, Saranga Komanduri, Michelle L. Mazurek, Richard Shay, Timothy Vidas, Lujo Bauer, Nicolas Christin,
Lorrie Faith Cranor, and Julio Lopez. 2012. Guess again (and again and again): Measuring password strength by simulat-
ing password-cracking algorithms. In Proceedings of the IEEE Symposium on Security and Privacy (SP’12). IEEE, 523–537.
Elizabeth A. Kensinger. 2007. Negative emotion enhances memory accuracy: Behavioral and neuroimaging evidence. Curr.
Direct. Psychol. Sci. 16, 4 (2007), 213–218.
Saranga Komanduri, Richard Shay, Lorrie Faith Cranor, Cormac Herley, and Stuart Schechter. 2014. Telepathwords: Pre-
venting weak passwords by reading users’ minds. In 23rd USENIX Security Symposium (USENIX Security’14). 591–606.
Cynthia Kuo, Sasha Romanosky, and Lorrie Faith Cranor. 2006. Human selection of mnemonic phrase-based passwords.
In Proceedings of the 2nd Symposium on Usable Privacy and Security. ACM, 67–78.

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:34 S. S. Woo et al.

LTH Semantic Role Labeler. 2015. Retrieved October 10, 2014 from http://barbar.cs.lth.se:8081/.
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stan-
ford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Compu-
tational Linguistics: System Demonstrations. 55–60. Retrieved from http://www.aclweb.org/anthology/P/P14/P14-5010
Luke Mastin. 2016. The Human Memory. Retrieved from http://www.human-memory.net/.
Nicholas Micallef and Nalin Asanka Gamagedara Arachchilage. 2017. A gamified approach to improve users? Memorability
of fall-back authentication. Proceedings of the Symposium on Usable Privacy and Security (SOUPS’17).
Mnemonic Guard. 2015a. http://www.mneme.co.jp/english/index.html.
Mnemonic Guard Blog. 2015b. http://mnemonicguard.blogspot.com/.
A. Niedzwienska. 2003. Distortion of autobiographical memories. Appl. Cogn. Psychol. 17, 1 (2003), 81–91.
Ann Nosseir, Richard Connor, and M. D. Dunlop. 2005. Internet authentication based on personal history—A feasibility
test. In Proceedings of Customer Focused Mobile Services Workshop.
THE Corpus of Contemporary American English (COCA). 2015. Retrieved October 10, 2014 from http://corpus.byu.edu/
coca/.
Part of-speech tagging. 2015. Retrieved October 10, 2014 from https://en.wikipedia.org/wiki/Part-of-speech_tagging.
Semantic role labeling. 2015. Retrieved October 10, 2014 from https://en.wikipedia.org/wiki/Semantic_role_labeling.
Stuart Schechter, A. J. Bernheim Brush, and Serge Egelman. 2009. It’s no secret. Measuring the security and reliability of
authentication via ’secret’ questions. In Proceedings of the 30th IEEE Symposium on Security and Privacy. IEEE, 375–390.
Richard Shay, Patrick Gage Kelley, Saranga Komanduri, Michelle L. Mazurek, Blase Ur, Timothy Vidas, Lujo Bauer, Nico-
las Christin, and Lorrie Faith Cranor. 2012. Correct horse battery staple: Exploring the usability of system-assigned
passphrases. In Proceedings of the 8th Symposium on Usable Privacy and Security. ACM, 7.
Richard Shay, Saranga Komanduri, Adam L. Durity, Phillip Seyoung Huh, Michelle L. Mazurek, Sean M. Segreti, Blase Ur,
Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor. 2014. Can long passwords be secure and usable? In Proceedings
of the 32nd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2927–2936.
Anil Somayaji, David Mould, and Carson Brown. 2013. Towards narrative authentication: Or, against boring authentication.
In Proceedings of the Workshop on New Security Paradigms Workshop. ACM, 57–64.
Amazon Mechanical Turk. 2018. https://www.mturk.com/.
Blase Ur, Felicia Alfieri, Maung Aung, Lujo Bauer, Nicolas Christin, Jessica Colnago, Lorrie Cranor, Harold Dixon,
Pardis Emami Naeini, Hana Habib, Noah Johnson, and William Melicher. 2017. Design and evaluation of a data-driven
password meter. In Proceedings of the 35th Annual ACM Conference on Human Factors in Computing Systems (CHI’17).
Rafael Veras, Christopher Collins, and Julie Thorpe. 2014. On the semantic patterns of passwords and their security impact.
In Proceedings of the Network and Distributed System Security Symposium (NDSS’14).
Rick Wash, Emilee Rader, Ruthie Berman, and Zac Wellmer. 2016. Understanding password choices: How frequently entered
passwords are re-used across websites. In Proceedings of the Symposium on Usable Privacy and Security (SOUPS’16).
Wikipedia, The Free Encyclopedia. 2004. Retrieved on February 12, 2016 from https://en.wikipedia.org/wiki/Main_Page.
Simon Woo, Elsi Kaiser, Ron Artstein, and Jelena Mirkovic. 2016a. Life-experience passwords (LEPs). In Proceedings of the
32nd Annual Conference on Computer Security Applications. ACM Press, 113–126.
Simon S. Woo, Zuyao Li, and Jelena Mirkovic. 2016b. Good automatic authentication question generation. In Proceedings
of the 9th International Natural Language Generation Conference. 203.
Simon S. Woo and Jelena Mirkovic. 2016. Improving recall and security of passphrases through use of mnemonics. In
Proceedings of the 10th International Conference on Passwords (Passwords).
Takumi Yamamoto, Atsushi Harada, Takeo Isarida, and Masakatsu Nishigaki. 2008. Improvement of user authentication
using schema of visual memory: Exploitation of “schema of story.” In Proceedings of the 22nd International Conference
on Advanced Information Networking and Applications (AINA’08). IEEE, 40–47.
Xuchen Yao, Emma Tosch, Grace Chen, Elnaz Nouri, Ron Artstein, Anton Leuski, Kenji Sagae, and David Traum. 2012.
Creating conversational characters using question generation tools. Dialog. Discourse 3, 2 (2012), 125–146.

Received March 2018; revised October 2018; accepted January 2019

ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.

You might also like