Professional Documents
Culture Documents
Episodic Memory User Auth
Episodic Memory User Auth
Passwords are widely used for user authentication, but they are often difficult for a user to recall, easily
cracked by automated programs, and heavily reused. Security questions are also used for secondary authen-
tication. They are more memorable than passwords, because the question serves as a hint to the user, but
they are very easily guessed. We propose a new authentication mechanism, called “life-experience passwords
(LEPs).” Sitting somewhere between passwords and security questions, an LEP consists of several facts about
a user-chosen life event—such as a trip, a graduation, a wedding, and so on. At LEP creation, the system ex-
tracts these facts from the user’s input and transforms them into questions and answers. At authentication,
the system prompts the user with questions and matches the answers with the stored ones. We show that
question choice and design make LEPs much more secure than security questions and passwords, while the
question-answer format promotes low password reuse and high recall.
Specifically, we find that: (1) LEPs are 109 –1014 × stronger than an ideal, randomized, eight-character pass-
word; (2) LEPs are up to 3× more memorable than passwords and on par with security questions; and (3) LEPs
are reused half as often as passwords. While both LEPs and security questions use personal experiences for 11
authentication, LEPs use several questions that are closely tailored to each user. This increases LEP security
against guessing attacks. In our evaluation, only 0.7% of LEPs were guessed by casual friends, and 9.5% by
family members or close friends—roughly half of the security question guessing rate. On the downside, LEPs
take around 5× longer to input than passwords. So, these qualities make LEPs suitable for multi-factor au-
thentication at high-value servers, such as financial or sensitive work servers, where stronger authentication
strength is needed.
CCS Concepts: • Security and privacy → Usability in security and privacy;
Additional Key Words and Phrases: Authentication, password, security question, usability, template
ACM Reference format:
Simon S. Woo, Ron Artstein, Elsi Kaiser, Xiao Le, and Jelena Mirkovic. 2019. Using Episodic Memory for User
Authentication. ACM Trans. Priv. Secur. 22, 2, Article 11 (March 2019), 34 pages.
https://doi.org/10.1145/3308992
This research was supported by the Ministry of Science and ICT (MSIT), Korea, under the ICT Consilience Creative program
(Grant No. IITP-2017-R0346-16-1007) supervised by the Institute for Information & communications Technology Promotion
(IITP), by NRF of Korea by the MSIT (Grant No. NRF-2017R1C1B5076474).
Authors’ addresses: S. S. Woo, 2066 Seobu-ro jangan-gu, Sungkyunkwan University, Suwon, 16419, South Korea; email:
swoo@g.skku.edu; R. Artstein, USC ICT, 12015 Waterfront Drive, Playa Vista, CA 90094-2536; email: artstein@ict.usc.edu;
E. Kaiser, Department of Linguistics Grace Ford Salvatori 301 Los Angeles, CA 90089-1693; email: emkaiser@usc.edu; X.
Le and J. Mirkovic, USC ISI, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292; emails: {xiaole, mirkovic}@isi.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2471-2566/2019/03-ART11 $15.00
https://doi.org/10.1145/3308992
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:2 S. S. Woo et al.
1 INTRODUCTION
Textual passwords are widely used for user authentication. An ideal password should be easy
for a user to remember but difficult for others to guess. Users value these two requirements very
differently. High recall is very important, as it allows the user to access the desired service. Security
is important, but less so, as it only guards against an unwanted, but from a user’s perception
rare, unauthorized access. These different priorities lead to two prevalent approaches to password
creation. In one, users relate their passwords to some personally salient facts, such as names and
birth dates, which makes them easy to remember but also easy to guess (Veras et al. 2014; Bonneau
2012). In the other approach, users create several moderately strong passwords (Wash et al. 2016),
but then reuse them to achieve high recall. This lowers security, because passwords stolen from
one server can be used to gain access to another one (Blog 2016).
Many alternatives to textual passwords have been proposed, such as graphical passwords,
cognitive authentication, one-time passwords, hardware tokens, phone-aided authentication, and
biometric passwords. However, research has shown that none of these can offer convenience,
simplicity, and user familiarity (Bonneau et al. 2012, 2015b) comparable to that of textual
passwords. We thus focus on trying to improve text-based authentication. Specifically, we seek to
change how passwords are created and used to make their recall more natural, while improving
their security and diversity.
We observe that it is highly unnatural to humans to create new, complex memories (such as
passwords) and recall them in minute detail (e.g., recall capitalization and placement of special
characters) after a long time period, and without any hints. Humans remember by association,
relating new facts to existing memories (Mastin 2016). This makes it difficult to create and recall
many new, strong passwords. Humans also recall by reconstructing facts, sometimes imprecisely,
from relevant data stored in the brain (Mastin 2016). Thus, a user may recall that they used a family
member’s name plus their birth year for a password, but forget which family member they chose,
if they used their first or last name, how it was capitalized, and so on. This makes it difficult to
precisely recall passwords.
We propose a new authentication method using life-experience passwords (LEPs), which extract
authentication secrets out of a user’s existing memories related to a chosen life experience, such
as a trip, a graduation, a wedding, a place, and so on. This lowers users’ cognitive burden and
mimics what users already do—build passwords out of existing memories. To achieve high recall,
we transform existing user memories into a series of questions and answers. The questions are
used at authentication time as hints for the user, and the answers become the password. We further
apply imprecise matching at authentication. Authentication succeeds when a user recalls enough
answers with enough precision. LEPs have 2–3× higher recall than regular passwords;—73% are
recalled after a week and 54% are recalled after 3–6 months. LEPs are 109 –1014 × stronger than an
ideal, random, eight-character password, making them hard to guess via offline attacks. Friends
and family also have a hard time guessing LEPs. In our studies, only 0.7% of LEPs were guessed by
acquaintances and 9.5% by very close friends or family members. We further achieve low reuse of
LEPs by having different servers prompt users for different memories at password creation time.
In our studies, LEPs were reused half as often as passwords.
LEPs resemble security questions, in that both use personal experiences for authentication and
both achieve very high recall. But LEPs use more diverse, unique, and memorable personal facts
than security questions, and these are deeper, more specific facts. This makes LEPs 24–35× harder
to guess than security questions (Schechter et al. 2009). LEPs contain 2.4–3.2× fewer fake answers
compared to security questions.
There are two downsides to LEPs: (1) user burden for creating and authenticating with an LEP is
3–6× higher than when using passwords—a burden that may be prohibitive for some applications
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:3
and devices; (2) LEPs may contain sensitive information, which poses a privacy risk. While these
are serious drawbacks, our studies found that 93.7% of users would use LEPs for high-security
content (e.g., banking applications). We further found that only 3% of LEPs contained generally
sensitive information. We can reduce this risk with better user prompts, which discourage use of
sensitive information. We believe that LEPs are a promising authentication method for some users
and certain applications, where the large benefits would outweigh the user burden and privacy
risk.
LEPs were originally introduced in Woo et al. (2016a), where we described their design and two
reference implementations and evaluated their security, recall, diversity, and resistance to acquain-
tance guessing. In this article, we extend our prior work to add additional implementation, provide
more details on automatic question generation, improve our evaluation of LEPs’ security, and eval-
uate resistance to guessing by family or very close friends. Thus, the current article represents a
complete study of LEPs for user authentication.
The article is organized as follows. We discuss related work to LEPs in Section 2. We present LEP
design in Section 3, and we detail the setup of our human studies in Section 4. Section 5 presents
the results of our evaluation of LEPs. Section 6 summarizes our results, and Section 7 provides
discussions. Finally, we offer our conclusions in Section 8.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:4 S. S. Woo et al.
Fig. 1. Security questions versus LEP example for “high school” topic.
Applicability. Organizations usually offer a very limited choice of security questions. There may
be users to whom no question applies. For example, a user may not have a favorite high school
teacher or a best friend, or may have multiple teachers/friends that she likes. When faced with
such questions users select answers that they do not recall at authentication time. Schechter et al.
(2009) and Bonneau et al. (2015a) measured that 20–40% of security questions cannot be recalled
by users. Conversely, during LEP creation users can choose, with very little constraint, the topic
they want to talk about and the facts about that topic, which are memorable to them. This leads
to more personalized facts and thus higher recall.
Depth of facts. Security questions ask for shallow facts or factoids (e.g., pet’s name, best friend’s
name), which are generally applicable to many users. Such facts can be mined from public sources
(Griffith and Jakobsson 2005; Schechter et al. 2009) or guessed using statistical attacks (Bonneau
et al. 2015a). Easy guessability leads users to provide fake answers to security questions, which
leads to low recall when they cannot remember the fake answers. However, LEPs ask for deep
facts—deeper contextual information about the user involved memorable events, such as memo-
rable people, places, activities, or objects. Answers to such questions are not easily found on social
networks, or guessed by family and friends, which removes the need for users to lie. In our studies,
only 0.7% of LEPs were guessed by friends (compared to 17–25% of security questions (Schechter
et al. 2009)) and only 11.5–15.7% of answers were fake (compared to 37% for security questions
(Bonneau et al. 2015a)).
Number of facts. Security questions contain only one fact, which may be easily guessed or ob-
tained from public sources. LEPs contain a larger number of facts, and a user must recall most or
all of them for authentication. Thus, the barrier for a successful guessing attack is higher.
Another approach to security questions is to let users choose the questions themselves. This
allows users to freely choose facts that are relevant to them, but decreases security (Just and
Aspinall 2009; Schechter et al. 2009). While LEPs also allow users to choose which facts to provide,
our fact elicitation guides them toward secure, memorable, and stable facts. This allows LEPs to
outperform user-chosen security questions.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:5
Fig. 2. LEP creation and authentication (either precise or imprecise matching can be applied for authenti-
cation, where imprecise matching improves usability).
Cognitive Passwords. Similar to LEPs, cognitive passwords are based on personal facts, in-
terests, and opinions that are thought to be easily recalled by a user. Article Cognitive password
(2016) provides an overview, definitions, and some examples of cognitive passwords. Das et al.
(2013) and Nosseir et al. (2005) explore autobiographical authentication that uses facts about past
events, which are captured by smartphones or calendars, without any user input. Similarly Hang
et al. (2015) explore use of call, SMS, and application use/installation facts on smartphones to au-
thenticate phone users. While phone usage and calendar information may be memorable in short
intervals after it is collected, humans do not remember ordinary daily events for long periods nor
with great consistency (Bradburn et al. 1987). Recall of automatically gathered autobiographical
information ranged from 64% to 75%, while acquaintances could guess 40–50% of correct answers.
However, LEPs require more user effort during creation but elicit more salient facts (Niedzwienska
2003; Bradburn et al. 1987). LEP recall is 70–73% after one week, but it remains stable over long
time periods (53–54% after 3–6 months), and only 0.7% of LEPs can be guessed by acquaintances.
Mnemonic. While mnemonic passwords can improve memorability, Kuo et al. (2006) found
that a majority of users’ mnemonic passwords are from popular phrases that can be found on the
Internet. Woo and Mirkovic (2016) proposed hint- and guide-mnemonics to help users create strong
and memorable passphrases. They successfully demonstrated methods to prevent users from using
phrases from popular quotes and dictionary words, when creating mnemonic-based passphrases.
Narrative Authentication. Somayaji et al. (2013) proposed use of narratives for user authen-
tication but did not evaluate them. Their narratives require users to associate imaginary objects
with past memories (e.g., contents of a drawer from a childhood bedroom) and may also be fully
fictional. Because of fictional facts, we expect they would be less memorable than LEPs.
3 LIFE-EXPERIENCE PASSWORDS
This section describes the design and the implementation of LEPs. We discuss our choices for
LEP topics and facts in Section 3.1, attacker models and strength calculation in Section 3.2,
the LEP creation process in Section 3.3, fact extraction and question generation in Section 3.4,
and LEP-based authentication in Section 3.5.
Figure 2 shows the LEP creation process. A user identifies a life-experience she wants to use
for an LEP. She then inputs a title for this experience and recounts interesting facts, with some
guidance from the system. The system mines the facts from the user’s input, and transforms them
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:6 S. S. Woo et al.
Category Subcategory
Event Engagement, wedding, birth, death, accident, graduation, party, trip
Learning Driving, skiing, snowboarding, swimming, biking, skill/art, language
About Person, place
into question-and-answer pairs. Questions and the title must be stored as clear text, because they
are shown to users during authentication. To store the answers, we concatenate either all of them,
or several subsets (see Section 3.5), add the salt, and hash the resulting string(s). During authenti-
cation, the system displays the title and the questions, and user answers are hashed and compared
to those stored by the system using imprecise matching.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:7
Spain they took. If the answer was “no,” then we would ask the user to provide more words that
uniquely describe their chosen trip, e.g., “trip to Spain after college.”
An immutable fact does not change over time. For this reason, we do not use preferences and
opinions (e.g., “What is your favorite band?”) for building LEPs, because such facts are mutable.
During LEP creation, we mine facts about people, locations, time, objects, and activities.
These facts are objective, and thus immutable. People and location facts have high strength (see
the Section 3.2), while time, objects and activities have lower strength but still substantial strength.
Further, we have designed our elicitation process to produce very specific questions, and thus
stable facts.
Privacy risk. LEP questions and answers contain information about some past event, which
may pose privacy risk to a user if questions are observed by others, or if answers are guessed or
cracked and the facts, which are revealed are sensitive. We have evaluated how easy it would be for
attackers to guess LEPs in Section 5, and our results show that LEPs are very robust to guessing.
With regard to reveal of sensitive facts in LEPs, we have designed our fact elicitation process
to advise users against using sensitive or incriminating facts. We have also analyzed our facts to
determine how many of these could be labeled “generally sensitive.” We say that a fact is “generally
sensitive” if it talks about illness, injury, death, incarceration, crime, or love affair. We found that
only 3% of LEPs in our study had sensitive information. Predominantly these facts were supplied
in response to our “illness” or “death” topic suggestions. Removing these topic suggestions is thus
likely to greatly reduce sensitivity of user-provided information.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:8 S. S. Woo et al.
Table 2. Fact Categories and Their Statistical Strengths (See Appendix for Sources)
type, sorted from the most to the least popular, k is the number of facts in a given LEP, and fi is the
ith fact. This method of calculation assumes an ideal guessing strategy for the attacker, where he
keeps guessing the first fact until success, then moves on to the second, and so on. Such guessing
is unrealistic, since the LEPs are stored as hashes of fact combinations, and the attacker cannot
tell if he guessed a part of the combination correctly. Thus, our statistical strength estimate is the
lower bound of the actual strength of LEPs against statistical attacks.
Our method of attacker guessing also assumes fact independence. However, in real life, facts
are related to each other and some combinations will be more likely than others. We found it very
hard to find sufficient public data to explore these correlations and include them in our calcula-
tions. We thus address this issue in a limited and conservative manner: (1) for first and last name
combinations, which have the highest strength, we conservatively assume that there is only one
first name, which can be coupled with a given last name; (2) for objects, actions, and locations
related to a topic (e.g., wedding), we perform Bing search for that topic and mine most frequent
objects and locations from the answers. We use this information to create a ranked list of popular
items per topic, as described below.
To determine strength of facts against statistical attacks, we classify LEP facts into categories,
shown in Table 2. We assume an intelligent attacker, who can infer the fact’s category or topic
from the user’s authentication prompt, and guesses answers only within that category or topic.
We measure the statistical strength of a fact as: the rank of the fact on a ranked list of pop-
ular facts in its category or for a given topic. A challenge lies in creating sufficiently large and
comprehensive lists of popular facts, and ranking them based on their popularity.
We examined many different data sources, seeking to identify: (1) the total number of possible
facts, and (2) the ranked list of popular facts, within each fact category. We provide a brief explana-
tion of our data sources here and refer the readers to the Appendix for more details. Our estimates
and popular list sizes are shown in Table 2.
Statistical strength calculation. To calculate statistical strength, we needed lists of popular
facts in each category and for each topic, ranked by popularity. For per-category lists, we used on-
line domain-specific sources to form these lists. We gathered around 434,000 unique popular list
items from more than 530 different online sources. These sources include (1) Wikipedia/DBpedia
(Wikipedia, The Free Encyclopedia 2004), (2) Freebase (Freebase 2016), (3) U.S. Government
sources: U.S. Census, U.S Social Security Administration, Dept. of Education, Dept. of Labor Stat.,
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:9
National Center for Education Statistics, (4) Other domain specific online sources: TripAdviser
for popular travel destinations, Forbes and U.S. News for educational institutions, IMDB for movie
names, and so on, (5) Popular English word lists from Google 20K (Inc. 2016), and 5K nouns, words,
and lemma from the Corpus of Contemporary American English (COCA) (COCA 2015). We fur-
ther incorporated popular lists for different categories from the Bonneau et al. (2010) dataset. More
details are provided in the Appendix.
Some lists had items ranked by popularity, while others did not (e.g., the list of the 100 most
popular Chinese names from Wikipedia). For unranked lists, we used the Bing search engine to
calculate the number of Web pages containing each list item. We automatically built structured
queries as a “fact category + the item” (e.g., “first name Hao”) and mined the number of pages
from the search engine’s reply. We then assumed that the popularity of an item is proportional to
the number of Web pages containing it, and used this to rank the items in the list. While this may
not be an accurate reflection of the popularity of each item, we believe it is a good approximation
for relative rankings.
If our popular lists were too small, then we would overestimate the statistical strength of LEPs.
We show the count of popular lists per category, and their minimum and maximum sizes, as well
as the total number of unique items in Table 2. For example, in the FN (first name) category, we
have 384 popular lists, ranging from 3 to 38,717 inputs, and containing the total number of 150,695
unique names. We further evaluated how many facts collected in our user studies were covered by
our popular lists. We were able to find 75% of first name (FN), 99% of last name (LN), 81% of first
and last name (FL), 63% of city (CI), 54% of object (OBJ), 46% of activity (ACT), and 34% of place
(PL) inputs on our popular lists. Thus, our popular lists seem comprehensive enough for statistical
strength calculation.
To model attacker’s use of correlations among popular objects, activities and places for a given
topic (e.g., wedding), we crawled the most popular 100 websites returned by Bing search with a
query specific to each LEP topic. We used Selenium Python packages to crawl websites and applied
Natural Language Toolkit (NLTK) and WordNet to process web data and extract nouns and verbs
as shown in Figure 3. Then, for each topic, we created ranked lists of these nouns and verbs based
on their frequency.
We have multiple lists of popular facts per category, and objects, actions and locations also have
lists of popular facts per topic. We calculate the strength of an LEP fact as its lowest rank on any
list. This approach assumes a strong statistical guessing attacker, which has the best popular list
for each input.
3.3 Creation
LEP creation requires users to actively provide input, from which the system extracts useful facts.
An appropriate method for LEP generation should meet the following requirements: (1) It is easy
and natural for humans to use it, (2) It exposes a sufficient number of useful facts for any user. In
our work, we have investigated freeform, guided, and semi-guided methods for LEP input. These
methods are triggered after a user has chosen the topic they want to talk about and provided its
title. Figure 5 illustrates these input methods with one specific title “Trip to France.”
In the freeform method shown in Figure 5(a), the user inputs a paragraph of free narrative about
her chosen life experience. Out of this narrative, we automatically mine useful facts and gener-
ate question and answer pairs. We use question generation based on semantic and dependency
parsing (Woo et al. 2016b).
In the guided method a user is prompted with a series of questions, chosen from a fixed set
about each type of life experience. The questions are displayed one at a time and the choice of the
subsequent questions may depend on the user’s answers to the preceding ones. This is illustrated
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:10 S. S. Woo et al.
Fig. 3. Creating frequently used word lists from extracting terms by crawling the top 100 websites for each
topic.
Table 3. Comparison of Input Methods
in Figure 5(b). Some questions may be open-ended, e.g., “What else do you remember about . . . ?”.
The samples of questions used in the guided method are provided in the Appendix.
In the semi-guided method the user is prompted to input a certain number of facts in the given
category, and to provide a “hint” for each fact that will be used to form the authentication prompt.
This is illustrated in Figure 5(c). Also, the samples of questions used in the semi-guided method
are presented in the Appendix.
With regard to ease of input, freeform poses the highest burden on the user. User input in our
example (28 words) resulted in a much smaller amount of text, grouped into five useful facts
(9 words). In guided and semi-guided methods, our input methods guide the user toward use-
ful facts, such as names, locations, objects, and so on, and away from facts, which are not useful,
such as preferences, opinions and feelings. In our example, guided method required the user to
type 7 words, with 5 words leading to useful facts, while semi-guided input required 11 words,
and all led to useful facts. Guiding users toward specific facts lowers strength against statistical
attacks, since the attacker can focus his guessing within specific fact categories. It also increases
the chance that a user may not have answers to some questions (e.g., Who traveled with you on
this trip? No one). But lower guidance may also lower stability of extracted facts. The freeform
and semi-guided methods allow more freedom to the user to choose facts, but lead to less stable
facts, compared to the guided method. Freeform also leads to fewer useful facts, as users like to
chat about feelings and preferences, which cannot be used for LEPs. Finally, user input in natural
language requires some natural-language processing to be converted to question/answer format
LEPs need. Freeform input requires the highest support from NLP techniques, followed by semi-
guided and guided methods. Table 3 summarizes the relationships of the input methods along all
the above-discussed dimensions.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:11
role labeling (srl 2015), named entity recognition (ner 2015), and question generation seem well-
suited for fact extraction. We experimented with the two well-known question generation frame-
works: (1) Heilman & Smith (H&S) (Heilman and Smith 2010) and (2) OpenAryhpe (OA) (Yao et al.
2012). However, these tools generated long and complex questions, which leaked answers to other
questions. This lead us to develop our own fact extraction and generation method.
Our fact extraction and question generation proceed through the following stages: (1) pre-
processing, (2) segmentation of complex sentences into simpler ones, (3) post-processing, (4) sen-
tence combining, (5) dependency parsing or semantic role labeling, and (6) question generation.
Question generation returns multiple question and answer pairs and a fact category for each pair.
Our system selects the pair with the highest statistical strength (see Section 3.2). We use mate-
tools (Björkelund et al. 2009; bar 2015) for dependency parsing and semantic role labeling to iden-
tify important parts of speech such as subject, objects, action, roles, time, location, and so on. In
extracting and processing useful facts from user responses, first we normalize inputs for capitaliza-
tion and punctuation. We also use POS tagging (Manning et al. 2014), dependency parsing (Chen
and Manning 2014), noun stemming, and semantic role labeling (srl 2015) to extract the specific
parts of user responses, such as verbs, nouns, subject, object, location, time, action, and person in-
formation. This helps us transform facts into question/answer pairs, and to identify, for multi-word
inputs, those parts that carry the most meaning for the user (nouns, proper names of people and
locations, verbs, or adjectives). The overall question generation pipeline is provided in Figure 4,
and more details on how we construct questions are given in Woo et al. (2016b).
After performing fact extraction, we manually examined all the resulting facts, and corrected
them where needed to produce more useful guidance for users. Manual correction was needed
in approximately 50% of freeform inputs, and 5% of guided inputs, which were open-ended. The
main reasons for requiring manual correction were: (1) users could not adhere to the required
input format, e.g., we asked for complete sentences like “John Smith gave a book” but many users
would provide “John Smith, book” instead, (2) users would omit parts of the answer, (3) some
input was in complex sentences, which our segmentation process could not handle, e.g., “because
of the architecture of the library, that is where I developed my love and admiration books, and the
buildings which housed them,” (4) some input was in broken English, like “I am find very happy
at the moment of sky.”
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:12 S. S. Woo et al.
3.5 Authentication
Let an LEP contain N facts. We require that a user recalls M facts for authentication success, where
M ≤ N , and that the strength of the recalled facts be greater than some target value. The smaller
N N and M, the stronger the authentication criterion. Further, if M < N , the
the difference between
system must store M hashes for one LEP. During authentication, the system produces all possible
combinations of M user answers, and hashes them. Any match between these and stored hashes
leads to authentication success. We have explored different values for N and M, and provide more
information about their performance in Section 5.
Authentication may fail not only because a user forgot her answers but also because she recalled
them imprecisely. Imprecise recall of LEPs occurs due to a high redundancy of natural language,
as explained below. We address some sources of mismatch through imprecise matching, which
includes normalization, keyword extraction, and reorder matching. While imprecise matching will
reduce strength of LEPs, our evaluation shows that resulting LEPs still have high security (Sec-
tion 5), and that imprecise matching significantly improves recall.
Capitalization, reordering, and punctuation. A user might respond to the prompt using
different capitalization or punctuation than she did during password creation, or use a singular
instead of plural, or a different tense for a verb. We overcome this by normalizing user answers
before storage and authentication. This includes stemming verbs and converting all nouns to sin-
gular form. We also remove all capitalization and punctuation. A user may also list several parts
of the answer in a different order, e.g., she may reply “Nice, Paris,” where the original answer was
“Paris, Nice.” We resolve this through reorder matching. We detect when an answer may consist
of multiple parts, try all permutations of these parts in real time, and compare these with stored
hash values.
Misspelling. A user may misspell a reply during password creation or authentication. We leave
handling of misspelling for future work.
Synonyms. A user may reply to a question with a near-synonym to the extracted term, such as
responding with “lake” instead of “pond”; or with a term that is more specific (hyponym) or more
general (hypernym) than the expected term, such as “dog” instead of “poodle.” We leave handling
of synonyms, hyponyms, and hypernyms for future work.
Extraneous words. A user may provide extraneous words in an answer. For example, a ques-
tion “what was red?” may lead the user to input “my apple was red,” even though we expect just
“apple” as an answer. We address this via keyword extraction, and we store and try to match only
these keywords. We apply keyword extraction both during password creation and during authenti-
cation, in the same manner. The extraction method depends on the answer category (see Table 2).
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:13
Table 4. High Level Overview of the User Study Descriptions and Methods
From answers in OBJ category, we extract nouns only; from PL answers, we extract nouns and
out-of-dictionary words (likely place names); from ACT answers, we extract verbs only. Other
categories do not need keyword extraction.
4 USER STUDIES
We evaluated LEPs through a series of user studies and compared their performance against per-
formance of textual passwords and security questions. We implemented LEP creation and authen-
tication as a Web application, so that it can be used remotely. We then ran multiple user studies
over the period of three years, with Amazon MTurk participants (Turk 2018) and with students
at University of Southern California. We used results of early studies to refine and improve both
our user interface, and our elicitation process. In this article, we report on the four culminating
studies, which we used to evaluate LEP performance. The high-level overview of each user study
is presented in Table 4, which we describe in detail in the following sections.
All studies were approved by our Institutional Review Board (IRB). All communication with
participants was in English, and their inputs were also required to be in English. Participants were
required to be 18 years of age to participate.
Each participant was first shown the Information Sheet explaining the purpose of the study.
Following this she registered in one of three ways, depending on the specific study’s design: by
entering their MTurk ID, or their E-mail address, or the system assigned them a random identifier.
A participant then input their demographic information (age, gender, and native language) and
proceeded with the study. We describe our study design in this section and provide detailed results
and their interpretation in the next section.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:14 S. S. Woo et al.
designed semi-guided input as a way to allow users to tap into a larger set of useful facts without
jeopardizing usefulness of facts.
1 We initially attempted a phased study design, where a participant created one password (LEP or 3class8) and returned
after 1 week to authenticate. After authentication the participant created another password for the next cycle. However,
we had to abandon this study idea due to high attrition rates.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:15
to personal knowledge, we encouraged the guessers to fully utilize information available from
various social network sites and search engines. We observed that many participants in the study
indeed made use of these online information sources.
Participants were required to enroll into the study with at least one other friend (acquaintance)
at USC. We advertised the study via class announcements, wall posters, and flyers. Each participant
was paid $10, and took 45–60min to complete the study. Unlike our previous studies, this study
used deception in the Information Sheet (approved by our IRB), by not informing the participants
that they will be guessing each others’ LEPs. This was necessary to prevent participants from
intentionally creating LEPs, which would be either too easy or too hard for a given acquaintance
to guess. We designed our study to mimic the real-life password use, where one does not know
who may try to guess their password.
After reviewing the Information Sheet, each participant was asked how close they were with
their friend on the scale from 1 to 5 (closest), and how long they knew each other. Next, they
were asked to create three LEPs for three different online accounts: Gmail, Facebook, and Bank of
America. We displayed corresponding logos during creation and authentication. Each participant
was randomly assigned to either guided or semi-guided input method, and created all LEPs using
this method.
Next, participants were asked to authenticate with each LEP, and were allowed unlimited num-
ber of trials, but required to make at least three. After authentication, we debriefed each participant
about the deception and explained our reasons for this. We informed them that their friend would
be guessing their LEP and vice versa. They were offered a chance to quit the study at this time,
and still receive the full payment. No participants quit. Each participant attempted to guess her
friend’s LEP’s, and was allowed unlimited number of trials, and required to make at least three.
Afterwards, we reviewed successfully guessed answers with participants, asking them about their
strategy. We ended the study with a short survey about usability of LEPs. Last, we asked the par-
ticipants not to disclose details about deception to other students on campus, so we could continue
recruitment.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:16 S. S. Woo et al.
During guessing, a guesser was asked to answer how long they knew the creator and in which
capacity. They were then allowed to make as many guesses as they wanted to guess the creator’s
LEP. At least three guesses were required to complete the study. At the end, we asked guessers
about the sources of information they used in guessing. Finally, we notified the creator that the
guesser completed the study.
5 STUDY RESULTS
In this section, we report on the results of our user studies. We found that: (1) freeform LEPs have
decent security but abysmal recall, (2) guided and semi-guided LEPs (in the rest of this paragraph,
we just call them “LEPs” for short) are 109 –1013 × stronger than an ideal, randomized, 8-character
password, (3) acquaintance success is guessing LEPs is 24–35× lower than for security questions,
(4) family/close friend success in guessing semi-guided LEPs is 1.8–2.6× lower than for security
questions, (5) LEPs are 2–3× more memorable than passwords, (6) LEPs are reused half as often as
passwords, and (7) LEPs contain 2.4–3.2× fewer fake answers than security questions.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:17
Table 5. Participant Statistics and Results of Our Studies, Compared to the Baseline:
Passwords and Security Questions
12 all-fact — — — 0.7% 0%
13 five-fact — — — 0.7% 0% 17–25%
Acquaintance
14 four-fact — — — 0.7% 0% — [Schechter
Guessing
15 three-fact — — — 1.3% 4.5% et al. 2009]
16 all-fact — — — — 0%
Close Friend
17 five-fact — — — — 0% 17–25%
or Family
18 four-fact — — — — 9.5% — [Schechter
Guessing
19 three-fact — — — — 11.9% et al. 2009]
5.3 Security
In this section, we report and summarize findings from our Performance study with regard to
security of LEPs.
5.3.1 Statistical Strength. Table 5 shows the total number of LEPs, and their average statistical
strength in the third row. LEPs are 109 –1013 × stronger than an ideal, randomized, eight-character
password, with guided LEPs being 104 × stronger than semi-guided because of higher number of
useful facts.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:18 S. S. Woo et al.
5.3.2 Reuse Attack. We also explored strength of LEPs and passwords against a password-reuse
attacker. The results are shown in Table 5 in rows 22 and 23. We first investigated how many out
of 10 passwords were identical, for each given user. An LEP fact is said to be identical to a fact in
another LEP, by the same user, if their answers would match during authentication (accounting for
capitalization, reordering, punctuation and extraneous words). An LEP l 1 is identical to the LEP l 2
if all of l 1 ’s facts match the facts l 2 . There were 2.7–3.1% identical LEPs, compared to 5.7% identical
passwords.
We next investigated how many out of a user’s ten passwords were sufficiently similar to each
other, so that a password-reuse attacker could easily guess one if he knew the other. Because LEPs
are authenticated based on fact matches, and passwords based on the exact string match, it is hard
to devise a similarity measure that applies equally well to both concepts. To define similarity of two
authentication tokens (LEPs or passwords), we borrow from the Linux Pluggable Authentication
Modules (PAM) (PAM 2015) design. We say that two tokens t1 and t2 are similar, if more than 1/2
of items in t2 also appear in t1. For passwords, items are characters and for LEPs, items are facts.
This definition allows us to directly apply pam_cracklib to detect similar passwords. We say that
a password op1 is similar to the password op2 if one of the following conditions is met: (1) more
than half of op1’s characters appear in op2 (this includes cases when (2) op1 is a palindrome of op2
or a rotated version of op2), or (2) op1 differs from op2 only in case. There were 4.6–15.4% similar
LEPs, in semi-guided and guided categories, respectively, compared to 31.6% similar passwords.
Thus, passwords were reused more than twice as often as LEPs, and guided LEPs were reused 3×
more often than semi-guided LEPs.
One possible reason for such low LEP reuse is the way we prompted users for LEPs (described in
Section 4). Whenever an LEP was to be created, we offered to the participant one topic, randomly
selected from our list. The participant could reject the offered topics, until she found the one that
she wanted to talk about. These prompts seem to have stimulated recall of diverse memories of
past events, and lowered LEP reuse. We expect that servers that adopt LEPs would also use such
user prompts to lower reuse.
5.4 LEPs Are Strong Against Acquaintance and Family/Close Friend Guessing
This section discusses results of our AcqGuess and CloseGuess study.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:19
We show acquaintance guessing success in the Table 5, rows 13–16, using the same authentica-
tion schemes as in Section 3.5. Guess success rate is very low (0–0.7%) for all-fact authentication,
and climbs up to 1.3–4.5%, for three-fact authentication. Taken together with recall results, this
data suggests that the four-fact authentication seems to strike the right balance between achiev-
ing high authentication success (70–73%), and reasonable strength, while keeping the acquaintance
guessing success low (0–0.7%).
Compared to security questions, where friends/family could guess 17–25% (Schechter et al.
2009), LEP’s are 3.7–35× stronger, assuming four-fact or three-fact authentication. AcqGuess at-
tack success rate per fact category, for facts described in Table 2, is shown in column 4 of Table 6.
Acquaintances could successfully guess more than half of the cities, and more than 20% of rela-
tionships, places, first names, and facts in HU and TN categories. Other categories, such as FL, PL,
OBJ, DATE, and YR had lower guess success. This shows another strength of LEPs over security
questions. Bonneau et al. (2015a) found that security questions could not strike the right balance
between security and memorability, because facts that were memorable for users were also eas-
ily guessed (Schechter et al. 2009). Conversely, LEPs have multiple fact categories with high user
recall and low acquaintance guess rate (e.g., FN, FL, PL, OBJ).
Because guess rate for semi-guided LEPs was zero in this study, we continued investigation of semi-
guided LEPs only for close friend or family guessing.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:20 S. S. Woo et al.
Guess success rate per fact category, for facts described in Table 2, is shown in column 5 of
Table 6. Close friends and family could successfully guess 25.9% of names and 21.8% of places,
compared to 9.3% of objects. Thus, one way to increase strength of LEPs against close friend/family
attackers would be to increase the number of object facts.
5.5 Usability
In this section, we report on usability aspects of LEPs, with regard to recall and user-friendliness.
5.5.1 Short-term Recall. We report the percentage of successful authentications, after one week,
in Table 5, in the rows 5–12 for LEPs and for passwords. For LEPs, in addition to all-fact recall,
we investigated three alternative authentication schemes—five-fact, four-fact and three-fact recall.
In these, the user must successfully recall three, four, or five facts, respectively, and the statistical
strength of the recalled facts must exceed the strength of fully random, 3class8 password of 958
guesses, which we denote as S 3c8 .
Authentication success after 1 week is shown in rows 5–8 of Table 5. Password recall in our
study was 26% (others have found 45–70% recall (Komanduri et al. 2014; Shay et al. 2012, 2014),
but they asked users to recall only one password after 2 days). Our findings are consistent with
psychological literature (Averell and Heathcote 2011; Ebbinghaus 1913), where Ebbinghaus found
that people retain only 25% of new information they learned after six to seven days (Ebbinghaus
1913).
Imprecise matching is a trade-off between usability and security. Imprecise matching greatly
helps to increase LEP recall. With exact matching, all-fact authentication would be 19% for guided
and 9.6% for semi-guided LEPs. With imprecise matching, it is 31.6% and 45.7%. Imprecise matching
reduces strength of LEPs against statistical attacks, compared to exact matching, but only slightly.
This is because each fact has only a handful of variations, which are all counted as same by im-
precise matching (e.g., “see, saw, have seen, will see”). This small cost is more than justified by
significantly better recall, which also means faster authentication and better usability. Our sta-
tistical strength attack models already take into account imprecise matching, i.e., we account for
each noun and each verb once and do not match for order, punctuation or capitalization.
LEP authentication success with all facts is 30–75% higher than that for passwords. When we
require fewer facts for authentication the success rate increases significantly. At four facts, LEP
success is 2.7× higher than password success rate, and at three facts it is 3.2× higher. Allowing users
to authenticate with fewer than all facts lowers security of LEPs. Since we also require that the
statistical strength of recalled facts exceeds S 3c8 , the remaining strength of these “shortened LEPs”
is sufficiently large to thwart attacks. In Section 5.4, we will investigate how requiring M < N facts
for authentication affects strength against friend attacks.
Security questions have a wide range of recall rates after 1 month—from 32.1% for frequent flyer
number to 83.9% for city of birth (Bonneau et al. 2015a). LEP recall with four-fact and three-fact
authentication is 70–89.2% and thus resembles recall for memorable security questions. If an LEP
were equivalent to a set containing several security questions, then its best recall rate would be
83.94 = 50% for four-fact and 83.93 = 59% for three-fact authentication. The fact that LEP recall
exceeds these values shows the power of user-customized questions and imprecise matching, over
general questions and exact matching.
To understand reasons for failures to recall an LEP, we examine recall rate per fact category (as
given in Table 2). Table 6 shows the percentage of correctly recalled facts (in at least one attempt)
per fact category in column 2. Relationships and cities are most accurately recalled, followed by
items in the HU category, places, and first and last names. Overall, all categories except ACT have
recall of more than 70% after one week. While this is quite high, it may be puzzling why a user
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:21
would fail to recall all facts correctly, or at least at higher rates. We note some frequent reasons for
failed recall per category in column 3. Many of these could be handled by better NLP techniques,
e.g., trying synonyms during matching, building a database of common abbreviations (e.g., gf for
girlfriend), and so on. This would further improve recall, at some security cost. We leave this
direction for future work.
Overall there were 18.2% of facts, which a user failed to recall in any authentication attempt.
Users provide fake answers to security questions. They may also provide fake answers to LEPs,
which they could not later recall. While we cannot accurately establish which facts are fake and
which are not, we estimate this by looking for facts, which a user failed to recall, and where failure
cannot be attributed to the NLP reasons we listed in Table 6; i.e., it is a total miss. We find that
11.5–15.7% of facts were not recalled by users, and thus may be fake. This rate is less than a half of
the fake-answer rate for security questions (Bonneau et al. 2015a). A finer investigation of these
“fake facts” shows that about half of them fail authentication, because a user used an initial instead
of a last name in LEP creation, but reverted to full name in authentication. We could handle this
case with better authentication prompts.
5.5.2 Long Term Recall. We invited all participants in our performance study to authenticate
with their LEPs and passwords once again, in May 2015. A total of 54 participants returned. The
time between creation and authentication for these return participants ranged from 104 to 231
days, with a median of 120 days. Table 5 shows the long-term authentication success in rows 9–12.
While both LEP and password recall have declined, time lapse affected recall of passwords much
more than recall of LEPs. Password recall declined by 66%, while LEP recall declined by 17–47%.
We thus conclude that LEPs are more robust with regard to long-term recall, than passwords.
Security questions have a wide range of recall rates after 3–6 months—from 6.4% for frequent
flyer number to 79.2% for city of birth (Bonneau et al. 2015a). LEP recall with four-fact and three-
fact authentication is 53–73.6%, within the range of more memorable security questions.
5.5.3 Time to Create and Authenticate. We show the median time to create and authenticate an
LEP or a password in Table 5, in rows 24 and 25. LEPs require 6.7× longer to create and 3.3–4.6×
longer to authenticate, than passwords. This is expected as they require a user to both read the
questions and to provide input that is approximately five times longer than a password.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:22 S. S. Woo et al.
We further collected participant feedback on why guessing LEPs was hard. About half of re-
sponses (48%) stated that LEPs involved too much detail about the topic, which was hard to mine
from online sources or from personal knowledge. Unless the acquaintance participated in the same
event, it was difficult to mine correct answers online. In addition, 26% of participants acknowl-
edged that many facts were too personal and thus not shared between the acquaintance and the
user, nor did they appear in social network posts, such as events from early childhood and elemen-
tary school. These private and unique facts make guessing difficult, unlike facts used for security
questions, which can be easily found in online sources.
During a CloseGuess study, we collected information about guessing strategy using a survey
after the guessing phase. We asked friends to rate the use of personal knowledge, social network,
and search engine to guess information in 1 to 10 scale, where 10 being the highest use. The most
prominent use was personal knowledge with the average score of 7.1, while the average rating
of social network and search engine was 3.2 and 3.4, respectively. When asked why LEPs were
hard to guess, friends said the questions were general in nature, abstract or vague (33.3%), had
inadequate information (13.3%), or were too detailed/specific (53.3%).
5.6 Deployability
5.6.1 Storage. LEP answers should be concatenated and stored as one or several hashes. In case
of all-fact authentication, we store only one hash per LEP. If we N allow for M-fact authentication
(M = 3, 4, or 5), then we would have to create, hash, and store M combinations of facts for each
LEP, where N is the number of all facts: 77% participants would need up to 35 hashes, 88% would
need up to 70, 92% would need up to 126 hashes, and the worst case scenario would require 1,287
hashes. Even in the worst case, the storage cost would only amount to several Kilobytes per user,
which is negligible for today’s server storage. Because one-way hashing is fast, the processing
cost should also be acceptable. We could further limit the storage cost of LEPs by discarding all
but N strongest facts, at creation time. For M = [3, 4] and N = 8 each LEP would require at most
70 hashes.
We conclude that performance of LEPs was good. The strength of both guided and semi-guided
LEPs was much higher than strength of passwords, and recall for both when we allow for four-fact
authentication was comparable to recall of security questions. We thus continued evaluation of guided
and semi-guided LEPs for acquaintance and family/close-friend guessing.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:23
Table 7. Comparing LEPs to Security Questions and Passwords Using the UDS Framework
7 DISCUSSION
We now briefly discuss our findings, limitations, and practicality of LEPs for different uses. LEPs
were clearly superior to passwords with regard to their strength and recall. When servers prompted
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:24 S. S. Woo et al.
users with a random LEP topic, we also ended up with LEPs that were much more diverse than
passwords. LEPs thus successfully addressed the problem we started with—the balance between
security and memorability of passwords. However, this improvement comes at the cost of usability,
because LEPs take much longer to create and to input than passwords. Extended authentication
time may be especially burdensome to users on mobile devices, where keyboard input is slower
than on desktops/laptops. LEPs thus are not well-suited to all authentication tasks.
One possibility would be to use LEPs instead of passwords for first-time authentication, when
cookies have expired, when the user is accessing an online service from a new machine, or when
the user is logging onto his local machine after a logout. In these rare situations the added overhead
of LEPs may be acceptable to users, at the benefit of higher security. LEPs could further be activated
only for high-value accounts, such as bank or e-mail, where secure access is crucial.
LEPs could also be used for secondary or added authentication, instead of security questions.
We show in Section 5 that LEPs surpass security questions in security, recall and strength against
friend guessing. Many high-value services currently use text messages sent to user phone with a
code, to reduce risk of password cracking. LEPs could be used in lieu of the code, when a users
does not have access to her phone (e.g., during international travel).
LEPs could also be used for continuous authentication on high-security servers, such as gov-
ernment or bank servers. A logged-in user may be prompted for one or several facts after a period
of inactivity, to verify that she still has physical access to her computer.
One limitation of our work is that our strength measurement model may be imprecise. Answers
to specific LEP questions may be correlated both with the topic (e.g., users may talk about “cake”
in the context of a wedding) and among themselves. We modeled these correlations in a naive way
due to lack of publicly available information, but it is possible that attackers may come up with
a smarter strategy or have better access to useful data about correlations. Another limitation of
our work is due to the Hawthorne effect, where users tend to provide more positive answers in a
study than they would do in their real life. Therefore, low fake answers and password reuse rates
we obtained in studies could map into higher values in real life.
Yet another limitation of ours, and similar password studies, is that we measured recall over
relatively short time intervals and with small user populations. We were limited by the research
funds we had and could not afford a longer-term, larger-population study. In particular, studying
the likelihood of a user to share the specific memory they used for an LEP in their daily social
interactions would be an interesting future work.
LEPs could also be vulnerable to social engineering, when a user is coaxed by a friend to provide
an answer to their authentication questions. For example, if a user’s LEP contains the fact that “Joe
Smith” accompanied this user on a trip to Spain, a friend may ask “Who went with you to Spain?”
We hope that the specificity of LEP questions may represent a hindrance to social engineering
attacks. For example, if asked a “who” question the user will likely reply with too short an answer
(e.g., “John went with me”). But if asked a “first and last name” question the user may get suspicious,
because such question is too specific for a casual conversation. We hope to perform a user study
in the future to validate this intuition.
We have not tested LEPs with different cultures and did not consider the cross-cultural aspect.
For example, our initial list of 17 LEP topics in Table 1 relates to our life experiences and has worked
well in our user studies, but it may not be general and broad enough across different cultures. We
believe that it would be possible to identify a wealth of topics in any culture, which are relevant
to many people in that culture. We leave addressing of the cross-cultural aspects for future work.
It is possible that a negative emotional effect could arise when users are asked to recall a neg-
ative topic such as “death” or “accident.” According to psychology research by Kensinger (2007),
negative events are remembered in greater detail than positive ones, which would aid LEP recall.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:25
However, users may experience emotional pain if we asked them to recall traumatic events. And
we have already seen that these topics lead to leakage of sensitive information. We thus plan to
remove those negative topics from LEP offerings and replace them with more positive ones. We
leave the definition of such added topics for our future work.
Last, reminiscing about positive topics, such as “trip” or “wedding,” could have a positive emo-
tional effect on users, making password creation and recall more pleasant than it is today.
8 CONCLUSIONS
Textual passwords are a widely used form of authentication and suffer from many deficiencies,
because users trade security for memorability. Users create weak passwords because they are
memorable, and they reuse the same password across many sites. Forcing users to create strong
passwords does not help, as these are easily forgotten.
We have proposed life-experience passwords (LEPs) as a new authentication mechanism, which
strikes a good balance between security and memorability. We investigated several LEP designs
and evaluated them in two user studies. Our results show that LEPs are much more memorable
and secure than passwords, they are less often reused, and they are strong against friend guessing.
While they take more time to create and input during authentication, we believe their benefits
may make them a viable primary authentication mechanism for high-security servers or a much
better secondary authentication mechanism than security questions.
A APPENDIX
A.1 Popular List Sources and Sample Questions
Table 8 shows data sources, which we used to build our popular item lists. Some samples of our
guided questions and semi-guided questions are shown in Tables 9 and 10, respectively.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:26 S. S. Woo et al.
Table 8. Continued
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:27
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:28 S. S. Woo et al.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:29
Table 9. Continued
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:30 S. S. Woo et al.
(Continued)
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:31
Question Answer
“Learning to swim”
Which year did you learn to swim? 2010
In which city did you learn to swim? Jaipur
Provide the name of the lake, sea, beach, pool or river where you learned to swim. Pico Paradise
What was the full name of the swim school? Pico
List the first and last name of one person that taught you to swim? Ashay Ramani
Names have been anonymized to protect participant privacy.
Question Answer
“My high-school graduation”
List the first and last name of “first in line.” Nana C
List the first and last name of “second in line.” Dad Bob
Specify the location where “graduation took place.” gym
Specify the object related to “new for graduation.” tan shirt
Specify the object related to “from Steven’s family.” rose
Names have been anonymized to protect participant privacy.
(1) How difficult it was to create one ordinary password? (0–10 scale)
(2) How difficult it was to come up with different ordinary passwords? (0–10 scale)
(3) How secure are your ordinary passwords against guessing by a stranger? (0–10 scale)
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:32 S. S. Woo et al.
(4) How secure are your ordinary passwords against guessing by a close friend? (0–10 scale)
(5) How difficult it was to remember your ordinary passwords? (0–10 scale)
For LEPs:
(1) How difficult it was to create one LEP? (0–10 scale)
(2) How difficult it was to come up with different LEPs? (0–10 scale)
(3) How secure are your LEPs against guessing by a stranger? (0–10 scale)
(4) How secure are your LEPs against guessing by a close friend? (0–10 scale)
(5) How difficult it was to remember your LEPs? (0–10 scale)
For Acq/Close Guessing:
Instruction. Please help us understand the closeness by assigning the score to it from 0 (do not
know him/her well) to 5 (family member).
(1) Closeness (0–5 scale)
Instruction. Please help us understand which source of information you used in guessing your
friend’s password by assigning the score to it from 0 (not used at all) to 10 (used all the time).
(1) Personal knowledge (0–10 scale)
(2) My friend’s profile on social network (Facebook, Google plus) (0–10 scale)
(3) Search engine (0–10 scale)
(4) Other methods (0–10 scale)
Tables 11 and 12.
ACKNOWLEDGMENTS
We thank Jason Hong (CMU) for valuable feedback on previous versions of this manuscript. We
also thank anonymous reviewers for their helpful feedback on drafts of this journal.
REFERENCES
Lee Averell and Andrew Heathcote. 2011. The form of the forgetting curve and the fate of memories. J. Math. Psychol. 55,
1 (2011), 25–35.
Anders Björkelund, Love Hafdell, and Pierre Nugues. 2009. Multilingual semantic role labeling. In Proceedings of the
13th Conference on Computational Natural Language Learning: Shared Task. Association for Computational Linguistics,
43–48.
The GitHub Blog. 2016. GitHub Security Update: Reused password attack. Retrieved from https://github.com/blog/
2190-github-security-update-reused-password-attack.
Hristo Bojinov, Daniel Sanchez, Paul Reber, Dan Boneh, and Patrick Lincoln. 2012. Neuroscience meets cryptography:
Designing crypto primitives secure against rubber hose attacks. In Proceedings of the 21st USENIX Conference on Security
Symposium. USENIX Association, 33–33.
Joseph Bonneau. 2012. The science of guessing: Analyzing an anonymized corpus of 70 million passwords. In Proceedings
of the IEEE Symposium on Security and Privacy. Retrieved from http://www.jbonneau.com/doc/B12-IEEESP-analyzing_
70M_anonymized_passwords.pdf.
Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, and Mike Williamson. 2015a. Secrets, lies, and account recovery:
Lessons from the use of personal knowledge questions at google. In Proceedings of the 24th International Conference on
World Wide Web. International World Wide Web Conferences Steering Committee, 141–150.
Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano. 2012. The quest to replace passwords: A frame-
work for comparative evaluation of web authentication schemes. In Proceedings of the IEEE Symposium on Security and
Privacy. Retrieved from http://www.jbonneau.com/doc/BHOS12-IEEESP-quest_to_replace_passwords.pdf.
Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano. 2015b. Passwords and the evolution of imperfect
authentication. Commun. ACM (July 2015). Retrieved from http://www.jbonneau.com/doc/BHOS15-CACM-imperfect_
authentication.pdf.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
Using Episodic Memory for User Authentication 11:33
Joseph Bonneau, Mike Just, and Greg Matthews. 2010. What’s in a name? In Financial Cryptography and Data Security.
Springer, 98–113.
N. M. Bradburn, L. J. Rips, and S. K. Shevell. 1987. Answering autobiographical questions: The impact of memory and
inference on surveys. Science 236, 4798 (1987).
Brain Authentication. 2016. http://brainauth.com/testdrive/.
Claude Castelluccia, Markus Dürmuth, and Daniele Perito. 2012. Adaptive password-strength meters from Markov models.
In Proceedings of the Network and Distributed System Security Symposium (NDSS’12).
Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceed-
ings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Vol. 1. 740–750.
Cognitive password. 2016. http://en.wikipedia.org/wiki/Cognitive password/.
Microsoft Corporation. 2015. Sketch-based password authentication. US Patent number 8,024,775.
Sauvik Das, Eiji Hayashi, and Jason I. Hong. 2013. Exploring capturable everyday memory for autobiographical authenti-
cation. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 211–220.
Darren Davis, Fabian Monrose, and Michael K. Reiter. 2004. On user choice in graphical password schemes. In Proceedings
of the USENIX Security Symposium, Vol. 13. 11–11.
Matteo Dell’Amico and Maurizio Filippone. 2015. Monte Carlo strength evaluation: Fast and reliable password checking.
In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 158–169.
Tamara Denning, Kevin Bowers, Marten van Dijk, and Ari Juels. 2011. Exploring implicit memory for painless password
recovery. In Proceedings of the Annual Conference on Human Factors in Computing Systems. ACM, 2615–2618.
Geoff Duncan. 2016. Why haven’t biometrics replaced passwords yet? http://www.digitaltrends.com/android/
can-biometrics-secure-our-digital-lives/.
Hermann Ebbinghaus. 1913. Memory: A Contribution to Experimental Psychology. Number 3. University Microfilms.
Named entity recognition. 2015. Retrieved on October 10, 2014 from https://en.wikipedia.org/wiki/Named-entity_
recognition.
Pluggable Authentication Modules for Linux (PAM). 2015. Retrieved on October 10, 2014 from http://www.linux-pam.org/.
Freebase. 2016. http://www.freebase.com/.
Virgil Griffith and Markus Jakobsson. 2005. Messin’ with texas: Deriving mother’s maiden names using public records. In
Applied Cryptography and Network Security. Springer, 91–103.
NIST Electronic Authentication Guideline. 2006. NIST Special Publication 800-63 Version 1.0. 2.
Ameya Hanamsagar, Simon S. Woo, Chris Kanich, and Jelena Mirkovic. 2018. Leveraging semantic transformation to in-
vestigate password habits and their causes. In Proceedings of the Conference on Human Factors in Computing Systems
(CHI’18). ACM, 570.
Alina Hang, Alexandre De Luca, and Heinrich Hussmann. 2015. I know what you did last week! Do you?: Dynamic security
questions for fallback authentication on smartphones. In Proceedings of the 33rd Annual ACM Conference on Human
Factors in Computing Systems (CHI’15). 1383–1392.
Michael Heilman and Noah A. Smith. 2010. Good question! Statistical ranking for question generation. In Proceedings of the
2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies. Association for Computational Linguistics, 609–617.
Jun Ho Huh, Seongyeol Oh, Hyoungshick Kim, Konstantin Beznosov, Apurva Mohan, and S Raj Rajagopalan. 2015. Sur-
pass: System-initiated user-replaceable passwords. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and
Communications Security. ACM, 170–181.
Google Inc. 2015. Facial Recognition. U.S. Patent number 8,457,367.
Google Inc. 2016. 10,000 Most Common English Words. Retrieved October 10, 2015 from https://github.com/first20hours/
google-10000-english/.
Ian Jermyn, Alain Mayer, Fabian Monrose, Michael K. Reiter, and Aviel D. Rubin. 1999. The design and analysis of graphical
passwords. In Proceedings of the 8th USENIX Security Symposium. Washington DC, 1–14.
Mike Just and David Aspinall. 2009. Personal choice and challenge questions: A security and usability assessment. In
Proceedings of the 5th Symposium on Usable Privacy and Security. ACM, 8.
Patrick Gage Kelley, Saranga Komanduri, Michelle L. Mazurek, Richard Shay, Timothy Vidas, Lujo Bauer, Nicolas Christin,
Lorrie Faith Cranor, and Julio Lopez. 2012. Guess again (and again and again): Measuring password strength by simulat-
ing password-cracking algorithms. In Proceedings of the IEEE Symposium on Security and Privacy (SP’12). IEEE, 523–537.
Elizabeth A. Kensinger. 2007. Negative emotion enhances memory accuracy: Behavioral and neuroimaging evidence. Curr.
Direct. Psychol. Sci. 16, 4 (2007), 213–218.
Saranga Komanduri, Richard Shay, Lorrie Faith Cranor, Cormac Herley, and Stuart Schechter. 2014. Telepathwords: Pre-
venting weak passwords by reading users’ minds. In 23rd USENIX Security Symposium (USENIX Security’14). 591–606.
Cynthia Kuo, Sasha Romanosky, and Lorrie Faith Cranor. 2006. Human selection of mnemonic phrase-based passwords.
In Proceedings of the 2nd Symposium on Usable Privacy and Security. ACM, 67–78.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.
11:34 S. S. Woo et al.
LTH Semantic Role Labeler. 2015. Retrieved October 10, 2014 from http://barbar.cs.lth.se:8081/.
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stan-
ford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Compu-
tational Linguistics: System Demonstrations. 55–60. Retrieved from http://www.aclweb.org/anthology/P/P14/P14-5010
Luke Mastin. 2016. The Human Memory. Retrieved from http://www.human-memory.net/.
Nicholas Micallef and Nalin Asanka Gamagedara Arachchilage. 2017. A gamified approach to improve users? Memorability
of fall-back authentication. Proceedings of the Symposium on Usable Privacy and Security (SOUPS’17).
Mnemonic Guard. 2015a. http://www.mneme.co.jp/english/index.html.
Mnemonic Guard Blog. 2015b. http://mnemonicguard.blogspot.com/.
A. Niedzwienska. 2003. Distortion of autobiographical memories. Appl. Cogn. Psychol. 17, 1 (2003), 81–91.
Ann Nosseir, Richard Connor, and M. D. Dunlop. 2005. Internet authentication based on personal history—A feasibility
test. In Proceedings of Customer Focused Mobile Services Workshop.
THE Corpus of Contemporary American English (COCA). 2015. Retrieved October 10, 2014 from http://corpus.byu.edu/
coca/.
Part of-speech tagging. 2015. Retrieved October 10, 2014 from https://en.wikipedia.org/wiki/Part-of-speech_tagging.
Semantic role labeling. 2015. Retrieved October 10, 2014 from https://en.wikipedia.org/wiki/Semantic_role_labeling.
Stuart Schechter, A. J. Bernheim Brush, and Serge Egelman. 2009. It’s no secret. Measuring the security and reliability of
authentication via ’secret’ questions. In Proceedings of the 30th IEEE Symposium on Security and Privacy. IEEE, 375–390.
Richard Shay, Patrick Gage Kelley, Saranga Komanduri, Michelle L. Mazurek, Blase Ur, Timothy Vidas, Lujo Bauer, Nico-
las Christin, and Lorrie Faith Cranor. 2012. Correct horse battery staple: Exploring the usability of system-assigned
passphrases. In Proceedings of the 8th Symposium on Usable Privacy and Security. ACM, 7.
Richard Shay, Saranga Komanduri, Adam L. Durity, Phillip Seyoung Huh, Michelle L. Mazurek, Sean M. Segreti, Blase Ur,
Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor. 2014. Can long passwords be secure and usable? In Proceedings
of the 32nd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2927–2936.
Anil Somayaji, David Mould, and Carson Brown. 2013. Towards narrative authentication: Or, against boring authentication.
In Proceedings of the Workshop on New Security Paradigms Workshop. ACM, 57–64.
Amazon Mechanical Turk. 2018. https://www.mturk.com/.
Blase Ur, Felicia Alfieri, Maung Aung, Lujo Bauer, Nicolas Christin, Jessica Colnago, Lorrie Cranor, Harold Dixon,
Pardis Emami Naeini, Hana Habib, Noah Johnson, and William Melicher. 2017. Design and evaluation of a data-driven
password meter. In Proceedings of the 35th Annual ACM Conference on Human Factors in Computing Systems (CHI’17).
Rafael Veras, Christopher Collins, and Julie Thorpe. 2014. On the semantic patterns of passwords and their security impact.
In Proceedings of the Network and Distributed System Security Symposium (NDSS’14).
Rick Wash, Emilee Rader, Ruthie Berman, and Zac Wellmer. 2016. Understanding password choices: How frequently entered
passwords are re-used across websites. In Proceedings of the Symposium on Usable Privacy and Security (SOUPS’16).
Wikipedia, The Free Encyclopedia. 2004. Retrieved on February 12, 2016 from https://en.wikipedia.org/wiki/Main_Page.
Simon Woo, Elsi Kaiser, Ron Artstein, and Jelena Mirkovic. 2016a. Life-experience passwords (LEPs). In Proceedings of the
32nd Annual Conference on Computer Security Applications. ACM Press, 113–126.
Simon S. Woo, Zuyao Li, and Jelena Mirkovic. 2016b. Good automatic authentication question generation. In Proceedings
of the 9th International Natural Language Generation Conference. 203.
Simon S. Woo and Jelena Mirkovic. 2016. Improving recall and security of passphrases through use of mnemonics. In
Proceedings of the 10th International Conference on Passwords (Passwords).
Takumi Yamamoto, Atsushi Harada, Takeo Isarida, and Masakatsu Nishigaki. 2008. Improvement of user authentication
using schema of visual memory: Exploitation of “schema of story.” In Proceedings of the 22nd International Conference
on Advanced Information Networking and Applications (AINA’08). IEEE, 40–47.
Xuchen Yao, Emma Tosch, Grace Chen, Elnaz Nouri, Ron Artstein, Anton Leuski, Kenji Sagae, and David Traum. 2012.
Creating conversational characters using question generation tools. Dialog. Discourse 3, 2 (2012), 125–146.
ACM Transactions on Privacy and Security, Vol. 22, No. 2, Article 11. Publication date: March 2019.