Professional Documents
Culture Documents
Abstract
Large language models (LLMs) have been proven capa-
ble of memorizing their training data, which can be ex-
tracted through specifically designed prompts. As the scale of
datasets continues to grow, privacy risks arising from memo-
rization have attracted increasing attention. Quantifying lan-
guage model memorization helps evaluate potential privacy
risks. However, prior works on quantifying memorization re-
quire access to the precise original data or incur substantial
computational overhead, making it difficult for applications
in real-world language models. To this end, we propose a fine-
grained, entity-level definition to quantify memorization with Figure 1: Privacy leakage of LLM in real-world scenarios.
conditions and metrics closer to real-world scenarios. In ad- Attackers can bypass safety measures and collect sensitive
dition, we also present an approach for efficiently extracting information from LLMs.
sensitive entities from autoregressive language models. We
conduct extensive experiments based on the proposed, prob-
ing language models’ ability to reconstruct sensitive entities
under different settings. We find that language models have pability of emitting memorized sensitive entities helps re-
strong memorization at the entity level and are able to repro- searchers understand the potential privacy risks with LLMs.
duce the training data even with partial leakages. The results Recently, several studies have been conducted to quantify
demonstrate that LLMs not only memorize their training data memorization in language models. Prior attempts compared
but also understand associations between entities. These find-
ings necessitate that trainers of LLMs exercise greater pru-
models trained under different conditions to evaluate lan-
dence regarding model memorization, adopting memoriza- guage model memorization (Zhang et al. 2021; Mireshghal-
tion mitigation techniques to preclude privacy violations. lah et al. 2022), but these approaches incur high computa-
tional costs. Carlini et al. (Carlini et al. 2023) used prefix
prompts to prompt models to complete training set suffixes
Introduction verbatim and quantified memorization based on completion
Pretrained large language models (LLMs) have made sig- accuracy. However, in real-world scenarios, malicious or cu-
nificant breakthroughs in various downstream tasks (Devlin rious users are unlikely to have direct access to the train-
et al. 2019; Raffel et al. 2020; Ouyang et al. 2022; OpenAI ing set for obtaining the original prefixes. Additionally, the
2023). However, researches show that LLMs memorize the information leakage from language models is usually fine-
training data (Carlini et al. 2019, 2021), typically sourced grained. For instance, in the “Grandma Exploit” of Chat-
from crowd-sourced corpora or the Internet. Through crafted GPT, the language model has generated content that poses
prompts, language models can reproduce the training data privacy risks. Among these texts, certain key entities (for
(Huang, Shao, and Chang 2022; Lukas et al. 2023), lead- instance, the keys shown in the figure) constitute sensitive
ing to serious concerns regarding data privacy. As shown in information rather than all contents or verbatim suffixes.
Figure 1, attackers have bypassed the security constraints of Therefore, previous research may not adequately quantify
language models via the “Grandma Exploit” and success- real-world memorization in LLMs.
fully extracted sensitive information such as Windows Keys Apart from the limitations of quantification methodolo-
or Apple IDs from ChatGPT. This suggests that privacy leak- gies, challenges also arise in approaches for extracting mem-
ages in LLMs may be incurred in real-world settings due orized training data from language models. Researchers typ-
to emitting memorized sensitive entities. Quantifying the ically employ original prefixes (Carlini et al. 2023) or care-
memorization in language models and analyzing their ca- fully crafted prompts (Shao et al. 2023; Li et al. 2023) as
input. However, obtaining original prefixes is difficult in
* Corresponding author practical applications of language models. If we forgo the
Figure 2: Comparison of the extraction processes of verbatim memorization and entity memorization. Verbatim memorization
emphasizes the generation of verbatim matched suffixes of training data, effectively serving as perfect continuations of mali-
cious queries. In contrast, the entity memorization extraction process initially extracts entities carrying critical information from
training data and chooses a uniquely identifiable entity set. Then coupled with a soft prompt embedding, to form a malicious
query, expecting the model’s response to contain the required specific entity. This entity-level memorization pattern is more
common in real-world scenarios.
use of prefixes in favor of handcrafted prompts, one must evaluating language model memorization without re-
consider that the structure and order (Shin et al. 2020; Lu quirements for updating original model parameters or
et al. 2022; Gao, Fisch, and Chen 2021; Jiang et al. 2020) of precise data from the training set.
designed textual prompts can significantly influence the re- 2. We present a method to test language model memoriza-
sults. In summary, these challenges make it tricky to probe tion under conditions that closely resemble real-world
memorization across language model families using textual scenarios. More memorized information can be extracted
prompts. by leveraging entity attributes and soft prompts to help
In this paper, we conduct comprehensive testing and eval- understand potential privacy risks.
uation of real-world language models to quantify their mem-
orization. To quantify fine-grained model memorization in a 3. We conduct comprehensive experiments on language
manner that more closely resembles real-world privacy leak- model entity-level memorization. Our analysis delves
ages, we propose a definition for memorization extraction at into the processes involved in activating entity memo-
the entity level. We also introduce an approach for learn- rization in language models and discusses factors that
ing prompts adaptively, which utilizes entity attribute infor- may influence entity memorization.
mation and soft prompts (Li and Liang 2021; Lester, Al-
Rfou, and Constant 2021). This approach enables efficient Related Work
language model memorization emitting. Through these, re-
searchers can quantify and analyze LLMs’ memorization Prompt for LLMs
under conditions closer to real-world applications. Besides, Pre-trained language models store a vast amount of lan-
activating more memorization allows researchers to ascer- guage knowledge. As models continue to scale up in pa-
tain potential hazards arising from language models’ mem- rameters, methods based on fine-tuning incur high computa-
orization. For entity-level specific memorization extraction, tional overhead. In contrast, prompt-based methods run di-
our method can achieve the highest accuracy of 61.2%. In rectly on large language models, achieving results compa-
terms of average accuracy, our method has a three-fold in- rable to fine-tuning. Brown et al. (Brown et al. 2020) ini-
crease in entity extraction rate in a 6B language model com- tially attempted to leverage the language knowledge of the
pared to textual prompts when the data is unique to the pre-trained language model through the prompts. Shin et
dataset. al. (Shin et al. 2020) introduced the approach of searching
To summarize, our contributions are as follows: for the optimal prompt word. The prompt generated through
1. We propose a quantifiable, fine-grained definition for this method enhances the utility. Recognizing that prompts’
primary role is to improve models’ performance on spe-
cific tasks, researchers have begun exploring approaches that 0.10
are not constrained by natural language and use continu- ( I I L F L H Q W 0 H P R U L ] D W L R Q ( [ W U D F W L R Q
ous vectors directly in the embedding space of the model to + D Q G F U D I W H G 3 U R P S W ( [ W U D F W L R Q
prompt language models. This prompt-based approach has 0.08
( Q W L W \ ( [ W U D F W L R Q 5 D W H
yielded many influential extensions, such as p-tuning (Liu
et al. 2021b), prompt-tuning (Lester, Al-Rfou, and Constant 0.06
2021), prefix-tuning (Li and Liang 2021), etc. These meth-
ods have proposed effective solutions for improving model 0.04
performance.
Figure 4: The trend of entity extraction rate for prefix lengths across different model sizes. When the model’s parameter size
expands to a certain scale, EME demonstrates substantial advantages in entity extraction rates compared to baseline methods.
the language model, leaving significant room for improve- use the method of handcrafted textual prompts as a base-
ment. Additionally, designing prompts individually for mul- line. The results are shown in Figure 3. As models scale up,
tiple models is laborious and cannot guarantee optimality. their ability to associate entities and emit memorization im-
These challenges obstruct efficient evaluation of universal prove, revealing a more pronounced tendency to emit memo-
language model memorization. Even across models within rization. Nevertheless, By utilizing entity attributes and soft
the same family, crafting universal textual prompts for mem- prompts, the language model’s memorization can be more
orization activation is challenging. effectively activated at the entity level. Our approach gener-
Soft prompt extends prompts into continuous space, pro- ates markedly superior performance over textual prompts on
viding continuous prompt vectors for the input to the lan- larger models. For instance, on GPT-Neo 6B, our approach
guage model. It demonstrates similar workflows across di- demonstrates a three-fold increase in memorization extrac-
verse models and excellent performance in downstream tion efficiency compared to textual prompts.
tasks. Inspired by soft prompt, we explore applying soft
prompts to activate memorization in models. These contin- Experiments and Results
uous prompts are typically learned through approaches like
prompt tuning. Since the optimal memorization extraction Experimental Setup
prompt is determined by itself, this approach enables effi-
The GPT-Neo model family (Black et al. 2021) includes a set
cient memorization activation while maintaining strong gen-
of causal language models (CLMs) that are trained on “The
eralization across diverse models.
Pile” datasets (Gao et al. 2020) and available in four sizes,
We introduce Efficient Memorization Extraction (EME) with 125 million, 1.3 billion, 2.7 billion, and 6 billion pa-
combines effective memorization content filtering and mem- rameters respectively. In our experimental setup, we use the
orization activation. Specifically, EME compresses textual greedy decoding strategy by default to generate the output
content into entity attribute annotations and combines it with with the minimum perplexity (PPL), which is then utilized
soft prompts as inputs. Through this workflow, testing mem- for evaluating the model’s entity memorization capabilities.
orization across different language models follows an iden- The Enron email dataset (Klimt and Yang 2004), a sub-
tical process, enabling the examination of diverse models’ set of “The Pile” dataset, encompasses over 500,000 emails
memorization capacities. We explore applying soft prompts from approximately 150 users of the Enron Corporation.
for entity-level memorization activation, with experiments Owing to this dataset’s inherent structured data character-
validating fine-grained memorized information extraction. istics, it is conducive to extracting entities from the data,
Soft prompt encompasses a wide variety of techniques. facilitating entity-level memorization extraction experimen-
When without a specific explanation, the default method tation.
used in this work is prefix tuning (Li and Liang 2021).
For data preprocessing, we extract entities from the En-
The lower section of Figure 2 illustrates our approach to ron dataset and tag their attributes. “Date” and “Content”
implementing a universal entity-level evaluation of language are retained, while “X-From” and “X-To” are relabeled as
model memorization. For documents that experience partial “Sender” and “Recipient” attributes for that data instance.
leakage from the language model’s training set, we extract The attribute labels “X-From” and “X-To” were modified
key entities that uniquely identify this document. Following to be more interpretable for models. These four columns
this, we construct preliminary prompts based on the entity uniquely identify each data instance in the Enron dataset.
attribute tagging of these entities. Finally, soft prompts are We select the target entity column as “excepted entity” to
incorporated into the textual prompt vector embeddings to recover and then construct prompts from other column en-
obtain complete prompts as inputs for the model. tities. To ensure the model searches its memorization rather
We construct textual prompts with entities from training than deducing answers directly from prompts, we exclude
data to prompt the model to generate expected entities and data where the constructed prompt contains the target entity.
the prefix length ranging from 5 to 10, which tends to opti-
0.12 mally activate the model’s memorization. While excessively
S W X Q L Q J long prefixes, such as 100, compel the model to prioritize
S U R P S W W X Q L Q J
0.10 S U H I L [ W X Q L Q J
learning the relationships between prompted entities rather
than activating memorization. Thus, we conjecture that this
D F W L Y D W H H Q V H P E O H