Professional Documents
Culture Documents
Activation
Memory 3
Agent
4
You can solve it
as follows:1. xx
Flash 5 2. xx ... ... 6
Memory
(Memory From #T-1)
Response #T
New Memory
Figure 2: The workflow of our proposed Self-Controlled Memory(SCM) system, where numbers 1-6 represent the
sequential process of one iteration with new observation #T.
lem into multiple sub-problems. However, these memory information, refer to section § 3.2) is re-
methods fail to capture the relationships among trieved by executing steps 3 and 4. Otherwise, the
sub-problems. process moves directly to step 5. § 3.3.1 provides a
comprehensive explanation of the control flow of
3 Methodology the memory controller.
The Self-Controlled Memory (SCM) system pro- 3. Memory Retrieval: In this step, we utilize the
posed in this paper aims to give large-scale lan- observation as a query to identify related memories.
guage models (LLMs) the capability to store long- The score ranking of each memory is computed
term memories, allowing them to process lengthy by considering two dimensions: relevance and re-
inputs and retain information after multiple interac- cency. With respect to relevance, we evaluate how
tions with the user. similar the content of the memory is to the observa-
tion. With respect to recency, we consider the time
3.1 System Overview elapsed since the memory was last accessed. Sub-
sequently, we retain the top K-ranked memories.
In this section, we introduce the workflow of our
proposed SCM system. As illustrated in Figure 2, 4. Memory Reorganization: In this step, the con-
our SCM system comprises three modules, includ- troller will determine whether to use the original
ing a language model agent, a memory stream, and memory directly or the summarized memory. If
a memory controller. The three modules work to- summarized memory is chosen, the original mem-
gether to process lengthy documents and provide ory the will be compressed. § 3.3.2 provides a
more accurate and coherent responses. Our system detailed explanation of the state compression pro-
workflow consists of six explicit steps, which are cess. Then, the system will combine the memory
presented as follows: retrieved in a structured manner to serve as back-
1. Input Acquisition: The agent receives an ob- ground information for response generation at this
servation in turn T (i.e., ultra-long document input point.
or a user question), either through direct input or 5. Input Fusion: In this step, we carefully de-
from an external source. sign a prompt that fuses the restructured memory
2. Memory Activation: Based on the current with the present observation to serve as the model’s
observation, the memory controller determines input. A thorough description is given in § 3.4.
whether it is necessary to activate memory for the 6. Response Generation: The model generates a
current user input. In the case where memory acti- response based on previous step result and incorpo-
vation is warranted, relevant memories (for detailed rates the current interaction, including observation
Memory Controller
Given a user command, determine whether executing the
Observation command requires historical or previous information, or
whether it requires recalling the conversation content.
Simply answer yes (A) or no (B) without explaining the
Is memory retrieval necessary?
information:
NO
Y/N Command: [User Input]
Retrieve From Archived Memory
Figure 4: English prompt for the necessity of using
Summary or Full Content? memory.
NO
Y/N Full Contnet
Controller
Summarize Activated Content Given a user command, determine if it can be executed
correctly based solely on the summary historical
Summary information provided. Simply answer yes (A) or no (B),
without explaining the information.
Figure 3: Workflow of the Memory Controller. Figure 5: English prompt for whether or not to use the
summary of memory.
###
Figure 6: Prompt for dialogue memory summarization.
User: [current user input]
Assistant:
recently, reinforcing the idea that the agent’s atten-
tion remains on the states of latest interactions. The
relevance factor assigns a higher score to memory
Figure 7: English Prompt of ultra-long dialogue gener-
items that are related to the current observation. ation.
In our implementation, we created an embedding
vector for the text description of every memory
through the use of a language model5 . The co- 3.4 Response Generation
sine similarity between the embedding vector of As mentioned in the introduction, SCM allows non-
the memory and that of the query observation is dialogue-optimized LLMs to emulate ChatGPT.
calculated to determine relevance. The rank score This necessitates providing explicit instructions to
of each memory is the sum of its recency and rel- prompt LLMs to use activated and flash memory to
evance scores: rank_score = recency_score + generate the desired response. Figure 7 shows an
relevance_score. Depending on the length limit, English prompt intended for extremely long multi-
we designate the top k memories with the highest interaction dialogues, with other language versions
rank scores as activated memories, where k varies included in the § A.2.
between 3 and 10.
If the controller determines that the employment 4 Experiments
of a summary can allow for the normal execution 4.1 Settings
of instructions, then it is necessary to summarize
Our framework is preliminarily evaluated in two
the current extracted memories. The specific in-
scenarios: ultra-long dialogues and ultra-long doc-
formation regarding the memory summarization is
ument summarization. We conduct experiments to
located in the following subsection.
answer three research questions (RQs).
3.3.2 Memory Summarization • RQ1. Can SCM system compete with or even
Memory summarization is a crucial aspect in sce- outperform ChatGPT within a specific token
narios such as document summarization, where a limit?
single interaction or dialogue turn can have a token • RQ2. Can SCM system scale to provide ac-
length exceeding 3000. It enables stacking multiple curate responses to users’ questions, which are
memories into an activated memory section. Fig- related to historical contexts that date back hun-
ure 6 shows the English prompt that is specifically dreds or even thousands of turns?
designed for memory summarization in individual
interactions (i.e., dialogue tasks). In addition, other • RQ3. Can SCM demonstrate generalization to
language versions of the prompt can be found in other scenarios, including long document sum-
§ A.3. marization?
5
We used OpenAI embedding model text-embedding-ada- The following experiment evaluates the perfor-
002 mance of the text-davinci-003 model without dia-
I feel like I’ve gained weight recently and I’m planning to lose some. My goal weight is 65 kilograms.
That's a great goal! I can help you reach your goal weight. What kind of exercise plan do you have in mind?
It sounds like you have a great plan! To help you stay motivated, why don't you set goals for each session?
I’ve got some good news to share - my weight loss has been going well lately and I’m down to 68kg now.
Congratulations! That's great news. What have you been doing to help you reach this goal?
It’s because I’ve recently started enjoying sports like swimming and basketball again.
That's wonderful! It's great that you're finding activities that you enjoy and can stick with.
Yes, you mentioned that you wanted to start running as your first sport. Are you still running?
logue optimization in comparison to the ChatGPT- After ten days, the length of the dialogue reaches
Turbo model. The evaluation focuses on the tasks 10,000 tokens. The user then asks the model “Do
of ultra-long dialogues and ultra-long document you remember my fir”. Our SCM system accurately
summarization. responds to this question.
question-asking methodology during single block Consequently, we restrict the usage of our system
summarization. For instance, if a user poses a to academic research purposes for now.
question such as "Please provide a summary of the
technical details and evaluation findings of GPT4 6 Conclusion and Future Work
in image processing", the model will access prior
In this paper, we propose a Self-Controlled Mem-
summary memories and extract the relevant con-
ory (SCM) system to extend the input length of any
tent. We will continue to improve this aspect in the
LLMs model to an unlimited length and effectively
future.
capture useful information from all historical infor-
5 Limitations and Risks mation. This method does not require any training
or modification of models and has strong applicabil-
Limitations A lack of appropriate datasets for ity. We validated the effectiveness of our method
evaluating the handling of extremely lengthy texts through manual evaluation of the ChatGPT and
has resulted in our model being validated solely the Text-DaVinci-003 model based on our system,
through manual verification. This method, how- demonstrating superior performance in certain as-
ever, is inadequate for evaluating different scenar- pects of long-text scenarios compared to ChatGPT.
ios comprehensively and objectively. Therefore, Our future work will focus on releasing a com-
we aim to construct a specific test set that incorpo- prehensive test set and its manual evaluation crite-
rates various key indicators essential for processing ria, and testing our system on various open-source
long texts in diverse settings. This test set will be models currently available.
accompanied by a manual evaluation standard to
enable a more equitable comparison with relevant
methods. Moreover, we will assess the efficacy References
of our system on more open-source models that
possess single-turn instruction comprehension ca- Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda
Askell, Anna Chen, Nova DasSarma, Dawn Drain,
pability. Stanislav Fort, Deep Ganguli, Tom Henighan,
Nicholas Joseph, Saurav Kadavath, Jackson
Risks Our system has the capability to attach to Kernion, Tom Conerly, Sheer El-Showk, Nelson
any LLMs, which may be prone to factual errors, Elhage, Zac Hatfield-Dodds, Danny Hernandez,
delusions, toxic language, and malicious responses. Tristan Hume, Scott Johnston, Shauna Kravec,
Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Katherine Lee, Zongwei Zhou, Xuezhi Wang, Bren-
Amodei, Tom Brown, Jack Clark, Sam McCandlish, nan Saeta, Mark Diaz, Orhan Firat, Michele Catasta,
Chris Olah, Ben Mann, and Jared Kaplan. 2022. Jason Wei, Kathy Meier-Hellstern, Douglas Eck,
Training a helpful and harmless assistant with Jeff Dean, Slav Petrov, and Noah Fiedel. 2022.
reinforcement learning from human feedback. Palm: Scaling language modeling with pathways.
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Hyung Won Chung, Le Hou, Shayne Longpre, Barret
Longformer: The long-document transformer. Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi
Wang, Mostafa Dehghani, Siddhartha Brahma, Al-
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie bert Webson, Shixiang Shane Gu, Zhuyun Dai,
Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Mirac Suzgun, Xinyun Chen, Aakanksha Chowdh-
Neelakantan, Pranav Shyam, Girish Sastry, Amanda ery, Alex Castro-Ros, Marie Pellat, Kevin Robin-
Askell, Sandhini Agarwal, Ariel Herbert-Voss, son, Dasha Valter, Sharan Narang, Gaurav Mishra,
Gretchen Krueger, Tom Henighan, Rewon Child, Adams Yu, Vincent Zhao, Yanping Huang, Andrew
Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean,
Clemens Winter, Christopher Hesse, Mark Chen, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V.
Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Le, and Jason Wei. 2022. Scaling instruction-
Chess, Jack Clark, Christopher Berner, Sam Mc- finetuned language models.
Candlish, Alec Radford, Ilya Sutskever, and Dario
Amodei. 2020a. Language models are few-shot Chenhe Dong, Yinghui Li, Haifan Gong, Miaoxin
learners. In Advances in Neural Information Pro- Chen, Junxin Li, Ying Shen, and Min Yang. 2023.
cessing Systems 33: Annual Conference on Neu- A survey of natural language generation. ACM Com-
ral Information Processing Systems 2020, NeurIPS put. Surv., 55(8):173:1–173:38.
2020, December 6-12, 2020, virtual.
Mandy Guo, Joshua Ainslie, David Uthus, Santiago
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Ontanon, Jianmo Ni, Yun-Hsuan Sung, and Yinfei
Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Yang. 2022. LongT5: Efficient text-to-text trans-
Neelakantan, Pranav Shyam, Girish Sastry, Amanda former for long sequences. In Findings of the Associ-
Askell, Sandhini Agarwal, Ariel Herbert-Voss, ation for Computational Linguistics: NAACL 2022,
Gretchen Krueger, Tom Henighan, Rewon Child, pages 724–736, Seattle, United States. Association
Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, for Computational Linguistics.
Clemens Winter, Christopher Hesse, Mark Chen,
Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Niklas Muennighoff, Thomas Wang, Lintang Sutawika,
Chess, Jack Clark, Christopher Berner, Sam Mc- Adam Roberts, Stella Biderman, Teven Le Scao,
Candlish, Alec Radford, Ilya Sutskever, and Dario M Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hai-
Amodei. 2020b. Language models are few-shot ley Schoelkopf, Xiangru Tang, Dragomir Radev, Al-
learners. ham Fikri Aji, Khalid Almubarak, Samuel Albanie,
Zaid Alyafeai, Albert Webson, Edward Raff, and
Shuyang Cao and Lu Wang. 2022. HIBRIDS: Atten- Colin Raffel. 2022. Crosslingual generalization
tion with hierarchical biases for structure-aware long through multitask finetuning.
document summarization. In Proceedings of the
60th Annual Meeting of the Association for Compu- OpenAI. 2022. Introducing chatgpt.
tational Linguistics (Volume 1: Long Papers), pages
786–807, Dublin, Ireland. Association for Computa- Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida,
tional Linguistics. Carroll L. Wainwright, Pamela Mishkin, Chong
Zhang, Sandhini Agarwal, Katarina Slama, Alex
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Ray, John Schulman, Jacob Hilton, Fraser Kelton,
Maarten Bosma, Gaurav Mishra, Adam Roberts, Luke Miller, Maddie Simens, Amanda Askell, Pe-
Paul Barham, Hyung Won Chung, Charles Sutton, ter Welinder, Paul Christiano, Jan Leike, and Ryan
Sebastian Gehrmann, Parker Schuh, Kensen Shi, Lowe. 2022. Training language models to follow in-
Sasha Tsvyashchenko, Joshua Maynez, Abhishek structions with human feedback.
Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vin-
odkumar Prabhakaran, Emily Reif, Nan Du, Ben Jason Phang, Yao Zhao, and Peter J. Liu. 2022. Inves-
Hutchinson, Reiner Pope, James Bradbury, Jacob tigating efficiently extending transformers for long
Austin, Michael Isard, Guy Gur-Ari, Pengcheng input summarization.
Yin, Toju Duke, Anselm Levskaya, Sanjay Ghe-
mawat, Sunipa Dev, Henryk Michalewski, Xavier Ofir Press, Noah Smith, and Mike Lewis. 2022. Train
Garcia, Vedant Misra, Kevin Robinson, Liam Fe- short, test long: Attention with linear biases enables
dus, Denny Zhou, Daphne Ippolito, David Luan, input length extrapolation. In International Confer-
Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, ence on Learning Representations.
Ryan Sepassi, David Dohan, Shivani Agrawal, Mark
Omernick, Andrew M. Dai, Thanumalayan Sankara- Alec Radford, Jeff Wu, Rewon Child, David Luan,
narayana Pillai, Marie Pellat, Aitor Lewkowycz, Dario Amodei, and Ilya Sutskever. 2018. Improving
Erica Moreira, Rewon Child, Oleksandr Polozov, language understanding with unsupervised learning.
Alec Radford, Jeff Wu, Rewon Child, David Luan, Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu,
Dario Amodei, and Ilya Sutskever. 2019. Language Adams Wei Yu, Brian Lester, Nan Du, Andrew M.
models are unsupervised multitask learners. Dai, and Quoc V Le. 2022a. Finetuned language
models are zero-shot learners. In International Con-
Colin Raffel, Noam Shazeer, Adam Roberts, Kather- ference on Learning Representations.
ine Lee, Sharan Narang, Michael Matena, Yanqi
Zhou, Wei Li, and Peter J. Liu. 2020. Exploring Jason Wei, Yi Tay, Rishi Bommasani, Colin Raf-
the limits of transfer learning with a unified text-to- fel, Barret Zoph, Sebastian Borgeaud, Dani Yo-
text transformer. Journal of Machine Learning Re- gatama, Maarten Bosma, Denny Zhou, Donald Met-
search, 21(140):1–67. zler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals,
Percy Liang, Jeff Dean, and William Fedus. 2022b.
Xiaozhe Ren, Pingyi Zhou, Xinfan Meng, Xinjing Emergent abilities of large language models. Trans-
Huang, Yadao Wang, Weichao Wang, Pengfei Li, actions on Machine Learning Research. Survey Cer-
Xiaoda Zhang, Alexander Podolskiy, Grigory Arshi- tification.
nov, Andrey Bout, Irina Piontkovskaya, Jiansheng
Wei, Xin Jiang, Teng Su, Qun Liu, and Jun Yao. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten
2023. Pangu-Σ: Towards trillion parameter lan- Bosma, brian ichter, Fei Xia, Ed H. Chi, Quoc V
guage model with sparse heterogeneous computing. Le, and Denny Zhou. 2022c. Chain of thought
prompting elicits reasoning in large language mod-
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec els. In Advances in Neural Information Processing
Radford, and Oleg Klimov. 2017. Proximal policy Systems.
optimization algorithms.
BigScience Workshop, :, Teven Le Scao, Angela Fan,
Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel
Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Hesslow, Roman Castagné, Alexandra Sasha Luc-
Dario Amodei, and Paul F. Christiano. 2020. Learn- cioni, François Yvon, Matthias Gallé, Jonathan Tow,
ing to summarize from human feedback. CoRR, Alexander M. Rush, and etc. 2023. Bloom: A
abs/2009.01325. 176b-parameter open-access multilingual language
model.
Ross Taylor, Marcin Kardas, Guillem Cucurull,
Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Sti-
Andrew Poulton, Viktor Kerkez, and Robert Stojnic. ennon, Ryan Lowe, Jan Leike, and Paul Christiano.
2022. Galactica: A large language model for sci- 2021. Recursively summarizing books with human
ence. feedback.
Romal Thoppilan, Daniel De Freitas, Jamie Hall, Manzil Zaheer, Guru Guruganesh, Avinava Dubey,
Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Joshua Ainslie, Chris Alberti, Santiago Ontanon,
Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang,
YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, and Amr Ahmed. 2021. Big bird: Transformers for
Amin Ghafouri, Marcelo Menegali, Yanping Huang, longer sequences.
Maxim Krikun, Dmitry Lepikhin, James Qin, De-
hao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang,
Roberts, Maarten Bosma, Vincent Zhao, and etc. Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu,
2022. Lamda: Language models for dialog appli- Wendi Zheng, Xiao Xia, Weng Lam Tam, Zix-
cations. uan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen,
Zhiyuan Liu, Peng Zhang, Yuxiao Dong, and Jie
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Tang. 2023. GLM-130b: An open bilingual pre-
Martinet, Marie-Anne Lachaux, Timothée Lacroix, trained model. In The Eleventh International Con-
Baptiste Rozière, Naman Goyal, Eric Hambro, ference on Learning Representations (ICLR).
Faisal Azhar, Aurelien Rodriguez, Armand Joulin,
Edouard Grave, and Guillaume Lample. 2023. Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang,
Llama: Open and efficient foundation language mod- Yi Liao, Zhiwei Wang, Xin Jiang, ZhenZhang
els. Yang, Kaisheng Wang, Xiaoda Zhang, Chen Li,
Ziyan Gong, Yifan Yao, Xinjing Huang, Jun Wang,
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Jianfeng Yu, Qi Guo, Yue Yu, Yan Zhang, Jin
Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Wang, Hengtao Tao, Dasen Yan, Zexuan Yi, Fang
Kaiser, and Illia Polosukhin. 2017. Attention is all Peng, Fangqing Jiang, Han Zhang, Lingfeng Deng,
you need. In Advances in Neural Information Pro- Yehong Zhang, Zhe Lin, Chao Zhang, Shaojie
cessing Systems, volume 30. Curran Associates, Inc. Zhang, Mingyue Guo, Shanzhi Gu, Gaojun Fan,
Yaowei Wang, Xuefeng Jin, Qun Liu, and Yonghong
Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Tian. 2021. Pangu-α: Large-scale autoregres-
Fang, and Hao Ma. 2020. Linformer: Self-attention sive pretrained chinese language models with auto-
with linear complexity. parallel computation.
Susan Zhang, Stephen Roller, Naman Goyal, Mikel
Artetxe, Moya Chen, Shuohui Chen, Christopher De-
wan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mi-
haylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel
Simig, Punit Singh Koura, Anjali Sridhar, Tianlu
Wang, and Luke Zettlemoyer. 2022a. Opt: Open pre-
trained transformer language models.
Yusen Zhang, Ansong Ni, Ziming Mao, Chen Henry
Wu, Chenguang Zhu, Budhaditya Deb, Ahmed
Awadallah, Dragomir Radev, and Rui Zhang. 2022b.
Summn : A multi-stage summarization framework
for long input dialogues and documents. In Proceed-
ings of the 60th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Pa-
pers), pages 1592–1604, Dublin, Ireland. Associa-
tion for Computational Linguistics.
A Prompt List
A.1 Prompt For Memory Controller
给定一个用户指令,判断执行该指令是否需要历史信
息或者上文的信息,或者需要回忆对话内容,只需要 以下是用户和人工智能助手的一段对话,请分
回答是(A)或者否(B),不需要解释信息: 别用一句话写出用户摘要、助手摘要,分段列
出,要求尽可能保留用户问题和助手回答的关
指令:[用户输入] 键信息。
用户:[用户输入]
A.2 Prompt for Dialogue Generation 助手:[系统回复]
摘要:
以下是用户和人工智能助手的对话,请根据历史
对话内容,回答用户当前问题:
Figure 12: Chinese Prompt of ultra-long dialogue sum-
相关历史对话: marization.
[历史轮对话内容]
上一轮对话:
[上一轮对话内容]
###
用户:[用户问题]
助手: