Professional Documents
Culture Documents
Bein Niv Schemas - RL - Clean - Psyrxiv - 092023
Bein Niv Schemas - RL - Clean - Psyrxiv - 092023
Affiliations
1
Princeton Neuroscience Institute and 2Psychology Department,
Princeton University, Princeton, NJ, 08540, USA.
*Correspondence: oded.bein@princeton.edu
Key words:
Schema, reinforcement learning, orbitofrontal cortex, medial prefrontal cortex, learning,
memory
1
Abstract
Schemas are rich and complex knowledge structures about the typical unfolding of events in a
context. For example, a schema of a lovely dinner at a restaurant. Schemas are central in
psychology and neuroscience. Here, we suggest that reinforcement learning (RL), a
computational theory of learning the structure of the world and relevant goal-oriented
behavior, underlies schema learning. We synthesize literature about schemas and RL to offer
that three RL principles might govern the learning of schemas: learning via prediction errors,
constructing hierarchical knowledge using hierarchical RL, and dimensionality reduction
through learning a simplified and abstract representation of the world. We then offer that the
orbito-medial prefrontal cortex is involved in both schemas and RL due to its involvement in
dimensionality reduction and in guiding memory reactivation through interactions with
posterior brain regions. Finally, we hypothesize that the amount of dimensionality reduction
might underlie gradients of involvement along the ventral-dorsal and posterior-anterior axes of
the orbito-medial prefrontal cortex. More specific and detailed representations might engage
the ventral and posterior parts, while abstraction might shift representations toward the dorsal
and anterior parts of the medial prefrontal cortex.
2
Introduction
Imagine entering a restaurant. An influx of knowledge informs you about the likely sequence of
occurrences and the relevant set of behaviors. You know you will be seated at a table and given
a menu. You will then read the menu and choose your food. After placing your order, you will
receive a delicious meal, and maybe a glass of fine wine to go along. This delightful dinner will
be followed by paying the bill and leaving the restaurant. The general knowledge of what
typically occurs in an event and in what order, as well as the appropriate behavior, is referred
to as the “schema” of the event (Alba & Hasher, 1983; Bartlett, 1932; Ghosh & Gilboa, 2014;
Schank & Abelson, 1977). While schemas are widely used in psychology and, more recently, in
neuroscience, they also remain notoriously elusive and ill-defined (Ghosh & Gilboa, 2014;
Gilboa & Marlatte, 2017). Importantly, we still lack a satisfying computational account of how
schemas are learned, how they guide behavior, and how they influence perception, attention,
learning, and memory.
Reinforcement learning (RL) offers a computational theory of how we learn the
structure of our environment and the relevant behaviors through experience (Sutton & Barto,
2018). RL algorithms have been powerful in accounting for behavioral and neural findings in
simplified environments (Dayan & Niv, 2008; Schultz et al., 1997). However, these algorithms
suffer from a “curse of dimensionality:” they scale poorly to rich high-dimensional everyday life
events (Niv, 2019; Sutton & Barto, 2018). Thus, RL and schemas can be thought of as two ends
of a spectrum: at the one end, highly rich and ecological knowledge structures that lack a
satisfying computational account. At the other end, a detailed account that is dissatisfying in
the complexity of the phenomena it explains. Can we bring these seemingly disparate fields of
research together, and are they indeed so different?
The goal of this opinion paper is to answer this question. We are motivated by recent
neuroscientific research that has emphasized the importance of the medial prefrontal cortex
(mPFC) and orbitofrontal cortex (OFC) for both schemas and RL. Within the RL framework, the
medial orbitofrontal cortex and ventral part of the medial prefrontal cortex (mOFC/vmPFC) are
3
thought to represent a “cognitive map” of the current task1 (Chan et al., 2016; Klein-Flügge et
al., 2022; Schuck et al., 2016; Wilson et al., 2014; Zhou et al., 2020). In RL, this map partitions
the task into specific contexts or time points termed “states,” each state including information
relevant to guiding behavior in that context. In parallel, memory research on schemas suggests
that the mPFC (which includes the mOFC/vmPFC) represents schemas and mediates the
influence of schemas on memory (e.g., (Baldassano et al., 2018; Bonasia et al., 2018; Gilboa &
Marlatte, 2017; Giuliano et al., 2021; van Kesteren et al., 2012; Varga et al., 2022)). This co-
localization prompted us to explore common mechanisms that might underly RL and schemas.
Below, we synthesize selected research on schemas and RL to propose that RL and
complementary algorithms may provide a computational framework for learning schemas. We
focus on three core computational principles that could underly learning schemas: (1) learning
a summary of the environment through prediction errors, (2) grouping of states through
hierarchical RL, and (3) dimensionality reduction through learning of abstract state
representations. We begin with a brief description of schemas and the above three aspects of
RL and show how schemas and RL mechanisms are related. We then postulate that the mPFC is
involved in both RL and schemas as it mediates dimensionality reduction and guides memory
retrieval through communicating with more posterior brain regions. We conclude by
postulating that graded recruitment along the ventral-dorsal and anterior-posterior axes of the
mPFC might reflect the amount of dimensionality reduction required in a current situation.
1
Another prominent view within the RL literature proposes that the mOFC/vmPFC represents economic value that
guides decisions (Levy & Glimcher, 2012; Padoa-Schioppa & Conen, 2017); for additional theories about these
brain areas, see e.g., (Delgado et al., 2016).
4
include knowledge of the temporal structure of an event.
Critically, schemas entail knowledge of commonalities extracted over multiple
experiences (Ghosh & Gilboa, 2014). Our schema of a restaurant includes the knowledge that
typically, we order, wait, and then receive our food. This knowledge was extracted from
multiple visits to restaurants in which this sequence occurred. The flip side of including
knowledge of the repeated structure of the world is that schemas are, by definition, devoid of
details of specific episodes. That we ordered salmon on one specific visit is not likely to become
a part of our schema of a restaurant, because it does not occur typically enough (though if we
usually order pasta, that may be part of our schema of eating out).
Schemas can guide behavior as they include knowledge of context-appropriate actions.
For example, the knowledge that upon receiving a menu, one should read it and place an order.
Finally, schemas can be described as hierarchically organized “modules” that can be shared
across schemas: both the restaurant schema and that of having dinner at home can include a
module of sitting at the table and eating. Additionally, the schema of an airport can include a
restaurant as a module because restaurants are found in airports. Thus, schemas can be a part
of other schemas, as well as include other schemas.
Despite decades of research on the influence of schemas on cognition (Alba & Hasher,
1983; Bartlett, 1932; Gilboa & Marlatte, 2017; Piaget, 1952), it is not completely clear how
schemas are learned and how they influence perception, action, learning, and memory (Elman
& McRae, 2019; Franklin et al., 2020; Varga et al., 2022). Computational models of semantic
networks, as well as models of concepts and category learning (A. M. Collins & Loftus, 1975; A.
M. Collins & Quillian, 1969; Kenett et al., 2017; Kumar, 2021; Love et al., 2004; McClelland,
McNaughton, & Oreilly, 1995; Murphy, 2004; Rumelhart & Ortony, 1977), characterize some
aspects of extracting general knowledge about entities and their co-occurrence, as well as the
hierarchical structure of conceptual knowledge, but do not seem to capture fully the scope and
richness of schemas. For example, the fact that schemas are characteristically learned through
experience that involves having goals, taking actions toward these goals, and learning from the
outcomes of said actions. Experience is also dynamic in time and involves temporal
information. To incorporate these aspects into a theory of schemas, we turn to the framework
5
of RL.
6
predicted one (Niv & Schoenbaum, 2008; Rescorla, 1988; Rescorla & Wagner, 1972). Prediction
errors include both reward prediction errors, which refer to obtaining more or less reward than
expected, and state prediction errors, which refer to transitioning to a different state than
expected (with expectations – in the form of values or transition probabilities – obtained from
past learning). When encountering a prediction error, the agent adaptively updates their
expectations so that these expectations align better with the observed outcome. In this way,
through experience, the agent can learn a world model, which includes the representation of
states, transitions between them, and the distribution of rewards in each state (this is termed
“model-based RL”; Daw et al., 2005). The agent can then mentally simulate actions within the
learned world model to determine which action is best in what situation. Alternatively, in
“model-free RL,” the agent can learn the optimal policy directly from trial and error using
reward prediction errors, without learning a world model.
From this description, it is already clear how schemas might be mapped to a
representation of a task in model-based RL, including the world model and the policy. In what
follows, we unpack that mapping, focusing on three aspects: learning through prediction errors,
hierarchical organization of task representations, and the development of an efficient state
representation through dimensionality reduction (Figure 1).
7
Figure 1. Three reinforcement learning principles contribute to schema learning. Circles represent states/time points
in the schema or the episode, with different shades and colors representing different features of states. Top:
Prediction errors, namely, the difference between the schema-based predictions (blue) and the evidence from a
specific episode (orange), drive schema update (red). This eventually converges to the typical unfolding of events.
The episode and schemas to be updated are selected through latent cause inference, illustrated by the green circles
‘selected’ from the grey ones. Middle: dimensionality reduction, implemented via schema-guided attention,
mediates the elimination of episodic details that differ in each episodes (symbols dimension), and the inclusion of
goal-relevant information as well as repeating, but potentially goal-irrelevant, information (purple dimension).
Bottom: schemas’ hierarchical structures are learned via identifying subgoals that chunk sub-schemas.
8
Are schemas learned through prediction errors?
Since RL algorithms use prediction-error driven learning, the first question we ask is whether
schemas are also learned and updated via prediction errors (Fig. 1, top). Before diving into
studies supporting this view and outlining outstanding questions, it is useful to mention briefly
an additional hypothesis: that a summary of the typical and repeating structure of the world is
learned by tracking the frequency of occurrences (termed “unsupervised learning”; e.g.,
(Kumar, 2021)). In the frequency hypothesis, learning does not require a prediction and an
update following an error – each experience leads to an update of associations between
contiguous events.
In animal learning, the discovery of the phenomenon of “blocking” (Kamin, 1968, 1969)
led theorists to shift from assuming that contiguity is sufficient for associative learning to
considering prediction errors as the main force driving learning. In blocking, a stimulus
previously associated with a motivationally relevant outcome (e.g., an electric shock or food)
prevents a co-occurring neutral stimulus from also becoming associated with the same
outcome. The idea is that because the first stimulus fully predicts the outcome, there is no
prediction error when the outcome occurs, and thus learning about the association with the
newly added stimulus is “blocked” (Rescorla, 1988; Rescorla & Wagner, 1972). In humans, a
wealth of research shows that reward prediction errors drive learning (d’Acremont et al., 2013;
Niv & Langdon, 2016; Niv & Schoenbaum, 2008) and facilitate long-term memory (Ergo et al.,
2020; Greve et al., 2019; Rouhani et al., 2018; Rouhani & Niv, 2021).
To establish that prediction errors drive schema learning, one can modify transition
probabilities between states and test whether the resulting state prediction errors lead to
updating of that model of the world (the schema) and to changes in behavior. Recent work
hints at this process. In rodents, Sharpe and colleagues demonstrated blocking of learning of
simple stimulus-stimulus associations (without rewards or otherwise motivationally relevant
stimuli), showing that learning such “neutral” associations is also driven by prediction errors
(Sharpe et al., 2017, 2020). Computational models that learn via state prediction errors have
been shown to explain choice data in studies that involved frequent changes (reversals) of
either transition probabilities or the state structure in humans (Momennejad et al., 2017; Sharp
et al., 2021, 2022) and animals (Akam et al., 2021; Bartolo & Averbeck, 2020). Other studies
9
involved extensive training of participants on state transitions, and found enhanced memory of
items that violated these transitions ((Bein et al., 2021; Greve et al., 2017; Kafkas & Montaldi,
2018) for reviews, see: (Henson & Gagnepain, 2010; Quent et al., 2021)). Additionally, memory
was reduced for items that cued a future state that was (surprisingly) not transitioned to (G.
Kim et al., 2014; H. Kim et al., 2020). Enhanced memory for violations together with reduced
memory for items that elicited predictions that were violated is consistent with updating a
model of the world. These studies, which focused on learning over a few trials or sessions,
provide evidence that initial learning of schemas might be driven by state prediction errors.
An interesting question is whether updating of established and consolidated schemas
uses the same mechanism. Compared to newly learned knowledge, consolidated and well-
learned knowledge is thought to be (1) stable and less amenable to change, (2) largely
supported by cortical structures (whereas newly-acquired knowledge is supported by the
hippocampus, see Box 1), and (3) more abstract and including fewer specific episodic details
(Antony & Schapiro, 2019; Frankland & Bontempi, 2005; Gilboa & Moscovitch, 2021;
Moscovitch et al., 2016; Squire & Alvarez, 1995). We are not aware of studies that directly
tested whether changes to established models of the world are learned and updated via state
prediction errors.
Another aspect of the prediction-error account of schema learning is that it requires
having predictions. Otherwise, there is no error. In contrast, the frequency account does not
require prediction prior to the update. Behavioral studies have linked prediction strength to
memory for violations, providing evidence that memory enhancement is indeed related to
forming predictions (Bein et al., 2021; Greve et al., 2017). Further, explicitly generating a
prediction before being provided with a piece of violating information boosted memory for that
information (Brod et al., 2018, 2020). fMRI studies additionally showed that better predictions,
defined as higher accuracy of decoding a predicted category from multivoxel BOLD activity
patterns, correlated with better memory for violations of those predictions ((H. Kim et al.,
2020); see also (Long et al., 2016) for related findings).
By showing that prediction errors facilitate learning and memory of the structure of the
environment, the studies surveyed here lay the ground for our proposal that prediction errors
10
facilitate learning and updating of schemas. However, these studies were mostly done in
simplified environments that trained participants only on one or few associations. It is not clear
that their findings generalize to updating of complex and well-learned associative structures
that are characteristic of schemas, as work in humans shows that complex semantic knowledge
can both impair and enhance learning and memory of new associations (Anderson & Reder,
1999; Bein et al., 2019; Bellana et al., 2021; Gasser & Davachi, 2022; Reder et al., 2007;
Tompary & Thompson-Schill, 2021).
One example of this complexity is that in everyday life, cues and outcomes are not as
clearly defined as in many of the studies mentioned here, but rather dynamically evolve in time,
and span multiple temporal scales (Brunec & Momennejad, 2022; Gravina & Sederberg, 2017;
Lee et al., 2021). In one study that extends ideas about the effects of prediction errors on
memory updating to such a scenario, Sinclair and colleagues (2018) used rich movie-clip stimuli
to elicit predictions of action outcomes that had been learned over a lifetime of experience, for
example, a baseball batter hitting a home run. They then violated these predictions by stopping
the movie before the expected outcome, essentially omitting the outcome. Participants were
then presented with another, semantically related movie clip. In a subsequent memory test of
the original clips, the researchers found that participants recalled details from the related clips
as if they were in the original clip, and this happened more frequently if the original clip was
stopped as compared to when it was played fully, suggesting that stopping the clips leads to
more memory intrusions from related clips (Sinclair et al., 2021; Sinclair & Barense, 2018).
These intrusions might reflect memory update of the original movie clips, that was enhanced by
violations of everyday life expectations (see also (Antony et al., 2020)). Note, however, that in
this study clips were stopped to elicit prediction errors. In everyday life, experience never
stops. Rather, surprise occurs when events unfold in an unexpected way. In sum, while many
questions remain open, emerging literature suggests that schemas might indeed be learned and
updated via prediction errors, similar to ideas about learning in RL.
Schema hierarchies might be learned, and instantiated, via hierarchical RL and latent cause
inference mechanisms
11
As mentioned, schemas are hierarchically organized: each schema can be composed of
subschemas and might be a subschema of another, larger schema. Hierarchical RL algorithms
(Badre, 2008; Barto & Mahadevan, 2003; Botvinick, 2008, 2012; Botvinick et al., 2009; A. G. E.
Collins, 2018; Correa et al., 2022; Tomov et al., 2020) might provide a blueprint for how such a
schema hierarchy is acquired (Fig. 1, bottom). In RL, hierarchical RL alleviates the scaling
problem (i.e., that learning via RL algorithms becomes prohibitively slow in complex
environments) by grouping together states and actions into larger units in a process termed
“temporal abstraction” (Barto & Mahadevan, 2003; Sutton et al., 1999). In temporal
abstraction, a temporally extended task is divided into different subunits, called “subtasks”. A
subtask is defined by states that are possible start states, a policy that includes a sequence of
actions and is sometimes called an “option”, the transitional probabilities between states, and a
set of termination states (also called “subgoals”) in which the subtask will terminate and cede
control back to the overarching model of the entire task (Botvinick et al., 2009; Correa et al.,
2022; Solway et al., 2014; Sutton et al., 1999; Tomov et al., 2020). For example, ‘ordering food’
can be a subtask in a restaurant that starts upon receiving a menu, continues with a sequence
of actions including reading the menu, deciding on a dish, waiting for the server, and conveying
your choices to them, and ends when the subgoal is reached: the server successfully obtained
your request. The term “subgoal” is used to distinguish the termination state of the subtask –
having conveyed your choices to a staff member – from the final goal of the larger task at hand
(having a full stomach). In some hierarchical RL algorithms, reaching a subgoal leads to
obtaining a pseudo-reward signal (Botvinick et al., 2009; Diuk et al., 2013; Ribas-Fernandes et
al., 2011). This signal allows a standard reward-maximizing algorithm to discover the optimal
policy for the subtask using standard RL mechanisms.
An additional important feature of hierarchical organization is that subtasks can be
shared across tasks – the subtask of ‘adding salt,’ which includes some starting state, a series of
basic actions, and a termination state, can be shared by both ‘dining at a restaurant’ and
‘having dinner at home’ task. Thus, we offer that such temporal abstraction may play a role in
how simple sequences of states and actions are grouped into sub-schemas, that are in turn
grouped into a sequence comprising a larger schema.
12
An important question in hierarchical RL is: how are subgoals selected? In terms of
schemas, this will address how continuous experience is segmented into discrete event
schemas (Zacks, 2020b). Hierarchical RL offers more than one mechanism (Botvinick et al.,
2009; Correa et al., 2022; Solway et al., 2014; Tomov et al., 2020). Some ideas draw on
statistical learning literature, which has established that as early as infancy, we learn to identify
frequently repeating patterns, including hierarchical structures in streams of perceptual input
(Frost et al., 2015; Saffran et al., 1996; Saffran & Wilson, 2003; Schapiro & Turk-Browne, 2015;
Turk-Browne et al., 2005). Thus, an agent can explore an environment, keeping track of
sequences of states and actions that co-occur frequently and identify the transitions between
sequences. These transitions are good candidates for becoming subgoals. For example, a state
initializing an ‘adding salt’ sequence would lead to a series of states and actions, which upon
completion might transition to other sequences, depending on whether we are cooking a
chicken dish or seasoning a salad. The end state of ‘salt added’ could turn into a subgoal,
towards the final goal of having a delicious dinner (Balaguer et al., 2016; Barto & Mahadevan,
2003; Botvinick, 2012). Other algorithms introduce Bayesian inference to maximize the
discovery of optimal hierarchies given the structure of the environment (Solway et al., 2014)
while accounting for similarity in the reward probabilities across states and changes in task
demands (Tomov et al., 2020), or the cost of planning (Correa et al., 2022). These algorithms
rely on repeated experience to construct a hierarchical model of the world.
Salient events can act as subgoals, and also cause event segmentation.
Another idea is that salient events trigger the creation of a subgoal. Salient events
create an intrinsic reward signal and engage motivation-related neural systems, much like
reward (Bunzeck et al., 2010; Bunzeck & Düzel, 2006; E. T. Cowan et al., 2021; Düzel et al.,
2010; Kamiński et al., 2018; Murty & Adcock, 2014; Wittmann et al., 2007). Therefore, salient
events are suitable for becoming subgoals. Research on event segmentation that focuses on
how ongoing and continuous experience is chunked into discrete events (Clewett et al., 2019;
Shin & DuBrow, 2021; Zacks, 2020a) has shown that saliant changes, termed “event
boundaries,” cause humans to segment their experiences, both during perception and as they
are stored in memory. For example, events that span a boundary are remembered as
13
happening further apart in time from each other, and memory of their temporal order is often
impaired compared to that of events not separated by a boundary (Clewett et al., 2020;
DuBrow & Davachi, 2013; Ezzyat & Davachi, 2014). This suggests that event boundaries, like
subgoals, structure our experiences into discrete, segmented units. Interestingly, this role of
structuring of memories has been shown for reward prediction errors as well (Rouhani et al.,
2020), consistent with the notion that salient prediction errors can act as subgoals.
Of note, a mechanism that relies on salient changes to create subgoals does not require
repetition (i.e., statistical learning). A change of context, perceptual details, or internal state,
can trigger segmentation in the first instance (Clewett et al., 2019; DuBrow et al., 2017; DuBrow
& Davachi, 2013; Shin & DuBrow, 2021). This discrete event representation can form a base
that future instances will join to build an event schema. This proposal resonates with recent
behavioral work in humans suggesting that schema memories can be created rapidly (Tompary
et al., 2020); see Box 1 for the importance of initial learning). Such rapid extraction of structure
can facilitate goal-oriented learning and behavior in new situations (A. G. E. Collins & Frank,
2013, 2016; Eckstein & Collins, 2020), with additional learning following more experience
refining the initial structure to construct the full event schema (Davachi & DuBrow, 2015).
Latent-cause inference might be the computation by which salient changes might trigger
the instantiation of an existing schema or subschema or the initiation of a new (sub)schema.
Latent-cause inference is a computational theory of how observations in the world are grouped
according to similarity, with each group representing a set of observations that are presumed
to stem from similar sources, namely, from a single “latent cause” (Franklin et al., 2020;
Gershman & Niv, 2010; Niv, 2019). According to this framework, the current latent cause can be
inferred by combining prior knowledge of the probability of various latent causes with evidence
from the current external environment using Bayesian inference. When current external
evidence is different (above some threshold) from all past knowledge, a new latent cause is
created (Gershman, 2015; Gershman et al., 2017)). For example, upon entering a restaurant,
the myriad of perceptual cues (tables set up for dining, customers enjoying their meal, etc.)
may lead to the inference that the current relevant schema is that of a restaurant. Here,
external cues are sufficient to infer the correct cause. However, when going to a ballet
14
performance for the first time, it might be adaptive to initiate a new latent cause because the
sequence of unfolding events and the relevant behaviors might differ from other familiar
events, for which we have an existing schema. Inferring a new latent cause would support
learning a new state-transition model and submodels and their respective policies. Recent
theoretical work has begun to explore how salient changes such as event boundaries might
trigger the inference of a new latent cause (Franklin et al., 2020; Shin & DuBrow, 2021) and how
that invokes the instantiation of a relevant event schema (Franklin et al., 2020). In sum, we
offer that latent-cause inference can be used to instantiate an existing schema, or to initiate a
new schema (or subschema). The content of this schema is the model and policy, where the
model can include submodels terminating in subgoals that are used to train optimal subpolicies
through RL mechanisms.
15
2020) in a single lab session.
The hippocampus, known to be involved in rapid learning (Eichenbaum, 2004;
McClelland, McNaughton, & O’Reilly, 1995; Moscovitch et al., 2016; Norman & O’Reilly, 2003),
and event segmentation (Ben-Yakov & Dudai, 2011; Ben-Yakov & Henson, 2018; Clewett et al.,
2019), also mediates learning the structure of the environment (Farzanfar et al., 2022; Kumaran
et al., 2009; McKenzie et al., 2014; Schapiro et al., 2014, 2016; Shohamy & Turk-Browne, 2013;
Theves et al., 2019, 2021). Thus, converging behavioral and neural evidence suggests that rapid
initial learning largely shapes our schemas. Another, not mutually exclusive, theoretical
proposal for the role of the hippocampus in learning structure attributes relatively slow
learning to one hippocampal pathway (entorhinal cortex-subregion CA1) that might underly the
extraction of regularities and learning structure, while rapid learning is mediated by another
hippocampal pathway (entorhinal cortex-subregions CA3-dentate gyrus-CA1; (Schapiro et al.,
2017)). The proximity of these hippocampal pathways, and the convergence of both in the
same hippocampal subregion (CA1), makes the interaction between the pathways a candidate
mechanism by which initial learning can quickly sketch a schema, which in turn influences how
further slow learning refines that schema.
----- end of Box 1-----
16
are down weighted?
In RL, an optimal representation of a state does not include all the dimensions of the
environment, but rather only goal-relevant information (Langdon et al., 2019; Schuck et al.,
2018; Sutton & Barto, 2018). The process by which an agent learns what dimensions of the
environment are important to a given task has been termed “representation learning” (Niv,
2019), and often involves dimensionality reduction. Through experience, we learn what
dimensions of our environment are relevant to our goals and therefore should be attended to,
and what dimensions are irrelevant and thus can be ignored. For example, when choosing a
meal at a restaurant we should attend to the prices of dishes, but we can ignore the color of
tablecloths. Research has shown that learning the relevant (i.e., reward-predicting) dimensions
of a state guides attention to these dimensions, which in turn prioritizes learning predictions
associated with these dimensions (Daniel et al., 2020; Farashahi et al., 2017; Leong, Radulescu,
Daniel, Dewoskin, et al., 2017; Niv, 2019; Niv et al., 2015; Radulescu et al., 2016, 2019). These
studies suggest that goal relevance and selective attention might mediate dimensionality
reduction during schema learning.
However, repetition of features might prioritize learning even in goal-irrelevant
dimensions. Including goal-irrelevant but regular information in our schema representations
might be adaptive because it allows flexible behavior when the world changes (Ghosh & Gilboa,
2014; Tolman, 1948). For example, in restaurants, the cashier is typically close to the bar.
Although this is mostly goal irrelevant because we pay with a server, it might become useful if
ever we are asked to pay at the cashier. Vast literature shows that indeed, we learn regularities,
namely, features of the environment that are statistically reliable, even if goal irrelevant (e.g.,
reviews: (Frost et al., 2015; Schapiro & Turk-Browne, 2015; Sherman et al., 2020). The question
we focus on here is whether and how these dimensions are prioritized during schema learning.
Studies suggest that attention is drawn towards repeated but goal-irrelevant
information, providing initial evidence for prioritization during schema learning. For example,
participants are faster to identify a target stimulus if it appears in a location where, on other
trials, regularities existed in a stream of symbols (Zhao et al., 2013). Similarly, items that are
semantically congruent and encountered repeatedly in daily life (e.g., restaurant-menu)
17
typically show enhanced processing (reduced reaction times and increased accuracy) compared
to incongruent pairs that are rarely encountered (e.g., spinach-train; (Bein et al., 2015; Gronau,
2021; van Kesteren et al., 2013). This is true even if congruency is task irrelevant (Bein et al.,
2015) and pairs are presented only briefly (Gronau, 2021; Gronau et al., 2008). Task-irrelevant
congruency also enhances long-term memory (Gronau & Shachar, 2015). Other studies showed
the prediction of a feature in a goal-irrelevant but repeating dimension (i.e., prediction of the
category of the next item in a stream of items, estimated using classification of multivoxel BOLD
activity patterns) comes at the expense of later memory of the unique details (the specific item
image; (G. Kim et al., 2014; H. Kim et al., 2020; Sherman & Turk-Browne, 2020). Together, these
findings suggest that attentional mechanisms might prioritize the learning of repeating but
goal-irrelevant information, potentially while down weighting episodic details. Future research
could specify how this unfolds during schema learning, and how mechanisms underlying
dimensionality reduction might differ based on goal relevancy.
18
may be important in schemas are not symmetric: menus are found in restaurants, but
restaurants are not found in menus. Thus, hierarchical information cannot be mapped to a
distance measure. Interestingly, some frameworks propose that even spatial navigation might
rely on strategies or computations that are not based solely on distance (e.g., (Brunec et al.,
2018; Ekstrom et al., 2020; Farzanfar et al., 2022; Peer et al., 2021; Warren, 2019).
Orbito-medial PFC involvement in schemas and states: dimensionality reduction and the
guidance of memory reactivation in posterior brain regions.
There is a wide agreement that the orbitofrontal (OFC) and medial prefrontal cortex (mPFC) are
involved in both RL and schema-related processes. However, the functions these regions play
are a topic of intense debate (Delgado et al., 2016; Frankland & Bontempi, 2005; Gilboa &
Marlatte, 2017; Klein-Flügge et al., 2022; Padoa-Schioppa & Conen, 2017; Stalnaker et al., 2015;
Varga et al., 2022; Wilson et al., 2014). In this part, we focus on the medial part of the OFC and
ventromedial PFC (mOFC/vmPFC) and the mid-mPFC (the area dorsal to the mOFC/vmPFC on
the medial wall, but ventral to the most dorsal part of the mPFC, which is more traditionally
thought of as dorsal mPFC), using “mPFC” to collectively refer to these areas. We first
summarize findings that suggest that these regions represent both states in RL and schemas.
Then we offer that dimensionality reduction in the mPFC might underlie these representations
(Varga et al., 2022). We go on to propose that these low-dimensional representations in the
mPFC guide memory reactivation in more posterior brain regions. Motivated by the anatomical
heterogeneity of the mPFC and its varied connectivity patterns (Cavada et al., 2000; Kahnt et
al., 2012; Mackey & Petrides, 2010; Öngür et al., 2003; Price, 2007), we postulate that the
gradual involvement of subparts along the ventral-lateral and anterior-posterior axis of the
mPFC might be driven by the amount of dimensionality reduction, that is, by levels of specificity
vs. abstractions of schemas and states.
A prominent theory suggests that the mPFC and OFC represent a map of task states, and
more specifically, the current state of the task at any point in time (Boorman et al., 2021; Klein-
19
Flügge et al., 2022; Vaidya & Badre, 2022; Wilson et al., 2014; Zhou, Gardner, et al., 2021).
Recent empirical studies indeed show that a representation of states and the relationships
between them can be found in the mOFC/vmPFC as participants navigate a conceptual space
(Constantinescu et al., 2016) and sequential structures (Baram et al., 2021; Klein-Flügge et al.,
2019; Schapiro et al., 2013), as well as in the absence of active navigation (Viganò & Piazza,
2020). Some theories specify that the mOFC/vmPFC is particularly needed when states cannot
be determined based on perceptual input alone but are latent (like latent causes above) and
require the retrieval of information from memory (Boorman et al., 2021; Wilson et al., 2014).
Empirically, mOFC/vmPFC multivoxel activation patterns were found to be consistent with
Bayesian inference about the current state that requires integrating retrieved prior memories
and current observations (Chan et al., 2016). Another study successfully classified from
mOFC/vmPFC activity patterns states that included information from the current and the
previous trial, and thus relied on memory (Schuck et al., 2016).
Additional results suggest that the mOFC/vmPFC does not only represent each state, but
also the relationship between states. In the study by Schuck et al. (2016), the similarity of
mOFC/vmPFC multivoxel activity patterns was higher for states that were more similar in their
cognitive function. Similar results were obtained in the social domain (Park et al., 2020); see
(Mızrak et al., 2021) for findings in the lateral OFC). The mid-mPFC was also found to be
sensitive to the transition probability and physical distance between states (Klein-Flügge et al.,
2019) and to mediate the retrieval and recombination of memories needed to make choices
about novel stimuli (Barron, Dolan, & Behrens, 2013). Together, these studies are consistent
with the hypothesis that the mOFC/vmPFC and the mid-mPFC represent a map of states, in
particular when information needs to be retrieved from memory.
As proposed above, schemas include hierarchically organized states. Further, schemas
hold knowledge of what typically occurs in an event, and therefore require retrieving
information from memory. Consistent with the view that the mPFC represents latent states that
rely on memory, growing evidence demonstrates the involvement of the mPFC in schemas
(Gilboa & Marlatte, 2017; Varga et al., 2022). Lesions to the mOFC/vmPFC impair the
appropriate deployment of schema knowledge (Ghosh et al., 2014; Gilboa, 2010; Giuliano et al.,
20
2021; Spalding et al., 2015). Activation in the mPFC is modulated by whether information is
consistent or inconsistent with schematic knowledge in both humans and rodents (Alonso et al.,
2020; Amer et al., 2019; Bonasia et al., 2018; Brod & Shing, 2018; Preston & Eichenbaum, 2013;
Reggev et al., 2016; Tse et al., 2007, 2011; van Kesteren et al., 2013; van Kesteren, Rijpkema, et
al., 2010; Wang et al., 2012; Yacoby et al., 2021). Moreover, a recent study showed that mid-
mPFC activation patterns were more similar for events that belonged to the same schema
compared to different schemas (i.e., different examples of visiting a restaurant versus an
airport), even when similarity was computed across visual movies and audio recordings
(Baldassano, Hasson, & Norman, 2018). These results suggest a schematic representation in the
mid-mPFC that goes beyond perceptual features. Another study found similar results and added
that the similarity of neural representations of a movie to itself (across repetitions) was
comparable to its similarity to other movies belonging to the same schema in the mPFC, as if
the mPFC predominantly encodes schematic information, and does not differentiate specific
instances of schemas (Reagh & Ranganath, 2023); though see Masis-Obando et al., 2022,
finding both schematic and specific movie representations in mPFC; see more below). Together
with the finding above that mPFC representations follow Bayesian inference (Chan et al., 2016),
these findings strengthen the proposal that schemas are instantiated via Bayesian latent cause
inference (Varga et al., 2022).
The mPFC might represent schemas and states through dimensionality reduction
Learning and representing schemas and states involve dimensionality reduction: the creation of
a simplified representation of information (Fig. 2). In an RL task, mid-mPFC activation correlates
with predicted rewards as computed based on attending one relevant task dimension out of
three available (Leong, Radulescu, Daniel, Dewoskin, et al., 2017); Fig. 2a). Outside of RL, the
mOFC/vmPFC also represents similarly episodes that have similar attentional goals (Günseli &
Aly, 2020). Additionally, The mPFC is involved in categorization, especially when it relies on
extracting rules from multiple instances or developing a summary representation (e.g.,
prototype; (Bowman et al., 2020; Bowman & Zeithamova, 2018; Goldstone et al., 2012;
Kumaran et al., 2009; Nosofsky, 1986; Zeithamova et al., 2019). A recent fMRI study used a
21
categorization task and principal components analysis (PCA) to extract the number of
orthogonal components that account for variance in mOFC/vmPFC multivoxel activity patterns
(Mack et al., 2020) Fig. 2b). A lower number of components was interpreted as a simpler neural
representation. Simpler categorization rules corresponded to fewer components needed to
explain mOFC/vmPFC activity patterns, consistent with dimensionality reduction (see (Zhou et
al., 2020) for a similar finding in rodents, slightly lateral to the mOFC).
As mentioned above, schemas involve eliminating episodic details to represent a typical
course of events, potentially through dimensionality reduction. Studies showing schematic
representations and a lack of episodic details in the mPFC are consistent with this notion
(Baldassano et al., 2018; Reagh & Ranganath, 2023); Fig. 2c). Additional work allows more
direct inference of dimensionality reduction. These studies exposed participants to item-scene
associations, with some items sharing the same scene (Audrain & McAndrews, 2020; Tompary
& Davachi, 2017). After consolidation, the neural representations of items that shared the same
scene, but not different scenes, showed stronger similarity to each other in the mid-mPFC.
These results suggest that specific episodes (each item-scene pair) were grouped based on a
shared feature (the scene), while the details of each episode were reduced, or eliminated, in
mid-mPFC representations (Fig. 2d; (see (Richards et al., 2014) for similar findings in rodents).
Consistent with our proposal that dimensionality reduction prioritizes goal-relevant as
well as repeating but goal-irrelevant dimensions, evidence suggests that the mPFC represents
both types of information (Boorman et al., 2021; Moneta et al., 2023; Rushworth et al., 2011;
Wilson et al., 2014; Zhou, Gardner, et al., 2021). Regarding goal-relevant dimensions, a wealth
of research has established that mPFC activity correlates with subjective preferences e used for
goal-oriented behavior, namely, choices aimed toward maximizing reward (Padoa-Schioppa &
Conen, 2017; Rushworth et al., 2011). The studies above that support a broader role of the
mPFC in representing cognitive maps tested mPFC representations in the service of solving a
task, and therefore are goal-oriented (Schuck et al., 2016; Wilson et al., 2014). Studies that
found differential activity and connectivity of the mPFC during encoding of semantically
congruent (vs. incongruent) information required participants to indicate congruency, thus
making semantic knowledge goal relevant (Bein et al., 2014; van Kesteren et al., 2013). This is
22
also the case in mOFC/vmPFC involvement in object recognition tasks that require access to
semantic information (Bar et al., 2006). During movie watching, where the mPFC has been
shown to represent schemas (Baldassano et al., 2018; Reagh & Ranganath, 2023; van Kesteren,
Fernández, et al., 2010), it can be argued that the goal is to understand and interpret the
movie, and toward that goal, retrieval of prior semantic knowledge is advantageous. Thus, goal-
relevant schematic and state representations are represented in the mPFC.
Regarding goal-irrelevant information, the mOFC/vmPFC represents the values of
outcomes even when these values are task-irrelevant (Lebreton et al., 2009; Moneta et al.,
2023). The mOFC/vmPFC also encoded a map of a two-dimensional well-learned social
hierarchy, even in a task that asked participants to make inferences only based on one
dimension (Park et al., 2020). Likewise, mid-mPFC representations tracked the structure of a
task even when it was goal-irrelevant (Klein-Flügge et al., 2019); for similar findings in rodents,
see (Zhou et al., 2020; Zhou, Gardner, et al., 2019)). The mPFC also seems to be engaged in
schema processing even when participants are involved in an unrelated task: mid-mPFC BOLD
signal is higher for encoding semantically congruent vs. incongruent information when
participants indicate the grammatical correctness of word stimuli, regardless of semantics
(Reggev et al., 2016; Yacoby et al., 2021, 2023). In line with dimensionality reduction, similar
neural representations of items that share a dimension (e.g., the context of learning) are found
in the mOFC/vmPFC when participants perform an unrelated task (E. Cowan et al., 2020;
Schlichting et al., 2015).
Temporal order seems to be an important dimension in mPFC schematic
representations. Sequentially – akin to the transition probabilities between states, is an
essential part of the world model in model-based RL. Interestingly, scrambling the order of
events in a schema disrupts mPFC representation, evidence that mPFC schematic
representations retain sequential order (Baldassano et al., 2018). In rodents, lesions to the
mPFC impair temporal memory (Chiba et al., 1997). In humans, mPFC lesions specifically impair
schema knowledge, while sparing category knowledge (Giuliano et al., 2021). Arguably, the
temporal order of events is a critical aspect of schemas but not of categories. The
representation of sequential order might be supported by strong anatomical connections to the
23
hippocampus (Barbas & Blatt, 1995; Cavada et al., 2000; Öngür & Price, 2000; Price, 2007),
which represents temporal and sequential information (Buzsáki & Tingley, 2018; Davachi &
DuBrow, 2015; Eichenbaum, 2014; Kesner & Hunsaker, 2010; Ranganath & Hsieh, 2016).
Consistent with that, mPFC-hippocampal functional connectivity supports learning and memory
of sequential information (DuBrow & Davachi, 2016; Schapiro et al., 2016).
24
Figure 2. Dimensionality reduction in the mPFC. a. Participants learned that one of three category-dimensions
is relevant for obtaining reward. A model that biased attention towards that category best explained behavior,
suggesting a dimensionality reduced representation of the task. Activity in the mPFC correlated with values as
estimated by that dimensionality reduced representation (Leong et al., 2017). b. Participants categorized bugs
based on 1/2/3 dimensions. PCA was used to extract the dimensions of vmPFC multivoxel activity patterns,
and dimensionality reduction (‘compression’) was quantified as the % of PCA components that explained 90%
of the variance in vmPFC activity patterns, with fewer components mean stronger compression. As
participants learned the categories, stronger compression was observed in the vmPFC the simpler was the
categorization, suggesting that dimensionality reduction in vmPFC tracked the dimensions of the categories
(Mack et al., 2020). c. Participants encoded associations between objects and overlapping scenes. During
retrieval, greater neural similarity was observed in the mPFC between objects that appeared with the same
scene compared to different (nonoverlapping) scenes during encoding. This greater similarity potentially
reflects loss of distinct details, i.e., fewer dimensions in the representation of these objects, based on scene
overlap. This study also showed that similarity only emerged following a period of consolidation (remote
memories; Tompary and Davachi, 2017). d. Participants watched movie clips showing different instances of
schemas. MPFC representations generalized across instances (e.g., all café clips) of schemas, as indicated by
instances being relatively similar within the same schema, but different from another schema (top right, Reagh
et al., 2023). This indicate reduced dimensions, as specific details of each instance were not represented in the
mPFC. Similar representations were also found across visual vs. auditory modalities, but not when the order of
events was compromised (left side, Baldassano et al., 2018). This suggest that the modality dimension is
reduced in mPFC schematic representations, while sequential information is preserved.
25
MPFC schematic representations guide memory reactivation in posterior brain regions.
One potential role of schema and state representations in the mPFC is to guide the retrieval of
task-relevant knowledge via memory reactivation in posterior brain regions including the
hippocampus (Varga et al., 2022). Place et al. (2016) found in rodents electrophysiology that
mPFC activity preceded hippocampal activity during memory retrieval. Correspondingly, in
humans, magnetoencephalography (MEG) studies found that mOFC/vmPFC activity precedes
and correlates with hippocampal activity during retrieval (McCormick et al., 2020). Similarly,
during object recognition, activity in the mOFC/vmPFC correlates with and precedes that in
ventral-temporal regions involved in object recognition (Bar, 2009; Bar et al., 2006). Mid-mPFC
activity also correlated with the persistence (across time) of ventral-temporal and hippocampal
representations of items experienced within the same context (Ezzyat & Davachi, 2021). Lesion
studies demonstrate causality: in a reversal-learning task that required resolving interference to
infer the correct state, mPFC lesions in rodents impaired hippocampal representations that
mediated interference resolution (Guise & Shapiro, 2017). In humans, neurotypical participants
demonstrate an increase in an N170 EEG signal for familiar versus unfamiliar faces. This
increase, which is localized to posterior brain regions and indicates knowledge about familiar
faces, is diminished in patients with mOFC/vmPFC lesions (Gilboa & Moscovitch, 2017).
MOFC/vmPFC lesions further impair the evaluation of retrieved memories (Hebscher et al.,
2016; Hebscher & Gilboa, 2016), which can result in confabulation – retrieval of memories that
are irrelevant to a specific context (Gilboa, 2010; Gilboa et al., 2009).
Studies that specifically target schema instantiation converge on the same idea (Gilboa
& Marlatte, 2017). Consistency versus inconsistency with prior knowledge modulates mPFC
functional connectivity towards posterior cortical regions versus the hippocampus, respectively
(Alonso et al., 2020; Bein et al., 2014; Bonasia et al., 2018; Preston & Eichenbaum, 2013; van
Kesteren et al., 2012, 2013; van Kesteren, Fernández, et al., 2010; van Kesteren, Rijpkema, et
al., 2010). These differential interactions support memory encoding and retrieval, suggesting
that the mPFC routes the involvement of cortical vs. hippocampal memory systems based on
whether information encoded or retrieved is consistent with schematic knowledge or not (van
Kesteren et al., 2012). More directly, a recent study showed that the extent of schematic
26
representation in the mPFC correlated with the strength of the representation of specific
instances of that schema (i.e., a specific movie belonging to that schema) in a posterior medial
cortical region (Masís-Obando et al., 2022).
Dimensionally reduced representation and guided memory retrieval could be used for
predictions and inference because both functions rely on the reactivation of learned
representations. Indeed, the mPFC is involved in both (Boorman et al., 2021; DeVito et al.,
2010; Preston & Eichenbaum, 2013; Rushworth et al., 2011; Sherrill et al., 2023; Spalding et al.,
2018; Zhou, Gardner, et al., 2021; Zhou, Zong, et al., 2021). In the case of inference, the
retrieval of learned representations is needed to construct an accurate representation of the
current situation. Lesion to the mOFC/vmPFC impair both transitive inference (if A > B and B >
C, then A > C) and associative inference (if A is associated with B and B is associated with C,
then A is associated with C; (DeVito et al., 2010; Preston & Eichenbaum, 2013; Spalding et al.,
2018). One Such inferences might be mediated by similar representations for the A and C items
in the mOFC/vmPFC (Morton et al., 2017; Schlichting et al., 2015), consistent with
dimensionality reduction. Regarding the prediction of future events, “expected value”, thought
to be represented in the mPFC, is by definition a predictive representation. Moreover, several
recent studies found neural evidence consistent with prediction of complex events in the mPFC.
For instance, with repeated exposure to a movie, mPFC activity patterns that represent
segments of the movie shifted back in time, as if predicting future events (Lee et al., 2021). A
navigation study showed that activity patterns in the mPFC became more similar across time
points when participants navigated toward a goal compared to following a GPS, suggesting
prediction of goal location (Brunec & Momennejad, 2022).
We conclude that the mPFC represents dimensionality-reduced knowledge of the
structure of the environment, including both goal-relevant and irrelevant but statistically-
reliable information—the hallmarks of schemas. This representation guides the reactivation of
learned representations in other brain regions and can support prediction and inference. We
postulate that guiding reactivation might be achieved through different populations of mPFC
neurons communicating with different populations of neurons in posterior brain regions, and
thus activating different representations in these regions. This suggestion is consistent with
27
ideas about the mPFC as a ‘hub’ linking together well-learned and consolidated memories
(Frankland & Bontempi, 2005; van Kesteren et al., 2012). Potentially for this reason, the mPFC is
critical for tasks involving partially observed states, where memory reactivation is needed, but
not for fully observed states (Bradfield et al., 2015; Wilson et al., 2014).
2
The medial to lateral axis in the OFC, with some focus on the lateral OFC, has been discussed elsewhere,
suggesting that while vmPFC/mOFC is involved in latent states that require memory retrieval, the lateral OFC is
involved in representing states based on observable information (Rushworth et al., 2011; Stalnaker et al., 2015;
Wilson et al., 2014))
28
and actions, but not current ones (Boorman et al., 2009, 2011; Haynes et al., 2007;
Momennejad & Haynes, 2012, 2013). Further, studies on prospective planning and predictions
found a gradient of predictions in mPFC, whereby predictions of the near future are
represented more posteriorly and predictions of the far future are represented more anteriorly
(Brunec & Momennejad, 2022; Lee et al., 2021). Potentially, the farther we prospect to the
future, the more abstract and less concrete and detailed are our thoughts (Liberman et al.,
2002; Trope & Liberman, 2003, 2010), and therefore they are represented more anteriorly. This
might be true also for counterfactual thoughts compared to actions and events that have
materialized. Additionally, judging values on a relative scale engages a posterior cluster of
voxels, compared to making binary decisions which engages a more anterior cluster
(Grabenhorst et al., 2008). A relative judgment might require attending to more details, as
compared to when making a simpler binary decision. The specific identity of an expected
reward has also been found more posteriorly, at least in humans (Howard & Kahnt, 2017,
2021). Finally, studies reporting dimensionality reduction in simple and abstract tasks in the
mOFC/vmPFC find a more anterior cluster of voxels (Leong, Radulescu, Daniel, DeWoskin, et al.,
2017; Mack et al., 2020) compared to studies addressing retrieval of autobiographic memories
of specific events (Takashima et al., 2006).
In terms of connectivity, the more posterior (and ventral) part of the mPFC is connected
to the hippocampus (Barbas & Blatt, 1995; Cavada et al., 2000; Öngür & Price, 2000; Price,
2007). The hippocampus is critical for the retrieval of detailed memories that support fine
discrimination, potentially through pattern separation, namely, the allocation of distinct
representations to similar stimuli (Baker et al., 2016; Bakker et al., 2008; Berron et al., 2016;
Leutgeb et al., 2007; Yassa & Stark, 2011). Recent studies have shown that the hippocampus
supports fine discrimination of states (Ballard et al., 2019; Jiang et al., 2020; Zhou, Montesinos-
Cartagena, et al., 2019). Thus, the posterior mOFC/vmPFC might guide the retrieval of detailed
state representations through interaction with the hippocampus.
Specificity versus abstraction might also underlie graded involvement from ventral to
dorsal mPFC, supported by changes in functional connectivity. For instance, identity-specific
expected value representations have been found ventral to general (scalar) expected value
29
representations in the mPFC (Howard et al., 2015; McNamee et al., 2013). Further, while
retrieval of specific autobiographical memories tends to involve a ventral cluster of voxels
(Takashima et al., 2006), a study addressing rule learning that requires abstraction across
multiple episodes showed a mid-mPFC activation (Sweegers et al., 2014). While the
mOFC/vmPFC is connected to the hippocampus (important for detailed memories), the mid-
mPFC is functionally connected with the posterior medial cortex (Jackson et al., 2020; Kahnt et
al., 2012; Price, 2007). The posterior medial cortex has been shown to represent events over
large timescales, potentially abstracting away more specific details (Baldassano et al., 2017;
Chen et al., 2017; Hasson et al., 2015; Masís-Obando et al., 2022). Of note, the studies
mentioned here employed different learning protocols and stimuli. Nevertheless, they are in
line with our proposal that the extent of dimensionality reduction underlies differential
involvement of mPFC subregions, and the change in connectivity patterns along mPFC axes.
Abstraction or schematization are related to consolidation, namely, the transformation
of memories in the brain over time (Alvarez & Squire, 1994; Antony & Schapiro, 2019; Gilboa &
Moscovitch, 2021; Nadel et al., 2000). Consolidated memories tend to be more schematic and
involve fewer specific details (Gilboa & Moscovitch, 2021; Moscovitch et al., 2016; Robin &
Moscovitch, 2017). Thus, consolidation may also play a part in whether more ventral or dorsal
mPFC subregions are involved in event schemas and states. For example, studies examining the
integration of memory representations show a ventral cluster immediately after learning or
only 1-day of consolidation (E. Cowan et al., 2020; Schlichting et al., 2015). However, a more
dorsal cluster in the mid-mPFC is revealed after one week (Tompary & Davachi, 2017), or for
well-consolidated schemes (Baldassano et al., 2018). Given that the retrieval of consolidated
but specific autobiographical memories involves the mOFC/vmPFC, it is possible that more
consolidated memories involve more dorsal regions because these memories are more
abstract, but this should be directly tested.
Other proposals for mPFC function have been made, such as the evaluation of retrieved
memories (Hebscher & Gilboa, 2016), indicating confidence (Barron et al., 2015; Hebscher &
Gilboa, 2016), or signaling the match between prior schemas and perceptual information (van
Kesteren et al., 2012). In our view, recent studies that examined multivoxel activity patterns
30
better support state or schematic representations, because these studies show different
representations for memories that are retrieved with similar confidence levels or that have
similar levels of match to prior memories (Masís-Obando et al., 2022; Schuck et al., 2016;
Tompary & Davachi, 2017). Coding of different identities of equally valued stimuli is also
consistent with our proposal ((McNamee et al., 2013) for reviews, see (Howard & Kahnt, 2021;
Zhou, Gardner, et al., 2021)). An interesting possibility is that multiple codes exist in the mPFC:
different populations of neurons might represent different states and schemas, leading to the
multivariate effects reviewed here, whereas the overall level of activity in the neurons (total
firing rate or average BOLD signal) can signal value, match with prior memories or other
monitoring signals. Indeed a recent rodent study found a multiplicity of codes more laterally in
the OFC (Zhou et al., 2020), and similar notions can potentially explain mPFC coding as well.
Conclusion
We outlined how RL, state representations, and event schemas might be related. We proposed
that schemas might be learned via RL-related mechanisms such as learning through prediction
errors, hierarchical RL, and dimensionality reduction. We then hypothesized that dimensionality
reduction might underlie the involvement of the mPFC in both schemas and reinforcement
learning and postulated that the extent of abstraction might determine the locus of
involvement along mPFC axes. Broadly, we hope to facilitate integration of the fields of learning
and memory (e.g., (Biderman et al., 2020; Ergo et al., 2020; Hartley et al., 2021)), to advance
our understanding of human cognition and how it is implemented in the brain.
31
References
Akam, T., Rodrigues-Vaz, I., Marcelo, I., Zhang, X., Pereira, M., Oliveira, R. F., Dayan, P., & Costa,
R. M. (2021). The Anterior Cingulate Cortex Predicts Future States to Mediate Model-
https://doi.org/10.1016/j.neuron.2020.10.013
Alba, J. W., & Hasher, L. (1983). Is memory schematic? Psychological Bulletin, 93(2), 203–231.
https://doi.org/10.1037//0033-2909.93.2.203
Alonso, A., van der Meij, J., Tse, D., & Genzel, L. (2020). Naïve to expert: Considering the role of
239821282094868. https://doi.org/10.1177/2398212820948686
Alvarez, P., & Squire, L. R. (1994). Memory consolidation and the medial temporal lobe: A
simple network model. Proceedings of the National Academy of Sciences of the United
Amer, T., Giovanello, K. S., Nichol, D. R., Hasher, L., & Grady, C. L. (2019). Neural Correlates of
Enhanced Memory for Meaningful Associations with Age. Cerebral Cortex, 29(11), 4568–
4579. https://doi.org/10.1093/cercor/bhy334
Anderson, J. R., & Reder, L. M. (1999). The fan effect: New results and new theories. Journal of
3445.128.2.186
Antony, J. W., Hartshorne, T. H., Pomeroy, K., Gureckis, T. M., Hasson, U., McDougle, S. D., &
32
Naturalistic Sports Viewing. Neuron, 1–14.
https://doi.org/10.1016/j.neuron.2020.10.029
Antony, J. W., & Schapiro, A. C. (2019). Active and effective replay: Systems consolidation
https://doi.org/10.1038/s41583-019-0191-8
Audrain, S., & McAndrews, M. P. (2020). Schemas provide a scaffold for neocortical integration
https://doi.org/10.1101/2020.10.11.335166
Badre, D. (2008). Cognitive control, hierarchy, and the rostro-caudal organization of the frontal
Baker, S., Vieweg, P., Gao, F., Gilboa, A., Wolbers, T., Black, S. E., & Rosenbaum, R. S. (2016).
The Human Dentate Gyrus Plays a Necessary Role in Discriminating New Memories.
Bakker, A., Kirwan, C. B., Miller, M., & Stark, C. E. L. (2008). Pattern separation in the human
https://doi.org/10.1126/science.1152882
Balaguer, J., Spiers, H., Hassabis, D., & Summerfield, C. (2016). Neural Mechanisms of
https://doi.org/10.1016/j.neuron.2016.03.037
Baldassano, C., Chen, J., Zadbood, A., Pillow, J. W., Hasson, U., & Norman, K. A. (2017).
33
Baldassano, C., Hasson, U., & Norman, K. A. (2018). Representation of real-world event schemas
https://doi.org/10.1523/JNEUROSCI.0251-18.2018
Ballard, I. C., Wagner, A. D., & McClure, S. M. (2019). Hippocampal pattern separation supports
019-08998-1
Bar, M. (2009). The proactive brain: Memory for predictions. Philos. Trans. R. Soc. Lond. B Biol.
Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Dale, A. M., Hamalainen, M. S.,
Marinkovic, K., Schacter, D. L., Rosen, B. R., & Halgren, E. (2006). Top-down facilitation
https://doi.org/10.1073/pnas.0507062103
Baram, A. B., Muller, T. H., Nili, H., Garvert, M. M., & Behrens, T. E. J. (2021). Entorhinal and
https://doi.org/10.1016/j.neuron.2020.11.024
Barbas, H., & Blatt, G. J. (1995). Topographically specific hippocampal projections target
functionally distinct prefrontal areas in the rhesus monkey. Hippocampus, 5(6), 511–
533. https://doi.org/10.1002/hipo.450050604
Barron, H. C., Garvert, M. M., & Behrens, T. E. J. (2015). Reassessing VMPFC: full of confidence?
34
Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge
University Press.
Barto, A. G., & Mahadevan, S. (2003). Recent Advances in Hierarchical Reinforcement Learning.
Bartolo, R., & Averbeck, B. B. (2020). Prefrontal Cortex Predicts State Switches during Reversal
Behrens, T. E. J., Muller, T. H., Whittington, J. C. R., Mark, S., Baram, A. B., Stachenfeld, K. L., &
Bein, O., Livneh, N., Reggev, N., Gilead, M., Goshen-Gottstein, Y., & Maril, A. (2015). Delineating
the effect of semantic congruency on episodic memory: The role of integration and
Bein, O., Plotkin, N. A., & Davachi, L. (2021). Mnemonic prediction errors promote detailed
Bein, O., Reggev, N., & Maril, A. (2014). Prior knowledge influences on hippocampus and medial
https://doi.org/10.1016/j.neuropsychologia.2014.09.046
Bein, O., Trzewik, M., & Maril, A. (2019). The role of prior knowledge in incremental associative
https://doi.org/10.1016/j.jml.2019.03.006
35
Bellana, B., Mansour, R., Ladyka-Wojcik, N., Grady, C. L., & Moscovitch, M. (2021). The influence
of prior knowledge on the formation of detailed and durable memories. J. Mem. Lang.,
Bellmund, J. L. S., Gärdenfors, P., Moser, E. I., & Doeller, C. F. (2018). Navigating cognition:
https://doi.org/10.1126/science.aat6766
Ben-Yakov, A., & Dudai, Y. (2011). Constructing realistic engrams: Poststimulus activity of
Ben-Yakov, A., & Henson, R. N. (2018). The Hippocampal Film Editor: Sensitivity and Specificity
https://doi.org/10.1523/JNEUROSCI.0524-18.2018
Berron, D., Schütze, H., Maass, A., Cardenas-Blanco, A., Kuijf, H. J., Kumaran, D., & Düzel, E.
(2016). Strong Evidence for Pattern Separation in Human Dentate Gyrus. J. Neurosci.,
Biderman, N., Bakkour, A., & Shohamy, D. (2020). What Are Memories For? The Hippocampus
Bridges Past Experience with Future Decisions. Trends in Cognitive Sciences, 24(7), 542–
556. https://doi.org/10.1016/j.tics.2020.04.004
Bonasia, K., Sekeres, M. J., Gilboa, A., Grady, C. L., Winocur, G., & Moscovitch, M. (2018). Prior
events at short and long delays. Neurobiol. Learn. Mem., 153(Pt A), 26–39.
https://doi.org/10.1016/j.nlm.2018.02.017
36
Boorman, E. D., Behrens, T. E. J., Woolrich, M. W., & Rushworth, M. F. S. (2009). How green is
the grass on the other side? Frontopolar cortex and the evidence in favor of alternative
https://doi.org/10.1016/j.neuron.2009.05.014
Boorman, E. D., Behrens, T. E., & Rushworth, M. F. (2011). Counterfactual Choice and Learning
in a Neural Network Centered on Human Lateral Frontopolar Cortex. PLoS Biology, 9(6),
e1001093. https://doi.org/10.1371/journal.pbio.1001093
Boorman, E. D., Witkowski, P. P., Zhang, Y., & Park, S. A. (2021). The orbital frontal cortex, task
https://doi.org/10.1037/bne0000465
Botvinick, M. M. (2008). Hierarchical models of behavior and prefrontal function. Trends Cogn.
Botvinick, M. M. (2012). Hierarchical reinforcement learning and decision making. Curr. Opin.
Botvinick, M. M., Niv, Y., & Barto, A. G. (2009). Hierarchically organized behavior and its neural
https://doi.org/10.1016/j.cognition.2008.08.011
Bowman, C. R., Iwashita, T., & Zeithamova, D. (2020). Tracking prototype and exemplar
https://doi.org/10.7554/eLife.59360
37
Bowman, C. R., & Zeithamova, D. (2018). Abstract Memory Representations in the
Bradfield, L. A., Dezfouli, A., van Holstein, M., Chieng, B., & Balleine, B. W. (2015). Medial
Brod, G., Breitwieser, J., Hasselhorn, M., & Bunge, S. A. (2020). Being proven wrong elicits
learning in children—But only in those with higher executive function skills. Dev. Sci.,
Brod, G., Hasselhorn, M., & Bunge, S. A. (2018). When generating a prediction boosts learning:
https://doi.org/10.1016/j.learninstruc.2018.01.013
Brod, G., & Shing, Y. L. (2018). Specifying the role of the ventromedial prefrontal cortex in
https://doi.org/10.1016/j.neuropsychologia.2018.01.005
https://doi.org/10.1523/JNEUROSCI.1327-21.2021
Brunec, I. K., Moscovitch, M., & Barense, M. D. (2018). Boundaries Shape Cognitive
https://doi.org/10.1016/j.tics.2018.03.013
38
Bunzeck, N., Dayan, P., Dolan, R. J., & Duzel, E. (2010). A common mechanism for adaptive
https://doi.org/10.1002/hbm.20939
Bunzeck, N., & Düzel, E. (2006). Absolute coding of stimulus novelty in the human substantia
Buzsáki, G., & Tingley, D. (2018). Space and Time: The Hippocampus as a Sequence Generator.
Cavada, C., Compañy, T., Tejedor, J., Cruz-Rizzolo, R. J., & Reinoso-Suárez, F. (2000). The
Chan, S. C. Y., Niv, Y., & Norman, K. A. (2016). A Probability Distribution over Latent Causes, in
https://doi.org/10.1523/JNEUROSCI.0659-16.2016
Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared
memories reveal shared structure in neural activity across individuals. Nat. Neurosci.,
Chiba, A. A., Kesner, R. P., & Gibson, C. J. (1997). Memory for temporal order of new and
familiar spatial location sequences: Role of the medial prefrontal cortex. Learning &
Clewett, D., DuBrow, S., & Davachi, L. (2019). Transcending time in the brain: How event
https://doi.org/10.1002/hipo.23074
39
Clewett, D., Gasser, C., & Davachi, L. (2020). Pupil-linked arousal signals track the temporal
https://doi.org/10.1038/s41467-020-17851-9
Bornstein, & A. Shenhav (Eds.), Goal-Directed Decision Making (pp. 105–123). Academic
Press. https://doi.org/10.1016/B978-0-12-812098-9.00005-X
Collins, A. G. E., & Frank, M. J. (2013). Cognitive control over learning: Creating, clustering, and
https://doi.org/10.1037/a0030852
Collins, A. G. E., & Frank, M. J. (2016). Neural signature of hierarchically structured expectations
predicts clustering and transfer of rule sets in reinforcement learning. Cognition, 152,
160–169. https://doi.org/10.1016/j.cognition.2016.04.002
Collins, A. M., & Loftus, E. F. (1975). Spreading activation theory of semantic processing.
Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of Verbal
Correa, C. G., Ho, M. K., Callaway, F., Daw, N. D., & Griffiths, T. L. (2022). Humans decompose
http://arxiv.org/abs/2211.03890
40
Cowan, E., Liu, A., Henin, S., Kothare, S., Devinsky, O., & Davachi, L. (2020). Sleep Spindles
Cowan, E. T., Schapiro, A. C., Dunsmoor, J. E., & Murty, V. P. (2021). Memory consolidation as
d’Acremont, M., Schultz, W., & Bossaerts, P. (2013). The human brain encodes event
https://doi.org/10.1523/JNEUROSCI.5829-12.2013
Daniel, R., Radulescu, A., & Niv, Y. (2020). Intact Reinforcement Learning But Impaired
19.2019
Davachi, L., & DuBrow, S. (2015). How the hippocampus preserves order: The role of prediction
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based
influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215.
https://doi.org/10.1016/j.neuron.2011.02.027
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and
dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–
1711. https://doi.org/10.1038/nn1560
41
Dayan, P., & Niv, Y. (2008). Reinforcement learning: The good, the bad and the ugly. Curr. Opin.
Delgado, M. R., Beer, J. S., Fellows, L. K., Huettel, S. A., Platt, M. L., Quirk, G. J., & Schiller, D.
DeVito, L. M., Lykken, C., Kanter, B. R., & Eichenbaum, H. (2010). Prefrontal cortex: Role in
Diuk, C., Tsai, K., Wallis, J., Botvinick, M., & Niv, Y. (2013). Hierarchical learning induces two
Doya, K., Samejima, K., Katagiri, K., & Kawato, M. (2002). Multiple Model-Based Reinforcement
DuBrow, S., & Davachi, L. (2013). The influence of context boundaries on memory for the
1286. https://doi.org/10.1037/a0034024
DuBrow, S., & Davachi, L. (2016). Temporal binding within and across events. Neurobiol. Learn.
DuBrow, S., Rouhani, N., Niv, Y., & Norman, K. A. (2017). Does mental context drift or shift? Curr
Düzel, E., Bunzeck, N., Guitart-Masip, M., & Düzel, S. (2010). NOvelty-related motivation of
42
Neurosci. Biobehav. Rev., 34(5), 660–669.
https://doi.org/10.1016/j.neubiorev.2009.08.006
Eckstein, M. K., & Collins, A. G. E. (2020). Computational evidence for hierarchically structured
reinforcement learning in humans. Proc. Natl. Acad. Sci. U. S. A., 117(47), 29381–29389.
https://doi.org/10.1073/pnas.1912330117
https://doi.org/10.1016/j.neuron.2004.08.028
Eichenbaum, H. (2014). Time cells in the hippocampus: A new dimension for mapping
https://doi.org/10.1038/nrn3827
Ekstrom, A. D., Harootonian, S. K., & Huffman, D. J. (2020). Grid coding, spatial representation,
https://doi.org/10.1002/hipo.23175
Elman, J. L., & McRae, K. (2019). A model of event knowledge. Psychol. Rev., 126(2), 252–291.
https://doi.org/10.1037/rev0000133
Ergo, K., De Loof, E., & Verguts, T. (2020). Reward Prediction Error and Declarative Memory.
Ezzyat, Y., & Davachi, L. (2014). Similarity breeds proximity: Pattern similarity within and across
1179–1189. https://doi.org/10.1016/j.neuron.2014.01.042
43
Ezzyat, Y., & Davachi, L. (2021). Neural Evidence for Representational Persistence Within
https://doi.org/10.1523/JNEUROSCI.0073-21.2021
Farashahi, S., Rowe, K., Aslami, Z., Lee, D., & Soltani, A. (2017). Feature-based learning improves
https://doi.org/10.1038/s41467-017-01874-w
Farzanfar, D., Spiers, H. J., Moscovitch, M., & Rosenbaum, R. S. (2022). From cognitive maps to
00655-9
Frankland, P. W., & Bontempi, B. (2005). The organization of recent and remote memories. Nat.
Franklin, N. T., Norman, K. A., Ranganath, C., Zacks, J. M., & Gershman, S. J. (2020). Structured
Event Memory: A neuro-symbolic model of event cognition. Psychol. Rev., 127(3), 327–
361. https://doi.org/10.1037/rev0000177
Frost, R., Armstrong, B. C., Siegelman, N., & Christiansen, M. H. (2015). Domain generality
Garvert, M. M., Dolan, R. J., & Behrens, T. E. J. (2017). A map of abstract relational knowledge in
https://doi.org/10.7554/eLife.17086
Gasser, C., & Davachi, L. (2022). Cross-modal facilitation of memory through actions [Preprint].
In Review. https://doi.org/10.21203/rs.3.rs-1466467/v1
44
Gershman, S. J. (2015). A Unifying Probabilistic View of Associative Learning. PLoS Comput. Biol.,
Gershman, S. J., Monfils, M.-H., Norman, K. A., & Niv, Y. (2017). The computational nature of
Gershman, S. J., & Niv, Y. (2010). Learning latent structure: Carving nature at its joints. Curr.
Ghosh, V. E., & Gilboa, A. (2014). What is a memory schema? A historical perspective on current
https://doi.org/10.1016/j.neuropsychologia.2013.11.010
Ghosh, V. E., Moscovitch, M., Melo Colella, B., & Gilboa, A. (2014). Schema representation in
https://doi.org/10.1523/JNEUROSCI.0740-14.2014
Gilboa, A. (2010). Strategic retrieval, confabulations, and delusions: Theory and data. Cogn.
Gilboa, A., Alain, C., He, Y., Stuss, D. T., & Moscovitch, M. (2009). Ventromedial prefrontal
cortex lesions produce early functional alterations during remote memory retrieval. J.
Gilboa, A., & Marlatte, H. (2017). Neurobiology of Schemas and Schema-Mediated Memory.
Gilboa, A., & Moscovitch, M. (2017). Ventromedial prefrontal cortex generates pre-stimulus
30. https://doi.org/10.1016/j.cortex.2016.10.008
45
Gilboa, A., & Moscovitch, M. (2021). No consolidation without representation: Correspondence
Giuliano, A. E., Bonasia, K., Ghosh, V. E., Moscovitch, M., & Gilboa, A. (2021). Differential
https://doi.org/10.1162/jocn_a_01746
Goldstone, R. L., Kersten, A., & Carvalho, P. F. (2012). Concepts and Categorization. In
Handbook of psychology: Experimental psychology (pp. 607–630). John Wiley & Sons,
Inc..
Grabenhorst, F., Rolls, E. T., & Parris, B. A. (2008). From affective value to decision-making in
https://doi.org/10.1111/j.1460-9568.2008.06489.x
Gravina, M. T., & Sederberg, P. B. (2017). The neural architecture of prediction over a
202. https://doi.org/10.1016/j.cobeha.2017.09.001
Greve, A., Cooper, E., Kaula, A., Anderson, M. C., & Henson, R. (2017). Does prediction error
drive one-shot declarative learning? Journal of Memory and Language, 94(June), 149–
165. https://doi.org/10.1016/j.jml.2016.11.001
Greve, A., Cooper, E., Tibon, R., & Henson, R. N. (2019). Knowledge is power: Prior knowledge
aids memory for both congruent and incongruent events, but in different ways. Journal
46
of Experimental Psychology. General, 148(2), 325–341.
https://doi.org/10.1037/xge0000498
Gronau, N. (2021). To Grasp the World at a Glance: The Role of Attention in Visual and
https://doi.org/10.3390/jimaging7090191
Gronau, N., Neta, M., & Bar, M. (2008). Integrated Contextual Representation for Objects’
Gronau, N., & Shachar, M. (2015). Contextual consistency facilitates long-term memory of
Guise, K. G., & Shapiro, M. L. (2017). Medial Prefrontal Cortex Reduces Memory Interference by
https://doi.org/10.1016/j.neuron.2017.03.011
Günseli, E., & Aly, M. (2020). Preparation for upcoming attentional states in the hippocampus
Hartley, C. A., Nussenbaum, K., & Cohen, A. O. (2021). Interactive Development of Adaptive
Hasson, U., Chen, J., & Honey, C. J. (2015). Hierarchical process memory: Memory as an integral
https://doi.org/10.1016/j.tics.2015.04.006
47
Haynes, J.-D., Sakai, K., Rees, G., Gilbert, S., Frith, C., & Passingham, R. E. (2007). Reading
https://doi.org/10.1016/j.cub.2006.11.072
Hebscher, M., Barkan-Abramski, M., Goldsmith, M., Aharon-Peretz, J., & Gilboa, A. (2016).
Memory, Decision-Making, and the Ventromedial Prefrontal Cortex (vmPFC): The Roles
Hebscher, M., & Gilboa, A. (2016). A boost of confidence: The role of the ventromedial
58. https://doi.org/10.1016/j.neuropsychologia.2016.05.003
Henson, R. N., & Gagnepain, P. (2010). Predictive, interactive multiple memory systems.
Howard, J. D., Gottfried, J. A., Tobler, P. N., & Kahnt, T. (2015). Identity-specific coding of future
rewards in the human orbitofrontal cortex. Proc. Natl. Acad. Sci. U. S. A., 112(16), 5195–
5200. https://doi.org/10.1073/pnas.1503550112
https://doi.org/10.1523/JNEUROSCI.3473-16.2017
Howard, J. D., & Kahnt, T. (2021). To be specific: The role of orbitofrontal cortex in signaling
https://doi.org/10.1037/bne0000455
48
Jackson, R. L., Bajada, C. J., Lambon Ralph, M. A., & Cloutman, L. L. (2020). The Graded Change
Jiang, J., Wang, S.-F., Guo, W., Fernandez, C., & Wagner, A. D. (2020). Prefrontal reinstatement
Kafkas, A., & Montaldi, D. (2018). Expectation affects learning and modulates memory
https://doi.org/10.1016/j.cognition.2018.07.010
Kahnt, T., Chang, L. J., Park, S. Q., Heinzle, J., & Haynes, J.-D. (2012). Connectivity-based
https://doi.org/10.1523/JNEUROSCI.0257-12.2012
Miami Symposium on the Prediction of Behavior, 1967: Aversive Stimulation (pp. 9–31).
https://psychology.nottingham.ac.uk/staff/mxh/C82MPR/Useful%20reading%20(pdfs)/
Kamin%20(1968).pdf
Crofts. https://ntrs.nasa.gov/api/citations/19680014821/downloads/19680014821.pdf
Kamiński, J., Mamelak, A. N., Birch, K., Mosher, C. P., Tagliati, M., & Rutishauser, U. (2018).
49
of Declarative Memory Formation. Curr. Biol., 28(9), 1333-1343.e4.
https://doi.org/10.1016/j.cub.2018.03.024
Kenett, Y. N., Levi, E., Anaki, D., & Faust, M. (2017). The Semantic Distance Task: Quantifying
https://doi.org/10.1037/xlm0000391
Kesner, R. P., & Hunsaker, M. R. (2010). The temporal attributes of episodic memory.
Kim, G., Lewis-Peacock, J. A., Norman, K. A., & Turk-Browne, N. B. (2014). Pruning of memories
Kim, H., Schlichting, M. L., Preston, A. R., & Lewis-Peacock, J. A. (2020). Predictability Changes
Klein-Flügge, M. C., Bongioanni, A., & Rushworth, M. F. S. (2022). Medial and orbital frontal
https://doi.org/10.1016/j.neuron.2022.05.022
Klein-Flügge, M. C., Wittmann, M. K., Shpektor, A., Jensen, D. E. A., & Rushworth, M. F. S.
https://doi.org/10.1038/s41467-019-12557-z
50
Kumar, A. A. (2021). Semantic memory: A review of methods, models, and current challenges.
In Psychonomic Bulletin and Review (Vol. 28, Issue 1). Psychonomic Bulletin & Review.
https://doi.org/10.3758/s13423-020-01792-x
Kumaran, D., Summerfield, J. J., Hassabis, D., & Maguire, E. A. (2009). Tracking the Emergence
https://doi.org/10.1016/j.neuron.2009.07.030
Langdon, A. J., Song, M., & Niv, Y. (2019). Uncovering the ‘state’: Tracing the hidden state
Lebreton, M., Jorge, S., Michel, V., Thirion, B., & Pessiglione, M. (2009). An Automatic Valuation
System in the Human Brain: Evidence from Functional Neuroimaging. Neuron, 64(3),
431–439. https://doi.org/10.1016/j.neuron.2009.09.040
Lee, C. S., Aly, M., & Baldassano, C. (2021). Anticipation of temporally structured events in the
Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V., & Niv, Y. (2017). Dynamic Interaction
Leong, Y. C., Radulescu, A., Daniel, R., Dewoskin, V., Niv, Y., Leong, Y. C., Radulescu, A., Daniel,
R., Dewoskin, V., & Niv, Y. (2017). Dynamic Interaction between Reinforcement Learning
https://doi.org/10.1016/j.neuron.2016.12.040
51
Leutgeb, J. K., Leutgeb, S., Moser, M.-B., & Moser, E. I. (2007). Pattern separation in the dentate
https://doi.org/10.1126/science.1135801
Levy, D. J., & Glimcher, P. W. (2012). The root of all value: A neural common currency for
https://doi.org/10.1016/j.conb.2012.06.001
Liberman, N., Sagristano, M. D., & Trope, Y. (2002). The effect of temporal distance on level of
Long, N. M., Lee, H., & Kuhl, B. A. (2016). Hippocampal Mismatch Signals Are Modulated by the
12677–12687. https://doi.org/10.1523/JNEUROSCI.1850-16.2016
Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). SUSTAIN: A Network Model of Category
295X.111.2.309
Mack, M. L., Preston, A. R., & Love, B. C. (2020). Ventromedial prefrontal cortex compression
13930-8
areas within the ventromedial and lateral orbital frontal cortex in the human and the
https://doi.org/10.1111/j.1460-9568.2010.07465.x
52
Mackey, S., & Petrides, M. (2014). Architecture and morphology of the human ventromedial
https://doi.org/10.1111/ejn.12654
Masís-Obando, R., Norman, K. A., & Baldassano, C. (2022). Schema representations in distinct
brain networks support narrative memory during encoding and retrieval. eLife, 11,
e70445. https://doi.org/10.7554/eLife.70445
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary
learning systems in the hippocampus and neocortex: Insights from the successes and
419–457. https://doi.org/10.1037/0033-295X.102.3.419
McClelland, J. L., McNaughton, B. L., & Oreilly, R. C. (1995). Why there are complementary
learning-systems in the hippocampus and neocortex – insights from the success and
419–457. https://doi.org/10.1037/0033-295x.102.3.419
McCormick, C., Barry, D. N., Jafarian, A., Barnes, G. R., & Maguire, E. A. (2020). vmPFC Drives
McKenzie, S., Frank, A. J., Kinsky, N. R., Porter, B., Rivière, P. D., & Eichenbaum, H. (2014).
https://doi.org/10.1016/j.neuron.2014.05.019
53
McNamee, D., Rangel, A., & O’Doherty, J. P. (2013). Category-dependent and category-
Mızrak, E., Bouffard, N. R., Libby, L. A., Boorman, E. D., & Ranganath, C. (2021). The
hippocampus and orbitofrontal cortex jointly represent task structure during memory-
https://doi.org/10.1016/j.celrep.2021.110065
Momennejad, I., & Haynes, J.-D. (2012). Human anterior prefrontal cortex encodes the “what”
https://doi.org/10.1016/j.neuroimage.2012.02.079
Momennejad, I., & Haynes, J.-D. (2013). Encoding of prospective tasks in the human prefrontal
https://doi.org/10.1523/JNEUROSCI.0492-13.2013
Momennejad, I., Russek, E. M., Cheong, J. H., Botvinick, M. M., Daw, N. D., & Gershman, S. J.
(2017). The successor representation in human reinforcement learning. Nat Hum Behav,
Moneta, N., Garvert, M. M., Heekeren, H. R., & Schuck, N. W. (2023). Task state representations
in vmPFC mediate relevant and irrelevant value signals and their behavioral influence.
Morton, N. W., Sherrill, K. R., & Preston, A. R. (2017). Memory integration constructs maps of
space, time, and concepts. Curr Opin Behav Sci, 17, 161–168.
https://doi.org/10.1016/j.cobeha.2017.08.007
54
Moscovitch, M., Cabeza, R., Winocur, G., & Nadel, L. (2016). Episodic Memory and Beyond: The
https://doi.org/10.1146/annurev-psych-113011-143733
Murty, V. P., & Adcock, R. A. (2014). Enriched encoding: Reward motivation organizes cortical
networks for hippocampal detection of unexpected events. Cereb. Cortex, 24(8), 2160–
2168. https://doi.org/10.1093/cercor/bht063
Nadel, L., Samsonovich, A., Ryan, L., & Moscovitch, M. (2000). Multiple trace theory of human
10, 352–368.
https://doi.org/10.1038/s41593-019-0470-8
Niv, Y., Daniel, R., Geana, A., Gershman, S. J., Leong, Y. C., Radulescu, A., & Wilson, R. C. (2015).
https://doi.org/10.1523/JNEUROSCI.2978-14.2015
Niv, Y., & Langdon, A. (2016). Reinforcement learning with Marr. Current Opinion in Behavioral
Niv, Y., & Schoenbaum, G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences,
55
Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contributions to
Öngür, D., Ferry, A. T., & Price, J. L. (2003). Architectonic subdivision of the human orbital and
https://doi.org/10.1002/cne.10609
Öngür, D., & Price, J. L. (2000). The Organization of Networks within the Orbital and Medial
Prefrontal Cortex of Rats, Monkeys and Humans. Cereb. Cortex, 10, 206–219.
Padoa-Schioppa, C., & Conen, K. E. (2017). Orbitofrontal Cortex: A Neural Circuit for Economic
Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map Making:
1226-1238.e8. https://doi.org/10.1016/j.neuron.2020.06.030
Peer, M., Brunec, I. K., Newcombe, N. S., & Epstein, R. A. (2021). Structuring Knowledge with
Cognitive Maps and Cognitive Graphs. Trends Cogn. Sci., 25(1), 37–54.
https://doi.org/10.1016/j.tics.2020.10.004
Place, R., Farovik, A., Brockmann, M., & Eichenbaum, H. (2016). Bidirectional prefrontal-
994. https://doi.org/10.1038/nn.4327
56
Preston, A. R., & Eichenbaum, H. (2013). Interplay of hippocampus and prefrontal cortex in
Price, J. L. (2007). Definition of the orbital cortex in relation to specific connections with limbic
and visceral structures and other cortical regions. Ann. N. Y. Acad. Sci., 1121, 54–71.
https://doi.org/10.1196/annals.1401.008
Quent, J. A., Henson, R. N., & Greve, A. (2021). A predictive account of how novelty influences
https://doi.org/10.1016/j.nlm.2021.107382
Radulescu, A., Daniel, R., & Niv, Y. (2016). The effects of aging on the interaction between
https://doi.org/10.1037/pag0000112
Radulescu, A., Niv, Y., & Ballard, I. (2019). Holistic Reinforcement Learning: The Role of
https://doi.org/10.1016/j.tics.2019.01.010
Ranganath, C., & Hsieh, L.-T. (2016). The hippocampus: A special place for time. Ann. N. Y. Acad.
during encoding and recall of naturalistic events. Nature Communications, 14(1), 1279.
https://doi.org/10.1038/s41467-023-36805-5
Reder, L. M., Paynter, C., Diana, R. A., Ngiam, J., & Dickison, D. (2007). Experience is a Double-
57
Familiarity. In Psychology of Learning and Motivation (Vol. 48, pp. 271–312). Elsevier.
https://doi.org/10.1016/S0079-7421(07)48007-0
Reggev, N., Bein, O., & Maril, A. (2016). Distinct Neural Suppression and Encoding Effects for
https://doi.org/10.1162/jocn_a_00994
Rescorla, R. A. (1988). Pavlovian conditioning. It’s not what you think it is. The American
https://doi.org/10.1101/gr.110528.110
Ribas-Fernandes, J. J. F., Solway, A., Diuk, C., McGuire, J. T., Barto, A. G., Niv, Y., & Botvinick, M.
370–379. https://doi.org/10.1016/j.neuron.2011.05.042
Richards, B. A., Xia, F., Santoro, A., Husse, J., Woodin, M. A., Josselyn, S. A., & Frankland, P. W.
(2014). Patterns across multiple memories are identified over time. Nat. Neurosci.,
Robin, J., & Moscovitch, M. (2017). Details, gist and schema: Hippocampal–neocortical
interactions underlying recent and remote episodic and spatial memory. Current
https://doi.org/10.1016/j.cobeha.2017.07.016
Rouhani, N., & Niv, Y. (2021). Signed and unsigned reward prediction errors dynamically
58
Rouhani, N., Norman, K. A., & Niv, Y. (2018). Dissociable effects of surprising rewards on
learning and memory. J. Exp. Psychol. Learn. Mem. Cogn., 44(9), 1430–1443.
https://doi.org/10.1037/xlm0000518
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create
https://doi.org/10.1016/j.cognition.2020.104269
Rumelhart, D. E., & Ortony, A. (1977). The Representation of Knowledge in Memory. Schooling
Rushworth, M. F. S., Noonan, M. P., Boorman, E. D., Walton, M. E., & Behrens, T. E. (2011).
Frontal cortex and reward-guided learning and decision-making. Neuron, 70(6), 1054–
1069. https://doi.org/10.1016/j.neuron.2011.05.014
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants.
Saffran, J. R., & Wilson, D. P. (2003). From syllables to syntax: Multilevel statistical learning by
https://doi.org/10.1207/s15327078in0402_07
Schank, R., & Abelson, R. (1977). Scripts. In Scripts Plans Goals and Understanding: An inquiry
005
Schapiro, A. C., Gregory, E., Landau, B., McCloskey, M., & Turk-Browne, N. B. (2014). The
necessity of the medial temporal lobe for statistical learning. J. Cogn. Neurosci., 26(8),
1736–1747. https://doi.org/10.1162/jocn_a_00578
59
Schapiro, A. C., Rogers, T. T., Cordova, N. I., Turk-Browne, N. B., & Botvinick, M. M. (2013).
Schapiro, A. C., & Turk-Browne, N. (2015). Statistical Learning. Brain Mapping: An Encyclopedic
Schapiro, A. C., Turk-Browne, N. B., Botvinick, M. M., & Norman, K. A. (2017). Complementary
reconciling episodic memory with statistical learning. Philos. Trans. R. Soc. Lond. B Biol.
Schapiro, A. C., Turk-Browne, N. B., Norman, K. A., & Botvinick, M. M. (2016). Statistical learning
https://doi.org/10.1002/hipo.22523
Schiller, D., Eichenbaum, H., Buffalo, E. A., Davachi, L., Foster, D. J., Leutgeb, S., & Ranganath, C.
(2015). Memory and space: Towards an understanding of the cognitive map. Journal of
https://doi.org/10.1038/ncomms9151
Schuck, N. W., Cai, M. B., Wilson, R. C., Niv, Y., Schuck, N. W., Cai, M. B., Wilson, R. C., & Niv, Y.
(2016). Human Orbitofrontal Cortex Represents a Cognitive Map of State Space Article
60
Human Orbitofrontal Cortex Represents a Cognitive Map of State Space. Neuron, 91(6),
1402–1412. https://doi.org/10.1016/j.neuron.2016.08.019
Schuck, N. W., Wilson, R., & Niv, Y. (2018). A state representation for reinforcement learning
812098-9.00012-7
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward.
Sharp, P. B., Dolan, R. J., & Eldar, E. (2021). Disrupted state transition learning as a
https://doi.org/10.1017/S0033291721003846
Sharp, P. B., Russek, E. M., Huys, Q. J., Dolan, R. J., & Eldar, E. (2022). Humans perseverate on
https://doi.org/10.7554/eLife.74402
Sharpe, M. J., Batchelor, H. M., Mueller, L. E., Yun Chang, C., Maes, E. J. P., Niv, Y., &
https://doi.org/10.1038/s41467-019-13953-1
Sharpe, M. J., Chang, C. Y., Liu, M. A., Batchelor, H. M., Mueller, L. E., Jones, J. L., Niv, Y., &
https://doi.org/10.1038/nn.4538
61
Sherman, B. E., Graves, K. N., & Turk-Browne, N. B. (2020). The prevalence and importance of
statistical learning in human cognition and behavior. Curr Opin Behav Sci, 32, 15–20.
https://doi.org/10.1016/j.cobeha.2020.01.015
Sherman, B. E., & Turk-Browne, N. B. (2020). Statistical prediction of the future impairs episodic
https://doi.org/10.1101/851147
Sherrill, K. R., Molitor, R. J., Karagoz, A. B., Atyam, M., Mack, M. L., & Preston, A. R. (2023).
Generalization of cognitive maps across space and time. Cerebral Cortex, bhad092.
https://doi.org/10.1093/cercor/bhad092
Shin, Y. S., & DuBrow, S. (2021). Structuring Memory Through Inference-Based Event
https://doi.org/10.1037/a0034461
Shteingart, H., Neiman, T., & Loewenstein, Y. (2013). The role of first impression in operant
https://doi.org/10.1037/a0029550
Sinclair, A. H., & Barense, M. D. (2018). Surprise and destabilize: Prediction error influences
https://doi.org/10.1101/lm.046912.117
Sinclair, A. H., Manalili, G. M., Brunec, I. K., Adcock, R. A., & Barense, M. D. (2021). Prediction
62
of the National Academy of Sciences, 118(51), e2117625118.
https://doi.org/10.1073/pnas.2117625118
Solway, A., Diuk, C., Córdova, N., Yee, D., Barto, A. G., Niv, Y., & Botvinick, M. M. (2014).
https://doi.org/10.1371/journal.pcbi.1003779
Spalding, K. N., Jones, S. H., Duff, M. C., Tranel, D., & Warren, D. E. (2015). Investigating the
https://doi.org/10.1523/JNEUROSCI.2767-15.2015
Spalding, K. N., Schlichting, M. L., Zeithamova, D., Preston, A. R., Tranel, D., Duff, M. C., &
https://doi.org/10.1523/JNEUROSCI.2501-17.2018
Squire, L. R., & Alvarez, P. (1995). Retrograde amnesia and memory consolidation: A
https://doi.org/10.1016/0959-4388(95)80023-9
Stalnaker, T. A., Cooch, N. K., & Schoenbaum, G. (2015). What the orbitofrontal cortex does not
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (second edi). MIT
press. https://doi.org/10.1016/S0140-6736(51)92942-X
63
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for
211. https://doi.org/10.1016/S0004-3702(99)00052-1
Sweegers, C. C. G., Takashima, A., Fernández, G., & Talamini, L. M. (2014). Neural mechanisms
Takashima, A., Petersson, K. M., Rutters, F., Tendolkar, I., Jensen, O., Zwarts, M. J.,
humans: A prospective functional magnetic resonance imaging study. Proc. Natl. Acad.
Tavares, R. M., Mendelsohn, A., Grossman, Y., Williams, C. H., Shapiro, M., Trope, Y., & Schiller,
D. (2015). A Map for Social Navigation in the Human Brain. Neuron, 87(1), 231–243.
https://doi.org/10.1016/j.neuron.2015.06.011
Theves, S., Fernandez, G., & Doeller, C. F. (2019). The Hippocampus Encodes Distances in
https://doi.org/10.1016/j.cub.2019.02.035
Theves, S., Fernández, G., & Doeller, C. F. (2020). The hippocampus maps concept space, not
https://doi.org/10.1523/JNEUROSCI.0494-20.2020
Theves, S., Neville, D., Fernández, G., & Doeller, C. F. (2021). Learning and Representation of
https://doi.org/10.1523/JNEUROSCI.0657-21.2021
64
Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55(4), 189–208.
https://doi.org/10.1037/h0061626
Tomov, M. S., Yagati, S., Kumar, A., Yang, W., & Gershman, S. J. (2020). Discovery of hierarchical
https://doi.org/10.1371/journal.pcbi.1007594
Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational
Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228–241.
https://doi.org/10.1016/j.neuron.2017.09.005
Tompary, A., Zhou, W., & Davachi, L. (2020). Schematic memories develop quickly, but are not
020-73952-x
Trope, Y., & Liberman, N. (2003). Temporal construal. Psychological Review, 110(3), 403–421.
https://doi.org/10.1037/0033-295x.110.3.403
Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., &
https://doi.org/10.1126/science.1135935
65
Tse, D., Takeuchi, T., Kakeyama, M., Kajii, Y., Okuno, H., Tohyama, C., Bito, H., & Morris, R. G. M.
Turk-Browne, N. B., Jungé, J., & Scholl, B. J. (2005). The automaticity of visual statistical
https://doi.org/10.1037/0096-3445.134.4.552
Uylings, H. B. M., Sanz-Arigita, E. J., de Vos, K., Poolc, C. W., Evers, P., & Rajkowska, G. (2010). 3-
postmortem MRI Author links open overlay panel. Psychiatry Research: Neuroimaging,
183(1), 1–20.
Vaidya, A. R., & Badre, D. (2022). Abstract task representations for inference and control.
van Kesteren, M. T. R., Beul, S. F., Takashima, A., Henson, R. N., Ruiter, D. J., & Fernández, G.
(2013). Differential roles for medial prefrontal and medial temporal cortices in schema-
2359. https://doi.org/10.1016/j.neuropsychologia.2013.05.027
van Kesteren, M. T. R., Fernández, G., Norris, D. G., & Hermans, E. J. (2010). Persistent schema-
postencoding rest in humans. Proc. Natl. Acad. Sci. U. S. A., 107(16), 7550–7555.
https://doi.org/10.1073/pnas.0914892107
van Kesteren, M. T. R., Rijpkema, M., Ruiter, D. J., & Fernández, G. (2010). Retrieval of
66
prefrontal activity and connectivity. J. Neurosci., 30(47), 15888–15894.
https://doi.org/10.1523/JNEUROSCI.2674-10.2010
van Kesteren, M. T. R., Ruiter, D. J., Fernández, G., & Henson, R. N. (2012). How schema and
https://doi.org/10.1016/j.tins.2012.02.001
Varga, N., Morton, N., & Preston, A. (2022). Schema, Inference, and Memory. In M. J. Kahana &
A. D. Wagner (Eds.), The Oxford Handbook of Human Memory. Oxford University Press.
https://doi.org/10.31234/osf.io/m9adb
Viganò, S., & Piazza, M. (2020). Distance and Direction Codes Underlie Navigation of a Novel
https://doi.org/10.1523/JNEUROSCI.1849-19.2020
Wang, S.-H., Tse, D., & Morris, R. G. M. (2012). Anterior cingulate cortex in schema assimilation
https://doi.org/10.1242/jeb.187971
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G., & Niv, Y. (2014). Orbitofrontal cortex as a
https://doi.org/10.1016/j.neuron.2013.11.005
Wittmann, B. C., Bunzeck, N., Dolan, R. J., & Düzel, E. (2007). Anticipation of novelty recruits
194–202. https://doi.org/10.1016/j.neuroimage.2007.06.038
67
Yacoby, A., Reggev, N., & Maril, A. (2021). Examining the transition of novel information toward
https://doi.org/10.1016/j.neuropsychologia.2021.107993
Yacoby, A., Reggev, N., & Maril, A. (2023). Lack of source memory as a potential marker of early
https://doi.org/10.1016/j.neuropsychologia.2023.108569
Yassa, M. A., & Stark, C. E. L. (2011). Pattern separation in the hippocampus. Trends Neurosci.,
Zacks, J. M. (2020a). Event Perception and Memory. Annual Review of Psychology, 71(1), 165–
191. https://doi.org/10.1146/annurev-psych-010419-051101
Zacks, J. M. (2020b). Event Perception and Memory. Annu. Rev. Psychol., 71, 165–191.
https://doi.org/10.1146/annurev-psych-010419-051101
Zeithamova, D., Mack, M. L., Braunlich, K., Davis, T., Seger, C. A., van Kesteren, M. T. R., & Wutz,
https://doi.org/10.1523/JNEUROSCI.1166-19.2019
Zhao, J., AI-Aidroos, N., & Turk-Browne, N. B. (2013). Attention is spontanesouly biased toward
https://doi.org/10.1177/0956797612460407.Attention
Zhou, J., Gardner, M. P. H., & Schoenbaum, G. (2021). Is the core function of orbitofrontal
cortex to signal values or make predictions? Curr Opin Behav Sci, 41, 1–9.
https://doi.org/10.1016/j.cobeha.2021.02.011
68
Zhou, J., Gardner, M. P. H., Stalnaker, T. A., Ramus, S. J., Wikenheiser, A. M., Niv, Y., &
Dissociable Representations of Value and Task Structure in an Odor Sequence Task. Curr.
Zhou, J., Jia, C., Montesinos-Cartagena, M., Gardner, M. P. H., Zong, W., & Schoenbaum, G.
Zhou, J., Montesinos-Cartagena, M., Wikenheiser, A. M., Gardner, M. P. H., Niv, Y., &
Hippocampus and Orbitofrontal Cortex during an Odor Sequence Task. Curr. Biol.,
Zhou, J., Zong, W., Jia, C., Gardner, M. P. H., & Schoenbaum, G. (2021). Prospective
527. https://doi.org/10.1037/bne0000451
69