Nov9 2023 Perspective - Paper-3

1 Towards a new model of neuronal computation
2 ++
3 please add your name, alphabetically for now
4 November 9, 2023
5 Abstract
7 1 Introduction (Matthew and Wolfgang)
8 General strategy: instead of starting with how great the brain is - instead focus on the computational
9 problems to be solved, that they are not well solved by current ANNs, and how we can learn from the
10 solutions developed by the brain
11 Explain that it is remarkable that we can express complex biophysical events in terms of simple models
12 [Matthew: If you walk into an AI meeting and say NMDA spikes explain everything and your model of
13 the neuron is wrong - they will say you are mad! We are not going to change our ANN units based on
14 your opinion alone. . .
15 A dendritic expert walks into a bar at the Neurips meeting. The bartender asks: “what do you want’ ?
16 The expert says “A network model with NMDA spikes”. The bartender says “we don’t serve that kind
17 of exotic spirit in here”]
18 2 Goal: SSL on unstructured / multimodal data streams (Robert

19 and Wolfgang)
20 • Neural networks are handling inhomogeneous but structured data (e.g. text)
21 • The brain, in contrast, is exposed to “the world” which features highly multimodal data
22 • If you intermingle audio, text and visual inputs most ANN architectures run into trouble (i.e. can’t
23 handle mixed/interdependent simultaneous stimuli) - Alice: “Everyone knows this is a weakness
24 and they are looking for a solution” (Robert - cite a few recent reviews which points to the - current!
25 - deficiencies of LLMs)
26 • Evolutionary argument: why did evolution converge towards this solution to information processing
27 in the cortex? Why is the problem solved in this specific way? [Part of the answer is the layering of
28 cortex; the need to incorporate loops; etc]. Mammalian brains vs fish: different worlds. Our brains
29 (human brains) are specialized in the terrestrial world.
1
30 • In LLMs it is hard to add or delete information (have to be re-trained on giant data sets which are
31 only accessible to a few companies). (describe this more concretely, in order to prepare the ground
32 for sections 3 and 4)
33 • Energy-effciency issues (which arise when most units of the network are more or less active for all
34 computations)
35 • Fast learning issues, e.g. updating or complementing learnt knowledge from new experience.
36 Recent work has started to investigate the deficiencies and limitations of current large language models
37 (LLMs). LLMs such as the generative pretrained transformer (GPT) are based on the transformer
38 architecture – a deep neural network with a self-attention mechanism as the core computational operation.
39 In contrast to recurrent neural networks (RNNs) where the input tokens are processed sequentially and
40 information of past inputs is kept as a context in the activity state of the RNN, transformers process
41 the whole input token sequence at once. In a deep stack of layers, each layer enriches each token with
42 context information about other tokens through the self-attention mechanism. While this architecture
43 has important advantages in terms of trainability and parallelization, it also limits its expressiveness
44 [Hahn, 2020, Tran et al., 2018, Dehghani et al., 2018]. In particular, since transformers cannot process
45 input sequentially, they cannot maintain an internal state for arbitrary time spans as the input sequence
46 progresses. As a result, transformers do not generalize well to input lengths not encountered during
47 training [Dehghani et al., 2018]. Other results indicate that transformers are less suited than RNNs to
48 model hierarchical structure [Tran et al., 2018]. These claims have been backed up by theoretical results
49 [Hahn, 2020].
50 From the more practical side, there have been many investigations regarding the quality of the responses
51 of LLMs [Goertzel, 2023]. A general weakness that has attracted attention is their tendency to hallucinate
52 [McKenna et al., 2023, Azaria and Mitchell, 2023]. Further research indicates that LLMs show limited
53 ability to construct and use world models, a key aspect to human cognition [Aflalo et al., 2022]. While
54 there is evidence for a detailed world model in limited setups such as a GPT model trained on the
55 Othello board game [Li et al., 2022], ChatGPT fails on basic temporal and spatial reasoning and inference
56 problems in toy domains [Bang et al., 2023].
57 Multi-scale temporal structure of data and network
58 • Brain can deal with multiscale spatial and temporal dependencies
59 • Neuromorphic chips depend on spike timescales

60 • AI needs short latency for making decisions, but the timescale of behaviour is very broad - millisec-
61 onds to minutes to days to years
62 • Need multiscale plasticity rules that take this into account
63 3 Event-based energy-efficient processing with different types

64 of events (Arnd, Guozhang, Wolfgang)
65 • Everyone in the neuromorphic community uses spikes as events
66 • In biological brains you different have types of (dendritic) events - NMDA spikes, Ca spikes, BNDF
67 release etc at different scales
2
68 • Contrast the AMPA vs NMDA models of signalling
69 • Leave open how different types of events are implemented. Allow for example that different
70 types/groups of event occupy a particular phase of an oscillation, or a particular frequency band
71 (since building dendritic arbor in NMHW tends to be costly).
72 • Give an energy budget for different types of (dendritic) signals?
73 • Are dendrites key to the energy efficiency of the brain?
74 • Focus also on different types of neurons specialized for regulating different types of signals. E.g.
75 interneurons involved in E/I regulations - point out that pyramidal cell firing is strongly regulated
76 by firing of specific interneurons which effectively decide when it should fire
77 • role of bursts (as a separate type of event)
78 4 Intertwining computation and learning with dendrites (Jackie,

79 Michael, Arnd, Guozhang)
80 • Specific mechanisms that allow computation and learning to be combined
81 • Interweave computational learning and STORAGE (including recall mechanisms)

82 • Dendritic gating of plasticity and processing (Robert)
83 • Segregation of processing and learning rules in different branches / dendritic regions (Jackie)
84 Dendritic Computation and Learning Integration

85 Dendrites, far from being passive conduits, actively engage in the simultaneous processes of computation
86 and learning. The intertwining of these functions within dendritic arbors serves as a muse for machine
87 learning. By emulating this seamless integration, artificial intelligence models can harness the power
88 of real-time computation while dynamically adapting to new patterns, much like the brain’s ability to
89 encode and retrieve information swiftly.
90 Segregation of Processing and Learning Rules

91 The revelation of processing and learning rule specialization within dendritic branches offers a compelling
92 analogy to the concept of domain adaptation in machine learning. Just as different branches in dendritic
93 structures are tailored to specific information processing tasks, domain adaptation techniques aim to
94 optimize machine learning models for different data domains.
95 In this paradigm, each specialized branch mirrors a distinct data domain, and the unique learning al-
96 gorithms within these branches align with domain-specific characteristics. This specialization allows
97 machine learning models to efficiently adapt their processing strategies when confronted with diverse
98 datasets, reminiscent of how dendritic branches adjust to different types of information.
99 By adopting a similar approach to domain adaptation, where models are equipped with specialized
100 learning strategies for each domain, machine learning systems can optimize their performance across a
101 range of diverse datasets. This tailored adaptation not only ensures efficient resource utilization but also
102 fosters the creation of highly efficient, domain-specific machine learning systems capable of thriving in
103 varied and complex real-world scenarios.
3
104 Distinctive Hierarchical Computation in Dendrites and Neural Networks: A Paradigm Shift
105 in Multi-Scale Learning
106 Dendritic structures within the brain exhibit a unique form of hierarchical organization, where information
107 undergoes intricate processing through multiple layers of dendrites before reaching the neuron’s soma or
108 cell body. This multi-scale hierarchical computation, specific to dendrites, sets it apart from network-level
109 hierarchical computation prevalent in machine learning models. While network-level hierarchies involve
110 layered architectures, dendritic hierarchies encompass complex interactions within individual neurons,
111 emphasizing a more granular and multi-scale approach to information processing.
112 This intricate hierarchical processing observed in dendrites serves as an innovative departure point for the
113 design of deep learning architectures in machine learning. By understanding and emulating the brain’s
114 multi-scale dendritic hierarchical learning, artificial intelligence systems can transcend traditional limi-
115 tations. This approach empowers AI to learn intricate abstract features and decipher complex patterns,
116 essential for applications in image recognition, natural language processing, and various cognitive tasks.
117 Integrating dendritic-inspired hierarchies with network-level architectures allows machine learning models
118 to operate across different scales, enabling them to delve deeper into the nuances of data. This multi-scale
119 hierarchical perspective enhances their ability to grasp intricate structures and patterns, ensuring a more
120 comprehensive and nuanced understanding of complex datasets.
121 Spatial structure of synaptic activation and plasticity rules (Jackie)
122 • Point to the fundamental fact that there is a metric on input channels to a biological neuron
123 (implemented through vicinity on dendrites), whereas neurons in ANNs do not have such a metric.
124 • Point out that this metric is subject to plasticity in the brain (rewiring, spine motility)
125 • Highlight that machine learning approaches are very inefficient in their use of synapses to store
126 information (i.e. they use “too many synapses” - Jackie)
127 • Introduce the idea that synaptic input has structure in the spatial and temporal domain - e.g.
128 clustering of active inputs
129 • Plasticity rules that take spatial and temporal adjacency of active synapses to account
130 • Enable continual learning
131 Plasticity rules are fundamental mechanisms used by both biological and artificial neural networks, to
132 govern the modification of synaptic strength based on the activity of connected neurons. These rules
133 are crucial for learning and memory formation in mammalian brains and are used in ANNs for updating
134 weights of connections to learn new tasks.
135 While ANNs can reach high performance levels, sometimes surmounting human experts (REF XXX), they
136 differ from mammalian brains in their learning capabilities in two main aspects. First, while they excel
137 in supervised learning under defined conditions usually needing a large training data sets, they show
138 difficulties and many times fail under more flexible environments which are unsupervised and require
139 continual learning. In contrast, the human brain is designed for fast unsupervised learning with only few
140 examples in ever changing environments. Second, the basic learning rules fundamentally differ between
141 the biological and artificial neural networks. In ANNs the “neuron” simply applies a weighted sum
142 of its inputs, adds a bias, and then passes it through an activation function to produce the neuron’s
143 output. During the training process the output function is updated by adjusting these weights using
144 different methods such as backward propagation, so that the network can accurately map input data to
145 the correct output. During this process the vast majority of connections are changed using the same
4
146 plasticity rule since all the inputs to the neurons are equivalently treated. In contrast, in the biological
147 NN such as the mammalian cortical networks, the basic computational unit, the pyramidal neuron, is
148 equipped with a set of smart and diverse plasticity rules that can be implemented preferentially on its
149 different inputs.
150 The different plasticity rules are controlled by three unique principles which contribute to the efficiency,
151 speed and continual capabilities of the cortical network to learn in diverse environments:
152 First, not all inputs are treated alike. Which plasticity rule is implemented depends on the spatial and
153 temporal adjacency of active synapses and their location on the dendritic tree. Different synapses are
154 modified with different plasticity rules as a function of their absolute and relative location on the tree.
155 This dependency on the spatial arrangement of synapses is due to the fact that the voltage in dendrites
156 is non uniform thus synapses positioned in different dendritic locations will experience different degrees
157 of depolarizations, for example in the case of back propagating action potentials (BAPS) which are main
158 mediators of voltage in spike timing dependent plasticity (STDP). Voltage non uniformities can also be
159 generated when clustered inputs are activated in specific dendritic compartments. When a small number
160 of spatially clustered inputs are activated, especially in more distal dendritic compartments, dendritic
161 spikes, such as NMDA spikes, can be locally generated and their voltage can significantly impact which
162 synapses will undergo plasticity, selective strengthening inputs whose activity is correlated in time and
163 spatially clustered. Importantly, inputs from different sources are not randomly distributed along the
164 dendritic tree but tend to terminate in a location-selective manner. Thus dendritic location reflect the
165 information carried by synapses which can be strengthen selectively and unique plasticity rules can be
166 potentially used for different information sources.
167 Second principle of synaptic plasticity in biological NNs is that in contrast to ANNs where most synapses
168 undergo plasticity changes during learning of a task in cortical pyramidal neurons, only a small number
169 of synapses relevant to the learning task will be changed thus learning is involving sparse changes in
170 the network which are specific, and energy efficient. These specific changes can be realized using local
171 plasticity rules which are applied by local signals such as dendritic NMDA spikes. In addition, local
172 structural plasticity mechanisms support these sparse compartmentalized plasticity changes. Learning
173 was shown to be associated with addition of new spatially clustered spines. These structural changes
174 involve the addition of only a small number of new spines that are task dependent and tend to cluster at
175 specific dendritic branches which are involved with the learning.
176 Third, the different plasticity rules in pyramidal neurons involve a range of temporal windows extend-
177 ing from few milliseconds to several seconds bridging up to behaviorally relevant time windows. These
178 temporal rules depend on different voltage signals in dendrites ranging from fast voltage signals such as
179 back propagating action potentials to local dendritic spiking and plateau potential events which can be
180 prolonged ranging from few tens of milliseconds up to seconds. The mechanisms for bridging prolonged
181 time windows are inherent to dendritic mechanisms and include recruitment of calcium from intracellular
182 stores, or prolonged (second long) binding of glutamate to NMDA receptors, or some other biochemi-
183 cal cascades. These temporal rules heavily interact with the spatial location of synapses thus different
184 temporal rules may apply to different synapses. In addition, the different time scales of plasticity allow
185 for different causal rules of plasticity ranging from Hebbian rules where synapses are either potenti-
186 ated or depressed according to the causal order of pre- and postsynaptic activation within a narrow,
187 milliseconds-long, time-window, to non-Hebbian rules where local dendritic spikes serve to potentiate
188 spatially clustered inputs within some hundreds of milliseconds time window and up to several seconds
189 irrespective of their order of activation. These non-Hebbian rules can not only serve to bridge prolonged
190 time windows but also have the capacity to induce fast learning after within only few exposures and are
191 also sensitive to neuromodulation providing the relevant context for the change.
192 Taken together, biological cortical neurons contain different plasticity compartments with diverse plas-
5
193 ticity rules that may serve to increase the processing capabilities of the network in terms of flexibility,
194 speed and the ability for continual learning without experiencing catastrophic forgetfulness. The different
195 plasticity rules can serve to support different learning and behavioral purposes. While Hebbian STDP
196 rules can serve for learning causality when accurate timing of events is important (episodic memory),
197 non Hebbian plasticity can serve for encoding context when associations are important regardless of their
198 temporal sequence.
199 Recent work in the field of ANNs is designed to overcome some important limitations such as adaptive
200 learning using multiple methods some of which are biologically inspired (REF XXX). For example, intro-
201 ducing autoencoders to improve efficiency, using “replay mechanisms” which takes data from previously
202 learnt task and interweaving them with new training data to overcome catastrophic forgetting or treating
203 each neuron as deep artificial neuron (DAN) in a larger neural network to mimic some of the complex
204 behaviors of real neurons and increase the learning capacity. Few very recent ANNs studies have suc-
205 cessfully used a dendrite-inspired configuration of neurons to solve the interference problem of multi-task
206 learning (Levy and Baxter, 2023; Ahmed XXX; Poirazi). Here we propose that using biologically in-
207 spired plasticity rules could help further improve ANNs efficiency, speed and coping with learning in ever
208 changing complex environments making them more versatile and intelligent.
209 Guozhang take for the last paragraph: The recent advances in the field of AI, particularly in
210 continual learning, have drawn inspiration from the brain’s optimized-based approach, primarily from
211 replay-based methods and architecture-based strategies. Replay-based methods, such as experience re-
212 play, generative replay, and feature replay, preserve old data distributions and prevent catastrophic
213 forgetting by storing limited past training samples and utilizing generative models or feature-level dis-
214 tributions. Architecture-based approaches focus on incorporating task-specific parameters, employing
215 techniques like parameter allocation, model decomposition, and modular networks. In addition to these
216 approaches, here we propose that AI can be inspired by the brain’s biological plasticity rules. Unlike
217 traditional ANNs where most synapses undergo uniform changes during learning, the brain employs spa-
218 tially and temporally diverse plasticity rules. These rules allow for selective strengthening of inputs that
219 are spatially clustered and temporally correlated, enabling efficient learning in continually changing envi-
220 ronments. Moreover, biological neurons exhibit sparse and specific changes during learning, emphasizing
221 relevant synapses. These unique plasticity rules bridge different time scales, accommodating both fast
222 and prolonged learning events. These approaches, influenced by the brain’s natural learning mechanisms,
223 hold the potential to make artificial systems more versatile and intelligent, adapting to new challenges in
224 real-time.
225 5 Perspective: a new neuron model (Yiota, Guozhang, Matthew,

226 Robert)
227 • A new model is emerging which appears to provide solutions to many of these problems which may
228 provide a starting point for further research
229 • Need to justify this solution with a bottom-up argument
230 • A “wish list” of things we would want AI / neuromorphic engineers to implement

231 • Outline a concrete research program?
232 • Propose a specific model that can be used and tested for specific simulations?
233 • “Ignite the fantasy of machine learning people” (Wolfgang)
6
234 • List most important components of dendritic processing along with their respective functionalities
235 (e.g., use different “events” to bridge over different temporal scales, use the multiplicative “calcium
236 spike” to bind input streams in a selective manner) that people can combine in their models (Yiota)
237 • Probably best to avoid proposing an actual, concrete model (since this is easily attackable/falsifiable)
238 • Set the groundwork for future reviewers to assess “dendritic” models
239 References
240 [Aflalo et al., 2022] Aflalo, T., Chivukula, S., Zhang, C., Rosario, E. R., Pouratian, N., and Andersen,
241 R. A. (2022). Cognition through internal models: Mirror neurons as one manifestation of a broader
242 mechanism. BioRxiv, pages 2022–09.
243 [Azaria and Mitchell, 2023] Azaria, A. and Mitchell, T. (2023). The internal state of an llm knows when
244 its lying. arXiv preprint arXiv:2304.13734.
245 [Bang et al., 2023] Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z.,
246 Yu, T., Chung, W., et al. (2023). A multitask, multilingual, multimodal evaluation of chatgpt on
247 reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
248 [Dehghani et al., 2018] Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., and Kaiser, L. (2018). Uni-
249 versal transformers. arXiv preprint arXiv:1807.03819.
250 [Goertzel, 2023] Goertzel, B. (2023). Generative ai vs. agi: The cognitive strengths and weaknesses of
251 modern llms. arXiv preprint arXiv:2309.10371.
252 [Hahn, 2020] Hahn, M. (2020). Theoretical limitations of self-attention in neural sequence models. Trans-
253 actions of the Association for Computational Linguistics, 8:156–171.
254 [Li et al., 2022] Li, K., Hopkins, A. K., Bau, D., Viégas, F., Pfister, H., and Wattenberg, M. (2022).
255 Emergent world representations: Exploring a sequence model trained on a synthetic task. arXiv
256 preprint arXiv:2210.13382.
257 [McKenna et al., 2023] McKenna, N., Li, T., Cheng, L., Hosseini, M. J., Johnson, M., and Steedman,
258 M. (2023). Sources of hallucination by large language models on inference tasks. arXiv preprint
259 arXiv:2305.14552.
260 [Tran et al., 2018] Tran, K., Bisazza, A., and Monz, C. (2018). The importance of being recurrent for
261 modeling hierarchical structure. arXiv preprint arXiv:1803.03585.

Nov9 2023 Perspective - Paper-3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nov9 2023 Perspective - Paper-3

Uploaded by

Copyright:

Available Formats

1 Towards a new model of neuronal computation

7 1 Introduction (Matthew and Wolfgang)

18 2 Goal: SSL on unstructured / multimodal data streams (Robert

57 Multi-scale temporal structure of data and network

58 • Brain can deal with multiscale spatial and temporal dependencies

59 • Neuromorphic chips depend on spike timescales

63 3 Event-based energy-efficient processing with different types

78 4 Intertwining computation and learning with dendrites (Jackie,

81 • Interweave computational learning and STORAGE (including recall mechanisms)

84 Dendritic Computation and Learning Integration

90 Segregation of Processing and Learning Rules

121 Spatial structure of synaptic activation and plasticity rules (Jackie)

225 5 Perspective: a new neuron model (Yiota, Guozhang, Matthew,

230 • A “wish list” of things we would want AI / neuromorphic engineers to implement

233 • “Ignite the fantasy of machine learning people” (Wolfgang)

You might also like