Professional Documents
Culture Documents
Deep Continual Learning: Jawad Tariq Supervised By: Dr. Mohsen Ali
Deep Continual Learning: Jawad Tariq Supervised By: Dr. Mohsen Ali
Jawad Tariq
Continual learning: algorithms that allow to keep learning new tasks online, by adding
new knowledge to the model without sacrificing the previously acquired one.
T1 T1
T1
T
T23 T3 T2 T3
T2
T4 T4 T4
T5 T5 T5
Head for T1
Head for T3
Multi-head architecture Single-head architecture
6 6
Continual learning: Timeline
DIFFERENT STRATEGIES
• Model growing
• Replay …but the same goal:
• Knowledge Distillation Protect (either explicitly or implicitly) important
• Regularization parameters for prior tasks!
• Parameter isolation
Deep Continual Learning 7 7
Continual Learning Desiderata
1. Avoid forgetting
• Performance over previous tasks should not decrease
2. Fixed memory and compute
• If not possible, grow sub-linearly with tasks
3. Enable forward transfer
• Knowledge acquired over previous tasks should help learning future tasks
4. Enable backward transfer
• While learning the current task, performance in previous tasks may also increase
5. Do not store examples
• Or store as few as possible
Increase the model capacity for Use Multiple approaches at the same
every new task time to implement the algorithm
KNOWLEDGE
REGULARIZATION REHEARSAL
DISTILLATION
Penalize (some) Store old inputs and
Use the model in a
parameter replay them to the
previous training
variations model.
state as a teacher
• Drawbacks:
1. The model grows linarly with the number of trained tasks
1
1
1. Avoid forgetting
2. Fixed memory and compute
3. Enable forward transfer
4. Enable backward transfer
5. Do not store examples
• A representation learning step that uses the exemplars in combination with distillation to
avoid catastrophic forgetting.
Combination of classification loss for new samples and distillation loss for old samples is used.
Rebuffi, S. A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2017). icarl: Incremental classifier and representation learning. In Proceedings of the IEEE
conference on Computer Vision and Pattern Recognition (pp. 2001-2010).
• Drawbacks:
1. These methods stores examples.
14
1. Avoid forgetting
2. Fixed memory and compute
3. Enable forward transfer
4. Enable backward transfer
5. Do not store examples
•
The posterior over weights after the first task becomes the prior over
weights during the second task
• Other variants have been proposed, changing the way in which penalties for
each parameter are computed
• Zenke, Friedemann, Ben Poole, and Surya Ganguli. "Continual learning through synaptic intelligence." International
Conference on Machine Learning (2018)
• Aljundi, Rahaf, et al. "Memory aware synapses: Learning what (not) to forget." Proceedings of the European
Conference on Computer Vision (ECCV). 2018.
• Chaudhry, Arslan, et al. "Riemannian walk for incremental learning: Understanding forgetting and
intransigence." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
• For each task, there is task specific private module and task independent shared module.
Shared module is trained using Generative modeling with generator trying to learn a task invariant representation.
Both shared and task specific private features are concatenated and passed through a task specific head.
Ebrahimi, S., Meier, F., Calandra, R., Darrell, T., & Rohrbach, M. (2020). Adversarial continual learning.
In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020,
Proceedings, Part XI 16 (pp. 386-402). Springer International Publishing.
T1
T2
T3
T4
T5
ICARL 89.29
• Continual Learning is currently a very active research area in Machine Learning research.
Strongly motivated and have many practical applications.
• Study Continual Learning in Graph neural Network settings and to see how existing
methods behave.
No previous work have been done to in this direction.
Goal is to see how to set-up Continual learning on graph dataset.
• Use Multiple Benchmarks with various other evaluation metrices including backward and
forward transferability.
Kirkpatrick, James, et al. "Overcoming catastrophic forgetting in neural networks." Proceedings of the national academy of sciences
114.13 (2017): 3521-3526.
Rebuffi, S. A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2017). icarl: Incremental classifier and representation learning.
In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 2001-2010).
Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12), 2935-
2947.
Rusu, Andrei A., et al. "Progressive neural networks." arXiv preprint arXiv:1606.04671 (2016).
■ https://www.youtube.com/watch?v=k0kMx4BFLmI&t=388s