Professional Documents
Culture Documents
Review Distiiling Knowledge in A Neural Netowrk
Review Distiiling Knowledge in A Neural Netowrk
Why Soft Targets? Hard targets miss lot of valuable information from large model
ex (Hard): 2 => P(2) = 1, P(3) = 0 , P(7) = 0 (MNIST - database of handwritten digits)
ex (Soft): 2 => P(2) = 0.9 , P(3) = 10-6 , P(7) = 10-9 (MNIST - database of handwritten digits)
How to produce a softer probability distribution over classes? Raise the temperature T of final Softmax.
Softmax
• Use same high temperature to train smaller model to match soft targets.
• Preliminary experiments on MNIST
• Experiments on speech recognition
Future Research Directions
Using soft targets to prevent specialists from over fitting
What is good about the paper
• Address a real problem.
• Shows that small concept can help technology to advance.
• Provide plenty of experiments on their idea.
• Prove their idea based on experiments.
• Good introduction with real world examples
What is bad about the paper
• Requires a lot of effort to get a sense of it
• Need extended literature review
Distilling the Knowledge in a Neural Network
Thank You