Professional Documents
Culture Documents
Hanting Chen1,2, Yunhe Wang2, Chunjing Xu2, Boxin Shi1, Chao Xu1, Qi Tian2, Chang Xu3
1Peking University; 2Huawei Technologies; 3The University of Sydney
AdderNet: Do We Really Need Multiplications in Deep Learning? CVPR 2020 conference paper
Background
• High-power consumption of deep networks has blocked modern deep learning systems from being
deployed on mobile devices, e.g. smart phone, camera, and watch.
• Existing methods focus on pruning, quantization, etc. while compressed networks still involve massive
multiplications.
CNN:
X
AdderNet:
cross-correlation
• The gradients of AdderNets are sign gradient (±1), which is unsuitable to optimize the neural networks of a huge
number of parameters.
• To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the
full-precision gradient.
• Besides, the norms of gradients of filters in AdderNets are much smaller than that in CNNs, which could slow down
the update of filters in AdderNets.
• We propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the
magnitude of each neuron's gradient.
• As a result, the proposed AdderNets achieve comparable performance with CNNs with few
multiplications on the CIFAR and ImageNet dataset.
• We visualize filters, features and distribution of weights for AdderNets (left) and CNNs (right).
Mail: chenhanting@pku.edu.cn
Github: https://github.com/huawei-noah/AdderNet
AdderNet: Do We Really Need Multiplications in Deep Learning? CVPR 2020 conference paper