Professional Documents
Culture Documents
YouTube Playlist
Maziar Raissi
Assistant Professor
fi
maziar.raissi@colorado.edu
Introduction to Convolutional Neural Networks
YouTube Playlist
<latexit sha1_base64="0hPK07QDr9ZadJ6v6irNBezwGaM=">AAACSHicbVBNS8NAEN3Ur1q/oh69LBbBU0lUVDwVeyl4UbG20NSy2WzbpZtN2J2IJeRH+Tf8A+JNL569iTc3tYJWBwYeb94wb54fC67BcZ6swszs3PxCcbG0tLyyumavb1zrKFGUNWgkItXyiWaCS9YADoK1YsVI6AvW9Ie1fN68ZUrzSF7BKGadkPQl73FKwFBd+6zf9WDAgJzgFva4xF5IYOD76WV2k9axBzxkGje/QS3LBbGGCN9Ny4OuXXYqzrjwX+BOQBlN6rxrv3pBRJOQSaCCaN12nRg6KVHAqWBZyUs0iwkdkj5rGyiJsdBJx09neMcwAe5FyrQEPGZ/bqQk1HoU+kaZe9TTs5z8b9ZOoHfcSbmME2CSfh3qJQKbn/MEccAVoyBGBhCquPGK6YAoQsHk/OtKoHNrWckE407H8Bdc71Xcw8r+xUG5ejqJqIi20DbaRS46QlVUR+eogSi6R4/oGb1YD9ab9W59fEkL1mRnE/2qQuETniqyNQ==</latexit>
X 2 RH⇥W ⇥C ! an image
<latexit sha1_base64="67tzYPgWGQJH189svGEeuQIDkhI=">AAACP3icbVDLSsNAFJ3Ud31FXboZLIKrkqioy1I3LlWsLTS1TCbTduhkEmZu1BLyP/6GP+BW/QHdiVt3TmoErV64cDjnPo8fC67BcZ6t0tT0zOzc/EJ5cWl5ZdVeW7/UUaIoa9BIRKrlE80El6wBHARrxYqR0Bes6Q+Pc715zZTmkbyAUcw6IelL3uOUgKG6dr2FPS6xFxIY+H56nl2lJ9gDHjKNm9/gOMOe4v0BEKWiG8OyW0iJxNwMY1nXrjhVZxz4L3ALUEFFnHbtFy+IaBIyCVQQrduuE0MnJQo4FSwre4lmMaFDM7xtoCTmhk46/jXD24YJcC9SJiXgMfuzIyWh1qPQN5X5T3pSy8n/tHYCvaNOymWcAJP0a1EvERginBuHA64YBTEygFDFza2YDogiFIy9v7YEOj8tKxtj3Ekb/oLL3ap7UN0726/U6oVF82gTbaEd5KJDVEMn6BQ1EEV36AE9oifr3nq13qz3r9KSVfRsoF9hfXwCWcmwRw==</latexit>
| {z }
xn
g✓ ! CNN
<latexit sha1_base64="evuGds2hzXJHQciqvnnvMxOCQv8=">AAACHHicbVDJSgNBEO2JW4xb1KMHG4PgKcyoqMdgLp5CBLNAJoSeTiVp0rPQXaOGIUd/wx/wqn/gTbwK/oDfYWc5mMQHBY/3qqiq50VSaLTtbyu1tLyyupZez2xsbm3vZHf3qjqMFYcKD2Wo6h7TIEUAFRQooR4pYL4noeb1iyO/dg9KizC4w0EETZ91A9ERnKGRWtnDbsvFHiCjrhLdHjKlwgfqIjxiUiyVhq1szs7bY9BF4kxJjkxRbmV/3HbIYx8C5JJp3XDsCJsJUyi4hGHGjTVEjPdZFxqGBswH3UzGjwzpsVHatBMqUwHSsfp3ImG+1gPfM50+w56e90bif14jxs5VMxFBFCMEfLKoE0uKIR2lQttCAUc5MIRxJcytlPeYYhxNdjNb2np02jBjgnHmY1gk1dO8c5E/uz3PFa6nEaXJATkiJ8Qhl6RAbkiZVAgnT+SFvJI369l6tz6sz0lryprO7JMZWF+/F72iew==</latexit>
H
<latexit sha1_base64="UNo1T9dTZCrR2zm8p5TSCkm56yE=">AAAB/HicbVDLTsJAFL3FF+ILdemmkZi4Iq0adUl0wxISQRJoyHR6gQnTaTMzNSEN/oBb/QN3xq3/4g/4HU6hCwFPMsnJOffmnjl+zJnSjvNtFdbWNza3itulnd29/YPy4VFbRYmk2KIRj2THJwo5E9jSTHPsxBJJ6HN89Mf3mf/4hFKxSDzoSYxeSIaCDRgl2kjNer9ccarODPYqcXNSgRyNfvmnF0Q0CVFoyolSXdeJtZcSqRnlOC31EoUxoWMyxK6hgoSovHQWdGqfGSWwB5E0T2h7pv7dSEmo1CT0zWRI9Egte5n4n9dN9ODWS5mIE42Czg8NEm7ryM5+bQdMItV8YgihkpmsNh0RSag23SxcCVQWbVoyxbjLNayS9kXVva5eNq8qtbu8oiKcwCmcgws3UIM6NKAFFBBe4BXerGfr3fqwPuejBSvfOYYFWF+/veWVUQ==</latexit>
1 ⇥ 1 Convolution
<latexit sha1_base64="07WnS3P/AiPREp6xTgyMjLuRIV8=">AAACIHicbVBLTsMwFHTKr5RfgSUbixaJVZUUCVhWdMOySPQjtVHlOE5r1bEj26lURTkA1+ACbOEG7BBLOADnwE2zoC0jWRrNvOc3Gi9iVGnb/rIKG5tb2zvF3dLe/sHhUfn4pKNELDFpY8GE7HlIEUY5aWuqGelFkqDQY6TrTZpzvzslUlHBH/UsIm6IRpwGFCNtpGG5Msj+SCTx06oDB5qGREGnCpuCTwWL8ym7ZmeA68TJSQXkaA3LPwNf4DgkXGOGlOo7dqTdBElNMSNpaRArEiE8QSPSN5Qjc9NNsiApvDCKDwMhzeMaZurfjQSFSs1Cz0yGSI/VqjcX//P6sQ5u3YTyKNaE48WhIGZQCzhvBvpUEqzZzBCEJTVZIR4jibA2/S1d8dU8WloyxTirNayTTr3mXNeuHuqVxl1eURGcgXNwCRxwAxrgHrRAG2DwBF7AK3iznq1368P6XIwWrHznFCzB+v4FBUmjYg==</latexit>
Y (i, j) = |{z}
| {z } 0
| {z }
2RC ⇥C
2RC 0 2RC
3 ⇥ 3 Convolution
<latexit sha1_base64="n5CKsI+lGrCsIPvr4FzSVFL2E9Q=">AAACIHicbVBLTsMwFHTKr5RfgSUbixaJVZW0ErCs6IZlkehHaqrKcZzWqmNHtlOpinoArsEF2MIN2CGWcADOgZNmAS0jWRrNvOc3Gi9iVGnb/rQKG5tb2zvF3dLe/sHhUfn4pKtELDHpYMGE7HtIEUY56WiqGelHkqDQY6TnTVup35sRqajgD3oekWGIxpwGFCNtpFG54mZ/JJL4i2oDupqGRMFGFbYEnwkW51N2zc4A14mTkwrI0R6Vv11f4DgkXGOGlBo4dqSHCZKaYkYWJTdWJEJ4isZkYChH5uYwyYIs4IVRfBgIaR7XMFN/byQoVGoeemYyRHqiVr1U/M8bxDq4GSaUR7EmHC8PBTGDWsC0GehTSbBmc0MQltRkhXiCJMLa9Pfniq/SaIuSKcZZrWGddOs156rWuK9Xmrd5RUVwBs7BJXDANWiCO9AGHYDBI3gGL+DVerLerHfrYzlasPKdU/AH1tcPC++jZg==</latexit>
W X X
<latexit sha1_base64="fowvhvbNRw7XNEJtQqPofYe4ch8=">AAAB/HicbVBLSgNBFHwTfzH+oi7dNAbBVZhRUZdBNy4TMB9IhtDT85I06fnQ3SOEIV7Ard7Anbj1Ll7Ac9iTzMIkFjQUVe/xqsuLBVfatr+twtr6xuZWcbu0s7u3f1A+PGqpKJEMmywSkex4VKHgITY11wI7sUQaeALb3vg+89tPKBWPwkc9idEN6DDkA86oNlKj3S9X7Ko9A1klTk4qkKPeL//0/IglAYaaCapU17Fj7aZUas4ETku9RGFM2ZgOsWtoSANUbjoLOiVnRvHJIJLmhZrM1L8bKQ2UmgSemQyoHqllLxP/87qJHty6KQ/jRGPI5ocGiSA6Itmvic8lMi0mhlAmuclK2IhKyrTpZuGKr7Jo05IpxlmuYZW0LqrOdfWycVWp3eUVFeEETuEcHLiBGjxAHZrAAOEFXuHNerberQ/rcz5asPKdY1iA9fUL1c2VYA==</latexit>
<latexit sha1_base64="T0c71ZzNkzcOSZLEKrEhjHgcMM4=">AAAC93icbVHLjtMwFHXCayiPKbBkc0WFWjSlSgoCNkgjumE5IDotakpwHLd1x3Ei2wEqy1u28AfsEFs+hx/gI1jhpF10pr2SpaNzfe65Pk4KzpQOgj+ef+nylavXDq43bty8dfuweefuqcpLSeiQ5DyX4wQrypmgQ800p+NCUpwlnI6Ss0HVH32iUrFcvNOrgk4zPBdsxgjWjoqb/6JSpFQmEhNq3ndYd/nIxiZiAqIM60WSmLf2gxm0rYWXEKkyiw1rQ903j8Nu0A0ja9f8cpffmj3q9I9Yuwv9o2V7vwVEmmVUwcB5bSvHHQWs0ipY7hcPbBeiWS4x51A/ASLJ5guNpcw/u7H0izYFljijmkpQCyyZmNu42Qp6QV2wC8INaKFNncTNv1GakzKjQhOOlZqEQaGnBkvNCKe2EZWKFpic4TmdOCicn5qa+o8sPHRMCm5Jd4SGmt1WGJwptcoSd7N6mLrYq8h9vUmpZy+mhomi1FSQtdGs5KBzqD4cUiYp0XzlACaSuV2BuAQwcVmcd0lVtZptuGDCizHsgtN+L3zWe/Lmaev41SaiA3QfPUAdFKLn6Bi9RidoiIj30fvqffO++yv/h//T/7W+6nsbzT10rvzf/wEuoO4I</latexit>
2R C0 2RC 0 ⇥C 2RC
s ! stride
<latexit sha1_base64="pZx4TjTMSIzM9wmSv3mAPUtOnoc=">AAACGHicbVDJSgNBEO1xjXGLesylMQiewoyKegx68RjBLJAMoaenJmnSs9Bdo4YhB3/DH/Cqf+BNvHrzB/wOO8vBJD4oeLxXRVU9L5FCo21/W0vLK6tr67mN/ObW9s5uYW+/ruNUcajxWMaq6TENUkRQQ4ESmokCFnoSGl7/euQ37kFpEUd3OEjADVk3EoHgDI3UKRQ1bSvR7SFTKn6gbYRHzDQq4cOwUyjZZXsMukicKSmRKaqdwk/bj3kaQoRcMq1bjp2gmzGFgksY5tuphoTxPutCy9CIhaDdbPzEkB4ZxadBrExFSMfq34mMhVoPQs90hgx7et4bif95rRSDSzcTUZIiRHyyKEglxZiOEqG+UMBRDgxhXAlzK+U9phhHk9vMFl+PThvmTTDOfAyLpH5Sds7Lp7dnpcrVNKIcKZJDckwcckEq5IZUSY1w8kReyCt5s56td+vD+py0LlnTmQMyA+vrF5b/oTo=</latexit>
<latexit sha1_base64="/xZwqSHLnke7J4r8+stf9NH4Wio=">AAACJnicbVDLTsJAFJ3iC/FVdelmImhgQ1pI1I0JgYUu0cgjgYZMhylMmD4yMzUhDd/gb/gDbvUP3BnjzpXf4bR0IeBZnZxzb+65xw4YFdIwvrTM2vrG5lZ2O7ezu7d/oB8etYUfckxa2Gc+79pIEEY90pJUMtINOEGuzUjHnjRiv/NIuKC+9yCnAbFcNPKoQzGSShropUIDXsNqARbvb+ol6HOYCKYS6gzhCeyfw86YSlIa6HmjbCSAq8RMSR6kaA70n/7Qx6FLPIkZEqJnGoG0IsQlxYzMcv1QkECdQCPSU9RDLhFWlLw0g2dKGUJH5XF8T8JE/bsRIVeIqWurSRfJsVj2YvE/rxdK58qKqBeEknh4fsgJGZQ+jPuBQ8oJlmyqCMKcqqwQjxFHWKoWF64MRRxtllPFmMs1rJJ2pWxelKt3lXytnlaUBSfgFBSBCS5BDdyCJmgBDJ7AC3gFb9qz9q59aJ/z0YyW7hyDBWjfvyAaoLo=</latexit>
<latexit sha1_base64="/xZwqSHLnke7J4r8+stf9NH4Wio=">AAACJnicbVDLTsJAFJ3iC/FVdelmImhgQ1pI1I0JgYUu0cgjgYZMhylMmD4yMzUhDd/gb/gDbvUP3BnjzpXf4bR0IeBZnZxzb+65xw4YFdIwvrTM2vrG5lZ2O7ezu7d/oB8etYUfckxa2Gc+79pIEEY90pJUMtINOEGuzUjHnjRiv/NIuKC+9yCnAbFcNPKoQzGSShropUIDXsNqARbvb+ol6HOYCKYS6gzhCeyfw86YSlIa6HmjbCSAq8RMSR6kaA70n/7Qx6FLPIkZEqJnGoG0IsQlxYzMcv1QkECdQCPSU9RDLhFWlLw0g2dKGUJH5XF8T8JE/bsRIVeIqWurSRfJsVj2YvE/rxdK58qKqBeEknh4fsgJGZQ+jPuBQ8oJlmyqCMKcqqwQjxFHWKoWF64MRRxtllPFmMs1rJJ2pWxelKt3lXytnlaUBSfgFBSBCS5BDdyCJmgBDJ7AC3gFb9qz9q59aJ/z0YyW7hyDBWjfvyAaoLo=</latexit>
<latexit sha1_base64="CMnLzqdN4dICz9MDT04MS12iBaE=">AAACMHicbVDLSgNBEJz1bXxFPXoZDIKCWXYV1KPoxWMEo4EkhN7ZThycnVlmZsWw5EP8DX/Aq/6BnsSLB7/CyeNgjA0NRVV3dVNRKrixQfDuTU3PzM7NLywWlpZXVteK6xvXRmWaYZUpoXQtAoOCS6xabgXWUo2QRAJvorvzvn5zj9pwJa9sN8VmAh3J25yBdVSreNgYeOQa415FKWfTobvod/x9msBDmYKMKTgD6GA5Hep7rWIp8INB0UkQjkCJjKrSKn41YsWyBKVlAoyph0Fqmzloy5nAXqGRGUyB3bkjdQclJGia+eCxHt1xTEzbSruWlg7Y3xs5JMZ0k8hNJmBvzV+tT/6n1TPbPmnmXKaZRcmGh9qZoFbRflI05hqZFV0HgGnufqXsFjQw6/IcuxKb/mu9ggsm/BvDJLg+8MMj//DyoHR6NopogWyRbbJLQnJMTskFqZAqYeSRPJMX8uo9eW/eh/c5HJ3yRjubZKy87x8S3al9</latexit>
0 0
<latexit sha1_base64="cgfVuOY4+sL/j9cwKAEJ+4tPHy0=">AAACx3icdVHLjtMwFHXCayivAks2V1SoRROqBBCwQRoxG9gNiHYGmhIcx23dcezIdoZWlhf8Hn/AD/AdOGkXM1O4kqWjc699zj3OK860iePfQXjl6rXrN/Zudm7dvnP3Xvf+g7GWtSJ0RCSX6iTHmnIm6Mgww+lJpSguc06P89PDpn98RpVmUnw264pOSzwXbMYINp7Kur/SWhRU5QoTar8OWLR86jKbMgFpic0iz+0n980e9p2Dtw21yizrQ9u3z5IojpLUuQ2/3OXPvf1loIHts36kYbm/7P9PJYJ0JhXmHFovkCo2XxislPwBqaErY4UEXBSssY85VFjhkhq/ocu6vXgYtwW7INmCHtrWUdb9kxaS1CUVhnCs9SSJKzO1WBlGOHWdtNa0wuQUz+nEQ+GF9NS2mTt44pkCvFd/hIGWPX/D4lLrdZn7yWZFfbnXkP/qTWozezO1TFS1oYJshGY1ByOh+UAomKLE8LUHmCifAwGy8CmQJoQLKoVurLmODya5HMMuGD8fJq+GLz6+7B2820a0hx6hx2iAEvQaHaD36AiNEAniYBxkwffwQyjDs3C1GQ2D7Z2H6EKFP/8CfuTb6w==</latexit>
Z(i, j) != normalize
max max Y (si + i , sj + j ), 8(i, j) ! no additional parameters
<latexit sha1_base64="I+cIXyOtu5k4rrMXjER44243/s8=">AAACKnicbVDLSgNBEJz1bXxFPXoZDIIX467viyB68ahgNJCEMDvpJENmZ5aZXjUu+xX+hj/gVf/Am3gVv8NJ3IOvgoaiqpvurjCWwqLvv3ojo2PjE5NT04WZ2bn5heLi0qXVieFQ4VpqUw2ZBSkUVFCghGpsgEWhhKuwdzLwr67BWKHVBfZjaESso0RbcIZOahY3qvSQ1hFuMa1ubu3uZrRuRKeLzBh9kxtKm4hJcQdZs1jyy/4Q9C8JclIiOc6axY96S/MkAoVcMmtrgR9jI2UGBZeQFeqJhZjxHutAzVHFIrCNdPhWRtec0qJtbVwppEP1+0TKImv7Ueg6I4Zd+9sbiP95tQTbB41UqDhBUPxrUTuRFDUdZERbwgBH2XeEcSPcrZR3mWEcXZI/trTs4LSs4IIJfsfwl1xulYO98vb5TunoOI9oiqyQVbJOArJPjsgpOSMVwsk9eSRP5Nl78F68V+/tq3XEy2eWyQ94759oYqfS</latexit>
X = X/255
| {z } i 2{ 1,0,1} j 2{ 1,0,1} |
0 0 {z }
Training CData
<latexit sha1_base64="xLEohGksaBrelpOKInVKXS74iwI=">AAACFXicbVDLSsNAFJ3UV62vqBvBzWARXJWkgros6sJlhbYW2lBuJtN26GQSZiZCCfU3/AG3+gfuxK1rf8DvcJpmYVsPDBzOuXfO5fgxZ0o7zrdVWFldW98obpa2tnd29+z9g5aKEklok0Q8km0fFOVM0KZmmtN2LCmEPqcP/uhm6j88UqlYJBp6HFMvhIFgfUZAG6lnH3WzP1JJg0lDAhNMDPAtaOjZZafiZMDLxM1JGeWo9+yfbhCRJKRCEw5KdVwn1l4KUjPC6aTUTRSNgYxgQDuGCgip8tIsfYJPjRLgfiTNExpn6t+NFEKlxqFvJkPQQ7XoTcX/vE6i+1deykScaCrILKifcKwjPK0DB0xSovnYECCSmVsxGYIEok1pcymBmp42KZli3MUalkmrWnEvKuf31XLtOq+oiI7RCTpDLrpENXSH6qiJCHpCL+gVvVnP1rv1YX3ORgtWvnOI5mB9/QK2zp+i</latexit>
Vector Representation
<latexit sha1_base64="09GsjJo767DJJWgiB0A4NjFj2x4=">AAACHXicbVC7TsMwFHV4lvIqMLJYVEhMVVIkYKxgYSyIPqQ2qhznprXqxJHtIFVRV36DH2CFP2BDrIgf4Dtw0gy05U5H59zHuceLOVPatr+tldW19Y3N0lZ5e2d3b79ycNhWIpEUWlRwIbseUcBZBC3NNIduLIGEHoeON77J9M4jSMVE9KAnMbghGUYsYJRoQw0quJ/vSCX40zZQLSS+B7NBQaSLlqpds/PCy8ApQBUV1RxUfvq+oEloFlBOlOo5dqzdlEjNKIdpuZ8oiAkdkyH0DIxICMpNcxdTfGoYHwfGRSAijXP270RKQqUmoWc6Q6JHalHLyP+0XqKDKzdlUZxoiOjsUJBwrAXOYsE+k+Z7PjGAUMmMV0xHRBKqTXhzV3yVWZuWTTDOYgzLoF2vORe187t6tXFdRFRCx+gEnSEHXaIGukVN1EIUPaEX9IrerGfr3fqwPmetK1Yxc4Tmyvr6BegYo4I=</latexit>
0 C 0
2R 2R
{(Xn , yn ) : n = 1, . . . , N }
<latexit sha1_base64="ruODafN6+oBzwshTFRonCg3dlL8=">AAACHXicbVDLSgMxFM3UV62vUZdugkWoUMqMioogFN24kgq2FjplyGTSNjSTDElGKEO3/oY/4Fb/wJ24FX/A7zDTdmFbD1w4nHMv994TxIwq7TjfVm5hcWl5Jb9aWFvf2Nyyt3caSiQSkzoWTMhmgBRhlJO6ppqRZiwJigJGHoL+deY/PBKpqOD3ehCTdoS6nHYoRtpIvg29tNT0eRkOfH54ATm8hG4ZeiwUWpXhrTf07aJTcUaA88SdkCKYoObbP14ocBIRrjFDSrVcJ9btFElNMSPDgpcoEiPcR13SMpSjiKh2OvpkCA+MEsKOkKa4hiP170SKIqUGUWA6I6R7atbLxP+8VqI75+2U8jjRhOPxok7CoBYwiwWGVBKs2cAQhCU1t0LcQxJhbcKb2hKq7LRhwQTjzsYwTxpHFfe0cnx3UqxeTSLKgz2wD0rABWegCm5ADdQBBk/gBbyCN+vZerc+rM9xa86azOyCKVhfvyBooAY=</latexit>
<latexit sha1_base64="jYC4lTAk2nRwWysBZNDP4ACPKj8=">AAACJHicbVC7SgNBFJ2NrxhfUUubwSBqYdiNoJZBG4sEIphESEKYnb3RIbOzy8xdISz5BH/DH7DVP7ATCxtLv8PJo1DjgQuHc1+H48dSGHTdDyczN7+wuJRdzq2srq1v5De3GiZKNIc6j2Skb3xmQAoFdRQo4SbWwEJfQtPvX4z6zXvQRkTqGgcxdEJ2q0RPcIZW6ub32+MbqYZgWE0kiqMKG4CmNbD3Y9SRogfVSu2wmy+4RXcMOku8KSmQKWrd/Fc7iHgSgkIumTEtz42xkzKNgksY5tqJgZjxPruFlqWKhWA66djMkO5ZJaC9SNtSSMfqz42UhcYMQt9OhgzvzN/eSPyv10qwd9ZJhYoTBMUnj3qJpBjRUTo0EBo4yoEljGthvVJ+xzTjaDP89SUwI2vDnA3G+xvDLGmUit5J8fiqVCifTyPKkh2ySw6IR05JmVySGqkTTh7IE3kmL86j8+q8Oe+T0Ywz3dkmv+B8fgOLraUz</latexit>
<latexit sha1_base64="NwDOKo3eoxXVYPCh0r+oRlFOBgQ=">AAACUXicbZBNixNBEIYr49ea9SPq0UtjEBKUMLOKirCw6MWDh1U2yUImDDU9NZtme3qG7hpJGOaP+Tc8efMk6D/wZufj4O5a0PDyvlVU9ZNWWjkOw++d4Nr1Gzdv7d3u7t+5e+9+78HDiStrK2ksS13a0xQdaWVozIo1nVaWsEg1TdPz9+t8+oWsU6U54VVF8wLPjMqVRPZW0jupkiZGXS3wuYhTYmwHy6E4FDHTkhtX5lzgsh1MRVybjGxqUVKzDT/Tx3E7mIileCZw2CbN4dtF63U6THr9cBRuSlwV0U70YVfHSe9nnJWyLsiw1OjcLAornjdoWUlNbTeuHVUoz/GMZl4aLMjNm83vW/HUO5nIS+ufYbFx/51osHBuVaS+s0BeuMvZ2vxfNqs5fzNvlKlqJiO3i/JaCy7FGqXIlCXJeuUFSqv8rUIu0BNiD/zClsytT2u7Hkx0GcNVMTkYRa9GLz697B+92yHag8fwBAYQwWs4gg9wDGOQ8BV+wC/43fnW+RNAEGxbg85u5hFcqGD/L6k5sv8=</latexit>
| {z }
Multi-column Deep Neural Networks
for Image Classi cation YouTube Video
prediction of column j
! Latin characters
! Chinese characters
Ciregan, Dan, Ueli Meier, and Jürgen Schmidhuber. "Multi-column deep neural networks for image classi cation." 2012 IEEE conference on computer vision and
pattern recognition. IEEE, 2012.
fi
fi
ImageNet Classi cation with Deep
Convolutional Neural Networks YouTube Playlist
“To learn about thousands of objects from millions of images, we need a model with a large
learning capacity. However, the immense complexity of the object recognition task means that
this problem cannot be speci ed even by a dataset as large as ImageNet, so our model should
also have lots of prior knowledge to compensate for all the data we don’t have.”
“CNNs make strong and mostly correct assumptions about the nature of images (namely,
stationarity of statistics and locality of pixel dependencies).”
X
Architecture: I 2 R224⇥224⇥3 7! p 2 {p 2 R1000 : pi = 1, pi 0 8i}
i
1x1 Conv: X 2 RH⇥W ⇥C 7! Y 2 RH⇥W ⇥F N
X
Y (x, y) = W X(x, y), W 2 RF ⇥C , X(x, y) 2 RC , Y (x, y) 2 RF Loss: L(✓) = log(pin (In ))
n=1
X X
3x3 Conv: Y (x, y) = W (i, j)X(s(x 2) + i, s(y 2) + j), W 2 R3⇥3⇥F ⇥C
i=1,2,3 j=1,2,3
Krizhevsky, Alex, Ilya Sutskever, and Geo rey E. Hinton. "Imagenet classi cation with deep convolutional neural
networks." Advances in neural information processing systems. 2012.
fi
fi
ff
fi
Dropout: A Simple Way to Prevent
Neural Networks from Over tting YouTube Playlist
Hinton, Geo rey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from over tting." The journal of machine learning research 15.1 (2014): 1929-1958.
ff
fi
fi
Network In Network
YouTube Playlist
1) MLP Convolution Layers (1x1 Convolutions) Input patch centered at location (i, j)
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
Very Deep Convolutional Networks for
Large-Scale Image Recognition YouTube Playlist
Data Augmentation:
1) Image translations (random crops) and horizontal re ection
2) Altering the intensities of the RGB channels in training images (object identity is invariant
to changes in the intensity and color of the illumination )
random variable drawn from a Gaussian with mean zero and standard deviation 0.1
R G B T T
Ixy = [Ixy , Ixy , Ixy ] + [p1 , p2 , p3 ][↵1 1 , ↵2 2 , ↵3 3 ]
3) Scale jittering
Each training image is individually rescaled by randomly sampling S from a certain range [Smin , Smax ]
S is the smallest side of an isotropically-rescaled training image.
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
fl
Going Deeper with Convolutions
YouTube Playlist
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
Batch Normalization: Accelerating Deep Network
Training by Reducing Internal Covariate Shift YouTube Playlist
Mini-network replacing
{
Replace q(k|x) with q 0 (k|x) = (1 ✏)q(k|x) + ✏u(k)
label-smoothing regularization
n=7 x ! training example
H(q 0 , p) = (1 ✏)H(q, p) + ✏H(u, p)
k 2 {1, . . . , K} ! label
K H(u, p) = KL(u||p) + H(u)
X
p(k|x) = exp(zk )/ exp(zi ) u = 1/K
i=1
zi ! logits (un-normalized log-probabilities)
q(k|x) ! ground truth distribution over labels
k ⇤ ! ground truth label
⇢
0 if k 6= k ⇤ ;
q(k|x) = k⇤ (k) = ⇤
K 1 if k = k .
X
cross-entropy loss L= q(k|x) log(p(k|x)) = log(p(k ⇤ |x)) ! negative log-likelihood
H(q, p) k=1
Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition.
2016.
fi
Training Very Deep Networks
YouTube Video
Highway Networks
<latexit sha1_base64="bltVsaAtI6J+DMBb5J6xxKAs7u8=">AAACBHicbVC7TsMwFHV4lvIKMHaxqJCYqqQDMFawdEJFog+pjSrHuWmtOg/ZDlUUdWDhV1gYQIiVj2Djb3DTDNBypCsdnXOvfe9xY86ksqxvY219Y3Nru7RT3t3bPzg0j447MkoEhTaNeCR6LpHAWQhtxRSHXiyABC6Hrju5mfvdBxCSReG9SmNwAjIKmc8oUVoampVB/kYmwJs12Wg8JSm+BTWNxEQOzapVs3LgVWIXpIoKtIbm18CLaBJAqCgnUvZtK1ZORoRilMOsPEgkxIROyAj6moYkAOlk+QIzfKYVD/uR0BUqnKu/JzISSJkGru4MiBrLZW8u/uf1E+VfORkL40RBSBcf+QnHKsLzRLDHBFDFU00IFUzviumYCEKVzq2sQ7CXT14lnXrNvqjV7+rVxnURRwlV0Ck6Rza6RA3URC3URhQ9omf0it6MJ+PFeDc+Fq1rRjFzgv7A+PwBoyGYug==</latexit>
<latexit sha1_base64="BQDeInhVwFPHb9E1l+LnZNXiusA=">AAAB/XicbVDJSgNBEO2JW4xbXG5eGhMhggwzQdRj0IvHCNkgCaGnU0ma9Cx014hxCP6KFw+KePU/vPk3dpaDRh8UPN6roqqeF0mh0XG+rNTS8srqWno9s7G5tb2T3d2r6TBWHKo8lKFqeEyDFAFUUaCERqSA+Z6Euje8nvj1O1BahEEFRxG0fdYPRE9whkbqZA9aCPeYFIQN9inNe51K/mTcyeYc25mC/iXunOTIHOVO9rPVDXnsQ4BcMq2brhNhO2EKBZcwzrRiDRHjQ9aHpqEB80G3k+n1Y3pslC7thcpUgHSq/pxImK/1yPdMp89woBe9ifif14yxd9lORBDFCAGfLerFkmJIJ1HQrlDAUY4MYVwJcyvlA6YYRxNYxoTgLr78l9SKtntuF2/PcqWreRxpckiOSIG45IKUyA0pkyrh5IE8kRfyaj1az9ab9T5rTVnzmX3yC9bHNx+Mk7c=</latexit>
<latexit sha1_base64="h6NO6gJ/XLhZv/qRNdyEBR0psCs=">AAAB+3icbVDJSgNBEO2JW4xbjEcvjYkQQYaZIOox6MVjhGyQhNDTqSRNeha6ayRhyK948aCIV3/Em39jZzlo4oOCx3tVVNXzIik0Os63ldrY3NreSe9m9vYPDo+yx7m6DmPFocZDGaqmxzRIEUANBUpoRgqY70loeKP7md94AqVFGFRxEkHHZ4NA9AVnaKRuNtdGGGNSFDbYl7RQLVxMu9m8Yztz0HXiLkmeLFHpZr/avZDHPgTIJdO65ToRdhKmUHAJ00w71hAxPmIDaBkaMB90J5nfPqXnRunRfqhMBUjn6u+JhPlaT3zPdPoMh3rVm4n/ea0Y+7edRARRjBDwxaJ+LCmGdBYE7QkFHOXEEMaVMLdSPmSKcTRxZUwI7urL66Rest1ru/R4lS/fLeNIk1NyRorEJTekTB5IhdQIJ2PyTF7JmzW1Xqx362PRmrKWMyfkD6zPH6R8kuI=</latexit>
(i.e., bT ) (i.e., T )
y = H(x; WH ) ! a single layer
<latexit sha1_base64="1tfWuKE2xJ/qwIha+zMYBHgADKM=">AAACGHicbVDLSgNBEJz1bXxFPXoZDIJe4q6ICiKIXnJUMEZIQuiddJLB2dllplddlnyGF3/FiwdFvHrzb5zEHHwVNBRV3XR3hYmSlnz/wxsbn5icmp6ZLczNLywuFZdXLm2cGoFVEavYXIVgUUmNVZKk8CoxCFGosBZenw782g0aK2N9QVmCzQi6WnakAHJSq7id8SNe2bw75LVWZYs3jOz2CIyJb3mD8I5y4FbqrkKuIEPTbxVLftkfgv8lwYiU2AhnreJ7ox2LNEJNQoG19cBPqJmDISkU9guN1GIC4hq6WHdUQ4S2mQ8f6/MNp7R5JzauNPGh+n0ih8jaLApdZwTUs7+9gfifV0+pc9DMpU5SQi2+FnVSxSnmg5R4WxoUpDJHQBjpbuWiBwYEuSwLLoTg98t/yeVOOdgr75zvlo5PRnHMsDW2zjZZwPbZMauwM1Zlgt2zR/bMXrwH78l79d6+Wse80cwq+wHv/RPmlJ8m</latexit>
<latexit sha1_base64="gngleh6CltvwxD8+C5iCkautMc8=">AAACeXicbVFNb9NAEF2brxK+AhzhsJAGhQosO0htjxVcciwSaSolUTTejJNV17vW7rglWPkP/Lbe+CNcuLBOfCgpI6309Gbe7MybtFDSURz/CsI7d+/df7D3sPXo8ZOnz9rPX5w5U1qBQ2GUsecpOFRS45AkKTwvLEKeKhylF1/q/OgSrZNGf6NVgdMcFlpmUgB5atb+2R3wiZWLJYG15opPCL9TpY3+WHcEy8mCdpmx+UbAC7CQI1n5A+c8XfH90Wywv271MFpEHzhkmVftajKjlLna1gO/0RsEycumptSiBu9n7U4cxZvgt0HSgA5r4nTWvp7MjShz1CQUODdO4oKmFViSQuG6NSkdFiAuYIFjD7Uf302rjXNr3vXM3A9o/dPEN+xNRQW5c6s89ZV+maXbzdXk/3LjkrLjaSV1URJqsf0oKxUnw+sz8Lm0KEitPABhpZ+Vi6X3VpA/VsubkOyufBuc9aPkMOp/7XdOPjd27LFX7C3rsYQdsRM2YKdsyAT7HbwOusG74E/4JuyFB9vSMGg0L9k/EX76Cz7RwIc=</latexit>
element-wise multiplication
<latexit sha1_base64="jqZTPQnnHZrzUBFgRha1qEBPS+0=">AAACC3icbVC7SgNBFJ31GeNr1dJmSBBsDLtB1DJoYxnBPCBZwuzkJhky+2DmrhqW9Db+io2FIrb+gJ1/4yTZQhMvDBzOOffeucePpdDoON/W0vLK6tp6biO/ubW9s2vv7dd1lCgONR7JSDV9pkGKEGooUEIzVsACX0LDH15N9MYdKC2i8BZHMXgB64eiJzhDQ3XsQhvhAVOQEECIJ/dCAw0SicIsn3nGHbvolJxp0UXgZqBIsqp27K92N+LJZCCXTOuW68TopUyh4BLG+XaiIWZ8yPrQMjBkAWgvnd4ypkeG6dJepMwLkU7Z3x0pC7QeBb5xBgwHel6bkP9prQR7F14qwjhBCPlsUS+RFCM6CYZ2hQKOcmQA40qYv1I+YIpxNPHlTQju/MmLoF4uuWel8s1psXKZxZEjh6RAjolLzkmFXJMqqRFOHskzeSVv1pP1Yr1bHzPrkpX1HJA/ZX3+AIHsm/w=</latexit>
T ! transform gate
<latexit sha1_base64="KadYP5ne/qGVzRZmTTsZjALivOQ=">AAACDHicbVDLSgNBEJz1GeMr6tHLYBA8hV0R9Rj04jFCHkISpHfSSQZnZ5eZXjUs+QAv/ooXD4p49QO8+TdO4h58FQwUVdX0dIWJkpZ8/8ObmZ2bX1gsLBWXV1bX1ksbm00bp0ZgQ8QqNhchWFRSY4MkKbxIDEIUKmyFV6cTv3WNxspY12mUYDeCgZZ9KYCcdFkq13nHyMGQwJj4hncIbykjA9r2YxPxARCOXcqv+FPwvyTISZnlqF2W3ju9WKQRahIKrG0HfkLdDAxJoXBc7KQWExBXMMC2oxoitN1sesyY7zqlx9129zTxqfp9IoPI2lEUumQENLS/vYn4n9dOqX/czaROUkItvhb1U8Up5pNmeE8aFKRGjoAw0v2ViyEYEOT6K7oSgt8n/yXN/UpwWNk/PyhXT/I6Cmyb7bA9FrAjVmVnrMYaTLA79sCe2LN37z16L97rV3TGy2e22A94b5+eiJv/</latexit>
C ! carry gate
<latexit sha1_base64="bT7DnABtmn1FLbrm3J9RELyWJD8=">AAACCHicbVC7SgNBFJ31GeNr1dLCwSBYhd0gahlMYxnBPCAJYXZykwyZfTBzV12WlDb+io2FIrZ+gp1/4yTZQhMPXDhzzr3MvceLpNDoON/W0vLK6tp6biO/ubW9s2vv7dd1GCsONR7KUDU9pkGKAGooUEIzUsB8T0LDG1UmfuMOlBZhcItJBB2fDQLRF5yhkbr2UYW2lRgMkSkV3tM2wgOm3DwSOmAI465dcIrOFHSRuBkpkAzVrv3V7oU89iFALpnWLdeJsJMyhYJLGOfbsYaI8REbQMvQgPmgO+n0kDE9MUqP9kNlKkA6VX9PpMzXOvE90+kzHOp5byL+57Vi7F92UhFEMULAZx/1Y0kxpJNUaE8o4CgTQxhXwuxK+ZApxtFklzchuPMnL5J6qeieF0s3Z4XyVRZHjhySY3JKXHJByuSaVEmNcPJInskrebOerBfr3fqYtS5Z2cwB+QPr8wcY/JoL</latexit>
C=1 T
<latexit sha1_base64="l3zbcwg3aAJQxfOBfHVN6leZ7Io=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRbBi2W3iHoRir14rNAvaZeSTbNtaJJdkqxQlv4KLx4U8erP8ea/MW33oK0PBh7vzTAzL4g508Z1v53c2vrG5lZ+u7Czu7d/UDw8aukoUYQ2ScQj1QmwppxJ2jTMcNqJFcUi4LQdjGszv/1ElWaRbJhJTH2Bh5KFjGBjpccaukUeukCNfrHklt050CrxMlKCDPV+8as3iEgiqDSEY627nhsbP8XKMMLptNBLNI0xGeMh7VoqsaDaT+cHT9GZVQYojJQtadBc/T2RYqH1RAS2U2Az0sveTPzP6yYmvPFTJuPEUEkWi8KEIxOh2fdowBQlhk8swUQxeysiI6wwMTajgg3BW355lbQqZe+qXHm4LFXvsjjycAKncA4eXEMV7qEOTSAg4Ble4c1Rzovz7nwsWnNONnMMf+B8/gDylI6N</latexit>
<latexit sha1_base64="n/vD4R9Sr6XuUROe9uG+3sbYwlI=">AAACBHicbVDLSsNAFJ3UV62vqMtuBovQIpSkiLoRim5cVkgf0IYwmU7aoZNJmJlIS+jCjb/ixoUibv0Id/6N0zYLbT1w4XDOvdx7jx8zKpVlfRu5tfWNza38dmFnd2//wDw8askoEZg0ccQi0fGRJIxy0lRUMdKJBUGhz0jbH93O/PYDEZJG3FGTmLghGnAaUIyUljyz6JTHFXgNe5IOQgTLbc+BY3gGfc+peGbJqlpzwFViZ6QEMjQ886vXj3ASEq4wQ1J2bStWboqEopiRaaGXSBIjPEID0tWUo5BIN50/MYWnWunDIBK6uIJz9fdEikIpJ6GvO0OkhnLZm4n/ed1EBVduSnmcKMLxYlGQMKgiOEsE9qkgWLGJJggLqm+FeIgEwkrnVtAh2Msvr5JWrWpfVGv356X6TRZHHhTBCSgDG1yCOrgDDdAEGDyCZ/AK3own48V4Nz4WrTkjmzkGf2B8/gBX+JVa</latexit>
T (x) = (WT x + bT )
<latexit sha1_base64="jf3tihwCQvUVpuyihMoF4AzemUY=">AAACOXicbVBNbxMxEPW2QNNAaVqOXCyiqu0h0W5BbS+VIrhwDFLSRkqidNY7Sax67ZU9W4hW6c/qhX/BDYkLhyLElT+A84FEG55k6c17MxrPizMlHYXh12Bt/dHjJxulzfLTZ1vPtys7u+fO5FZgWxhlbCcGh0pqbJMkhZ3MIqSxwov46t3Mv7hG66TRLZpk2E9hpOVQCiAvDSrNeNDiZ7wW8R7hJypujJ3y2uu/ldSSJCg14Qex9GsSTuYj2MRxGiO/vBRg7WR/n8c4hmtp7OF0UKmG9XAOvkqiJamyJZqDypdeYkSeoiahwLluFGbUL8CSFAqn5V7uMANxBSPseqohRdcv5pdP+V4++9PQWP808bn670QBqXOTNPadKdDYPfRm4v+8bk7D034hdZYTarFYNMyVP5/PYuSJtCjI55JIENaHJLgYgwVBPuyyDyF6ePIqOT+qR8f1ow9vqo23yzhK7CV7xQ5YxE5Yg71nTdZmgt2yb+yO/Qg+B9+Dn8GvRetasJx5we4h+P0H9f+sfQ==</latexit>
<latexit sha1_base64="nBASIm9oRMCGhVPVDfPfdaoa9Lg=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSxCBQlJF+qy6MZlBfuANpTJ5KYdOpmEmYlQS/FX3LhQxK3/4c6/cdpmoa0HLhzOuZd77wlSzpR23W+rsLK6tr5R3Cxtbe/s7tn7B02VZJJCgyY8ke2AKOBMQEMzzaGdSiBxwKEVDG+mfusBpGKJuNejFPyY9AWLGCXaSD37qMIccM7xgIUhCJwJptVZzy67jjsDXiZeTsooR71nf3XDhGYxCE05Uarjuan2x0RqRjlMSt1MQUrokPShY6ggMSh/PLt+gk+NEuIokaaExjP198SYxEqN4sB0xkQP1KI3Ff/zOpmOrvwxE2mmQdD5oijjWCd4GgUOmQSq+cgQQiUzt2I6IJJQbQIrmRC8xZeXSbPqeBdO9a5arl3ncRTRMTpBFeShS1RDt6iOGoiiR/SMXtGb9WS9WO/Wx7y1YOUzh+gPrM8fhTKT/g==</latexit>
(i.e., y)
<latexit sha1_base64="cFVtwIlInLF4qSQZ4Zdj/Ki4EKg=">AAAB+XicbVA9SwNBEN2LXzF+nVraLAYhNuEuhVoGbSwjmA9IjrC3mUuW7O4du3vBcOSf2FgoYus/sfPfuEmu0MQHA4/3ZpiZFyacaeN5305hY3Nre6e4W9rbPzg8co9PWjpOFYUmjXmsOiHRwJmEpmGGQydRQETIoR2O7+Z+ewJKs1g+mmkCgSBDySJGibFS33UrmskhBwxPRCQcLvtu2at6C+B14uekjHI0+u5XbxDTVIA0lBOtu76XmCAjyjDKYVbqpRoSQsdkCF1LJRGgg2xx+QxfWGWAo1jZkgYv1N8TGRFaT0VoOwUxI73qzcX/vG5qopsgYzJJDUi6XBSlHJsYz2PAA6aAGj61hFDF7K2Yjogi1NiwSjYEf/XlddKqVf2rau2hVq7f5nEU0Rk6RxXko2tUR/eogZqIogl6Rq/ozcmcF+fd+Vi2Fpx85hT9gfP5A+awky8=</latexit>
(single example)
Srivastava, Rupesh Kumar, Klaus Gre , and Jürgen Schmidhuber. "Training very deep networks." arXiv preprint arXiv:1507.06228 (2015).
Srivastava, Rupesh Kumar, Klaus Gre , and Jürgen Schmidhuber. "Highway networks." arXiv preprint arXiv:1505.00387 (2015).
ff
ff
Deep Residual Learning for Image Recognition
YouTube Playlist
bottleneck
8
<
:
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Identity Mappings in Deep Residual Networks
YouTube Playlist
:
<
8
Original
| {z }
He, Kaiming, et al. "Identity mappings in deep residual networks." European conference on computer vision. Springer, Cham, 2016.
Deep Networks with Stochastic Depth
For each mini-batch, randomly select sets of layers and remove their correspon
<latexit sha1_base64="kwqQc42cmtM+Z/nU8U5L3y4wS5U=">AAACinicbVHLbtNAFB2bR0soEGDJ5ooIiUUb2UFqQWwqQIhlkUhbKYmi8fg6vso8rJlxpSjqx/BL7Pgbrt0soOVKIx+dc5/HRaMpxCz7naT37j94uLf/aPD44MnTZ8PnL86Da73CqXLa+ctCBtRkcRoparxsPEpTaLwo1p87/eIKfSBnf8RNgwsjV5YqUjIytRz+nFtHtkQb4avzgFLVYMjSUSGjqg/BS1s6ozfAI1BF/sQArgItN9wVWAWPxl0hxBrJg3LeY2icLcmuIHJ5qJw3/TSoWqs6EA7BWe65Rmz6tBqBuh0o8qA1NdzGWuxzx8vhKBtnfcBdkO/ASOzibDn8NS+dag33U1qGMMuzJi620kdSGq8H8zZgI9VarnDG0EqDYbHtrbyGN8yUwCvzY0969u+KrTQhbEzBmXxUHW5rHfk/bdbG6v1iS7ZpI1p1M6hqNUQH3X+BkjwfzKaUJJUn3hVULb1UkX0esAn57ZPvgvPJOD8ev/s+GZ1+2tmxL16J1+KtyMWJOBXfxJmYCpXsJUfJcXKSHqST9EP68SY1TXY1L8U/kX75A9SkxsQ=</latexit>
For each mini-batch, randomly select sets of layers and remove their correspond-ing transformation functions, only keeping the identity skip connection.
<latexit sha1_base64="kwqQc42cmtM+Z/nU8U5L3y4wS5U=">AAACinicbVHLbtNAFB2bR0soEGDJ5ooIiUUb2UFqQWwqQIhlkUhbKYmi8fg6vso8rJlxpSjqx/BL7Pgbrt0soOVKIx+dc5/HRaMpxCz7naT37j94uLf/aPD44MnTZ8PnL86Da73CqXLa+ctCBtRkcRoparxsPEpTaLwo1p87/eIKfSBnf8RNgwsjV5YqUjIytRz+nFtHtkQb4avzgFLVYMjSUSGjqg/BS1s6ozfAI1BF/sQArgItN9wVWAWPxl0hxBrJg3LeY2icLcmuIHJ5qJw3/TSoWqs6EA7BWe65Rmz6tBqBuh0o8qA1NdzGWuxzx8vhKBtnfcBdkO/ASOzibDn8NS+dag33U1qGMMuzJi620kdSGq8H8zZgI9VarnDG0EqDYbHtrbyGN8yUwCvzY0969u+KrTQhbEzBmXxUHW5rHfk/bdbG6v1iS7ZpI1p1M6hqNUQH3X+BkjwfzKaUJJUn3hVULb1UkX0esAn57ZPvgvPJOD8ev/s+GZ1+2tmxL16J1+KtyMWJOBXfxJmYCpXsJUfJcXKSHqST9EP68SY1TXY1L8U/kX75A9SkxsQ=</latexit>
ing transformation
f` !
<latexit sha1_base64="GszjD2FTJoBPhsKXgZvj+POm77g=">AAAB/HicbVBNS8NAEJ3Ur1q/oj16WSyCp5KoqMeiF48V7Ac0IWy2m3bpZhN2N0oo9a948aCIV3+IN/+N2zYHbX0w8Hhvhpl5YcqZ0o7zbZVWVtfWN8qbla3tnd09e/+grZJMEtoiCU9kN8SKciZoSzPNaTeVFMchp51wdDP1Ow9UKpaIe52n1I/xQLCIEayNFNjVKPAo58iTbDDUWMrkEQV2zak7M6Bl4hakBgWagf3l9ROSxVRowrFSPddJtT/GUjPC6aTiZYqmmIzwgPYMFTimyh/Pjp+gY6P0UZRIU0Kjmfp7YoxjpfI4NJ0x1kO16E3F/7xepqMrf8xEmmkqyHxRlHGkEzRNAvWZpETz3BBMJDO3IjLEEhNt8qqYENzFl5dJ+7TuXtTP7s5rjesijjIcwhGcgAuX0IBbaEILCOTwDK/wZj1ZL9a79TFvLVnFTBX+wPr8AYgBlLQ=</latexit>
Stochastic depth
<latexit sha1_base64="Mxpzg0cJKoCpd5Odz+BTDSSgP9U=">AAACBHicbVDLSsNAFJ34rPUVddlNsAiuSlJBXRbduKxoH9CGMpnctkMnD2ZuhBK6cOOvuHGhiFs/wp1/4yTNQlsPDBzOuY+5x4sFV2jb38bK6tr6xmZpq7y9s7u3bx4ctlWUSAYtFolIdj2qQPAQWshRQDeWQANPQMebXGd+5wGk4lF4j9MY3ICOQj7kjKKWBmaln89IJfizO4zYmCrkzPIhxvHArNo1O4e1TJyCVEmB5sD86vsRSwIIkQmqVM+xY3RTKvVIAbNyP1EQUzahI+hpGtIAlJvmH5hZJ1rxrWEk9QvRytXfHSkNlJoGnq4MKI7VopeJ/3m9BIeXbsrDOEEI2XzRMBEWRlaWiOVzCQzFVBPKJM/O1zFIylDnVtYhOIsnL5N2veac185u69XGVRFHiVTIMTklDrkgDXJDmqRFGHkkz+SVvBlPxovxbnzMS1eMoueI/IHx+QOfO5i3</latexit>
Huang, Gao, et al. "Deep networks with stochastic depth." European conference on computer vision. Springer, Cham, 2016.
Wide Residual Networks
YouTube Playlist
{z } ! BN-ReLU-Conv
Conv-BN-ReLU
| | {z }
ResNet Identity Mapping
32
X
Wi xi = [W1 , . . . , W32 ] [x1 ; . . . ; x32 ]
|{z} |{z} | {z }| {z }
i=1 2R256⇥4 2R4
2R256⇥128 R128
C
X
split transform
| {z } merge strategy ! inception F(x) = Ti (x) ! “Network in Neuron”
|{z} | {z }
1 ⇥ 1 conv 3 ⇥ 3, 5 ⇥ 5, etc. conv concatenation i=1
D
X C ! cardinality
cardinality: size of the set of transformations C
wi xi ! neuron X
i=1
y =x+ Ti (x)
i=1
x = [x1 , x2 , . . . , xD ]
split: x ! xi
transform: xi wi
X D
aggregate: w i xi
i=1
Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition.
2017.
Densely Connected Convolutional Networks
YouTube Playlist
x0 ! single image
L ! number of layers
H` (·) ! nonlinear transformation
` = 1, . . . , L ! layer index
x` ! output of the `-th layer 4k feature maps
x` = H` (x` 1 )
x` = H` (x` 1 ) + x` 1
x` = H` ([x0 , x1 , . . . , x` 1 ])
| {z }
growth rate of the network
concatenate k = 12
If each function H` produces k feature-maps,
then the `-th layer has k0 + k(` 1) input feature maps
Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Huang, Gao, et al. "Convolutional networks with dense connectivity." IEEE transactions on pattern analysis and machine intelligence (2019).
Inception-v4, Inception-ResNet and
the Impact of Residual Connections on Learning YouTube Playlist
Stem:
Inception-Resnet-v1
Refer to the paper for the exact forms of
Inception-B and Inception-C blocks
Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." arXiv preprint arXiv:1602.07261 (2016).
mixup: Beyond Empirical Risk Minimization
YouTube Playlist
X m
data-agnostic data augmentation
<latexit sha1_base64="F+YSwLidKuXl3fEcLgHx8j62kKk=">AAACG3icbVC7TsMwFHV4lvIqMDJgUSGxUCUdgLGChbFI9CG1UXXjOK1Vx45sB6mqOvIZfAErfAEbYmXgA/gPnDYDbbmSpeNzzr3XPkHCmTau++2srK6tb2wWtorbO7t7+6WDw6aWqSK0QSSXqh2AppwJ2jDMcNpOFIU44LQVDG8zvfVIlWZSPJhRQv0Y+oJFjICxVK90EoKBC8tJbRjB2Q1D2o+pMLmj7FbcaeFl4OWgjPKq90o/3VCSNBtAOGjd8dzE+GNQdjynk2I31TQBMoQ+7VgoIKbaH08/MsFnlglxJJU9wuAp+7djDLHWoziwzhjMQC9qGfmf1klNdO2PmUhSQwWZLYpSjo3EWSo4ZIoSw0cWAFEsi4IMQAExNru5LaHOnjaxuXiLKSyDZrXiXVaq99Vy7SZPqICO0Sk6Rx66QjV0h+qogQh6Qi/oFb05z8678+F8zqwrTt5zhObK+foFYaCiWA==</latexit>
<latexit sha1_base64="D8W73lp6Wi8kFfcjNpBZfgoImXI=">AAACdXicbVHBbtQwEHUChVKgXeCIkCwW0FZCS1Ih4FKpggvHgti20nqJHGeyO1rbiWynNLLyoZw48Q1ccbY50JaRRn7z3oxm9JzXEq1Lkp9RfOv21p272/d27j94uLs3evT4xFaNETATlazMWc4tSNQwc+gknNUGuMolnObrT71+eg7GYqW/ubaGheJLjSUK7gKVjVqmuFsJLv3XLmO6mZT79JCy0nDh086rjjLbqMzjYdp9V5SBlJNywhzKAvxFl+H+azpUbV9RZnC5ctyY6kcQ4MJ5UDWasFDScxSow2vQrrtsNE6mySboTZAOYEyGOM5Gv1lRiUaBdkJya+dpUruF58ahkNDtsMZCzcWaL2EeoOYK7MJvLOroy8AUtKxMSO3ohv13wnNlbavy0NkbYq9rPfk/bd648sPCo64bB1pcLiobSV1Fe79pgQaEk20AXBgMt1Kx4sFeF37lypbC9qf1vqTXXbgJTg6m6bvpwZe346OPg0Pb5Cl5TiYkJe/JEflMjsmMCPIr2op2o73oT/wsfhG/umyNo2HmCbkS8Zu/2M7Abw==</latexit>
1
R⌫ (f ) = `(f (x̃i ), ỹi ) ! empirical vicinal risk
<latexit sha1_base64="NoflTODSggNNWgjoH+b2x8iap34=">AAACKHicbVDdSgJBGJ21P7M/q8tuhiQwCNn1orqUQuhGMMkUdJHZ2VEHZ2aXmdnAln2JHqMn6LaeoLvwNug9GlcvUjswcDjn++Y7HC9kVGnbnliZtfWNza3sdm5nd2//IH949KiCSGLSxAELZNtDijAqSFNTzUg7lARxj5GWN7qd+q0nIhUNxIMeh8TlaCBon2KkjdTLX3TTP2JJ/KTKQyqNw2CDqhGsUUE5fU4HYbHaqJ338gW7ZKeAq8SZkwKYo97L/3T9AEecCI0ZUqrj2KF2YyQ1xYwkuW6kSIjwCA1Ix1CBOFFunCZK4JlRfNgPpHlCw1T9uxEjrtSYe2aSIz1Uy95U/M/rRLp/7cZUhJEmAs8O9SMGdQCnFUGfSoI1GxuCsKQmK8RDJBHWpsiFK76aRktML85yC6vksVxyLkvl+3KhcjNvKAtOwCkoAgdcgQq4A3XQBBi8gDfwDj6sV+vT+rIms9GMNd85Bguwvn8BCoCnPw==</latexit>
m
D {(x̃
<latexit sha1_base64="TQ5zwsAnjoYONwArl6G4KOOYe6U=">AAACNHicbZDNSgMxEMezftb6VfXoJVgEBSm7IlqEQlEPHitYK3Trks2mGkyyS5IVS9hX8TF8Aq96F7xJrz6D2XYPVh0Y+PGfGWbmHyaMKu26787U9Mzs3Hxpoby4tLyyWllbv1JxKjFp45jF8jpEijAqSFtTzch1IgniISOd8P40r3ceiFQ0Fpd6kJAeR7eC9ilG2kpBpe5zpO8wYuYsC3yRwuMG9M2OrymLiHnMArpX8MDyrp8Fhja87IYHlapbc0cB/4JXQBUU0QoqQz+KccqJ0Jghpbqem+ieQVJTzEhW9lNFEoTv0S3pWhSIE9Uzow8zuG2VCPZjaVNoOFJ/ThjElRrw0Hbm/6jftVz8r9ZNdb/eM1QkqSYCjxf1UwZ1DHO7YEQlwZoNLCAsqb0V4jskEdbW1IktkcpPy6wv3m8X/sLVfs07rO1fHFSbJ4VDJbAJtsAO8MARaIJz0AJtgMETeAGv4M15dj6cT2c4bp1yipkNMBHO1zdBu6yJ</latexit>
⌫ i i i=1
mixup
<latexit sha1_base64="ENGZe/SC4yHGgdofcsdOEccavn8=">AAACDnicbVBLTsMwFHT4lvILZcnGokJiVSVdAMsKNiyLRD9SG1WO47RWbSeyHdQqyh04AVs4ATvElitwAO6Bk2ZBW0ayNJp5z/M0fsyo0o7zbW1sbm3v7Fb2qvsHh0fH9kmtq6JEYtLBEYtk30eKMCpIR1PNSD+WBHGfkZ4/vcv93hORikbiUc9j4nE0FjSkGGkjjezasPgjlSTIOJ0lMayO7LrTcArAdeKWpA5KtEf2zzCIcMKJ0JghpQauE2svRVJTzEhWHSaKxAhP0ZgMDBWIE+WlRW4GL4wSwDCS5gkNC/XvRoq4UnPum0mO9EStern4nzdIdHjjpVTEiSYCL4LChEEdwbwIGFBJsGZzQxCW1NwK8QRJhLWpayklUPlpmenFXW1hnXSbDfeq0Xxo1lu3ZUMVcAbOwSVwwTVogXvQBh2AwQy8gFfwZj1b79aH9bkY3bDKnVOwBOvrFyMsnNw=</latexit>
<latexit sha1_base64="6gNx7hL5SZGaJU+7wgtaR1Ol2YI=">AAACSnicbVDLSgNBEJyN73fUo5fBIGxAwq6IehFEL54kilEhG8LsbG8yZPbBTK8mLPkrP8Mf0Kv6A97Ei7MxB18NDUV1N11VfiqFRsd5tEoTk1PTM7Nz8wuLS8sr5dW1K51kikODJzJRNz7TIEUMDRQo4SZVwCJfwrXfOynm17egtEjiSxyk0IpYJxah4AwN1S6feRHDLmcyvxjaYZUeUk/ESD2Qktqh3a9u00GVBrRu97cN8JTodJEpldxRD6GPOfRT4AgBVUL3hu1yxak5o6J/gTsGFTKuerv86gUJzyKIkUumddN1UmzlTKHgEobzXqYhZbzHOtA0MGYR6FY+8j2kW4YJaJgo00b0iP1+kbNI60Hkm83Cpf49K8j/Zs0Mw4NWLuI0Q4j516MwkxQTWoRIA6GMaTkwgHEljFbKu0wxE4T6+SXQhbQiF/d3Cn/B1U7N3avtnO9Wjo7HCc2SDbJJbOKSfXJETkmdNAgn9+SJPJMX68F6s96tj6/VkjW+WSc/qjT5CU5wsjA=</latexit>
P is unknown!
<latexit sha1_base64="7ff5k1Un8H2E/DEhwY7yk7PMEJ8=">AAACC3icbVBLTgJBFOzxi/hh1KWbVjBxRWZYqEuiG5eYyCeBCenp6YEOPd2T/mjIhCN4Ard6AnfGrYfwAN7DBmYhYCUvqVS9l1epMGVUac/7dtbWNza3tgs7xd29/YOSe3jUUsJITJpYMCE7IVKEUU6ammpGOqkkKAkZaYej26nffiRSUcEf9DglQYIGnMYUI22lvluqNCqQKmj4iIsnftp3y17VmwGuEj8nZZCj0Xd/epHAJiFcY4aU6vpeqoMMSU0xI5NizyiSIjxCA9K1lKOEqCCbBZ/Ac6tEMBbSDtdwpv69yFCi1DgJ7WaC9FAte1PxP69rdHwdZJSnRhOO549iw6AWcNoCjKgkWLOxJQhLarNCPEQSYW27WvgSqWm0ie3FX25hlbRqVf+yWruvles3eUMFcALOwAXwwRWogzvQAE2AgQEv4BW8Oc/Ou/PhfM5X15z85hgswPn6BScHmpg=</latexit>
↵ 2 {1/4, 1/2, 1, 2, 4}
<latexit sha1_base64="jUa1qkdXIOruyVAeasv15MlxkLI=">AAACHHicbVDLSgMxFM34rPVVdekmWAQXpZ0pRV0W3bisYB/QGcqdTNqGZjJDkhHK0K2f4Re41S9wJ24FP8D/MNN2YVsP5HI4517uzfFjzpS27W9rbX1jc2s7t5Pf3ds/OCwcHbdUlEhCmyTikez4oChngjY105x2Ykkh9Dlt+6PbzG8/UqlYJB70OKZeCAPB+oyANlKvgF3g8RCwywR2U6dSK2GnUjWlhE2tuZNeoWiX7SnwKnHmpIjmaPQKP24QkSSkQhMOSnUdO9ZeClIzwukk7yaKxkBGMKBdQwWEVHnp9CcTfG6UAPcjaZ7QeKr+nUghVGoc+qYzBD1Uy14m/ud1E92/9lIm4kRTQWaL+gnHOsJZLDhgkhLNx4YAkczciskQJBBtwlvYEqjstCwXZzmFVdKqlp3LcvW+VqzfzBPKoVN0hi6Qg65QHd2hBmoigp7QC3pFb9az9W59WJ+z1jVrPnOCFmB9/QK7sp7R</latexit>
(xi , yi ) ⇠ P, i = 1, . . . , n
<latexit sha1_base64="DmqdnyV15uJOUn2pAGuJEqlE/bc=">AAACH3icbVDLSsNAFJ3UV62vqks3g0WoEEpSRN0IRTcuK9gHNCFMJpN26OTBzEQMoR/gZ/gFbvUL3InbfoD/4aTNwrYeGDiccy73znFjRoU0jKlWWlvf2Nwqb1d2dvf2D6qHR10RJRyTDo5YxPsuEoTRkHQklYz0Y05Q4DLSc8d3ud97IlzQKHyUaUzsAA1D6lOMpJKcaq3+7FAdpg49h5agAWzrkMIbaOrQYl4khQ7zlNEwZoCrxCxIDRRoO9Ufy4twEpBQYoaEGJhGLO0McUkxI5OKlQgSIzxGQzJQNEQBEXY2+8wEninFg37E1QslnKl/JzIUCJEGrkoGSI7EspeL/3mDRPrXdkbDOJEkxPNFfsKgjGDeDPQoJ1iyVBGEOVW3QjxCHGGp+lvY4on8tInqxVxuYZV0mw3zstF8uKi1bouGyuAEnII6MMEVaIF70AYdgMELeAPv4EN71T61L+17Hi1pxcwxWIA2/QUic6C7</latexit>
Xn
<latexit sha1_base64="ss6xc5C+KdVbW7yXnXA0Ii19c44=">AAACaXicbVHBattAEF3JbeMmTeukl9JelpiCA8FIoSS5BEJ76dGFOglYrlitRvbi3ZXYHSUWQl+ZUz4gp35Bb13ZPjRJBxYeb97wZt4mhRQWg+De8zsvXr7a6r7e3nmz+/Zdb2//0ual4TDmuczNdcIsSKFhjAIlXBcGmEokXCWLb23/6gaMFbn+iVUBU8VmWmSCM3RU3FOjOEpBIqOD5RGtDuk5jTLDeB02tW5oZEsV1+I8bH5puhYOlk6zjIVTO1DF4pBGRszmyIzJb2mEsMQaVCGMM5E0dTcYkZStXRP3+sEwWBV9DsIN6JNNjeLeQ5TmvFSgkUtm7SQMCpzWzKDgEprtqLRQML5gM5g4qJkCO61XsTT0s2NSmuXGPY10xf47UTNlbaUSp1QM5/ZpryX/15uUmJ1Na6GLEkHztVFWSoo5bTN2NxvgKCsHGDfC7Ur5nLlU0f3EI5fUtqu1uYRPU3gOLo+H4cnw+MeX/sXXTUJd8okckAEJySm5IN/JiIwJJ3fkj+d7He+3v+d/8D+upb63mXlPHpXf/wtQwLp8</latexit>
1
P (x, y) = (x = xi , y = yi ) ! empirical distribution
Z n i=1 Xn
<latexit sha1_base64="Bzk5t5HAZhp4S99fnVgbyb3IPXA=">AAAChnicbVFdi9NAFJ3Er7p+VQVffPBiEVJYSrKo68tC0Rcfq9jdhaaG6eSmHTqZhJkbbQj5A/5Df4Dv/gQn3SDurhdmOJx7DvfOmVWppKUw/On5N27eun1ncPfg3v0HDx8NHz85tUVlBM5FoQpzvuIWldQ4J0kKz0uDPF8pPFttP3T9s29orCz0F6pLXOZ8rWUmBSdHJcMfcc5pI7hqPrdJnKIiHmRjOIFYaoIYlQqyYDc+hHoMKcx6CQS7w3qvygwXTdQ2uoXYVnnSyJOo/ar/OhPZed0NsZHrDXFjiu8QE+6owbyUxm2iwEi7bZPhKJyE+4LrIOrBiPU1S4a/4rQQVY6ahOLWLqKwpGXDDUmhsD2IK4slF1u+xoWDmudol80+sxZeOSaFrDDuuJfu2X8dDc+trfOVU3YJ2au9jvxfb1FR9m7ZSF1WhFpcDMoqBVRA9wGQSoOCVO0AF0a6XUFsuIuR3DddmpLabrUul+hqCtfB6dEkejs5+vR6NH3fJzRgz9lLFrCIHbMp+8hmbM4E++0981544A/8if/GP76Q+l7vecoulT/9A6vFwto=</latexit>
1
R (f ) = `(f (x), y)dP (x, y) = `(f (xi ), yi ) ! empirical risk
mixup GANs
<latexit sha1_base64="2a6cEp6sQwuuZz7LduUlVbmxuKQ=">AAACEXicbVBLTsMwFHTKr5RfALFiY1EhsaqSLoBlgQWsUJFoi9RGleO4rVXbiWwHUUU5BSdgCydgh9hyAg7APXDSLGjLSJZGM+95nsaPGFXacb6t0tLyyupaeb2ysbm1vWPv7rVVGEtMWjhkoXzwkSKMCtLSVDPyEEmCuM9Ixx9fZX7nkUhFQ3GvJxHxOBoKOqAYaSP17YNe/kciSZBy+hRH8PriVvXtqlNzcsBF4hakCgo0+/ZPLwhxzInQmCGluq4TaS9BUlPMSFrpxYpECI/RkHQNFYgT5SV5dAqPjRLAQSjNExrm6t+NBHGlJtw3kxzpkZr3MvE/rxvrwbmXUBHFmgg8DRrEDOoQZl3AgEqCNZsYgrCk5laIR0girE1jMymByk5LTS/ufAuLpF2vuae1+l292rgsGiqDQ3AEToALzkAD3IAmaAEMEvACXsGb9Wy9Wx/W53S0ZBU7+2AG1tcvpMmeOQ==</latexit>
n i=1
<latexit sha1_base64="fGA6Ris+0IC18Q/ByH6XYuUv4O8=">AAACJnicbVDNTgIxGOziH+LfqkcvjcQEPZBdDuqR6MULCRJBEiCk2y3Q0HY3bdcEN/sOPoZP4FWfwJsx3rz4HnYXDgJO0mQy8339JuOFjCrtOF9WbmV1bX0jv1nY2t7Z3bP3D1oqiCQmTRywQLY9pAijgjQ11Yy0Q0kQ9xi598bXqX//QKSigbjTk5D0OBoKOqAYaSP17bNu9kcsiZ+0KKYCMdigagxrVFBOH7MxWGo1aqd9u+iUnQxwmbgzUgQz1Pv2T9cPcMSJ0JghpTquE+pejKSmmJGk0I0UCREeoyHpGCoQJ6oXZ3kSeGIUHw4CaZ7QMFP/bsSIKzXhnpnkSI/UopeK/3mdSA8uezEVYaSJwNNDg4hBHcC0IOhTSbBmE0MQltRkhXiEJMLa1Dh3xVdptMT04i62sExalbJ7Xq7cVorVq1lDeXAEjkEJuOACVMENqIMmwOAJvIBX8GY9W+/Wh/U5Hc1Zs51DMAfr+xdtWaZs</latexit>
1
P⌫ (x̃, ỹ) = ⌫(x̃, ỹ|xi , yi )
n i=1
⌫ ! vicinity distribution (measures the probability of finding the
<latexit sha1_base64="/6SKto5FPoMwTF8WdrXOfTZU3nY=">AAACUXicbVBNbxMxEJ0sX6XlI8CRi0WE1FO0WyHgWMGFY5FIWykbRbPe2cSq117Z40K0yi/jZ3DiyIFL+wu4Yac50JaRLD2990bz/KpOK895/nOQ3bl77/6DnYe7e48eP3k6fPb82NvgJE2k1dadVuhJK0MTVqzptHOEbaXppDr7mPSTc3JeWfOFVx3NWlwY1SiJHKn5cFKaIEqnFktG5+xXUTJ94/5cSWUUr0QdIzhVheQW+y2hD4684CWJztkKK6WTzTaiUaZWZpGk9Xw4ysf5ZsRtUGzBCLZzNB/+LmsrQ0uGpUbvp0Xe8axHx0pqWu+WwVOH8gwXNI3QYEt+1m++vxavI1OLxrr4DIsN++9Gj633q7aKzhZ56W9qifyfNg3cvJ/1ynSBycirQ03Qgq1IXcZyHEnWqSWUTsWsQi7RoeTY+LUrtU/RUi/FzRZug+ODcfF2fPD5zejww7ahHXgJr2AfCngHh/AJjmACEr7DL7iAy8GPwZ8MsuzKmg22Oy/g2mR7fwFK2LaE</latexit>
<latexit sha1_base64="+9f8mm8zJjiALxuql0jvGrS2GB8=">AAACS3icbVBNT9tAEF2HUr5KCfTIZdVQiUolsnMAjqhceiNIDSAlUTRej5MV67W1O46wLP8sfkZ/QNUr+QXcEAc2wYfy8aSV3rw3o5l9YaakJd//6zWWPix/XFldW9/4tPl5q7m9c2HT3AjsiVSl5ioEi0pq7JEkhVeZQUhChZfh9encv5yisTLVv6nIcJjAWMtYCiAnjZpnU2koB8VjBMoNHhCYMRLf2x+QVBGWN9UPXtOi+r7HpeY0QT6VQmpJBU/jRU0GXK3Ho2bLb/sL8LckqEmL1eiOmrNBlIo8QU1CgbX9wM9oWIIhKRRW64PcYgbiGsbYd1RDgnZYLj5e8W9OiXicGvc08YX6/0QJibVFErrOBGhiX3tz8T2vn1N8PCylznJCLZ4XxbnilPJ5ijySBgWpwhEQRrpbuZiAAUEu6xdbIjs/rXK5BK9TeEsuOu3gsN0577ROftYJrbJd9pXts4AdsRP2i3VZjwl2y/6xOzbz/nj33oP3+Nza8OqZL+wFGstPn6y0dg==</latexit>
virtual feature-target (x̃, ỹ) in the vicinity of the training <latexit sha1_base64="5W0Ttkbgaf3hdjcUl3JTXqJ0YTI=">AAACHXicbVDLSgNBEJyN7/eqRy+jUTCgYTcH9Sh68ahgNBDD0jvpTYbMPpjpFUPI2c/wC7zqF3gTr+IH+B9OYg5qLGgoqrrp7gozJQ153odTmJicmp6ZnZtfWFxaXnFX165MmmuBVZGqVNdCMKhkglWSpLCWaYQ4VHgddk4H/vUtaiPT5JK6GTZiaCUykgLISoG7GSFQrnGfQLeQ+PbuXSD3eDeQpW2egdSlwC16ZW8IPk78ESmyEc4D9/OmmYo8xoSEAmPqvpdRoweapFDYn7/JDWYgOtDCuqUJxGgaveErfb5jlSaPUm0rIT5Uf070IDamG4e2MwZqm7/eQPzPq+cUHTV6MslywkR8L4pyxSnlg1x4U2oUpLqWgNDS3spFGzQIsun92tI0g9P6Nhf/bwrj5KpS9g/KlYtK8fhklNAs22BbbJf57JAdszN2zqpMsHv2yJ7Ys/PgvDivztt3a8EZzayzX3DevwBFZKGC</latexit>
<latexit sha1_base64="242uxs5MO1qu43qskWjFvBDWs8I=">AAACN3icbVC7SgNBFJ31bXxFLW0Gg6BN2E2hgo1ooaWCiYEYwt3Zm2RwdnaZuRsJIR/jZ/gFtlpaWSm2/oGzMYVGDwycOedezswJUyUt+f6LNzU9Mzs3v7BYWFpeWV0rrm/UbJIZgVWRqMTUQ7CopMYqSVJYTw1CHCq8Dm9Pc/+6h8bKRF9RP8VmDB0t21IAOalVPNqFrBOjJqk7nLrIyYDU+SUCAn4nqcshiiTJHvIzyKyVoLlOpMW9VrHkl/0R+F8SjEmJjXHRKr7dRInI8jihwNpG4KfUHIAhKRQOCzeZxRTELXSw4aiGGG1zMPrkkO84JeLtxLijiY/UnxsDiK3tx6GbjIG6dtLLxf+8Rkbtw+ZA6jQj1OI7qJ0pTgnPG+ORNChI9R0BYVwTgosuGBDkev2VEtn8aUPXSzDZwl9Sq5SD/XLlslI6Phk3tMC22DbbZQE7YMfsnF2wKhPsnj2yJ/bsPXiv3rv38T065Y13NtkveJ9fGC+tYQ==</latexit>
1 X X
<latexit sha1_base64="qMeOAsbzWAJ1ltsPE2/0+5nwexw=">AAACKHicbZDLSsNAFIYn9VbrLerSTbAIFaQkXajLooW6rNReoA1lMjlth04uzEyEEvoSPoZP4FafwJ10K/geTtIsbOuBYX7+c+VzQkaFNM25ltvY3Nreye8W9vYPDo/045O2CCJOoEUCFvCugwUw6kNLUsmgG3LAnsOg40zuk3znGbiggf8kpyHYHh75dEgJlsoa6Ff9dEbMwZ01ZUDGWEhKjDrHLgVfGjUQJPlLzXrtcqAXzbKZhrEurEwUURaNgf7TdwMSeWoCYViInmWG0o4xVzsYzAr9SECIyQSPoKekjz0QdpxeNDMulOMaw4Crpy5I3b8dMfaEmHqOqvSwHIvVXGL+l+tFcnhrx9QPIwk+WSwaRsyQgZEgMlzKgUg2VQITThMeigvHRCqQS1tckZw2U1ysVQrrol0pW9flymOlWL3LCOXRGTpHJWShG1RFD6iBWoigF/SG3tGH9qp9al/afFGa07KeU7QU2vcv0venGw==</latexit>
1 X 1 X X wt+k = wt ⌘ rl(x; wt+j ) ! k iterations of SGD with learning rate ⌘ and a mini-b <latexit sha1_base64="YWWm3eL1kGTW+oDKdz5nHOSTCN0=">AAACJnicbZDLSgMxFIYzXmu9jbp0EyxC66LMFFFBhKIbFy4q2At0SsmkmTY0kxmSjG2Zzjv4GD6BW30CdyLu3PgeppeFbf0h8POfczgnnxsyKpVlfRlLyyura+upjfTm1vbOrrm3X5FBJDAp44AFouYiSRjlpKyoYqQWCoJ8l5Gq270Z1auPREga8Ac1CEnDR21OPYqR0lHTPLnL9nLwCjqeQDi2k3hYGybQkZHfjPvQoRzWEsiy/ctermlmrLw1Flw09tRkwFSlpvnjtAIc+YQrzJCUddsKVSNGQlHMSJJ2IklChLuoTeracuQT2YjHf0rgsU5a0AuEflzBcfp3Ika+lAPf1Z0+Uh05XxuF/9XqkfIuGjHlYaQIx5NFXsSgCuAIEGxRQbBiA20QFlTfCnEHaTpKY5zZ0pKj0xLNxZ6nsGgqhbx9li/cn2aK11NCKXAIjkAW2OAcFMEtKIEywOAJvIBX8GY8G+/Gh/E5aV0ypjMHYEbG9y9CwKUv</latexit>
<latexit sha1_base64="AGZ5HUu5CDfC2ka9u4m0pnuGx5w=">AAACwHicbVHbbtNAEF2bWym3FB55GREjFVAju0IUCZCqgASPQZC2EEfWeLOON1mv3d0xSWr57/gJPoD/wE6DRFtGWunonJmdozNxoaQl3//luNeu37h5a+v29p279+4/6Ow8PLJ5abgY8lzl5iRGK5TUYkiSlDgpjMAsVuI4nr9v9eMfwliZ66+0KsQ4w6mWieRIDRV1fi6iil7Ma3gHi4hgD0JBCGFikFdBXekaQltmUeVDqMQpzOAtzP9ySwilhjBDSjmqql9Hs0bSGCsEtbt8A+u/Z/UzCI2cpoTG5AsISSyp8uYeSBJmbcNCnsCXjx9gISkFJdBoqafQiAK81pAHqCeAkEkt92IknoKVZ6Id87RXR52u3/PXBVdBsAFdtqlB1PkdTnJeZkITV2jtKPALGldoSHIl6u2wtKJAPsepGDVQYybsuFqnXcPThplAkpvmaYI1++9EhZm1qyxuOtto7GWtJf+njUpKXo8rqYuShObni5JSAeXQng4m0ghOatUA5EY2XoGn2FyqyfHiloltrbW5BJdTuAqO9nvBq97+55fdw/4moS32mD1huyxgB+yQfWIDNmTcee4MnG/Od7fvpm7unp63us5m5hG7UO7ZH1lE2H8=</latexit>
0j<k x2Bj
L(w) ! loss function ŵt+1 = wt ⌘ˆ rl(x; wt ) ! one iteration of SGD with learning rate ⌘ˆ and a large
<latexit sha1_base64="Mv6mwwDQhdDEwrOQd+qryPdKqWM=">AAACIXicbVDLTgIxFO34RHyhLt00EhLckBli1CXRjQsXmMgjAUI6pQMNnXbS3hHJhC/wM/wCt/oF7ow749r/cAZmIeBJmpycc27u7XEDwQ3Y9pe1srq2vrGZ2cpu7+zu7ecODutGhZqyGlVC6aZLDBNcshpwEKwZaEZ8V7CGO7xO/MYD04YreQ/jgHV80pfc45RALHVzhdvi6BS3Ne8PgGitRrgN7BEioYzBXihpEpt0c3m7ZE+Bl4mTkjxKUe3mfto9RUOfSaCCGNNy7AA6EdHAqWCTbDs0LCB0SPqsFVNJfGY60fQ7E1yIlR72lI6fBDxV/05ExDdm7Ltx0icwMIteIv7ntULwLjsRl0EITNLZIi8UGBROusE9rhkFMY4JoZrHt2I6IJpQiBuc29IzyWlJL85iC8ukXi4556Xy3Vm+cpU2lEHH6AQVkYMuUAXdoCqqIYqe0At6RW/Ws/VufVifs+iKlc4coTlY37843aTq</latexit>
w ! weights
<latexit sha1_base64="ThM5TXWr/3ArlmONGwKS713TI38=">AAACGHicbZBLTgJBEIZ78IX4GnWpi47ExBWZIUZdEt24xEQeCRDS0xTQoeeR7hqRTNh4DE/gVk/gzrh15wG8hz3AQsA/6eTLX1Wp6t+LpNDoON9WZmV1bX0ju5nb2t7Z3bP3D6o6jBWHCg9lqOoe0yBFABUUKKEeKWC+J6HmDW7Seu0BlBZhcI+jCFo+6wWiKzhDY7Xt4yFtKtHrI1MqNIzwiMkQUkeP23beKTgT0WVwZ5AnM5Xb9k+zE/LYhwC5ZFo3XCfCVsIUCi5hnGvGGiLGB6wHDYMB80G3kskvxvTUOB3aDZV5AdKJ+3ciYb7WI98znT7Dvl6speZ/tUaM3atWIoIoRgj4dFE3lhRDmkZCO0IBRzkywLgS5lbK+0wxjia4uS0dnZ6W5uIuprAM1WLBvSgU787zpetZQllyRE7IGXHJJSmRW1ImFcLJE3khr+TNerberQ/rc9qasWYzh2RO1tcv6bihpw==</latexit>
0j<k x2Bj
ŵt+1 6= wt+k
<latexit sha1_base64="NZlSzswu73wV8ij3q4xnrz4CWLE=">AAACFnicbVDLSgMxFM34rPVVdaebYBEEocwUUZdFNy4r2Ae0Zchk0jY0kxmTO5YyDPgZfoFb/QJ34tatH+B/mGm7sK0HAueecy/35niR4Bps+9taWl5ZXVvPbeQ3t7Z3dgt7+3UdxoqyGg1FqJoe0UxwyWrAQbBmpBgJPMEa3uAm8xuPTGkeynsYRawTkJ7kXU4JGMktHLb7BJJh6iZw5qS4LdkDHmbFIHULRbtkj4EXiTMlRTRF1S38tP2QxgGTQAXRuuXYEXQSooBTwdJ8O9YsInRAeqxlqCQB051k/IcUnxjFx91QmScBj9W/EwkJtB4FnukMCPT1vJeJ/3mtGLpXnYTLKAYm6WRRNxYYQpwFgn2uGAUxMoRQxc2tmPaJIhRMbDNbfJ2dluXizKewSOrlknNRKt+dFyvX04Ry6Agdo1PkoEtUQbeoimqIoif0gl7Rm/VsvVsf1uekdcmazhygGVhfvxegoAo=</latexit>
rl(x; wt ) label
⇡ rl(x; wt+j ) ! assume
<latexit sha1_base64="378l+1EGh2AqVKD1KHFcGS1WhBU=">AAACQ3icbVDLSgMxFM34flt16SZYBEUoMyIquBFd6FLBqtAp5U6atrGZZEjuaMvQT/Iz/AJXgi5duRO3gpnahVUPBA7n3FdOlEhh0fefvJHRsfGJyanpmdm5+YXFwtLypdWpYbzMtNTmOgLLpVC8jAIlv04MhziS/CpqH+f+1S03Vmh1gd2EV2NoKtEQDNBJtcJJqCCSQOVG54De1XCThpAkRnfosJHh1k3PmUY0WwjG6DsaIu9gBtamMe/VCkW/5PdB/5JgQIpkgLNa4TWsa+ZaFTLphlQCP8FqBgYFk7w3E6aWJ8Da0OQVRxXE3Faz/od7dN0pddrQxj2FtK/+7MggtrYbR64yBmzZ314u/udVUmzsVzOhkhS5Yt+LGqmkqGmeHq0LwxnKriPAjHC3UtYCAwxdxkNb6jY/Lc8l+J3CX3K5XQp2S9vnO8XDo0FCU2SVrJENEpA9ckhOyRkpE0buySN5Ji/eg/fmvXsf36Uj3qBnhQzB+/wCHJOx3A==</latexit>
⌘ˆ = k⌘ =) ŵt+1 ⇡ wt+k
<latexit sha1_base64="z4N0gk0K6eAeNCttDBnMsZOQhtw=">AAACNnicbVDLSgNBEJz1bXxFPXoZDIIghF0RFUEQvXhUMDGQDaF30jFDZh/M9Kph2X/xM/wCr3r14k28+gnOxhyM2jBQXdVN9VSQKGnIdV+dicmp6ZnZufnSwuLS8kp5da1u4lQLrIlYxboRgEElI6yRJIWNRCOEgcLroH9W6Ne3qI2MoysaJNgK4SaSXSmALNUuH/k9oMxHgpwf8z4vEPdlaL3R8KF4l7cz2vFy7kOS6Pie3xV9P2+XK27VHRb/C7wRqLBRXbTL734nFmmIEQkFxjQ9N6FWBpqkUJiX/NRgAqIPN9i0MIIQTSsb/jHnW5bp8G6s7YuID9mfGxmExgzCwE6GQD3zWyvI/7RmSt3DViajJCWMxLdRN1WcYl4ExjtSoyA1sACElvZWLnqgQZCNdcylY4rTily83yn8BfXdqrdf3b3cq5ycjhKaYxtsk20zjx2wE3bOLliNCfbAntgze3EenTfn3fn4Hp1wRjvrbKyczy8wzaz0</latexit>
<latexit sha1_base64="WmRxOn24c2wz17DLx7m+ZWI+YTM=">AAACFnicbVDLSsNAFJ34rPUVdaebwSK4KkkX6rLoQpcV7APaUG4mk3bo5MHMRCgh4Gf4BW71C9yJW7d+gP/hJM3Cth4YOJxz75zLcWPOpLKsb2NldW19Y7OyVd3e2d3bNw8OOzJKBKFtEvFI9FyQlLOQthVTnPZiQSFwOe26k5vc7z5SIVkUPqhpTJ0ARiHzGQGlpaF5PCj+SAX1slsBXgIcd0EESYyHZs2qWwXwMrFLUkMlWkPzZ+BFJAloqAgHKfu2FSsnBaEY4TSrDhJJYyATGNG+piEEVDppkZ/hM6142I+EfqHChfp3I4VAymng6skA1Fguern4n9dPlH/lpCyME0VDMgvyE45VhPNCsMcEJYpPNQEimL4VkzEIIErXNpfiyfy0TPdiL7awTDqNun1Rb9w3as3rsqEKOkGn6BzZ6BI10R1qoTYi6Am9oFf0Zjwb78aH8TkbXTHKnSM0B+PrF4d0oEs=</latexit>
ss-entropy + a regularization on w) rl(x; wt ) ⇡ rl(x; wt+j ) ! does not hold initially during training when the network is changing
<latexit sha1_base64="DbiwYHVJHbpfyLFCdUjF8/rRwTw=">AAACiXicbZHfahNBFMZn13+19U/UK/Hm0CBUhJAtokVvir3xsoJpC9kQzs6e7I6ZnVlmzpqEJY/gA/oAPoEv4GyaC9N6YODj+87hDL+T1Vp5Hg5/RfGdu/fuP9h7uH/w6PGTp71nzy+8bZykkbTauqsMPWllaMSKNV3VjrDKNF1m87Muv/xBzitrvvGqpkmFhVEzJZGDNe39TA1mGkEfLT/BYspvIMW6dnYJu0HLb7+vQ+hUUTI6ZxeQMi25zS15MJahtDoHZRQr1HoFeeOUKYAdBi+IRUkGuCQwxAvr5qA8yBJN0YUOa5Xr1Xra6w8Hw03BbZFsRV9s63za+53mVjYVGZYavR8nw5onLTpWUtN6P2081SjnWNA4SIMV+Um7wbaG18HJYWZdeIZh4/470WLl/arKQmeFXPqbWWf+Lxs3PDuZtMrUDZOR14tmjQa20N0AcuVIcgdJoXSBmOxQOJQcLrWzJffd1zouyU0Kt8XF8SB5Pzj++q5/+nlLaE+8EofiSCTigzgVX8S5GAkp/kQvo8OoHx/ESXwSf7xujaPtzAuxU/HZX6LjxtY=</latexit>
x; wt ) ⇡ rl(x; wt+j1) !Xdoes not hold initially during training when the network is changing rapidly <latexit sha1_base64="AfYEUtfP3QV7teRD6dcL7JQ1u/Y=">AAACaXicbVFdaxQxFM1M/aj1o1t9EX25uAgVcZkpooIIpQr6WNFtCzvLcCeb2Q1NMkNyx90lnV/pkz/AJ3+Bb2a2+9APDwQO59ybe3NS1Eo6SpJfUbxx4+at25t3tu7eu/9gu7fz8MhVjeViyCtV2ZMCnVDSiCFJUuKktgJ1ocRxcfqx849/COtkZb7TshZjjVMjS8mRgpT39Dz39DJt4QPMc4JXkAlCyEqL3KetP8s00oyj8gftWQuZa3TuF5BJAxecYBgsFILaXbzv7nkBmZXTGaG11RwyEgvy3z5/avNePxkkK8B1kq5Jn61xmPd+Z5OKN1oY4gqdG6VJTWOPliRXot3KGidq5Kc4FaNADWrhxn4VSwvPgzKBsrLhGIKVerHDo3ZuqYtQ2b3FXfU68X/eqKHy3dhLUzckDD8fVDYKqIIuY5hIKzipZSDIrQy7Ap9hiJTCT1yaMnHdal0u6dUUrpOjvUH6ZrD39XV//2Cd0CZ7yp6xXZayt2yffWGHbMg4+8n+RnG0Ef2Jd+LH8ZPz0jha9zxilxD3/wH49btD</latexit>
wt+1 = wt ⌘ rl(x; wt ) ! SGD Start with a learning rate of ⌘ and increment it by a constant amount at each
<latexit sha1_base64="B1xJMPiQVlm2j/2OvhT18PvbvXk=">AAACmnicbVHLbtNAFB2bAqW8AixhcUWChFhEdnhukKqyAbEpomkrJVF0Pb6ORxnPWDPXoMjKr/BffAD/wTj1grZcaTRH59zXnMlqrTwnye8ovrF389bt/TsHd+/df/Bw8OjxqbeNkzSVVlt3nqEnrQxNWbGm89oRVpmms2z9qdPPfpDzypoT3tS0qHBlVKEkcqCWg19zY5XJyTB8Z3QMPxWXgKAJnVFmBQ6ZwBYwmhPjCNDkoIx0VHUliiHbhGxpjWcMBFa26S4GQlkGndxuEARW6S7fdQL50K9EbrumW/gIa+jbF6ECRm9HQLWVpV8Ohsk42QVcB2kPhqKP4+Xgzzy3sum2kxq9n6VJzYs2vExJTduDeeOpRrnGFc0CNFiRX7Q7H7fwIjA5FNaFE16xY/+taLHyflNlIbNCLv1VrSP/p80aLj4sWmXqhsnIi0FFo4EtdJ8CuXIkWW8CQOlU2BVkiQ5lcOPylNx3q22DL+lVF66D08k4fTeefJsMD496h/bFU/FcvBSpeC8OxWdxLKZCRnvRq+h19CZ+Fh/FX+KvF6lx1Nc8EZciPvkLV3PKLQ==</latexit>
|B|
x2B
iteration until it reaches ⌘ˆ = k⌘ after 5 epochs
B ! mini-batch sampled from X
<latexit sha1_base64="xCExidHpsjW/ddyEDUbAWuxCSYk=">AAACNnicbVDLThtBEJwFwisBDDnmMsIgccHatRBEnBC5cCQSBktey+odt+0R81jN9EKslf8ln5Ev4EquXHJDueYTmDU+8EhLLZWqulXdleVKeorjh2hufuHD4tLyyurHT2vrG7XNrUtvCyewJayyrp2BRyUNtkiSwnbuEHSm8Cq7/lbpVzfovLTmgsY5djUMjRxIARSoXu041UAjAao8nfDUyeGIwDl7y1PCH1RqaeR+BiRG3IPOFfb5wFnNd9o7k16tHjfiafH3IJmBOpvVea/2mPatKDQaEgq87yRxTt0SHEmhcLKaFh5zENcwxE6ABjT6bjn9ccJ3AxPMrQttiE/ZlxslaO/HOguT1Uf+rVaR/9M6BQ2+dktp8oLQiGejQaE4WV4FxvvSoSA1DgCEk+FWLkbgQFCI9ZVL31enVbkkb1N4Dy6bjeSw0fx+UD85nSW0zL6wbbbHEnbETtgZO2ctJthPdsfu2e/oV/Qneoz+Po/ORbOdz+xVRf+eANzIrUc=</latexit>
lB (x; w) ! loss of a single sample x depends on the statistic of all samples in its m
<latexit sha1_base64="JWA4mKLEVDCQRYbsg038adFpmxE=">AAACiHicbVHLbtswEKTUV+q+nPZSoJdt7ALpoYYUFE2LXgL30mMK1EkAyzAoamURoUiBXCU2BP1Bf7Af0B/oF5RydMijCxAczMxyF8O0UtJRFP0Ownv3Hzx8tPN48OTps+cvhrsvT5yprcCZMMrYs5Q7VFLjjCQpPKss8jJVeJqef+v00wu0Thr9kzYVLkq+0jKXgpOnlsNfatkkJadCcNVM23Z//RUu30Ni5aogbq25hIRwTY0yzoHJgYOTeqUQHC8rf43XY8iwQp15WQMVXiH/uCMptn6leqsDqUGSg1Jq+SHlJAoYX5s9bpfDUTSJtgV3QdyDEevreDn8k2RG1CVqEoo7N4+jihYNt364wnaQ1A4rLs75Cuceal6iWzTb1Fp455kMcmP90QRb9npHw0vnNmXqnd2S7rbWkf/T5jXlnxeN1FVNqMXVoLxWQAa6L4BMWhSkNh5wYWUXlCi45YL8R92YkrlutS6X+HYKd8HJwST+NDn48XF0NO0T2mFv2B7bZzE7ZEfsOztmMybY3+B18DbYCwdhFB6GX66sYdD3vGI3Kpz+A+Pxxb0=</latexit>
lB (x; w) ! loss of a single sample x depends on the statistic of all samples in its mini-batch B
<latexit sha1_base64="JWA4mKLEVDCQRYbsg038adFpmxE=">AAACiHicbVHLbtswEKTUV+q+nPZSoJdt7ALpoYYUFE2LXgL30mMK1EkAyzAoamURoUiBXCU2BP1Bf7Af0B/oF5RydMijCxAczMxyF8O0UtJRFP0Ownv3Hzx8tPN48OTps+cvhrsvT5yprcCZMMrYs5Q7VFLjjCQpPKss8jJVeJqef+v00wu0Thr9kzYVLkq+0jKXgpOnlsNfatkkJadCcNVM23Z//RUu30Ni5aogbq25hIRwTY0yzoHJgYOTeqUQHC8rf43XY8iwQp15WQMVXiH/uCMptn6leqsDqUGSg1Jq+SHlJAoYX5s9bpfDUTSJtgV3QdyDEevreDn8k2RG1CVqEoo7N4+jihYNt364wnaQ1A4rLs75Cuceal6iWzTb1Fp455kMcmP90QRb9npHw0vnNmXqnd2S7rbWkf/T5jXlnxeN1FVNqMXVoLxWQAa6L4BMWhSkNh5wYWUXlCi45YL8R92YkrlutS6X+HYKd8HJwST+NDn48XF0NO0T2mFv2B7bZzE7ZEfsOztmMybY3+B18DbYCwdhFB6GX66sYdD3vGI3Kpz+A+Pxxb0=</latexit>
<latexit sha1_base64="SVWed4FJaIfB4GNZqMh7VOUttN8=">AAACH3icbVDLTgIxFO3gC/GFunTTSExwQ2ZYqEuiG5eYyCMBQjqdCzS0M5P2jgkhfICf4Re41S9wZ9zyAf6HHZiFgidpcnLOffX4sRQGXXfu5DY2t7Z38ruFvf2Dw6Pi8UnTRInm0OCRjHTbZwakCKGBAiW0Yw1M+RJa/vgu9VtPoI2IwkecxNBTbBiKgeAMrdQvlgKGjMZMMyntEKNoWSUSRSyBGtBp66WtcivuAnSdeBkpkQz1fvG7G0Q8URAil8yYjufG2JsyjYJLmBW6iYGY8TEbQsfSkCkwveniMzN6YZWADiJtX4h0of7umDJlzET5tlIxHJlVLxX/8zoJDm56UxHGCULIl4sGiaQY0TQZGggNHOXEEsa1sLdSPrLBcLQh/NkSmPS0mc3FW01hnTSrFe+qUn2olmq3WUJ5ckbOSZl45JrUyD2pkwbh5Jm8kjfy7rw4H86n87UszTlZzyn5A2f+A0LSo9I=</latexit>
<latexit sha1_base64="vTlClPQmNYtuqLqu5le/hH6Tyts=">AAACTnicbVDLSsNAFJ3UV31XXboZrIKrkhR8IAiiGxcufFWFppSb6U0dOpmEmYlQYv7Lz3Drxp3oF7gTnbRd+LowcDjn3Ln3niARXBvXfXJKY+MTk1Pl6ZnZufmFxcrS8pWOU8WwwWIRq5sANAousWG4EXiTKIQoEHgd9I4K/foOleaxvDT9BFsRdCUPOQNjqXbl/MQ2gqIXDOwXXXqeCtyj6z4aoPvUrXnUDxWw7N6PwNxaU3aY3+dZfWs7t/o3Q4/KIb3erlTdmjso+hd4I1AlozptV178TszSCKVhArRuem5iWhkow5nAfMZPNSbAetDFpoUSItStbHB7Tjcs06FhrOyThg7Y7x0ZRFr3o8A6iwv0b60g/9OaqQl3WxmXSWpQsuGgMBXUxLQIkna4QmZE3wJgittdKbsFG4Wxcf+Y0tHFarnNxfudwl9wVa9527X6Wb16cDhKqExWyRrZJB7ZIQfkmJySBmHkgTyTV/LmPDrvzofzObSWnFHPCvlRpfIXbvWyYQ==</latexit>
|B| kn
Linear Scaling Rule: ⌘ = 0.1 256 = 0.1 256 X n ! all distinct subsets of size n drawn from the original training set X
<latexit sha1_base64="Do4OkWlsoThMNegjJPdw0fFemnA=">AAACXXicbZA9TxtBEIbXByTEIeCQIkWaEXakVNYdQoQSJQ0lkWKwZDvW3N6cvWJv97Q7B3FO/n35DVRUlGmTNmvjgq+RVnr1zuc+aamV5zi+bkRr6xsvXm6+ar7eerO903q7e+Zt5ST1pNXW9VP0pJWhHivW1C8dYZFqOk8vvi7y55fkvLLmO89KGhU4MSpXEjlY4xb2fxgYOjWZMjpnr2DI9JNr1BqysF0ZyeCr1BN7sDl49YugYzqQObwykDtbAE8JbJigDGpgh8ooM4HQAZ1+Z94ct9pxN14GPBXJSrTFKk7HrdthZmVVkGGp0ftBEpc8qtGxkprmzWHlqUR5gRMaBGmwID+qlyjm8DE4GeTWhWcYlu79jhoL72dFGioL5Kl/nFuYz+UGFedHo1qZsmIy8m5RXoUPW1hwDbQcSdazIFA6FW4FOUWHkgP9B1syvzhtHrgkjyk8FWf73eSwu//toH38ZUVoU3wQe+KTSMRncSxOxKnoCSl+iz/ir/jXuIk2oq1o+640aqx63okHEb3/D8UOuDE=</latexit>
<latexit sha1_base64="S4Zg1H6m7KBeumGlG/iiVYDfVic=">AAACGnicbVC7TsMwFHXKq5RXgBEJGVqkslRJB2CsYGFgKBJ9SG1UOY7TWnWcyHZAVdSNz+ALWOEL2BArCx/Af+C0EaItR7J07jn36l4fN2JUKsv6MnJLyyura/n1wsbm1vaOubvXlGEsMGngkIWi7SJJGOWkoahipB0JggKXkZY7vEr91j0Rkob8To0i4gSoz6lPMVJa6pmHpZvyw2kJ4gHifSIhkrDEf8ujnlm0KtYEcJHYGSmCDPWe+d31QhwHhCvMkJQd24qUkyChKGZkXOjGkkQID1GfdDTlKCDSSSb/GMMTrXjQD4V+XMGJ+nciQYGUo8DVnQFSAznvpeJ/XidW/oWTUB7FinA8XeTHDKoQpqFAjwqCFRtpgrCg+tY0AYGw0tHNbPFketpY52LPp7BImtWKfVap3laLtcssoTw4AMegDGxwDmrgGtRBA2DwCJ7BC3g1now34934mLbmjGxmH8zA+PwBmBGfgg==</latexit>
[ The BN statistics should not be computed across all workers, not only for the sake of reducing
communication, but also for maintaining the same underlying loss function being optimized.
<latexit sha1_base64="HiDuPg0AdmYLZo9nymosvt2I7kk=">AAACUXicbVA9b9RAEN0zgYQkwAElzSiXSKlOdoSAgiIKTcogcUmk88ka7419m1vvmt1x4LD8y/gZVCkpaOAX0GFfriAfTxrp6b0ZzcxLS608h+FVL3iw9vDR+sbjza3tJ0+f9Z+/OPW2cpJG0mrrzlP0pJWhESvWdF46wiLVdJbOP3T+2SU5r6z5xIuSJgXmRmVKIrdS0h/FqcplVSZ1CLGmz3AB72HeQFwgzyTq+qhJLiB2Kp8xOme/QMz0lWuNLicolFEpspyBzcCrbwS7czC7TdIfhMNwCbhLohUZiBVOkv6veGplVZBhqdH7cRSWPKnRsZKams248lSinGNO45YaLMhP6uX7Dey1yhQy69oyDEv1/4kaC+8XRdp2dl/5214n3ueNK87eTWplyorJyOtFWaWBLXRZwlQ5kqwXLUHpVHsryBk6lNwmfmPL1HendblEt1O4S04PhtGb4cHH14PDo1VCG+KV2BH7IhJvxaE4FidiJKT4Ln6K3+JP70fvbyCC4Lo16K1mXoobCLb+AYOGtGI=</latexit>
f : Rn ! R
<latexit sha1_base64="TkUgG4EOGrwbUNJftZ37scNlgb4=">AAACEHicbVC7TsMwFHV4lvIKMLJYVAimKgEEiKmChbEg+pCaUDmu01p17Mh2QFXUT2DhV1gYQIiVkY2/wWkzlJYjXenonHt17z1BzKjSjvNjzc0vLC4tF1aKq2vrG5v21nZdiURiUsOCCdkMkCKMclLTVDPSjCVBUcBII+hfZX7jgUhFBb/Tg5j4EepyGlKMtJHa9kF4Ab0I6V4QpLfDew49Sbs9jaQUjxMGbNslp+yMAGeJm5MSyFFt299eR+AkIlxjhpRquU6s/RRJTTEjw6KXKBIj3Edd0jKUo4goPx09NIT7RunAUEhTXMOROjmRokipQRSYzuxENe1l4n9eK9HhuZ9SHieacDxeFCYMagGzdGCHSoI1GxiCsKTmVoh7SCKsTYZFE4I7/fIsqR+V3dPy8c1JqXKZx1EAu2APHAIXnIEKuAZVUAMYPIEX8AberWfr1fqwPsetc1Y+swP+wPr6BZMxnPI=</latexit>
function to be minimized
<latexit sha1_base64="XhPGxVJ2VVphVS++9+2VcL3vNAc=">AAACCHicbVC7SgNBFJ31bXxFLS0cDIJV2FVRy6CNpYJ5QBLC7OSuDs7MLjN3xbiktPFXbCwUsfUT7PwbZ5MUmnhg4HDOvdw5J0yksOj7397U9Mzs3PzCYmFpeWV1rbi+UbNxajhUeSxj0wiZBSk0VFGghEZigKlQQj28Pcv9+h0YK2J9hb0E2opdaxEJztBJneJ2C+EesyjVPBcoxjQEqoQWSjxAt98plvyyPwCdJMGIlMgIF53iV6sb81SBRi6Ztc3AT7CdMYOCS+gXWqmFhPFbdg1NRzVTYNvZIEif7jqlS6PYuKeRDtTfGxlT1vZU6CYVwxs77uXif14zxeiknQmdpAiaDw9Fqczj5q3QrjDAUfYcYdwI91fKb5hhHF13BVdCMB55ktT2y8FR+eDysFQ5HdWxQLbIDtkjATkmFXJOLkiVcPJInskrefOevBfv3fsYjk55o51N8gfe5w9mh5o4</latexit>
rft (xt ) ! gradient information obtained on a relatively small t-th batch of b data-points
<latexit sha1_base64="kiUvbru1VbE18mCdGVnQk3lKVTM=">AAACX3icbVFNb9NAEF2bj5a0tC6cEJcRCVI5NLIBFY4VXDgWibSV4siaXa/jVde71u44NLLyJ7khceGfsElzgJaRVnp6M6N57y1vtfKUpj+j+MHDR493dp8M9vafHhwmR88uvO2ckBNhtXVXHL3UysgJKdLyqnUSG67lJb/+vO5fLqTzyppvtGzlrMG5UZUSSIEqkkVukGuEqqDjm4LeQO7UvCZ0zn6HnOQN9XOHpZKGQJnKumazCJYThpMlBIzgpA70Quol+Aa1hhGNTqgGjiRqsBWM+AhKJDxprTLkV0UyTMfppuA+yLZgyLZ1XiQ/8tKKrgk6hEbvp1na0qxHR0pouRrknZctimucy2mABhvpZ/0mnxW8DkwJQXx4wceG/Xujx8b7ZcPDZLBX+7u9Nfm/3rSj6uOsV6btSBpxe6jqNJCFddhQKicFhVRKhcKpoBVEjQ4FhS8ZhBCyu5bvg4u34+x0/O7r++HZp20cu+wle8WOWcY+sDP2hZ2zCRPsVxRHe9F+9DveiQ/i5HY0jrY7z9k/Fb/4A1yTtX8=</latexit>
⌘t ! learning rate
<latexit sha1_base64="l606J9IF17qrLg3reFdZOssTuoA=">AAACEHicbVA9SwNBEN3z2/gVtbRZDKJVuFNRy6CNpYJJhFwIc5tJsri3d+zOqeHIT7Dxr9hYKGJraee/cRNT+PVg4PHeDDPzolRJS77/4U1MTk3PzM7NFxYWl5ZXiqtrNZtkRmBVJCoxlxFYVFJjlSQpvEwNQhwprEdXJ0O/fo3GykRfUD/FZgxdLTtSADmpVdwOkaBFPDSy2yMwJrnhIeEt5QrBaKm73ADhoFUs+WV/BP6XBGNSYmOctYrvYTsRWYyahAJrG4GfUjMHQ1IoHBTCzGIK4gq62HBUQ4y2mY8eGvAtp7R5JzGuNPGR+n0ih9jafhy5zhioZ397Q/E/r5FR56iZS51mhFp8LepkilPCh+nwtjQoSPUdAWGku5WLHhgQ5DIsuBCC3y//JbXdcnBQ3jvfL1WOx3HMsQ22yXZYwA5ZhZ2yM1Zlgt2xB/bEnr1779F78V6/Wie88cw6+wHv7RPUMZ28</latexit>
<latexit sha1_base64="x1H+8zlD1WJ0v3cE1Z4bQ+R8I70=">AAACKXicbVDLTgIxFO3gC/GFunTTSExwQ2Y0UZdETXCJIEIChHQ6F2nsPNLe0ZAJv+PGX3GjiUbd+iN2gIWiN2l6cs69p7fHjaTQaNsfVmZufmFxKbucW1ldW9/Ib25d6zBWHBo8lKFquUyDFAE0UKCEVqSA+a6Epnt7lurNO1BahMEVDiPo+uwmEH3BGRqqly93xh6JAm9Ux5APmEbBaUUxT0CA9Bw0T+97gQPaZMqnNdDIFGparFfOa/u9fMEu2eOif4EzBQUyrWov/9LxQh77xpVLpnXbsSPsJsZScAmjXCfWEDF+y26gbWDAfNDdZLzliO4ZxqP9UJljthqzPycS5ms99F3T6TMc6FktJf/T2jH2T7qJCKIYIeCTh/qxpBjSNDbqCQUc5dAAxpVIMzJZKcbRhJszITizX/4Lrg9KzlHp8PKgUD6dxpElO2SXFIlDjkmZXJAqaRBOHsgTeSVv1qP1bL1bn5PWjDWd2Sa/yvr6BrrLptQ=</latexit>
t = 0 =) ⌘t = ⌘max
i
<latexit sha1_base64="z2AuyhhVaX7E9Q3w1yfxjP64ZBo=">AAACGnicbZDLSgMxFIYz3q23qks3wSK4KjMq6kYQ3bis0Bt06pBJTzWYZIbkjFiGPocbX8WNC0XciRvfxvSyUOsPgY//nMPJ+eNUCou+/+VNTc/Mzs0vLBaWlldW14rrG3WbZIZDjScyMc2YWZBCQw0FSmimBpiKJTTi2/NBvXEHxopEV7GXQluxay26gjN0VlQMqlGIcI85z0yfntBqJGgolFsNloaALELnDiFXQvevRFQs+WV/KDoJwRhKZKxKVPwIOwnPFGjkklnbCvwU2zkzKLiEfiHMLKSM37JraDnUTIFt58PT+nTHOR3aTYx7GunQ/TmRM2VtT8WuUzG8sX9rA/O/WivD7nE7FzrNEDQfLepmkmJCBznRjjDAUfYcMG6E+yvlN8wwji7Nggsh+HvyJNT3ysFhef/yoHR6No5jgWyRbbJLAnJETskFqZAa4eSBPJEX8uo9es/em/c+ap3yxjOb5Je8z2/Hy6C5</latexit>
Tcur = Ti =) ⌘t = ⌘min
Loshchilov, Ilya, and Frank Hutter. "Sgdr: Stochastic gradient descent with warm restarts." arXiv preprint arXiv:1608.03983 (2016).
Decoupled Weight Decay Regularization
! weight decay
<latexit sha1_base64="7pdDBZiPxE3vsVdsizFcboFxIfc=">AAACCHicbVA9SwNBEN3zM8avqKWFi0GwCncqahm0sYxgPiA5wt7eJFmy98HunDEcKW38KzYWitj6E+z8N+4lKTTxwcDjvRlm5nmxFBpt+9taWFxaXlnNreXXNza3tgs7uzUdJYpDlUcyUg2PaZAihCoKlNCIFbDAk1D3+teZX78HpUUU3uEwBjdg3VB0BGdopHbhoKVEt4dMqWhAWwgPmA4gU6gPnA1H7ULRLtlj0HniTEmRTFFpF75afsSTAELkkmnddOwY3ZQpFFzCKN9KNMSM91kXmoaGLADtpuNHRvTIKD7tRMpUiHSs/p5IWaD1MPBMZ8Cwp2e9TPzPaybYuXRTEcYJQsgnizqJpBjRLBXqCwUc5dAQxpUwt1LeY4pxNNnlTQjO7MvzpHZScs5Lp7dnxfLVNI4c2SeH5Jg45IKUyQ2pkCrh5JE8k1fyZj1ZL9a79TFpXbCmM3vkD6zPH8T/mnU=</latexit>
Attention Module
<latexit sha1_base64="/iC8fzPgD/TE1QkXLj4Mdkqkn+o=">AAACF3icbVC7TsMwFHXKq5RXgLGLRYXEVCUdgLHAwoJUJPqQ2qhynNvWqvOQ7SBVUQY+gy9ghS9gQ6yMfAD/gZNmoC1HsnR07uv4uBFnUlnWt1FaW9/Y3CpvV3Z29/YPzMOjjgxjQaFNQx6KnkskcBZAWzHFoRcJIL7LoetOb7J69xGEZGHwoGYROD4ZB2zEKFFaGprVQb4jEeClV0pBkMn4LvRiDkOzZtWtHHiV2AWpoQKtofkz8EIa+3oL5UTKvm1FykmIUIxySCuDWEJE6JSMoa9pQHyQTpIbSPGpVjw8CoV+gcK5+nciIb6UM9/VnT5RE7lcy8T/av1YjS6dhAVRrL9H54dGMccqxFki2GMCqOIzTQgVTHvFdEIEoUrntnDFk5m1VOdiL6ewSjqNun1eb9w3as3rIqEyqqITdIZsdIGa6Ba1UBtR9IRe0Ct6M56Nd+PD+Jy3loxi5hgtwPj6BepVoRU=</latexit>
<latexit sha1_base64="Qz7OhACABr4nC0TzhYGYBfQwrE8=">AAACKXicbVDLSgMxFM34rO+qSzfBIuhiykwX6rLoxo1QwVqhDiWTSdvQTDIkN0op/Qo/wy9wq1/gTt2K/2Gm7cJaD1w4nHMv994TZ4IbCIIPb25+YXFpubCyura+sblV3N65McpqyupUCaVvY2KY4JLVgYNgt5lmJI0Fa8S989xv3DNtuJLX0M9YlJKO5G1OCTipVfR9H18S08OxJpJ28WGsAFTq2wyDyvxEPUhsQFsKVrOjVrEUlIMR8CwJJ6SEJqi1it93iaI2ZRKoIMY0wyCDaEA0cCrYcPXOGpYR2iMd1nRUkpSZaDB6a4gPnJLgttKuJOCR+ntiQFJj+mnsOlMCXfPXy8X/vKaF9mk04DKzwCQdL2pb4T7GeUY44ZpREH1HCNXc3Yppl2hCwSU5tSUx+WlDl0v4N4VZclMph8flylWlVD2bJFRAe2gfHaIQnaAqukA1VEcUPaJn9IJevSfvzXv3Psetc95kZhdNwfv6AYBHpvQ=</latexit>
!mask
<latexit sha1_base64="ISx3CrbdzVfp6JKybyLmFefRSjU=">AAACMHicbVC7TgMxEPTxfhOgpLGIkKCJ7hACSh4NBQUgAkhJFO35NomFz3ey94DolB/hM/gCWvgCqBAFDV+BL6TgNZKl0cyud3fCVElLvv/iDQ2PjI6NT0xOTc/Mzs2XFhbPbZIZgVWRqMRchmBRSY1VkqTwMjUIcajwIrw6KPyLazRWJvqMuik2Ymhr2ZICyEnN0mbdyHaHwJjkhtcJbynfI0JduPwUrYwyUPwIwWip23xt7/Rovdcslf2K3wf/S4IBKbMBjpul93qUiCx2/woF1tYCP6VGDoakUNibqmcWUxBX0MaaoxpitI28f12Przol4q3EuKeJ99XvHTnE1nbj0FXGQB372yvE/7xaRq2dRi51mrmDxdegVqY4JbyIikfSoCDVdQSEkW5XLjpgQJAL9MeUyBarFbkEv1P4S843KsFWZeNks7y7P0hogi2zFbbGArbNdtkhO2ZVJtgde2CP7Mm79569V+/tq3TIG/QssR/wPj4Byf2qtA==</latexit>
Attentiontrunk
Residual Learning (ARL)
<latexit sha1_base64="ZrICqKy6BLXXbJrG5943RXSCnj0=">AAAB/nicbVDLSgMxFL1TX7W+qi7dBIvgqsx0oS6LblxWsA9oh5LJZNrQJDMkGaEMBb/ArX6BO3Hrr/gB/oeZdha29UDgcM693JMTJJxp47rfTmljc2t7p7xb2ds/ODyqHp90dJwqQtsk5rHqBVhTziRtG2Y47SWKYhFw2g0md7nffaJKs1g+mmlCfYFHkkWMYJNLAuvJsFpz6+4caJ14BalBgdaw+jMIY5IKKg3hWOu+5ybGz7AyjHA6qwxSTRNMJnhE+5ZKLKj2s3nWGbqwSoiiWNknDZqrfzcyLLSeisBOCmzGetXLxf+8fmqiGz9jMkkNlWRxKEo5MjHKP45CpigxfGoJJorZrIiMscLE2HqWroQ6jzazvXirLayTTqPuXdUbD41a87ZoqAxncA6X4ME1NOEeWtAGAmN4gVd4c56dd+fD+VyMlpxi5xSW4Hz9AtmhlsA=</latexit> <latexit sha1_base64="T/naqgBifIMOdMz8ava/Pje18Ok=">AAAB/3icbVDLTsJAFL3FF+ILdemmkZi4Ii0LdUl04xITCyTQkOl0gAnTaTNza0IaFn6BW/0Cd8atn+IH+B9OoQsBTzLJyTn35p45QSK4Rsf5tkobm1vbO+Xdyt7+weFR9fikreNUUebRWMSqGxDNBJfMQ46CdRPFSBQI1gkmd7nfeWJK81g+4jRhfkRGkg85JWgkD1UqJ4Nqzak7c9jrxC1IDQq0BtWffhjTNGISqSBa91wnQT8jCjkVbFbpp5olhE7IiPUMlSRi2s/mYWf2hVFCexgr8yTac/XvRkYiradRYCYjgmO96uXif14vxeGNn3GZpMgkXRwapsLG2M5/bodcMYpiagihipusNh0TRSiafpauhDqPNjO9uKstrJN2o+5e1RsPjVrztmioDGdwDpfgwjU04R5a4AEFDi/wCm/Ws/VufVifi9GSVeycwhKsr1/Y55dS</latexit>
Wang, Fei, et al. "Residual attention network for image classi cation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
fi
fi
Squeeze-and-Excitation Networks
YouTube Playlist
H 0 ⇥W 0 ⇥C 0
X2R U 2 RH⇥W ⇥C x̃c = Fscale (uc , sc ) = sc uc
Ftr : X 7! U uc 2 RH⇥W
V = [v1 , . . . , vC ] ! learned set of filter kernels X̃ = [x̃1 , . . . , x̃C ]
vc ! parameters of the c-th filter
U = [u1 , . . . , uC ]
0
XC
uc = v c ⇤ X = vcs ⇤ xs
s=1
1 C0
vc = [vc , . . . , vc ]
1 C0
X = [x , . . . , x ]
Squeeze 1 XH X W
Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
CBAM: Convolutional Block Attention Module
YouTube Video
r ! reduction ratio
Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.
ResNeSt: Split-Attention Networks
ResNeSt = ResNeXt + Squeeze-and-Excitation
<latexit sha1_base64="fFaMv+6YkKNbH1dvOn+QbzMnfs0=">AAACLnicbVDLSgMxFM34tr6qLt0EiyCIZUZF3QhFEVyJr2qhLSWTua3BTGZM7kjr0O/wN/wBt/oHggtxJ36G6diFrwOBwzn3cG+OH0th0HVfnIHBoeGR0bHx3MTk1PRMfnbu3ESJ5lDmkYx0xWcGpFBQRoESKrEGFvoSLvyrvZ5/cQPaiEidYSeGeshaSjQFZ2ilRt6rIbQxPQFzCKdId2jGKkhX6Ol1AnALq0wFq/ttLjCLdBv5glt0M9C/xOuTAunjqJF/rwURT0JQyCUzpuq5MdZTplFwCd1cLTEQM37FWlC1VLEQTD3NvtalS1YJaDPS9imkmfo9kbLQmE7o28mQ4aX57fXE/7xqgs3teipUnCAo/rWomUiKEe31RAOhgaPsWMK4FvZWyi+ZZhxtmz+2BKZ3Wjdni/F+1/CXnK8Vvc3i+vFGobTbr2iMLJBFskw8skVK5IAckTLh5I48kEfy5Nw7z86r8/Y1OuD0M/PkB5yPT9OUqO8=</latexit>
h⇥w⇥(c0 /k)
<latexit sha1_base64="VsARmfhf0n95ZNAQshfW7XQ+ATQ=">AAACKnicbVDNTsJAGNz6i/hX9ehlIzHiQWzVqEeiF49oLJDQSrbLAhu222Z3qyFNn8LX8AW86ht4I16Nz+EWMBHwSzaZzHxfZnb8iFGpLGtgzM0vLC4t51byq2vrG5vm1nZVhrHAxMEhC0XdR5IwyomjqGKkHgmCAp+Rmt+7zvTaIxGShvxe9SPiBajDaZtipDTVNI+cJoUu5dANkOr6fnKXPiRd6CoaEAmffkERHxz3DtOmWbBK1nDgLLDHoADGU2ma324rxHFAuMIMSdmwrUh5CRKKYkbSvBtLEiHcQx3S0JAjbeYlw2+lcF8zLdgOhX5cwSH79yJBgZT9wNebWXg5rWXkf1ojVu1LL6E8ihXheGTUjhlUIcw6gi0qCFasrwHCguqsEHeRQFjpJidcWjKLluZ1MfZ0DbOgelKyz0unt2eF8tW4ohzYBXugCGxwAcrgBlSAAzB4Bq/gDbwbL8aHMTA+R6tzxvhmB0yM8fUDvG+mvA==</latexit>
Ui 2 R
Split Attention
<latexit sha1_base64="mbS3Ud3igkww+YK4GmK0Y1WcyoA=">AAACF3icbVDLSsNAFJ3UV62vqjvdDBbBVUkqqMuqG5cVbSu0oUwmk3bo5MHMjVBCwN/wB9zqH7gTty79Ab/DSZqFbb0wcDjn3nvuHCcSXIFpfhulpeWV1bXyemVjc2t7p7q711FhLClr01CE8sEhigkesDZwEOwhkoz4jmBdZ3yd6d1HJhUPg3uYRMz2yTDgHqcENDWoHvTzHYlkbnqnDQFfArBgKtbMupkXXgRWAWqoqNag+tN3Qxr7epwKolTPMiOwEyKBU8HSSj9WLCJ0TIasp2FAfKbsJPdP8bFmXOyFUr8AcM7+nUiIr9TEd3SnT2Ck5rWM/E/rxeBd2AkPolj/i06NvFhgCHEWCHa5ZBTERANCJde3YjoiklDQsc24uCo7La3oYKz5GBZBp1G3zuqnt41a86qIqIwO0RE6QRY6R010g1qojSh6Qi/oFb0Zz8a78WF8TltLRjGzj2bK+PoFv+qgwg==</latexit>
Xr <latexit sha1_base64="SsOtgily3IVXvCK05izEwDRXYOc=">AAACZ3icbVHLbtNAFJ2YVwmPBpAQEpsLKVIRamTTCthUqmDDski4rRSn1vV4HE869lgz10A08j+y5QeQ+AG2MHGzoC1ndXTOfencrFHSUhj+GATXrt+4eWvj9vDO3Xv3N0cPHh5Z3RouYq6VNicZWqFkLWKSpMRJYwRWmRLH2dmHlX/8RRgrdf2Zlo2YVTivZSE5kpfS0SIpkVzcnS5gHxLbVqmT+1F36kwHceoMbC92opfwCmQHiZHzktAY/RUSEt/IGeGXWVFTPw0KbYBKAVuLrR0qgaPJZY0K5ka3TZeOxuEk7AFXSbQmY7bGYTr6meSat5WfzxVaO43ChmYODUmuRDdMWisa5Gc4F1NPa6yEnbk+kw5eeCXvLyp0TdCr/3Y4rKxdVpmvrJBKe9lbif/zpi0V72ZO1k1Loubni4pWAWlYBQy5NIKTWnqC3Eh/K/ASDXLyb7iwJber07qhDya6HMNVcvR6Er2Z7H7aGx+8X0e0wZ6y52ybRewtO2Af2SGLGWff2W/2Z8AGv4LN4HHw5Lw0GKx7HrELCJ79BX6ZurU=</latexit>
i=1 Û j 2 Rh⇥w⇥c , j = 1, . . . , k
h X w
<latexit sha1_base64="quCH8WHZGF/fJVoUBao6pRJoQ+8=">AAACSnicbVDLSgMxFM3UqrW+qi7dBIugIGVGRd0URDduhCpWhU5bMmnGSZvJDEnGUsJ8lb/hD7hV8APciRszdRa2eiBw7rn3ck+OFzMqlW2/WIWZ4uzcfGmhvLi0vLJaWVu/lVEiMGniiEXi3kOSMMpJU1HFyH0sCAo9Ru68wXnWv3skQtKI36hRTNoheuDUpxgpI3Url7LTh3Xo+gJh7aQ6gMMUujIJu5rWnbQT5EU/K4bQDZDSzbTT36F7/V3oUg7dEKnA8/R12sHdStWu2WPAv8TJSRXkaHQr724vwklIuMIMSdly7Fi1NRKKYkbSsptIEiM8QA+kZShHIZFtPf52CreN0oN+JMzjCo7V3xsahVKOQs9MZh7ldC8T/+u1EuWftDXlcaIIxz+H/IRBFcEsQ9ijgmDFRoYgLKjxCnGATILKJD1xpScza2nZBONMx/CX3O7XnKPawdVh9fQsj6gENsEW2AEOOAan4AI0QBNg8ARewCt4s56tD+vT+voZLVj5zgaYQKH4DUwZswM=</latexit>
1 X
j
s = Û j (i, j) 2 Rc
hw i=1 j=1
global contextual information with embedded channel-wise statistics
<latexit sha1_base64="XcZgHpZK5JAzZ7fhUWwUm6/DL+4=">AAACQHicbVBNSwMxEM3Wr1q/qh69BIvgxbJbQT2KvXisYK1QS8lmp20wmyzJrFpKf5B/wz/g1f4B8SZePZlde7Dqg8DjvZnMzAsTKSz6/sQrzM0vLC4Vl0srq2vrG+XNrSurU8OhybXU5jpkFqRQ0ESBEq4TAywOJbTC23rmt+7AWKHVJQ4T6MSsr0RPcIZO6pbrfalDJinXCuEBU0eF6mkT5z69FzigEIcQRRBRPmBKgTy4FxaoRVdiUXDbLVf8qp+D/iXBlFTIFI1u+fUm0jyNQSGXzNp24CfYGTHjfpMwLt2kFhLGb1kf2o4qFoPtjPJjx3TPKRF1K7qnkObqz44Ri60dxqGrdEcM7G8vE//z2in2TjojoZIUQfHvQb1UUtQ0S45GwgBHOXSEcSPcrlkghnF0+c5MiWy22rjkggl+x/CXXNWqwVH18KJWOT2bRlQkO2SX7JOAHJNTck4apEk4eSTP5IVMvCfvzXv3Pr5LC960Z5vMwPv8AkZ2sdk=</latexit>
j
<latexit sha1_base64="SNx+gXPX+Ny0pGCPNVFoIs3qx+E=">AAACQHicbVC7TsMwFHV4U14FRhaLCompSgABI4KFERAFpKZUjnvTGhw7sm+AKsoH8Rv8ACv8AGJDrEw4pQOvK13p6Jz7PFEqhUXff/ZGRsfGJyanpiszs3PzC9XFpTOrM8OhwbXU5iJiFqRQ0ECBEi5SAyyJJJxH1welfn4DxgqtTrGfQithXSViwRk6ql09YG1xeUVDoWiYMOxFUX5SXOa8oKER3R4yY/QtDRHuMLc6RsqsFV2VgEJ6C2WFLdrVml/3B0H/gmAIamQYR+3qS9jRPCuHcOkGNgM/xVbODAouoaiEmYWU8WvWhaaDiiVgW/ng2YKuOaZDY21cuiMG7PeOnCXW9pPIVZYP2d9aSf6nNTOMd1u5UGmGoPjXojiTFDUtnaMdYYCj7DvAuBHuVsp7zDCOzt8fWzq2PK2oOGOC3zb8BWcb9WC7vnm8VdvbH1o0RVbIKlknAdkhe+SQHJEG4eSePJIn8uw9eK/em/f+VTriDXuWyY/wPj4BwHOyLQ==</latexit>
ResNeSt Block
<latexit sha1_base64="vSqGFUfrJt02Cahh9lTce6fxrF4=">AAACFXicbVDLSgMxFM3UV62vqhvBTbAIrspMBXVZ6saV1EdboS0lk7nThmYmQ5IRyjD+hj/gVv/Anbh17Q/4HabtLGzrgcDhnHtzLseNOFPatr+t3NLyyupafr2wsbm1vVPc3WsqEUsKDSq4kA8uUcBZCA3NNIeHSAIJXA4td3g59luPIBUT4b0eRdANSD9kPqNEG6lXPOhM/kgkeOktqGu407jGBR32iiW7bE+AF4mTkRLKUO8VfzqeoHEAoaacKNV27Eh3EyI1oxzSQidWEBE6JH1oGxqSAFQ3maSn+NgoHvaFNC/UeKL+3UhIoNQocM1kQPRAzXtj8T+vHWv/opuwMIo1hHQa5Mcca4HHdWCPSaCajwwhVDJzK6YDIgnVprSZFE+NT0sLphhnvoZF0qyUnbPy6U2lVK1lFeXRITpCJ8hB56iKrlAdNRBFT+gFvaI369l6tz6sz+lozsp29tEMrK9fdqqfew==</latexit>
j j
<latexit sha1_base64="GQIJNDivez4t5m+xrWIObtSiUTg=">AAACOnicbVDLSsNAFJ34tr6qLt0MFqFFrImKuhFEF7pUsCo0NUymNzo6eTBzI5YQf8bf8Afc6sqtC0Hc+gFOahe+DgwczrmXe+b4iRQabfvZ6usfGBwaHhktjY1PTE6Vp2eOdZwqDg0ey1id+kyDFBE0UKCE00QBC30JJ/7VbuGfXIPSIo6OsJNAK2TnkQgEZ2gkr7zFPHF2Sbeos1x16CJ14SapLlE3ZHjBmcz2ck9U9dllrUZr1EW4wexWBLc5VcWOV67YdbsL+pc4PVIhPRx45Ve3HfM0hAi5ZFo3HTvBVsYUCi4hL7mphoTxK3YOTUMjFoJuZd1v5nTBKG0axMq8CGlX/b6RsVDrTuibySK+/u0V4n9eM8Vgs5WJKEkRIv51KEglxZgWndG2UMBRdgxhXAmTlfILphhH0+yPK21dRMtLphjndw1/yfFK3Vmvrx6uVbZ3ehWNkDkyT6rEIRtkm+yTA9IgnNyRB/JInqx768V6s96/Rvus3s4s+QHr4xNDu6rB</latexit>
ai j
= exp(Gi (s ))/ j
exp(Gi0 (s )) if r > 1 ai = 1/(1 + exp( Gi (sj ))) if r = 1
V = concat{V j , j = 1, . . . , k}
<latexit sha1_base64="f41eLxvEWDFIvGiPsRVxjyht6GU=">AAACJ3icbZDLSgMxFIYzXmu9VV26CRZBpJQZFXUjFN24rGAv0Kklk0k1bSYZkjNiGeYdfA1fwK2+gTvRpRufw/Sy8PZD4Oc753BO/iAW3IDrvjtT0zOzc/O5hfzi0vLKamFtvW5UoimrUSWUbgbEMMElqwEHwZqxZiQKBGsE/bNhvXHLtOFKXsIgZu2IXEve5ZSARZ3Cbh2fYB/YHaRUSUszP61f9Uq4Z7lXwr4IFZgS7vtZp1B0y+5I+K/xJqaIJqp2Cp9+qGgSMQlUEGNanhtDOyUaOBUsy/uJYTGhfXLNWtZKEjHTTkd/yvC2JSHuKm2fBDyi3ydSEhkziALbGRG4Mb9rQ/hfrZVA97idchknwCQdL+omAoPCw4BwyDWjIAbWEKq5vRXTG6IJBRvjjy2hGZ6W5W0w3u8Y/pr6Xtk7LO9fHBQrp5OIcmgTbaEd5KEjVEHnqIpqiKJ79Iie0LPz4Lw4r87buHXKmcxsoB9yPr4AXhmk7w==</latexit>
r i0 =1
X
<latexit sha1_base64="GSuJSQ3etg2zPcKB9yV4aikL0NM=">AAACjnicbVFdb9MwFHXCx0b5yuARIV1RAUOIKtnQxkvFBC97HIh2k5o2cpyb1ptjR7bDVln5Efw8/gB/gVecrkhs40qWjs79Oj43rwU3No5/BuGt23fubmze691/8PDR42jrydioRjMcMSWUPsmpQcEljiy3Ak9qjbTKBR7nZ5+7/PF31IYr+c0ua5xWdC55yRm1nsqiH+PZKQwhNU2VOT5M2pkGmnFPjjKnYfv0XfIG3gJvIeUS0oraRZ67r+3MLSC1vEID538B80WazxeWaq06Fi+sO8eOwQLKphMBqgRGdcElFTDXqqlBo1dsUNqVpDaL+vEgXgXcBMka9Mk6jrLoV1oo1lR+AhPUmEkS13bqqLacCWx7aWOwpuyMznHioaRe69StrGvhpWe8NqX9kxZW7L8djlbGLKvcV3Z/N9dzHfm/3KSx5Yep47JuLEp2uahsBFgF3R2g4BqZFUsPKNPcawW2oJoy6691ZUthOmltzxuTXLfhJhjvDJK9we6X9/2DT2uLNskz8oJsk4TskwNySI7IiDDyO3gevApeh1G4Fw7Dj5elYbDueUquRHj4B4Zhx9k=</latexit>
j Y=V+X
<latexit sha1_base64="MTJYf34k/Hvvlztvv2yGeR1DPv8=">AAACBHicbVDLSsNAFL2pr1pfVZduBosgCCWpoG6EohuXFWxtaUOZTCbt0MkkzEyEErr1B9zqH7gTt/6HP+B3OGmzsK0HBg7n3Ms9c7yYM6Vt+9sqrKyurW8UN0tb2zu7e+X9g5aKEklok0Q8km0PK8qZoE3NNKftWFIcepw+eqPbzH98olKxSDzocUzdEA8ECxjB2kidDrpGLXSG2v1yxa7aU6Bl4uSkAjka/fJPz49IElKhCcdKdR071m6KpWaE00mplygaYzLCA9o1VOCQKjedBp6gE6P4KIikeUKjqfp3I8WhUuPQM5Mh1kO16GXif1430cGVmzIRJ5oKMjsUJBzpCGW/Rz6TlGg+NgQTyUxWRIZYYqJNR3NXfJVFm5RMMc5iDcukVas6F9Xz+1qlfpNXVIQjOIZTcOAS6nAHDWgCgRBe4BXerGfr3fqwPmejBSvfOYQ5WF+/j4WXRg==</latexit>
CutMix: Patches are cut and pasted among training images where the ground
<latexit sha1_base64="kOrAqxOv7CQnBlnz7K4wLBEucHo=">AAAClnicbVHbihNBEO0Zb2u8RX0RfGkMCz7FmV1Q8UEWF9EXIYLZXUhCqOmpyTTb0z10V+uGkA/x0/wBf0Mrkzy4uxY0HKpO1ak6XbRGB8qyX0l64+at23f27vbu3X/w8FH/8ZOT4KJXOFbOOH9WQECjLY5Jk8Gz1iM0hcHT4vx4Uz/9jj5oZ7/RssVZAwurK62AODXv/5xap22JluRxpC/64p0cAakagwSPUkWSYEvZQiAsJTTOLiR50FYz0DyMiT9qZCrVKBfeRWaTj1RLAwWa7RgwwclGX/CI1rvW+Y04GLOU5LpGJoF0VYfbrf68P8iGWRfyOsh3YCB2MZr3f09Lp2LDtygDIUzyrKXZClhMGVz3pjFgC+qcd54wtNBgmK06C9dynzOlrJznx1502X87VtCEsGwKZjZAdbha2yT/V5tEqt7OVtq2kdCqrVAVTXc4/4cstUdF7ESpQXnNu0pVgwdF/GuXVMqwWW3dY2PyqzZcBycHw/z18PDrweDow86iPfFcvBAvRS7eiCPxWYzEWCjxJ9lPhsmr9Fn6Pv2YftpS02TX81RcinT0F/kZy78=</latexit>
truth labels are also mixed proportionally to the area of the patches
x 2 RW ⇥H⇥C ! training image
<latexit sha1_base64="XuQYsclh2N+nHCpypH33ixmserw=">AAACRXicbVC7TsMwFHV4lvIKMLJYVEhMVQIIGCtYOhZEKVJTKsdxWwvHiewboIryS/wGP8DCACMbG2IFpwQJWq50paNz7vP4seAaHOfJmpqemZ2bLy2UF5eWV1bttfULHSWKsiaNRKQufaKZ4JI1gYNgl7FiJPQFa/nXJ7neumFK80iewzBmnZD0Je9xSsBQXbt+hz0usRcSGPh+epZdpS3sAQ+ZxvUfcJJhT/H+AIhS0a1h2R2koAiXXPYxNyNZ1rUrTtUZBZ4EbgEqqIhG1371gogmIZNABdG67ToxdFKigFPBsrKXaBYTem2Gtw2UxFzSSUcfZ3jbMAHuRcqkBDxif3ekJNR6GPqmMv9Mj2s5+Z/WTqB31Em5jBNgkn4v6iUCQ4Rz+3DAFaMghgYQqri5FdMBUYSCMfnPlkDnp2VlY4w7bsMkuNitugfVvdP9Su24sKiENtEW2kEuOkQ1VEcN1EQU3aNH9IxerAfrzXq3Pr5Lp6yiZwP9CevzCyTTszA=</latexit>
y ! corresponding label
<latexit sha1_base64="5RHyJmki9cdnf6vBSm2tKTJqrNg=">AAACJXicbVDLSgNBEJz1bXytevQyGARPYVdFPYpePEYwUUhCmJ3tJIOzM8tMr7os+QV/wx/wqn/gTQRP3vwOJzEHTSxoKKq66e6KUiksBsGHNzU9Mzs3v7BYWlpeWV3z1zfqVmeGQ41rqc11xCxIoaCGAiVcpwZYEkm4im7OBv7VLRgrtLrEPIVWwrpKdARn6KS2v5vTphHdHjJj9B1tItxjwbUxYFOtYqG6VLIIZL/tl4NKMASdJOGIlMkI1bb/1Yw1zxJQyCWzthEGKbYKZlBwCf1SM7OQMn7DutBwVLEEbKsYftSnO06JaUcbVwrpUP09UbDE2jyJXGfCsGfHvYH4n9fIsHPcKoRKMwTFfxZ1MklR00E8NBYGOMrcEcaNcLdS3mOGcXQh/tkS28Fp/ZILJhyPYZLU9yrhYWX/4qB8cjqKaIFskW2yS0JyRE7IOamSGuHkgTyRZ/LiPXqv3pv3/tM65Y1mNskfeJ/fSLmm1A==</latexit>
<latexit sha1_base64="BqIYfxnezhcVW+m+6kft4MGvxGg=">AAACgHicbVFdaxNBFJ1dv2r8aNQXxZfBREihxN0KKj618UEfK5i2kITl7uxNOnRmdpm5a7Ms++Sv9A8I/gsnmwVt64WBw7nncO89kxZKOoqin0F46/adu/d27vcePHz0eLf/5OmJy0srcCpylduzFBwqaXBKkhSeFRZBpwpP04tPm/7pd7RO5uYbVQUuNKyMXEoB5Kmk/2NucmkyNMQ/o0ELhBy4wUtOFqSRZsVxDbpQyIejOUmVYb1u9nkHq2ZvyNOKi1ynWzVd5n+trnU6b10nR/u8So68HEzWEpMNMdkbjpP+IBpHbfGbIO7AgHV1nPR/zbNclNpvLRQ4N4ujghY1WJJCYdOblw4LEBewwpmHBjS6Rd2G1fDXnsn4Mrf++atb9l9HDdq5SqdeqYHO3fXehvxfb1bS8sOilqYoCY3YDlqWipMPxCfPM2lRkKo8AGGl35WLc7AgyP/PlSmZ26zW9Hww8fUYboKTg3H8bvz268HgcNJFtMNesldsxGL2nh2yL+yYTZlgv4Pd4HnwIgzDUfgmjLfSMOg8z9iVCj/+AaYUv1o=</latexit>
Generate a new training example (x̃, ỹ) by combining two training samples
(xA , yA ) and (xB , yB ).
<latexit sha1_base64="S91TUU0TrL5vj1FAd2s+aTRv1aU=">AAACKHicbVDLSsNAFJ34tr6iLt0MFkERS6KibgQfGzeFCrYKbQiTydQOnWTCzI20hHyEv+EPuNU/cCduXfgdTmoRrR64cDjnXu69J0gE1+A4b9bY+MTk1PTMbGlufmFxyV5eaWiZKsrqVAqpbgKimeAxqwMHwW4SxUgUCHYddM8L//qOKc1lfAX9hHkRuY15m1MCRvLt7RZwEbKsl+NjXMUtGUrAPf8Ub+NNF+/g6ta3dubbZafiDID/EndIymiImm9/tEJJ04jFQAXRuuk6CXgZUcCpYHmplWqWENolt6xpaEwipr1s8FSON4wS4rZUpmLAA/XnREYirftRYDojAh096hXif14zhfaRl/E4SYHF9GtROxUYJC4SwiFXjILoG0Ko4uZWTDtEEQomx19bQl2clpdMMO5oDH9JY7fiHlT2LvfLJ2fDiGbQGlpHm8hFh+gEXaAaqiOK7tEjekLP1oP1Yr1ab1+tY9ZwZhX9gvX+Cfe6pAI=</latexit>
x̃ = M xA + (1 M ) xB
<latexit sha1_base64="hzrOGiC4w1g2yKGEHk+/lsVzApE=">AAACKHicbVDLSsNAFJ34rPUVdelmsAiVYklU1I1Q68ZlBfuAJpTJZNoOnUzCzEQIoR/hb/gDbvUP3Em3LvwOp20E23pg4HDOPdw7x4sYlcqyRsbS8srq2npuI7+5tb2za+7tN2QYC0zqOGShaHlIEkY5qSuqGGlFgqDAY6TpDe7GfvOJCElD/qiSiLgB6nHapRgpLXXMkqMo80maDOENdJgO+ggmnVtYgkUbnv5KJ1qrdsyCVbYmgIvEzkgBZKh1zG/HD3EcEK4wQ1K2bStSboqEopiRYd6JJYkQHqAeaWvKUUCkm04+NYTHWvFhNxT6cQUn6t9EigIpk8DTkwFSfTnvjcX/vHasutduSnkUK8LxdFE3ZlCFcNwQ9KkgWLFEE4QF1bdC3EcCYaV7nNniy/Fpw7wuxp6vYZE0zsr2Zfn84aJQqWYV5cAhOAJFYIMrUAH3oAbqAINn8ArewLvxYnwYn8ZoOrpkZJkDMAPj6wfOWKSB</latexit>
ỹ = yA + (1 )yB
M 2 {0, 1}W ⇥H ! a binary mask indicating where to drop out and fill in from two images
<latexit sha1_base64="zhhETVu+mbv9GgUwAbJEEV6IUcQ=">AAACcXicbVHbihNBEO0Zb7vxFvVJfCk2CIISZlTUx0Vf9kVYwWwWMjHU9NQkTXq6h+4aYxjmQ/UD/AF/wJ5sHtxdCxoOpy6n6nRea+U5SX5G8Y2bt27fOTgc3L13/8HD4aPHZ942TtJEWm3deY6etDI0YcWazmtHWOWapvn6U5+ffifnlTVfeVvTvMKlUaWSyIFaDPkzZMpA1iav0qz71k4hY1WRh5MOMqeWK0bn7Caw9INbhFwZdFuo0K9BmWI3xyxhsyJHwBYKZ2uwDQOaAkqldaiC0tkKeGNBBXXy3WI4SsbJLuA6SPdgJPZxuhj+zgorm4oMS43ez9Kk5nmLjpXU1A2yxlONch2mzwI0GC6Ytzt3OngemLCLdeEZhh37b0eLlffbKg+VFfLKX8315P9ys4bLD/NWmbphMvJCqGx0b0NvNRTKkWS9DQClU2FXkCt0KDl8yCWVwverdYNgTHrVhuvg7PU4fTd+8+Xt6Pjj3qID8UwciRciFe/FsTgRp2IipPgViegwGkR/4qcxxEcXpXG073kiLkX88i9dCLz5</latexit>
↵ = 1 =) ⇠ Unif(0, 1)
<latexit sha1_base64="8YgNB8ooAVxJzY+B4PR4Vlh5KOk=">AAACLnicbVDLSgMxFM34rPVVdekmWAQFKTMq6kYQ3bhUsCp0SrmTubXBZGZI7ohl6Hf4G/6AW/0DwYW4Ez/DtHbh60DgcO7j3JwoU9KS7794I6Nj4xOTpany9Mzs3HxlYfHcprkRWBepSs1lBBaVTLBOkhReZgZBRwovouujfv3iBo2VaXJG3QybGq4S2ZYCyEmtShCCyjrA93nAQ6mdJVoeKrcgBh5aqXlIeEtF3Q311vyNYL1Vqfo1fwD+lwRDUmVDnLQq72GcilxjQkKBtY3Az6hZgCEpFPbKYW4xA3ENV9hwNAGNtlkMvtbjq06JeTs17iXEB+r3iQK0tV0duU4N1LG/a33xv1ojp/Zes5BJlhMm4suonStOKe/nxGNpUJDqOgLCSHcrFx0wIMil+cMltv3TemUXTPA7hr/kfLMW7NS2TrerB4fDiEpsma2wNRawXXbAjtkJqzPB7tgDe2RP3r337L16b1+tI95wZon9gPfxCRsUp+Q=</latexit>
<latexit sha1_base64="Fn5R4B6AlXS28ipEQOaLsU585A8=">AAACa3icbZHNSsNAFIUn8b/+VbtTF4NFUNCSqKgboeimywq2FZoSJpOpHZxM4syNWkLe0o0v4M4XcOWkdmFtLwwczncv93ImSATX4Dgflj03v7C4tLxSWl1b39gsb223dZwqylo0FrF6CIhmgkvWAg6CPSSKkSgQrBM83Ra888KU5rG8h2HCehF5lLzPKQFj+WWp/DfsaR5hD9gbZC0D80PnGHeOjrHyhzNZY8Re8TXuGP6sIHPxCfaEWRuSvGADwxozmF+uOjVnVHhauGNRReNq+uVPL4xpGjEJVBCtu66TQC8jCjgVLC95qWYJoU/kkXWNlCRiupeNcsnxgXFC3I+VeRLwyP07kZFI62EUmM6IwED/Z4U5i3VT6F/1Mi6TFJikv4v6qcAQ4yJkHHLFKIihEYQqbm7FdEAUoWC+YmJLqIvT8pIJxv0fw7Ron9bci9rZ3Xm1fjOOaBnton10iFx0ieqogZqohSh6R9/WnDVvfdkVe8fe+221rfFMBU2UffADXVa3ng==</latexit>
p p
rx ⇠ Unif(0, W ), ry ⇠ Unif(0, H), rw = W 1 , rh = H 1
r w rh
<latexit sha1_base64="esqKbyJzEg1wH1/zGfzw3euZH2s=">AAACRHicbVBNb9NAEF2XQkv4CvTIZdQIiQuRTRFwQargQI9FappKcWSN1+Nk1bXX2h23RJZ/En+DP8AJiV576g1xRV2nOdCWkVZ6em9m3uxLK60ch+HPYO3O+t17G5v3ew8ePnr8pP/02aEztZU0kkYbe5SiI61KGrFiTUeVJSxSTeP0+FOnj0/IOmXKA15UNC1wVqpcSWRPJf3PcW5RNjY5BZvM22YMey18gAheQaz9mgwhtmo2Z7TWnELM9JUbaU1VUQboncB2m9qkPwiH4bLgNohWYCBWtZ/0z+PMyLqgkqVG5yZRWPG0QctKamp7ce2oQnmMM5p4WGJBbtosP9zCC89kkBvrX8mwZP+daLBwblGkvrNAnrubWkf+T5vUnL+fNqqsaqZSXhnltQY20KUHmbIkWS88QGmVvxXkHH2C7DO+5pK57rS254OJbsZwGxy+HkZvhztf3gx2P64i2hTPxbZ4KSLxTuyKPbEvRkKKb+KH+CXOgu/BRfA7+HPVuhasZrbEtQr+XgLnJrH5</latexit>
Yun, Sangdoo, et al. "Cutmix: Regularization strategy to train strong classi ers with localizable features." Proceedings of the IEEE/CVF International Conference on
Computer Vision. 2019.
fi
fi
Neural Ordinary Di erential Equations
YouTube Playlist
ht+1 = ht + f✓ (ht ) ! residual networks, recurrent neural networks, and normalizing flows (density estimation)
<latexit sha1_base64="W+Q7R42MBLdwAbxbWSk13xtXmZo=">AAAChnicbVHbihNBEO0ZbzHeouKToIVByLJLmPG2vghBX3xcwewuZMLQ01OTadLTPXTXGOMwH+An+gO++gt2snkwuxY0nDqnijqczmolHUXRryC8dv3GzVu92/07d+/dfzB4+OjUmcYKnAqjjD3PuEMlNU5JksLz2iKvMoVn2fLTRj/7htZJo7/SusZ5xRdaFlJw8lQ6+FmmLR3GHXyAMiU4hCJNqETiI98eQGLloiRurVlBQvidWotO5g1XoJFWxi7dEVgUjbWoyXON3ZO4zkEbW3Elf0i9gEKZlYNRjtpJWgM6ktXWyUGXDobRONoWXAXxDgzZrk7Swe8kN6Kp/GGhuHOzOKpp3nJLUijs+knjsOZiyRc481DzCt283UbWwUvP5FAY6583vmX/3Wh55dy6yvykN1i6y9qG/J82a6h4P2+lrhtCLS4OFY0CMrDJH3Lp4yK19oALK71XECW3XJD/pb0rudtY6/o+mPhyDFfB6atx/G78+sub4eTjLqIee8pesBGL2TGbsM/shE2ZYH+CJ8Gz4HnYC8fh2/D4YjQMdjuP2V6Fk7+WpsXH</latexit> <latexit sha1_base64="mL7CCJAv72xGyo9+3G3L6nxbi2Y=">AAACGHicbVC7SgNBFJ31GeMraplmMAjahF0VtQzaWEYwD0hCmJ29SQZnZ5eZu2JYUvgb/oCt/oGd2Nr5A36Hk00KTTwwcDjnXu6Z48dSGHTdL2dhcWl5ZTW3ll/f2NzaLuzs1k2UaA41HslIN31mQAoFNRQooRlrYKEvoeHfXY39xj1oIyJ1i8MYOiHrK9ETnKGVuoViG+EB08MAlBE4pGBQhJl3NOoWSm7ZzUDniTclJTJFtVv4bgcRT0JQyCUzpuW5MXZSplFwCaN8OzEQM37H+tCyVLEQTCfNPjGiB1YJaC/S9imkmfp7I2WhMcPQt5M24MDMemPxP6+VYO+ikwoVJwiKTw71EkkxouNGaCA0cJRDSxjXwmalfMA042h7+3MlMONoo7wtxputYZ7Uj8veWfnk5rRUuZxWlCNFsk8OiUfOSYVckyqpEU4eyTN5Ia/Ok/PmvDsfk9EFZ7qzR/7A+fwBRdWhBQ==</latexit>
<latexit sha1_base64="tPndAJIykfgGkH4ri+tM4vzYbds=">AAACK3icbVDLSgMxFM34rPVVdekmWARXZUZBXRbrwpWo2FZoS8lkbm1oJhmSG6UM/Qt/wx9wq3/gSnGr3+FM7cLXgcDhnHtzDydMpLDo+y/e1PTM7Nx8YaG4uLS8slpaW29Y7QyHOtdSm6uQWZBCQR0FSrhKDLA4lNAMB7Xcb96AsUKrSxwm0InZtRI9wRlmUrdUaY//SA1Eo5pWKJTTztJjSLBPL8CKyDFJTwFvtRnYYrdU9iv+GPQvCSakTCY465Y+2pHmLgaFXDJrW4GfYCdlBgWXMCq2nYWE8QG7hlZGFYvBdtJxphHdzpSI9rTJnkI6Vr9vpCy2dhiH2WTMsG9/e7n4n9dy2DvspEIlDkHxr0M9JylqmpdEI2GAoxxmhHEjsqyU95lhHLMqf1yJbB5tlBcT/K7hL2nsVoL9yt75brl6NKmoQDbJFtkhATkgVXJCzkidcHJHHsgjefLuvWfv1Xv7Gp3yJjsb5Ae8909kMajU</latexit>
<latexit sha1_base64="HyU6SA9TkwyAANeL7cdQ3jeulUQ=">AAACFHicbVDLSsNAFJ3UV62vqAsXbgaLUEFKoqJuhKIblxXsA9pSJtNJM3TyYOZGKCG/4Q+41T9wJ27d+wN+h5M2C9t6YOBwzn3NcSLBFVjWt1FYWl5ZXSuulzY2t7Z3zN29pgpjSVmDhiKUbYcoJnjAGsBBsHYkGfEdwVrO6C7zW09MKh4GjzCOWM8nw4C7nBLQUt886A5CSLwU32C33wWPAal4p3DSN8tW1ZoALxI7J2WUo943f/QkGvssACqIUh3biqCXEAmcCpaWurFiEaEjMmQdTQPiM9VLJh9I8bFWBtgNpX4B4In6tyMhvlJj39GVPgFPzXuZ+J/XicG97iU8iGJgAZ0ucmOBIcRZGnjAJaMgxpoQKrm+FVOPSEJBZzazZaCy09KSDsaej2GRNM+q9mX1/OGiXLvNIyqiQ3SEKshGV6iG7lEdNRBFKXpBr+jNeDbejQ/jc1paMPKefTQD4+sXLUCeJg==</latexit>
ḣ = f✓ (h, t)
Replacebackpropagation
the 6 standard through
residualODE
blocks in ResNet with ODESolve modules!
<latexit sha1_base64="XaGmOn2N9wsVahlnXVRgoFOqVDk=">AAACRnicbVA9TxtBEJ0z+QDny0lKmkmsSKmsO5BISpQEKTSBkBiQjGXN7Y3xynu7p905Isvyb+Jv8AcoaEiXMh1KmzvjIkCeNNLTezOamZcWRgeJ44uosXTv/oOHyyvNR4+fPH3Wev5iP7jSK+4qZ5w/TCmw0Za7osXwYeGZ8tTwQTr+WPsHJ+yDdva7TAru53Rs9VArkkoatLb3uDCkGGXEuIFByGbkM/QcdFaSwdQ4NQ6oLe5x+MKCP7SMcOfT1jdnThhzl5WGw6tmc9Bqx514DrxLkgVpwwK7g9avo8ypMmcrylAIvSQupD8lL1oZnjWPysAFqTEdc6+ilnIO/en85Rm+qZQMh85XZQXn6r8TU8pDmORp1ZmTjMJtrxb/5/VKGb7vT7UtSmGrrhcNS4PisM4PM+1ZiZlUhJTX1a2oRuRJSZXyjS1ZqE+b1cEkt2O4S/bXOslGZ/3rWnvzwyKiZViF1/AWEngHm/AZdqELCk7hHC7hZ3QW/Y6uoj/XrY1oMfMSbqABfwGxB7CX</latexit>
solvers
<latexit sha1_base64="ku9ZABlYqWt7IpVoFABLu6BhxUA=">AAACNHicbVDLSgMxFM34rPVVdekmWARXZUZFXYoPcGcFq0JbSiZz2wYzyZDcEcswn+Jv+ANu9QMEd6JLv8FM7cLXgcDh3Me5OWEihUXff/bGxicmp6ZLM+XZufmFxcrS8oXVqeHQ4FpqcxUyC1IoaKBACVeJARaHEi7D68OifnkDxgqtznGQQDtmPSW6gjN0Uqey20pVBCY0jEPWQrjFLGT8OjE6Yb1hD8W+0WmvT0+PjqnVstiW551K1a/5Q9C/JBiRKhmh3qm8tyLN0xgUcsmsbQZ+gu2MGRRcQl5upRYS58x60HRUsRhsOxt+MKfrToloVxv3FNKh+n0iY7G1gzh0nTHDvv1dK8T/as0Uu3vtTKgkRVD8y6ibSoqaFmnRSBjgKAeOMG6Eu5XyPnNhoUvhh0tki9Pysgsm+B3DX3KxWQt2altn29X9g1FEJbJK1sgGCcgu2ScnpE4ahJM78kAeyZN37714r97bV+uYN5pZIT/gfXwCFhStWg==</latexit>
h(0) ! input
<latexit sha1_base64="lB4oZluf6F9pXWxoWMWwgr9DxGY=">AAACGnicbVDLSsNAFJ34rPVVdSnCYBHqpiQq6rLoxmUF+4AmlMlk2g6dTMLMjVpKV/6GP+BW/8CduHXjD/gdTtosbOuBC4dz7uXee/xYcA22/W0tLC4tr6zm1vLrG5tb24Wd3bqOEkVZjUYiUk2faCa4ZDXgIFgzVoyEvmANv3+d+o17pjSP5B0MYuaFpCt5h1MCRmoXDnol+xi7ind7QJSKHrAL7BGGXMYJjNqFol22x8DzxMlIEWWotgs/bhDRJGQSqCBatxw7Bm9IFHAq2CjvJprFhPZJl7UMlSRk2huO3xjhI6MEuBMpUxLwWP07MSSh1oPQN50hgZ6e9VLxP6+VQOfSm7zEJJ0s6iQCQ4TTTHDAFaMgBoYQqri5FdMeUYSCSW5qS6DT00Z5E4wzG8M8qZ+UnfPy6e1ZsXKVRZRD++gQlZCDLlAF3aAqqiGKntALekVv1rP1bn1Yn5PWBSub2UNTsL5+AQ9FoWk=</latexit>
| {z }
h(T ) ! output (black-box di↵erential equation solver)
<latexit sha1_base64="8XaIiG8+BwAd8CjO9ksodKW/E3k=">AAACRHicbVA7SwNBEN7zGeMrammzGIRYGO5U1DJooWWEJApJCHubuWRx7/bcndOEIz/Jv+EfsBK0tbITW3HzKHwNDHx8M/PNzOfHUhh03Sdnanpmdm4+s5BdXFpeWc2trdeMSjSHKldS6SufGZAigioKlHAVa2ChL+HSvz4d1i9vQRuhogr2Y2iGrBOJQHCGlmrlzrqFyg5taNHpItNa3dEGQg9TlWCcIC34kvHrXV/1aFsEAWiIUDBJ4SYZCVCjpJXfGbRyebfojoL+Bd4E5Mkkyq3ca6OteBJaQS6ZMXXPjbGZMo2CSxhkG4mB2O5mHahbGLEQTDMdPTyg25Zp00BpmxHSEft9ImWhMf3Qt50hw675XRuS/9XqCQbHzVRE9neI+HhRkEiKig7dsyZo4Cj7FjCuhb2V8i7TjKP1+MeWthmeNshaY7zfNvwFtb2id1jcvzjIl04mFmXIJtkiBeKRI1Ii56RMqoSTe/JInsmL8+C8Oe/Ox7h1ypnMbJAf4Xx+AYZ/suQ=</latexit>
@L
<latexit sha1_base64="j6Vl7OgabiiT3oHoVYgiCotMvtA=">AAACQHicbVDLShxBFK028TW+xmTppsgg6GboVlERAuJssnBhIKPC9DDcrq6eKae6qqm6rRma/iB/wx9wm/mB4C5k68rqcSDxcaDgcM69nFsnyqSw6Ptjb+bDx9m5+YXF2tLyyupaff3TudW5YbzNtNTmMgLLpVC8jQIlv8wMhzSS/CIatir/4pobK7T6gaOMd1PoK5EIBuikXr0FW7hNj77SMDHAijADgwIkPS3/8YEbKWloRH+AYIy+oSHyn1hAfKWFwrJXb/hNfwL6lgRT0iBTnPXqv8NYszzlCpkEazuBn2G3qOKY5GUtzC3PgA2hzzuOKki57RaTz5Z00ykxTbRxTyGdqP9vFJBaO0ojN5kCDuxrrxLf8zo5JofdQqgsR67Yc1CSS4qaVs3RWBjOUI4cAWaEu5WyAbjW0PX7IiW21WllzRUTvK7hLTnfaQb7zd3ve43jk2lFC2SDfCFbJCAH5Jh8I2ekTRi5JffkFxl7d96D98f7+zw64013PpMX8B6fADtosMU=</latexit>
<latexit sha1_base64="5bwyPMgEWwcVZMeFr/nmRsRxtrk=">AAACYHicbVBNS8NAEN3Gr1q/qt70slgEPVgSFfUiFL14VLAqNKVsNpN26eaD3YlY0/xAf4JXD1696s1NW/BzYOHNe/N2huclUmi07eeSNTU9MztXnq8sLC4tr1RX1250nCoOTR7LWN15TIMUETRRoIS7RAELPQm3Xv+80G/vQWkRR9c4SKAdsm4kAsEZGqpT5a4IzR7Q1JVxlyY7jx1nl55+dfYu3Rt3roQAh9RFeMDMB8ypGyjGMzdhCgWTNMi/sDEaXYluD4edas2u26Oif4EzATUyqctO9dX1Y56GECGXTOuWYyfYzoqvuYS84qYaEsb7rAstAyMWgm5nozByum0YnwaxMi9COmK/OzIWaj0IPTMZMuzp31pB/qe1UgxO2pmIkhQh4uNFQSopxrRIlvpCAUc5MIBxJcytlPeYSQhN/j+2+Lo4La+YYJzfMfwFN/t156h+cHVYa5xNIiqTTbJFdohDjkmDXJBL0iScPJE38k4+Si9W2VqxVsejVmniWSc/ytr4BNAkuJI=</latexit>
@f (h, t) @f
<latexit sha1_base64="/6p6DnVUQnbJLAcCf4bmroUfUk0=">AAACNnicbVDLSsNAFJ34tr6qLt0MFkFBS6LiYyGIblwq2FpoarmZTszg5MHMjVBCvsXf8Afc6tqNO3XrJzipBW31wMDh3Me5c7xECo22/WKNjI6NT0xOTZdmZufmF8qLS3Udp4rxGotlrBoeaC5FxGsoUPJGojiEnuRX3u1pUb+640qLOLrEbsJbIdxEwhcM0Ejt8qHbiTGDnB7RLQrXl9T1FbDMTUChAEn9tosBR1gPNnEj/9GDvF2u2FW7B/qXOH1SIX2ct8vvxoulIY+QSdC66dgJtrJiIZM8L7mp5gmwW7jhTUMjCLluZb0v5nTNKB3qx8q8CGlP/T2RQah1N/RMZwgY6OFaIf5Xa6boH7QyESUp8oh9G/mppBjTIi/aEYozlF1DgClhbqUsAJMRmlQHXDq6OC0vmWCc4Rj+kvp21dmr7lzsVo5P+hFNkRWyStaJQ/bJMTkj56RGGLknj+SJPFsP1qv1Zn18t45Y/ZllMgDr8wsTF6wV</latexit>
T ✓
ȧ = a @z0
@h T
<latexit sha1_base64="osxPzMIwoYWRpEeozVaJ6BJd9IQ=">AAACSnicbVDRShtBFJ2N2tpYa9RHXy4GIUEIu1ravgiiL30RFIwKSRruTmaTwdmZZeZuY1zyVf6GP+BrC/2AvokvTmKEqj0wcO4593LvnDhT0lEY3gWlufmFd+8XP5SXPi5/Wqmsrp05k1sumtwoYy9idEJJLZokSYmLzApMYyXO48vDiX/+U1gnjT6lUSY6Kfa1TCRH8lK3cnRdo+2oDnvgSR22IYcB1IY/Tp/ruA5tK/sDQmvNENokrqjIFGq0oI1NUclrqfuQKDMcdyvVsBFOAW9JNCNVNsNxt/Kn3TM8T4UmrtC5VhRm1CnQkuRKjMvt3IkM+SX2RctTjalwnWL67TFseaUHibH+aYKp+u9EgalzozT2nSnSwL32JuL/vFZOybdOIXWWk9D8aVGSKyADkwyhJ63gpEaeILfS3wp8gBY5+aRfbOm5yWnjsg8meh3DW3K204i+NHZPPlf3D2YRLbINtslqLGJf2T77zo5Zk3F2w+7YL/Y7uA3+BvfBw1NrKZjNrLMXKM0/AhmCsNY=</latexit>
cubic cost!
<latexit sha1_base64="S3QZVC1xzz/kqdj0C8SQ7ykBr7I=">AAACBnicbVDLSsNAFJ3UV62vqks30SK4KkkFdVl047KCfUAaymQyaYdOZsLMjVBC9/6AW/0Dd+LW3/AH/A4nbRa29cCFwzn3cu89QcKZBsf5tkpr6xubW+Xtys7u3v5B9fCoo2WqCG0TyaXqBVhTzgRtAwNOe4miOA447Qbju9zvPlGlmRSPMEmoH+OhYBEjGIzkkTRgxCZSw+mgWnPqzgz2KnELUkMFWoPqTz+UJI2pAMKx1p7rJOBnWAEjnE4r/VTTBJMxHlLPUIFjqv1sdvLUPjdKaEdSmRJgz9S/ExmOtZ7EgemMMYz0speL/3leCtGNnzGRpEAFmS+KUm6DtPP/7ZApSoBPDMFEMcjfH2GFCZiUFraEOj9tWjHBuMsxrJJOo+5e1S8fGrXmbRFRGZ2gM3SBXHSNmugetVAbESTRC3pFb9az9W59WJ/z1pJVzByjBVhfvzIWmWs=</latexit>
@h(T ) T @h
=) log p(z(t + 1)) = log p(z(t)) log 1 + u
@L @z
<latexit sha1_base64="p5KpxNBpqcNFkwIIrE6IHgNfL2A=">AAACXnicbZBNT9tAEIY35qOQQknhgsRlRYQEh0Y2oLaXSoheOHAAiQBSHEXjzThZsfaa3TEQWf59/Q299dRbr3BlbSLx1ZFWevXO7LyjJ8qUtOT7vxvezOzc/IeFxebHpeVPK63Pq+dW50ZgV2ilzWUEFpVMsUuSFF5mBiGJFF5EVz+r/sUNGit1ekaTDPsJjFIZSwHkrEELYNvf4T94GBsQRZiBIQmKH5fPeuwmSh4aORoTGKNveUh4R0WkQFx9ifQdH8o4RoNpPY7Xeb2bW61ccjlotf2OXxd/L4KpaLNpnQxaf8OhFnni9gkF1vYCP6N+UV0jFJbNMLeYuWgYYc/JFBK0/aJGUfIt5wx5rI17KfHaffmjgMTaSRK5yQRobN/2KvN/vV5O8fd+IdMsJ0zFU1CcK06aV1wdA4OC1MQJEEa6W7kYg4NKjv6rlKGtTiubDkzwFsN7cb7bCb529k732weHU0QLbINtsm0WsG/sgB2xE9Zlgv1i/9g9e2j88ea9ZW/ladRrTP+ssVflrT8CVn25Cw==</latexit>
target density
<latexit sha1_base64="R+jhTZtMnEnaU3tuPRwwAikqH54=">AAACCXicbVBLSgNBFOyJvxh/UZduGoPgKsxEUJdBNy4jmA8kY+jpeUma9PQM3W+EIeQEXsCt3sCduPUUXsBz2PksTGLBg6LqPV5RQSKFQdf9dnJr6xubW/ntws7u3v5B8fCoYeJUc6jzWMa6FTADUiioo0AJrUQDiwIJzWB4O/GbT6CNiNUDZgn4Eesr0ROcoZUekek+IA1BGYFZt1hyy+4UdJV4c1Iic9S6xZ9OGPM0AoVcMmPanpugP2IaBZcwLnRSAwnjQ9aHtqWKRWD80TT1mJ5ZJaS9WNtRSKfq34sRi4zJosBuRgwHZtmbiP957RR71/5IqCRFUHz2qJdKijGdVEBDoYGjzCxhXAublfIB04yjLWrhS2gm0cYFW4y3XMMqaVTK3mX54r5Sqt7MK8qTE3JKzolHrkiV3JEaqRNONHkhr+TNeXbenQ/nc7aac+Y3x2QBztcvUpebMA==</latexit>
<latexit sha1_base64="esaUt63aXY+a1c+BYNec+YSpWtA=">AAACaHicbVHLbtNAFB27QEt4uWWBEJtRE6QiochupcKygg3Lgpq2UhJF1+PreIhnxpq5pkRWPpIlP8CCH2DLTJoFbbnSSEfnPs7RmbyppaM0/RnFW/fuP9jeedh79PjJ02fJ7t65M60VOBKmNvYyB4e11DgiSTVeNhZB5TVe5IuPoX/xDa2TRp/RssGpgrmWpRRAnpoliwnhd+q+oDCqaQn5oDqgNwOeg1hcgS241JykQk5mjlSh5VeSKi7JcSi+GqmJDyBsvOWOwJLUc15ao8KdM38HdBEGPByuZkk/Habr4ndBtgF9tqnTWfJrUhjRKtQkanBunKUNTbsgI2pc9Satw8YbhTmOPdSg0E27dSgr/tozBS+N9c+7XLP/bnSgnFuq3E8qoMrd7gXyf71xS+X7aSd1iEuLa6GyrX1CPCTMC2lRUL30AISV3isXFVgQ5P/hhkrhgrVVzweT3Y7hLjg/HGbHw6PPh/2TD5uIdtgrts8OWMbesRP2iZ2yERPsB/sTsSiKfsdJ/CJ+eT0aR5ud5+xGxft/AZVKtz4=</latexit>
Recompute h(t) backward in time together with its adjoint a(t), starting from h(T
Continuous Normalizing Flows
<latexit sha1_base64="EFdvCFfQ9Sad20lj40IUzN77SEo=">AAACJXicbVDLSsNAFJ3UV62vqEs3g0XoqiQV1GWxIK6kgn1AG8pkMmmHTmbCzESpob/gb/gDbvUP3Ingyp3fYZJmYVsPDBzOuXfu4bgho0pb1pdRWFldW98obpa2tnd298z9g7YSkcSkhQUTsusiRRjlpKWpZqQbSoICl5GOO26kfueeSEUFv9OTkDgBGnLqU4x0Ig3MSj/7I5bEmzYE15RHIlLwRsgAMfpI+RBeMfGgSgOzbFWtDHCZ2DkpgxzNgfnT9wSOAsI1Zkipnm2F2omR1BQzMi31I0VChMdoSHoJ5SggyomzNFN4kige9IVMHtcwU/9uxChQahK4yWSA9Egteqn4n9eLtH/hxJSHkSYczw75EYNawLQe6FFJsGaThCAsaZIV4hGSCOukxLkrnkqjTdNi7MUalkm7VrXPqqe3tXL9Mq+oCI7AMagAG5yDOrgGTdACGDyBF/AK3oxn4934MD5nowUj3zkEczC+fwHQxaaC</latexit>
<latexit sha1_base64="Y057faftSfVthrKeoiMHiW+VcB8=">AAACsXiclVHLbtNAFB2bVwmvFHawGREhpVKJbIoKywg20FWRmrYoDuF6fB2PMp6xZq4LkeW/4mf4Ab6DcRIJ2rLhSiMdnfs6c25aKekoin4G4Y2bt27f2bnbu3f/wcNH/d3Hp87UVuBEGGXseQoOldQ4IUkKzyuLUKYKz9Ll+y5/doHWSaNPaFXhrISFlrkUQJ6a93/AlxOe5BZEk1RgSYLi+TyhAgmGxT6nvfZPomj3+f/Ub+iWJ1YuCgJrzTfP4XdqLlCQsS+PQJhUguaVNVktiA8dgc7AZhxqMqUXKXgm8xwtaj+y0+z22nl/EI2idfDrIN6CAdvG8bz/K8mMqEs/RChwbhpHFc2aTqZQ2PaS2mEFYgkLnHqooUQ3a9butvyFZzKeG+ufJr5m/+5ooHRuVaa+0gsu3NVcR/4rN60pfztrpK5qQi02i/JacTK8O5X/uPU2qZUHIKzsvBAFeOvJH/TSlsx10tqeNya+asN1cPpqFB+ODj69HozfbS3aYc/YczZkMXvDxuwDO2YTJoKnwTj4GByFB+Hn8GuYbkrDYNvzhF2KcPkbVbbXDg==</latexit>
Source Target
Bilinear sampling
Jaderberg, Max, Karen Simonyan, and Andrew Zisserman. "Spatial transformer networks." Advances in neural information processing systems. 2015.
Dynamic Routing Between Capsules
YouTube Playlist
LN ! layer normalization
<latexit sha1_base64="+Z7d8yz/tWP3U03x+EreZF8At6M=">AAACGXicbVBNTxsxFPSGj9Lw0QDHXqxGSJyi3Qi1PaJy6QEhkAggJavorfOSWPHaK/ttIV3lb/TCX+mlBxDiCKf+mzrJHkpgJEujmTd6fpNkSjoKw79BZWl5ZfXd2vvq+sbm1ofa9s6FM7kV2BJGGXuVgEMlNbZIksKrzCKkicLLZHQ09S9/oHXS6HMaZxinMNCyLwWQl7q1sEN4Q8XxyYR3rBwMCaw113yuKhij5drYFJT8OUtMurV62Ahn4K9JVJI6K3HarT11ekbkKWoSCpxrR2FGcQGWpFA4qXZyhxmIEQyw7amGFF1czC6b8D2v9HjfWP808Zn6f6KA1LlxmvjJFGjoFr2p+JbXzqn/NS6kznJCLeaL+rniZPi0Jt6TFgWpsScgrPR/5WIIFgT5Mqu+hGjx5NfkotmIPjeaZwf1w29lHWvsI/vE9lnEvrBD9p2dshYT7Bf7ze7YfXAb/Akegsf5aCUoM7vsBYLnf74Kogc=</latexit>
z 2 RN ⇥D
<latexit sha1_base64="RI+WJQ+DrEJwvjroF7GMPt0P17s=">AAACBXicbVA9SwNBEJ3zM8avU0stFoNgFe6CqGVQCyuJYj4gd4a9zSZZsrd37O4J8Uhj41+xsVDE1v9g579xk1yhiQ8GHu/NMDMviDlT2nG+rbn5hcWl5dxKfnVtfWPT3tquqSiRhFZJxCPZCLCinAla1Uxz2oglxWHAaT3on4/8+j2VikXiVg9i6oe4K1iHEayN1LL3HpDHBPJCrHtBkN4M79IrT7OQKnQxbNkFp+iMgWaJm5ECZKi07C+vHZEkpEITjpVquk6s/RRLzQinw7yXKBpj0sdd2jRUYLPHT8dfDNGBUdqoE0lTQqOx+nsixaFSgzAwnaNr1bQ3Ev/zmonunPopE3GiqSCTRZ2EIx2hUSSozSQlmg8MwUQycysiPSwx0Sa4vAnBnX55ltRKRfe4WLo+KpTPsjhysAv7cAgunEAZLqECVSDwCM/wCm/Wk/VivVsfk9Y5K5vZgT+wPn8AFLuYUQ==</latexit>
p
A = softmax(qk T / Dh ), A 2 RN ⇥N
<latexit sha1_base64="YWLftFLWAKgAi2CmKXDDuBywNKw=">AAACLnicbVBNa9tAEF25bZq6Taq0x16GmoILxZFCaXIJJGkLOYU0xB9g2Wa1XtmLVytldxRihH5RL/0rzSHQhJBrfkZXjg+p3QcDj/dmmJkXplIY9Lw/TuXJ02crz1dfVF++Wlt/7W68aZkk04w3WSIT3Qmp4VIo3kSBkndSzWkcSt4OJ19Lv33OtRGJOsVpynsxHSkRCUbRSgP3+z7sQoD8AnOTRBjTi6J+BpP+KWxCYM405t8G4+LjJ9iHQCgIYorjMMxPin5+ZOdEzA0cFQO35jW8GWCZ+HNSI3McD9zLYJiwLOYKmaTGdH0vxV5ONQomeVENMsNTyiZ0xLuWKmr39PLZuwV8sMoQokTbUggz9fFETmNjpnFoO8tzzaJXiv/zuhlGO71cqDRDrtjDoiiTgAmU2cFQaM5QTi2hTAt7K7Ax1ZShTbhqQ/AXX14mra2G/6Wx9eNzbe9gHscqeUfekzrxyTbZI4fkmDQJIz/Jb3JNbpxfzpVz69w9tFac+cxb8g+c+79uOKeY</latexit>
<latexit sha1_base64="0jnq5Wf1uT1bOfr/6VTKv0ZB4nA=">AAAB+3icbVDLTgJBEJzFF+JrxaOXicQEL2SXGPViAnrxiFEeCWzI7DDAhNlHZnoJuNlf8eJBY7z6I978GwfYg4KVdFKp6k53lxsKrsCyvo3M2vrG5lZ2O7ezu7d/YB7mGyqIJGV1GohAtlyimOA+qwMHwVqhZMRzBWu6o9uZ3xwzqXjgP8I0ZI5HBj7vc0pAS10z3wE2gfihmhSfzvA1ruJx1yxYJWsOvErslBRQilrX/Or0Ahp5zAcqiFJt2wrBiYkETgVLcp1IsZDQERmwtqY+8Zhy4vntCT7VSg/3A6nLBzxXf0/ExFNq6rm60yMwVMveTPzPa0fQv3Ji7ocRMJ8uFvUjgSHAsyBwj0tGQUw1IVRyfSumQyIJBR1XTodgL7+8Shrlkn1RKt+fFyo3aRxZdIxOUBHZ6BJV0B2qoTqiaIKe0St6MxLjxXg3PhatGSOdOUJ/YHz+ACDukzI=</latexit>
SA(z) = Av
MSA(z) = [SA1 (z); SA2 (z); . . . ; SAk (z)]Umsa , Umsa 2 RkDh ⇥D
<latexit sha1_base64="eQNEU1mRX0BqeabSLbRSIE1yM8w=">AAACTXicbZFRSxtBEMf3YtWYVhvtY1+GhoIFCXehqCCCUR98KVhtVMidx97exizZ2zt258R43Bfsi9A3v4UvPlhK6eaM0GoHFn78Z4aZ+W+USWHQdW+d2syr2bn5+kLj9ZvFpbfN5ZUTk+aa8R5LZarPImq4FIr3UKDkZ5nmNIkkP41Ge5P86SXXRqTqG44zHiT0QomBYBStFDZjH/kVFl+Ou+Xq9SfYhv5xN/QsboGFTgW+jFM0lTCyQgC9sEgMLdeeAHyhwE8oDqOoOCrPixHsh0PwUSTcwH4ZNltu260CXoI3hRaZxmHY/OHHKcsTrpBJakzfczMMCqpRMMnLhp8bnlE2ohe8b1FROycoKjdK+GiVGAaptk8hVOrfHQVNjBknka2crGye5ybi/3L9HAebQSFUliNX7HHQIJeAKUyshVhozlCOLVCmhd0V2JBqytB+QMOa4D0/+SWcdNreervz9XNrZ3dqR528Jx/IKvHIBtkhB+SQ9Agj38kdeSA/nRvn3vnl/H4srTnTnnfkn6jN/wHSWLDr</latexit>
<latexit sha1_base64="aSZ7QxdzQdF3QnKDm+JrVgpEJSI=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRbBU90tol6Eoj14rGA/pF1KNs22oUl2SbJCWforvHhQxKs/x5v/xrTdg7Y+GHi8N8PMvCDmTBvX/XZyK6tr6xv5zcLW9s7uXnH/oKmjRBHaIBGPVDvAmnImacMww2k7VhSLgNNWMLqd+q0nqjSL5IMZx9QXeCBZyAg2Vnqs9YboGtXORr1iyS27M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjlp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpVsreRblyf16q3mRx5OEIjuEUPLiEKtxBHRpAQMAzvMKbo5wX5935mLfmnGzmEP7A+fwBBrKPQQ==</latexit>
Dh = D/k
Inductive Bias in CNNs: translation equivariance and locality
<latexit sha1_base64="ZcGpMYnrl0EiZijMjJyLSpIQ2JQ=">AAACJ3icbVDLSgNBEJz1bXxFPXoZDIKnsJuDigcJ8aKXEMEkQgyhd7ajg7Oz60xvIIT8jRd/xYugInr0T5zEHHwVDBRV1fR0hamSlnz/3Zuanpmdm19YzC0tr6yu5dc3GjbJjMC6SFRiLkKwqKTGOklSeJEahDhU2Axvjkd+s4fGykSfUz/FdgxXWnalAHJSJ390qqNMkOwhr0iwXGp+XK3aQ04GtFXjFMfbTPbASNACOeiIq0SAktTnnXzBL/pj8L8kmJACm6DWyT9dRonIYtQkFFjbCvyU2gMwJIXCYe4ys5iCuIErbDmqIUbbHozvHPIdp0S8mxj3NPGx+n1iALG1/Th0yRjo2v72RuJ/Xiuj7kF7IHWaEWrxtaibKU4JH5XGI2lQkOo7AsJI91cursGAIFdtzpUQ/D75L2mUisFesXRWKpQrkzoW2BbbZrssYPuszE5YjdWZYHfsgT2zF+/ee/Revbev6JQ3mdlkP+B9fALJtaXp</latexit>
x 2 RH⇥W ⇥C ! image
<latexit sha1_base64="o4QZt5HqVIMH2+3+nRA7XqCoa1k=">AAACKHicbVDLSgMxFM3UV62vqks3wSK4KjNF1J3FbrqsYh/QqSWTpm1oJjMkd7RlmM9x46+4EVGkW7/E9CFo64HA4dxH7jleKLgG2x5bqZXVtfWN9GZma3tndy+7f1DTQaQoq9JABKrhEc0El6wKHARrhIoR3xOs7g1Kk3r9gSnNA3kHo5C1fNKTvMspASO1s1dD7HKJXZ9A3/Pi2+Q+LmMXuM80rv+QUoJdxXt9IEoFj0ZlQ4i52cSSdjZn5+0p8DJx5iSH5qi0s29uJ6CRzyRQQbRuOnYIrZgo4FSwJONGmoWEDszypqGSmANa8dRogk+M0sHdQJknAU/V3xMx8bUe+Z7pnBjSi7WJ+F+tGUH3shVzGUbAJJ191I0EhgBPUsMdrhgFMTKEUMXNrZj2iSIUTLYZE4KzaHmZ1Ap55zxfuDnLFa/ncaTRETpGp8hBF6iIyqiCqoiiJ/SC3tGH9Wy9Wp/WeNaasuYzh+gPrK9vZsqm3A==</latexit>
N ⇥(P 2 C)
<latexit sha1_base64="Kp4NYPU8AQTw/UbyWb6avXPVWQc=">AAACT3icbVFNbxMxEPWGrzZ8BThyGZEitZdoN0LAsaIcOKGASFspm0ZeZzZr1WsbexYSrfYfcoEbf4MLBxDCG/YALSNZenpvRjPvObNKeorjr1HvytVr12/s7PZv3rp95+7g3v1jbyoncCqMMu404x6V1DglSQpPrUNeZgpPsvOjVj/5gM5Lo9/RxuK85Cstcyk4BWoxyNcLC6nUkJaciiyr3zZn9WtISZboYX9yNj46aCB1clUQd858DBKuqfb4vkItEEwOueJEqHEJ45dgOYmiHXXoC24R9tZ7B81iMIxH8bbgMkg6MGRdTRaDL+nSiKpETUJx72dJbGlec0dSKGz6aeXRcnHOVzgLUPNw7rze5tHA48AsITcuPE2wZf+eqHnp/abMQmfr2l/UWvJ/2qyi/Pm8ltpWwa/4syivFJCBNlxYSoeC1CYALpwMt4IouOOCwhf0QwjJRcuXwfF4lDwdjd88GR6+6OLYYQ/ZI7bPEvaMHbJXbMKmTLBP7Bv7wX5Gn6Pv0a9e19qLOvCA/VO93d8iAbNO</latexit>
HW
<latexit sha1_base64="7N6zly6VnADmJjmpW569Ve/Qee0=">AAACK3icbVBNa9tAEF25aeo4Teq2x16WmEJORjKh7aUQnEtOwYE6NliuWa1H0uLVrtgdJTVC/yeX/pUe0kOb0mv+R9cfh8bOg4HHezPMzItyKSz6/r1Xe7bzfPdFfa+x//Lg8FXz9ZsrqwvDoc+11GYYMQtSKOijQAnD3ADLIgmDaHa28AfXYKzQ6gvOcxhnLFEiFpyhkybN7gX9TMPYMF6e00FV9r52KhoakaTIjNE3NET4hqUBW0gUKqGqyCIwVMc0Z8hTsNWk2fLb/hJ0mwRr0iJr9CbNu3CqeZGBQi6ZtaPAz3FcMoOCS6gaYWEhZ3zGEhg5qlgGdlwuf63oe6dMaayNK4V0qf4/UbLM2nkWuc6MYWo3vYX4lDcqMP40LoXKCwTFV4viQlLUdBEcnQoDHOXcEcaNcLdSnjIXHLp4Gy6EYPPlbXLVaQcf2p3Lk9Zpdx1HnbwjR+SYBOQjOSXnpEf6hJNb8oP8Ir+9795P74/3d9Va89Yzb8kjeA//AMdsqBk=</latexit>
N= 2
! resulting number of patches
P
D ! latent vector size
<latexit sha1_base64="GINUJ9y/cDEi2T17eh/Hs2/C71U=">AAACEHicbVC7TsMwFHV4lvIKMLJYVAimKqkQMFbAwFgk+pDaqHJcp7XqxJF9UyhRP4GFX2FhACFWRjb+BrfNAC13Ojrnvs7xY8E1OM63tbC4tLyymlvLr29sbm3bO7s1LRNFWZVKIVXDJ5oJHrEqcBCsEStGQl+wut+/HOv1AVOay+gWhjHzQtKNeMApAUO17aMr3FK82wOilLzDLWD3kAoCLAI8YBSkwpo/sFHbLjhFZ1J4HrgZKKCsKm37q9WRNAnNIiqI1k3XicFLiQJOBRvlW4lmMaF90mVNAyMSMu2lE0MjfGiYDg7M8UCaRybs74mUhFoPQ990hgR6elYbk/9pzQSCcy/lUZwYh3R6KEgEBonH6eAOV8a0GBpAqOLmV0x7RBEKJsO8CcGdtTwPaqWie1os3ZwUyhdZHDm0jw7QMXLRGSqja1RBVUTRI3pGr+jNerJerHfrY9q6YGUze+hPWZ8/jBmdkA==</latexit>
(P 2 C)⇥D
<latexit sha1_base64="CGQSHpxX7rNArWVVPLW9EpJ0UXQ=">AAACh3icbVFdaxNBFJ1dq7ZRa6qPvlwMQmtL3I2lLYhQrYE+lShNW8hultnJJB06O7PM3JXEZf+KP8o3/42zScS28cKFw7nncL/SXAqLQfDb8x+sPXz0eH2j8eTps83nza0XF1YXhvE+01Kbq5RaLoXifRQo+VVuOM1SyS/Tm5O6fvmdGyu0OsdZzuOMTpQYC0bRUUnz548kgI8wmCZlhHyKJZPU2qr6ANMkH4bQXYBODSI50mgXxBl0Y9iF7l9brp1pD7oQCQVRRvE6Tctv1bDc7g07JzsRioxb+FJL7lhW9We74T950mwF7WAesArCJWiRZfSS5q9opFmRcYXzTQZhkGNcUoOCSV41osLynLIbOuEDBxV1feJyfscK3jhmBGNtXCqEOXvbUdLM2lmWOmU9sb1fq8n/1QYFjo/iUqi8QK7YotG4kIAa6qfASBjOUM4coMwINyuwa2ooQ/e6hjtCeH/lVXDRaYcH7c7X/dbx5+U51skr8ppsk5AckmNySnqkT5i35r313nv7/ob/zj/wjxZS31t6XpI74X/6A/m3wSM=</latexit>
1 2 N
z0 = [xclass ; xp E; xp E; . . . ; xp E] + Epos , E 2 R , Epos 2 R(N +1)⇥D
zl0 = MSA(LN(zl zl = MLP(LN(zl0 )) + zl0 , l = 1, . . . , L 0
<latexit sha1_base64="B+bSfZj0qiL1weZ6SM1kKZqqtkk=">AAAB/XicbVDJSgNBEO1xjXEbl5uXxiDES5gJol6EoBcPQSKYBZIx9HR6kiY9C9014mQI/ooXD4p49T+8+Td2loMmPih4vFdFVT03ElyBZX0bC4tLyyurmbXs+sbm1ra5s1tTYSwpq9JQhLLhEsUED1gVOAjWiCQjvitY3e1fjfz6A5OKh8EdJBFzfNINuMcpAS21zf0EX+AWsEdIyzfD/KBdvreO22bOKlhj4HliT0kOTVFpm1+tTkhjnwVABVGqaVsROCmRwKlgw2wrViwitE+6rKlpQHymnHR8/RAfaaWDvVDqCgCP1d8TKfGVSnxXd/oEemrWG4n/ec0YvHMn5UEUAwvoZJEXCwwhHkWBO1wyCiLRhFDJ9a2Y9ogkFHRgWR2CPfvyPKkVC/ZpoXh7kitdTuPIoAN0iPLIRmeohK5RBVURRQP0jF7Rm/FkvBjvxsekdcGYzuyhPzA+fwDI4JQn</latexit>
1 ) + zl 1 ), l = 1, . . . , L y = LN(zL )
Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
MLP-Mixer: An all-MLP Architecture for Vision
YouTube Video
S⇥C
X2R
<latexit sha1_base64="cZbqZhENTtXVeZ70QvrdqH28/m0=">AAACMXicbVDLTsJAFJ36Fl9Vl24mEhM3ktYYdUlk49IXQkKR3A4DTJhOm5lblDT8khv/xLhxoTFu/QkHZKHgTW5ycu7znDCRwqDnvTozs3PzC4tLy7mV1bX1DXdz69bEqWa8zGIZ62oIhkuheBkFSl5NNIcolLwSdkvDeqXHtRGxusF+wusRtJVoCQZoqYZ7XqWBUDSIADthmF0N7rJrGqCIuKGlAQ20aHcQtI7vLcsfMLPL5UEPZMqbVKgkRYpgjw0abt4reKOg08AfgzwZx0XDfQ6aMUsjrpBJMKbmewnWM9AomN2XC1LDE2BdaPOahQrsS/VspHhA9yzTpK1Y21RIR+zviQwiY/pRaDuHysxkbUj+V6ul2DqtZyNhXLGfQ61UUozp0D7aFJozlH0LgGlhf6WsAxoYWpNz1gR/UvI0uD0s+MeFw8ujfPFsbMcS2SG7ZJ/45IQUyTm5IGXCyCN5IW/k3XlyXp0P5/OndcYZz2yTP+F8fQMd2qrT</latexit>
HW
<latexit sha1_base64="T816eTeoqkx5GMR6vQS18bZklGU=">AAAB/XicbVDLSsNAFL2pr1pf8bFzM1gEVyUpom6EopsuK9oHtLVMppN26GQSZiZCDcFfceNCEbf+hzv/xmmbhbYeuHA4517uvceLOFPacb6t3NLyyupafr2wsbm1vWPv7jVUGEtC6yTkoWx5WFHOBK1rpjltRZLiwOO06Y2uJ37zgUrFQnGnxxHtBnggmM8I1kbq2Qe3CF2iji8xSaqomSa1+3Las4tOyZkCLRI3I0XIUOvZX51+SOKACk04VqrtOpHuJlhqRjhNC51Y0QiTER7QtqECB1R1k+n1KTo2Sh/5oTQlNJqqvycSHCg1DjzTGWA9VPPeRPzPa8fav+gmTESxpoLMFvkxRzpEkyhQn0lKNB8bgolk5lZEhtgkoU1gBROCO//yImmUS+5ZqXxzWqxcZXHk4RCO4ARcOIcKVKEGdSDwCM/wCm/Wk/VivVsfs9aclc3swx9Ynz9t3JPx</latexit>
S= 2
P Linear computational complexity
P ⇥ P ! resolution of each patch in the number of input patches
<latexit sha1_base64="44meArKE8XxsFY5DNE++U8rgwvs=">AAACH3icbVDLSgNBEJz1bXxFPXoZDIKnsCsSPYpePEYwUUhC6J30Zgdnd5aZXjUs+RMv/ooXD4qIt/yNk5iDr4KBoqqa6a4wU9KS74+8mdm5+YXFpeXSyura+kZ5c6tpdW4ENoRW2lyHYFHJFBskSeF1ZhCSUOFVeHM29q9u0Vip00saZNhJoJ/KSAogJ3XLtTpvk0zQckeM7McExug7J+I9FQatVvk4yXXEEUTMMyARD7vlil/1J+B/STAlFTZFvVv+aPe0yBNMSSiwthX4GXUKMCSFwmGpnVvMQNxAH1uOpuBW6hST+4Z8zyk9HmnjXkp8on6fKCCxdpCELpkAxfa3Nxb/81o5RcedQqZZTpiKr4+iXHHSfFwW70mDgtTAERBGul25iMGAIFdpyZUQ/D75L2keVINa9eDisHJyOq1jie2wXbbPAnbETtg5q7MGE+yBPbEX9uo9es/em/f+FZ3xpjPb7Ae80SePx6NU</latexit>
C ! hidden dimension
<latexit sha1_base64="2bG1gB2LtLTWvgj88TdmdU78++M=">AAACDnicbVC7SgNBFJ31GeMramkzGAJWYTeIWgbTWEYwD0hCmJ29SYbMzi4zd9Ww5Ats/BUbC0Vsre38GyePQhMPDBzOuYc79/ixFAZd99tZWV1b39jMbGW3d3b39nMHh3UTJZpDjUcy0k2fGZBCQQ0FSmjGGljoS2j4w8rEb9yBNiJStziKoROyvhI9wRlaqZsrVGhbi/4AmdbRPW0jPGA6EEEAigYiBDVJjru5vFt0p6DLxJuTPJmj2s19tYOIJzaPXDJjWp4bYydlGgWXMM62EwMx40PWh5alioVgOun0nDEtWCWgvUjbp5BO1d+JlIXGjELfToYMB2bRm4j/ea0Ee5edVKg4QVB8tqiXSIoRnXRjD9bAUY4sYVwL+1fKB0wzjrbBrC3BWzx5mdRLRe+8WLo5y5ev5nVkyDE5IafEIxekTK5JldQIJ4/kmbySN+fJeXHenY/Z6IozzxyRP3A+fwD4OJy3</latexit>
! token mixing
<latexit sha1_base64="287xY6GjCWatioS/32Fmj0X6iaA=">AAACCHicbVA9SwNBEN3z2/gVtbRwMQhW4S6IWoo2lhFMFJIj7G0myZK93WN3ThOOlDb+FRsLRWz9CXb+GzfxCjU+GHi8N8PMvCiRwqLvf3ozs3PzC4tLy4WV1bX1jeLmVt3q1HCocS21uYmYBSkU1FCghJvEAIsjCddR/3zsX9+CsUKrKxwmEMasq0RHcIZOahV3m0Z0e8iM0Xe0iTDADHUfFI3FQKjuqFUs+WV/AjpNgpyUSI5qq/jRbGuexqCQS2ZtI/ATDDNmUHAJo0IztZAw3mddaDiqWAw2zCaPjOi+U9q0o40rhXSi/pzIWGztMI5cZ8ywZ/96Y/E/r5Fi5yTMhEpSBMW/F3VSSVHTcSq0LQxwlENHGDfC3Up5jxnG0WVXcCEEf1+eJvVKOTgqVy4PS6dneRxLZIfskQMSkGNySi5IldQIJ/fkkTyTF+/Be/Jevbfv1hkvn9kmv+C9fwH0JpqT</latexit>
! channel mixing
<latexit sha1_base64="cQTaJDyoGSjQku5ImSmtqI3MOjA=">AAACCnicbVA9TwJBEN3zE/ELtbRZJSZW5I4YtSTaWGIiHwkQsrcMsGFv77I7p5ALtY1/xcZCY2z9BXb+Gxe4QsGXTPLy3kxm5vmRFAZd99tZWl5ZXVvPbGQ3t7Z3dnN7+1UTxppDhYcy1HWfGZBCQQUFSqhHGljgS6j5g+uJX7sHbUSo7nAUQStgPSW6gjO0Ujt31NSi10emdfhAmwhDTHifKQWSBmIoVG/czuXdgjsFXSReSvIkRbmd+2p2Qh4HoJBLZkzDcyNsJUyj4BLG2WZsIGJ8wHrQsFSxAEwrmb4ypidW6dBuqG0ppFP190TCAmNGgW87A4Z9M+9NxP+8Rozdy1YiVBQjKD5b1I0lxZBOcqEdoYGjHFnCuBb2VmqD0IyjTS9rQ/DmX14k1WLBOy8Ub8/ypas0jgw5JMfklHjkgpTIDSmTCuHkkTyTV/LmPDkvzrvzMWtdctKZA/IHzucPbjebXw==</latexit>
Patch Merging
<latexit sha1_base64="s1MymVVWVy48NNhVPHgezhq0QGw=">AAACAXicbVDLSsNAFJ34rPUVdSO4GSyCq5JUUJdFN26ECvYBbSiTyU07dDIJMxOhhLrxV9y4UMStf+HOv3GaZqGtBy4czrl37tzjJ5wp7Tjf1tLyyuraemmjvLm1vbNr7+23VJxKCk0a81h2fKKAMwFNzTSHTiKBRD6Htj+6nvrtB5CKxeJejxPwIjIQLGSUaCP17cNe/kYmIZg0iKZDfAtywMSgb1ecqpMDLxK3IBVUoNG3v3pBTNMIhKacKNV1nUR7GZGaUQ6Tci9VkBA6IgPoGipIBMrL8u0TfGKUAIexNCU0ztXfExmJlBpHvumMiB6qeW8q/ud1Ux1eehkTSapB0NmiMOVYx3gaBw6YBKr52BBCJTN/xXRIJKHahFY2IbjzJy+SVq3qnlfP7mqV+lURRwkdoWN0ilx0geroBjVQE1H0iJ7RK3qznqwX6936mLUuWcXMAfoD6/MH3uyXKA==</latexit>
is set to 2C.
|| {z
{z}}
patch size
Linear Embedding
<latexit sha1_base64="VSg/mWhBglKFJV0V0anOltXF/2M=">AAACBHicbVDLSsNAFJ3UV62vqMtuBovgqiQV1GVRBBcuKtgHtKFMJrft0MkkzEyEErpw46+4caGIWz/CnX/jNM1CWw8MHM65rzl+zJnSjvNtFVZW19Y3ipulre2d3T17/6ClokRSaNKIR7LjEwWcCWhqpjl0Ygkk9Dm0/fHVzG8/gFQsEvd6EoMXkqFgA0aJNlLfLveyGamEYHprZhCJr0MfgoCJYd+uOFUnA14mbk4qKEejb3/1gogmIQhNOVGq6zqx9lIiNaMcpqVeoiAmdEyG0DVUkBCUl2YHTPGxUQI8iKR5QuNM/d2RklCpSeibypDokVr0ZuJ/XjfRgwsvZSJONAg6XzRIONYRniWCAyaBaj4xhFDJzK2YjogkVJvcSiYEd/HLy6RVq7pn1dO7WqV+mcdRRGV0hE6Qi85RHd2gBmoiih7RM3pFb9aT9WK9Wx/z0oKV9xyiP7A+fwAiVZhn</latexit>
LN ! Layer Norm
<latexit sha1_base64="qVenOu9h8p8VuCuCY8gOkU1kURA=">AAACEHicbVA9SwNBEN3z2/h1ammzGESrcKeilkEbCxEFY4RcCHubSbK4d3vszqnhuJ9g41+xsVDE1tLOf+MmXqHGBwOP92aYmRcmUhj0vE9nbHxicmp6ZrY0N7+wuOQur1walWoONa6k0lchMyBFDDUUKOEq0cCiUEI9vD4a+PUb0Eao+AL7CTQj1o1FR3CGVmq5mwHCHWYnpzkNtOj2kGmtbmmhsj5oeqp0lLfcslfxhqCjxC9ImRQ4a7kfQVvxNIIYuWTGNHwvwWbGNAouIS8FqYGE8WvWhYalMYvANLPhQzndsEqbdpS2FSMdqj8nMhYZ049C2xkx7Jm/3kD8z2uk2DloZiJOUoSYfy/qpJKiooN0aFto4Cj7ljCuhb2V8h7TjKPNsGRD8P++PEoutyv+XmXnfLdcPSzimCFrZJ1sEZ/skyo5JmekRji5J4/kmbw4D86T8+q8fbeOOcXMKvkF5/0LsBSdow==</latexit>
<latexit sha1_base64="5yJwGPCyQ/6NPFcVzC299cuj3S0=">AAACI3icbVC7TgJBFJ31ifhCLW0mEhMsILuYqLECbWxIMMgjAUJmZ+/ChNlHZmY1ZMO/2PgrNhYaYmPhvzgLFAreZJKTc+49c++xQ86kMs0vY2V1bX1jM7WV3t7Z3dvPHBw2ZBAJCnUa8EC0bCKBMx/qiikOrVAA8WwOTXt4m+jNRxCSBf6DGoXQ9UjfZy6jRGmql7nuTD1iAc64yXwneMondg6uRFyx/ACIg2vA3XxZKfCTGZxr5iu18lm6l8maBXNaeBlYc5BF86r2MpOOE9DI0z6UEynblhmqbkyEYpTDON2JJISEDkkf2hr6xAPZjaf7jfFplGzlBkI/X+Ep+3siJp6UI8/WnR5RA7moJeR/WjtS7lU3Zn4Y6QPp7CM34lgFOAkMO0wAVXykAaGC6V0xHRBBqNKxJiFYiycvg0axYF0Uzu+L2dLNPI4UOkYnKIcsdIlK6A5VUR1R9Ixe0Tv6MF6MN2NifM5aV4z5zBH6U8b3D8EEo54=</latexit>
<latexit sha1_base64="Lm8WJp8vQJHVA9J512Mfl/5weqg=">AAACJnicbVDLSgNBEJz1GeMr6tHLYBD0kLAbQb0IiV68CJEYIyQhzM72miGzD2Z6lbDka7z4K148KCLe/BRnkxx8FQwUVd093eXGUmi07Q9rZnZufmExt5RfXlldWy9sbF7rKFEcmjySkbpxmQYpQmiiQAk3sQIWuBJa7uAs81t3oLSIwiscxtAN2G0ofMEZGqlXOOmMZ6QKvFGjL3wEj7ZE6EX39CKRKEp9YB5tgPRLNUQIsy6612iVLhq1/XyvULTL9hj0L3GmpEimqPcKLx0v4klgBnHJtG47dozdlCkUXMIo30k0xIwP2C20DQ1ZALqbjlcc0V2jeNSPlHkh0rH6vSNlgdbDwDWVAcO+/u1l4n9eO0H/uJuKME7MhXzykZ9IihHNMqOeUMBRDg1hXAmzK+V9phhHk2wWgvP75L/kulJ2DssHl5Vi9XQaR45skx2yRxxyRKrknNRJk3DyQJ7IC3m1Hq1n6816n5TOWNOeLfID1ucXDHOkyg==</latexit>
Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows."
Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
High-Performance Large-Scale Image
Recognition Without Normalization YouTube Playlist
✓ ! model parameters
<latexit sha1_base64="pT0tMd5pZHmWcAwrae/w3UcY3pA=">AAACJnicbVDLSsNAFJ34tr6qLt0MFkFclEREXRbduFSwrdCEMpnctIOTTJi5UUvoP/gZfoFb/QJ3Iu7c+B9O2i58HRg4nPs4d06YSWHQdd+dqemZ2bn5hcXK0vLK6lp1faNlVK45NLmSSl+FzIAUKTRRoISrTANLQgnt8Pq0rLdvQBuh0kscZBAkrJeKWHCGVupW93zsAzLqa9HrI9Na3VIf4Q6LREUgacY0SwDthmG3WnPr7gj0L/EmpEYmOO9WP/1I8TyBFLlkxnQ8N8OgYBoFlzCs+LmBjPFr1oOOpak1MkEx+tOQ7lglorHS9qVIR+r3iYIlxgyS0HYmDPvmd60U/6t1coyPg0KkWY6Q8rFRnEuKipYB0Uho4CgHljCuhb2V8r5NgZch/HCJTHlamYv3O4W/pLVf9w7r+xcHtcbJJKEFskW2yS7xyBFpkDNyTpqEk3vySJ7Is/PgvDivztu4dcqZzGySH3A+vgBHT6eY</latexit>
@L
<latexit sha1_base64="ngytrpztVDA6Rs5LAn1zoyMEn3E=">AAACRXicbVBNa9tAEF2laZM6/XDaYy5LTCEnI4XS9hIIyaEt9OBCbAcsY0arkb1kpRW7oyRG6DflZ/QX9JBLeuqxt9Brs7IFbZw+WHi8mdl586JcSUu+f+2tPVp//GRj82lr69nzFy/b268GVhdGYF9opc1pBBaVzLBPkhSe5gYhjRQOo7Pjuj48R2Olzk5onuM4hWkmEymAnDRpf/7ID3iYGBBlmIMhCYp/qf7ykGZIUPHQyOmMwBh94TS8pHJqIJaYET9HQdpUk3bH7/oL8IckaEiHNehN2j/DWIsidX8IBdaOAj+ncVkvFgqrVlhYzEGcwRRHjmaQoh2Xi5Mr/sYpMU+0cc95WKj/TpSQWjtPI9eZAs3saq0W/1cbFZR8GJcyywvCTCwXJYXipHmdH4+lcfequSMgjHReuZiBy49cyve2xLa2VucSrKbwkAz2u8G77v7Xt53DoyahTbbDdtkeC9h7dsg+sR7rM8Gu2Hd2w35437xf3q33e9m65jUzr9k9eH/uAHPYtDQ=</latexit>
G= ! gradient vector
Normalizer-Free ResNets @✓ <latexit sha1_base64="fNwYbH5bi6o3ocUNOSK7o6C3c9Y=">AAACHnicbVDLTgIxFO3gC/E16tJNAzFxI5lhoS6JJsYVQSOPBAjpdC7Q0M5M2o4JTtj7GX6BW/0Cd8atfoD/YRlYCHiSJifn3Nt7cryIM6Ud59vKrKyurW9kN3Nb2zu7e/b+QV2FsaRQoyEPZdMjCjgLoKaZ5tCMJBDhcWh4w6uJ33gAqVgY3OtRBB1B+gHrMUq0kbp2vp3+kUjwx5VQCsLZI8jTawmA70BVQKuuXXCKTgq8TNwZKaAZql37p+2HNBYQaMqJUi3XiXQnIVIzymGca8cKIkKHpA8tQwMiQHWSNMcYHxvFx71QmhdonKp/NxIilBoJz0wKogdq0ZuI/3mtWPcuOgkLolhDQKeHejHHOsSTYrDPJFDNR4YQKpnJiumASEK1qW/uiq8m0camF3exhWVSLxXds2LptlQoX84ayqIjlEcnyEXnqIxuUBXVEEVP6AW9ojfr2Xq3PqzP6WjGmu0cojlYX78ytqPV</latexit>
`
<latexit sha1_base64="lJa3mRo7ucEzuuMFhV9B3fx3EAw=">AAACSnicbVBNT9tAEF2Hr/DZQI9cVg1IXIhshApHBJeeUJAaQIpDNF6P4xXrXWt3TBtZ+Vf8DP4AvdL+gd5QL7VDDgX6pJGe3pvRzLwoV9KR7z96jbn5hcWl5vLK6tr6xofW5talM4UV2BNGGXsdgUMlNfZIksLr3CJkkcKr6Pas9q/u0Dpp9Fca5zjIYKRlIgVQJQ1b58lNGaJSEx5aOUoJrDXfeEj4ncqk0KLu4sJkeUEY82jMKUW+U0/s7FPKLToZF6B4ZEGLdDJstf2OPwV/T4IZabMZusPWrzA2oshQk1DgXD/wcxqUYEkKhZOVsHCYg7iFEfYrqiFDNyinf0/4bqXEPDG2Kk18qv47UULm3DiLqs4MKHVvvVr8n9cvKDkelFLXX2vxsigpFCfD6xB5LC0KUuOKgLCyupWLFCwIqqJ+tSV29Wl1LsHbFN6Ty4NO8LlzcHHYPjmdJdRk2+wT22MBO2In7Avrsh4T7J79YE/sp/fg/faevT8vrQ1vNvORvUJj/i8rgLTT</latexit>
! clipping threshold hyper-parameter f ! function computed by the `-th residual branch <latexit sha1_base64="/NSJwbu+vvsn+Poszn8YtcBZr9Q=">AAACOXicbVDLThsxFPXQ8myBtCzZWI2Qumk0EyFgg4RgwzKVGkDKRJHHc5Ox4rEt+w4QjfI1/Qy+gC2sWLJAQt32B+pJsigJR7J0dO7jXJ/ESOEwDJ+CpQ8fl1dW19Y3Pn3e3Nquffl64XRhObS5ltpeJcyBFAraKFDClbHA8kTCZTI8q+qX12Cd0OoXjgx0czZQoi84Qy/1asex9M0po7EVgwyZtfqGxgi3WHIpjBFqQDGz4DItU5r5DfaHYZblgGDHvVo9bIQT0EUSzUidzNDq1V7iVPMiB4VcMuc6UWiwWzKLgksYb8SFA8P4kA2g46nyPq5bTr45pnteSWlfW/8U0on6/0TJcudGeeI7c4aZm69V4nu1ToH9o24plCkQFJ8a9QtJUdMqM5oKCxzlyBPGrfC3Up75ELjP4K1L6qrTqlyi+RQWyUWzER00mj/36yens4TWyC75Rr6TiBySE3JOWqRNOPlN7skDeQzugufgNfgzbV0KZjM75A2Cv/8AQ5OvtQ==</latexit>
<latexit sha1_base64="ZLTUkgDj4Wl2unre2h9a0KzqURE=">AAACPXicbZC7SgNBFIZnvd+NWtoMBkGbsCuillEbywhGhSSEs7MncXB2dpk5G4xLnsfH8AlsFR8gndjaOhtTeDsw8POf63xhqqQl33/1Jianpmdm5+YXFpeWV1ZLa+uXNsmMwLpIVGKuQ7CopMY6SVJ4nRqEOFR4Fd6eFvmrHhorE31B/RRbMXS17EgB5Kx26bhJeEf5TgoGYiQj7zHilPAQeQ+MBC2Qu4EWTU/qLgfiUkuSoOT9aMLuoF0q+xV/FPyvCMaizMZRa5eGzSgRWYyahAJrG4GfUisHQ1IoHCw0M4spiFvoYsNJ7e6yrXz01QHfdk7EO4lxTxMfud87coit7cehq4yBbuzvXGH+l2tk1Dlq5VKnGaEWX4s6mSpYFNx4JA0KUn0nQBiHQHBx46AJcnR/bIlscVrBJfhN4a+43KsEB5W98/1y9WRMaI5tsi22wwJ2yKrsjNVYnQn2wJ7YM3vxHr2h9+a9f5VOeOOeDfYjvI9PsxSw0A==</latexit>
Adaptive Gradient Clipping for Efficient Large-Batch Training (parametrized to be variance preserving at initialization) <latexit sha1_base64="fprqNr76nncqOD2OtHAfuteJqLs=">AAACRHicbVC7TgMxEPTxDO8AJY1FhERDdJcCKAMRgoICJAJISRTt+fYSC5/Psn1I0Sm/xGfwBTQU0NHRIVqEc6TgNdVodmd3NKES3Fjff/QmJqemZ2ZLc/MLi0vLK+XVtUuTZpphk6Ui1dchGBRcYtNyK/BaaYQkFHgV3jRG86tb1Ian8sIOFHYS6EkecwbWSd3ySbu4kWuMhgcRKMtvkR5riDhKSxuCK8Vlj8appkexsxXyKege7hyCZX16oYFLt9ItV/yqX4D+JcGYVMgYZ93ySztKWZa4g0yAMa3AV7aTg7acCRzOtzODCtgN9LDlqIQETScv0g7pllOiIlWcukCF+t2RQ2LMIAndZgK2b37PRuJ/s1Zm4/1OzqXKLEr29SjOBLUpHdVHI66RWTFwBJjmLitlfdDArCv5x5fIjKINXS/B7xb+kstaNdit1s5rlfrhuKES2SCbZJsEZI/UyQk5I03CyB15IE/k2bv3Xr037/1rdcIbe9bJD3gfn2fcsvc=</latexit>
`
<latexit sha1_base64="yxH0arw6nAS3FaE+Kh6uG6dY5wI=">AAACL3icbVBLSgNBFOyJ//iLunTTGIQEJMyoqBtBdONSwSRCJoaezpuksedD9xsxDjmIx/AEbvUE4kbcuPAW9iRZaGJBQ1H1HvW6vFgKjbb9buWmpmdm5+YX8otLyyurhbX1mo4SxaHKIxmpa49pkCKEKgqUcB0rYIEnoe7dnmV+/Q6UFlF4hb0YmgHrhMIXnKGRWoU9F+Ee0xpT/ZJ/44KUpYdymR7TX/pDeYe6fqSYlDSbaBWKdsUegE4SZ0SKZISLVuHLbUc8CSBELpnWDceOsZkyhYJL6OfdREPM+C3rQMPQkAWgm+ngc326bZQ2NfHmhUgH6u+NlAVa9wLPTAYMu3rcy8T/vEaC/lEzFWGcIIR8GOQnkmJEs6ZoWyjgKHuGMK6EuZXyLlOMo+nzT0pbZ6f1TS/OeAuTpLZbcQ4qu5f7xZPTUUPzZJNskRJxyCE5IefkglQJJ4/kmbyQV+vJerM+rM/haM4a7WyQP7C+fwArB6lD</latexit>
` N ⇥M `
<latexit sha1_base64="vOqk3Wa1qb9FtMCmcQhPedsV/yg=">AAACU3icbVC5TsNAEN2YKxyBACXNioBEFdkIAWUEBTSggAhBipNovZ4kq6wP7Y4JkeVP4zMoqBEdfAEN65CCa6SRnt5cb54XS6HRtp8L1szs3PxCcXFpeaW0ulZe37jVUaI4NHgkI3XnMQ1ShNBAgRLuYgUs8CQ0veFpXm/eg9IiCm9wHEM7YP1Q9ARnaKhuuXnWcUFK6oqQugHDgeel11knvaQuigA0vcioq0R/gEypaGRYeMC0r5gvIEQ6EjigCnQMHClGdKc5WbeTdcsVu2pPgv4FzhRUyDTq3fKr60c8CcxWLpnWLceOsZ0yhYJLyJbcREPM+JD1oWVgyIy4djoxIKO7hvFpL1ImjaoJ+30iZYHW48AznfmP+nctJ/+rtRLsHbdTEcYJQsi/DvUSmb+au0l9oczncmwA40oYrZQPmGIcjec/rvg6l5b74vx24S+43a86h9X9q4NK7WTqUJFskW2yRxxyRGrknNRJg3DySF7IG3kvPBU+LMua/Wq1CtOZTfIjrNInhza18w==</latexit>
<latexit sha1_base64="HYBzBR1ZSiyDh1oIjRoRUFyxKA0=">AAACInicbVDLSgNBEJz1GeNr1aOXwaDES9gNol6EoBePEcwDsjHMTjrJkNmHM71iWPIHfoZf4FW/wJt4Erz6H+4mOZjEgoaiqpvuLjeUQqNlfRkLi0vLK6uZtez6xubWtrmzW9VBpDhUeCADVXeZBil8qKBACfVQAfNcCTW3f5X6tQdQWgT+LQ5CaHqs64uO4AwTqWUeOS4gu3NASnpBHX2vMHYQHjGuMjXM90bO8bBl5qyCNQKdJ/aE5MgE5Zb547QDHnngI5dM64ZthdiMmULBJQyzTqQhZLzPutBIqM880M149M+QHiZKm3YClZSPdKT+nYiZp/XAc5NOj2FPz3qp+J/XiLBz3oyFH0YIPh8v6kSSYkDTcGhbKOAoBwlhXInkVsp7TDGOSYRTW9o6PS3NxZ5NYZ5UiwX7tFC8OcmVLicJZcg+OSB5YpMzUiLXpEwqhJMn8kJeyZvxbLwbH8bnuHXBmMzskSkY37+sY6UX</latexit>
` `)
N M = Var(h
<latexit sha1_base64="46ZLVMZz8uetNKKu94iZ8ymqo3Y=">AAACNXicbVDLTgIxFO3gC/GFunTTSExwQ2aI8bEgIZoYNxpM5JEwMOmUAoXOI23HhAzzLX6GX+BW1y7cqVt/wQ7MQsCTNDn3nHtzb4/tMyqkrr9rqaXlldW19HpmY3Nreye7u1cTXsAxqWKPebxhI0EYdUlVUslIw+cEOTYjdXt4Ffv1R8IF9dwHOfJJy0E9l3YpRlJJVvbCHNfbJmHMHFvXEJagKQLHCmnJiNp3STGIi1uYnzYqcxAdt4tWNqcX9AngIjESkgMJKlb2y+x4OHCIKzFDQjQN3ZetEHFJMSNRxgwE8REeoh5pKuoih4hWOPliBI+U0oFdj6vnSjhR/06EyBFi5Niq00GyL+a9WPzPawaye94KqesHkrh4uqgbMCg9GOcFO5QTLNlIEYQ5VbdC3EccYalSndnSEfFpkcrFmE9hkdSKBeO0ULw/yZUvk4TS4AAcgjwwwBkogxtQAVWAwRN4Aa/gTXvWPrRP7XvamtKSmX0wA+3nF0Voq0g=</latexit>
XX
kW ` kF = (Wij` )2 Scaled Weight Standardization
<latexit sha1_base64="zWDDqNQtVZoLepELN3K9shdHX6I=">AAACK3icbVDLSsNAFJ34rO+qSzeDRXBVExF1WXTjUtFaoS3lZnLbDk4mYeZGrKGf4Wf4BW71C1wpbvsfJmkXvg4MHM65d+7h+LGSllz33Zmanpmdmy8tLC4tr6yuldc3rm2UGIF1EanI3PhgUUmNdZKk8CY2CKGvsOHfnuZ+4w6NlZG+okGM7RB6WnalAMqkTnmvRXhPaav4KTUYDC8FKAx4A2WvT/ySQAdgAvlQLAw75YpbdQvwv8SbkAqb4LxTHrWCSCQhahIKrG16bkztFAxJoXC42EosxiBuoYfNjGoI0bbTIs6Q72RKwLuRyZ4mXqjfN1IIrR2EfjYZAvXtby8X//OaCXWP26nUcUKoxfhQN1GcIp63xANpUJAaZASEkVlWLvpgQFDW5Y8rgc2j5b14v1v4S673q95hdf/ioFI7mTRUYltsm+0yjx2xGjtj56zOBHtkz+yFvTpPzpvz4XyOR6ecyc4m+wFn9AWz2Knf</latexit>
i=1 j=1
kW ` kF
<latexit sha1_base64="vqkkzwhegHfxZ07Xbr6B/wKh4sU=">AAACjXicbVHbbtNAEF2bWymXpvCIkEYkXJ4iuyqFB4QqkCiPRSJNpThE6/XYXnVv2l03RG4+gs/jA/gGXlknkaAtI6105pwdndGZ3AjufJL8jOIbN2/dvrN1d/ve/QcPd3q7j06cbizDEdNC29OcOhRc4chzL/DUWKQyFzjOzz52+vgcreNaffULg1NJK8VLzqgP1Kz3I/P43beDrLSUtdnF+FuGQmQXs0/L0B397QZgrD7nBTqg4Lg0AkEidY1F0CXUeg6yYfVKVFUQK0sLjsqD82hgzoUAVlNVIfg6jFhecUUFzJFXtXcwWDsPlrNePxkmq4LrIN2APtnU8az3Kys0a2TwYoI6N0kT46cttZ4zgcvtrHFoKDujFU4CVFSim7ar6JbwPDAFlNqGF3Zdsf9OtFQ6t5B5+Cmpr91VrSP/p00aX76dtlyZxqNia6OyEeA1dHeAgltkXiwCoMzysGsXTziCD9e65FK4brUul/RqCtfByd4wPRjufdnvH37YJLRFnpBn5BVJyRtySD6TYzIijPyOnkYvopfxTvw6fhe/X3+No83MY3Kp4qM/fGjIuw==</latexit>
`
provides a simple measure of how much a single gradient step will change the original weights W
kG` kF
ch a single gradient step will change the original weights W `
W ` = hG` , h ! learning rate
<latexit sha1_base64="IgEpVWMhEaDfVt0jE8BIBeLSHkU=">AAACOXicbZDPbtNAEMbXpdCQUjBw5LIiQuqBRnZUUS6RIorUHlOJ/JHiEI03k3jV9draHQORlafpY/AEXOHUYw+VKq68QO0khybhk1b66ZsZzewXpkpa8rxrZ+fR7uMne5Wn1f1nB89fuC9fdW2SGYEdkajE9EOwqKTGDklS2E8NQhwq7IWXp2W99w2NlYn+QrMUhzFMtZxIAVRYI7cZfEZFwHtfA1SKN/lRxM8W/J5HPDByGhEYk3znAeEPyhWC0VJPuQHC+citeXVvIb4N/gpqbKX2yL0NxonIYtQkFFg78L2UhjkYkkLhvBpkFlMQlzDFQYEaYrTDfPHNOX9XOGM+SUzxNPGF+3Aih9jaWRwWnTFQZDdrpfm/2iCjycdhLnWaEWqxXDTJFKeEl5nxsTQoSM0KAGFkcSsXERgQVCS7tmVsy9PKXPzNFLah26j7H+qNi+Na69MqoQp7w96yQ+azE9Zi56zNOkywK/aL/WZ/nJ/OjXPn/F227jirmddsTc6/e0tCrXM=</latexit>
k W ` kF kG` kF
<latexit sha1_base64="R8LoJtbk/OiM9KDGPtkyQlv8hb4=">AAACSnicdVDLSgMxFM3UqrW+qi7dBIvgqswUUTdCUVFXUsE+oFNLJr1jQzMPkoxQpv0rP8Mf0K36A+7EjZm2Qh96IHDuuedyb44TciaVab4YqYX04tJyZiW7ura+sZnb2q7KIBIUKjTggag7RAJnPlQUUxzqoQDiORxqTvc86dceQUgW+HeqF0LTIw8+cxklSkut3I3tCkJju29fAFcE1+5t4Nzuty4HWpyo8Cnu4F/z1X+uVi5vFswh8DyxxiSPxii3ch92O6CRB76inEjZsMxQNWMiFKMcBlk7khAS2iUP0NDUJx7IZjz89wDva6WN3UDo5ys8VCcnYuJJ2fMc7fSI6sjZXiL+1WtEyj1pxswPIwU+HS1yI45VgJMQcZsJoIr3NCFUMH0rph2is1E66qktbZmcluRizaYwT6rFgnVUKN4e5ktn44QyaBftoQNkoWNUQteojCqIoif0it7Qu/FsfBpfxvfImjLGMztoCqn0DypstOM=</latexit>
The activation functions are also scaled by a non-linearity specific scalar gain .
<latexit sha1_base64="PbZjPJYFy8A/AdG9MuQ5Zu1VsgA=">AAACXXicbZA9bxNBEIbXB4EQQmKgoKAZ4SDRYN25AMoIGsogxUkk27Lm9ubsVfbjtDsXcTr59/EbqKgoaaFl7+KCJIy00qv3nY/Vk1daBU7T74Pk3v2dBw93H+093n9ycDh8+uwsuNpLmkqnnb/IMZBWlqasWNNF5QlNruk8v/zU5edX5INy9pSbihYGV1aVSiJHaznEOdNXbk/XBChZXfU2lLWVnQiAPgY6OAgSNRWQN4BgnX3bHUSvuIFQkew29i3oYYXKwtF8hcbg0XizHI7ScdoX3BXZVozEtk6Ww5/zwsnakGWpMYRZlla8aNGzkpo2e/M6UIXyElc0i9KiobBoexQbeB2dAkrn47MMvfvvRIsmhMbksdMgr8PtrDP/l81qLj8sWmWrmsnK60NlrYEddFyhUJ4k6yYKlJFL5CHX6CPUSP/GlSJ0X+u4ZLcp3BVnk3H2bjz5Mhkdf9wS2hUvxSvxRmTivTgWn8WJmAopvolf4rf4M/iR7CT7ycF1azLYzjwXNyp58RetRbi4</latexit>
`
=h `
kW kF kW kF The activation functions are also scaled by a non-linearity specific scalar gain .
<latexit sha1_base64="PbZjPJYFy8A/AdG9MuQ5Zu1VsgA=">AAACXXicbZA9bxNBEIbXB4EQQmKgoKAZ4SDRYN25AMoIGsogxUkk27Lm9ubsVfbjtDsXcTr59/EbqKgoaaFl7+KCJIy00qv3nY/Vk1daBU7T74Pk3v2dBw93H+093n9ycDh8+uwsuNpLmkqnnb/IMZBWlqasWNNF5QlNruk8v/zU5edX5INy9pSbihYGV1aVSiJHaznEOdNXbk/XBChZXfU2lLWVnQiAPgY6OAgSNRWQN4BgnX3bHUSvuIFQkew29i3oYYXKwtF8hcbg0XizHI7ScdoX3BXZVozEtk6Ww5/zwsnakGWpMYRZlla8aNGzkpo2e/M6UIXyElc0i9KiobBoexQbeB2dAkrn47MMvfvvRIsmhMbksdMgr8PtrDP/l81qLj8sWmWrmsnK60NlrYEddFyhUJ4k6yYKlJFL5CHX6CPUSP/GlSJ0X+u4ZLcp3BVnk3H2bjz5Mhkdf9wS2hUvxSvxRmTivTgWn8WJmAopvolf4rf4M/iR7CT7ycF1azLYzjwXNyp58RetRbi4</latexit>
k W ` kF
<latexit sha1_base64="KRwVF8Y/3X+FYcLpCyoTmh/iWtg=">AAACXnicbZA9TxtBEIbXRyCEj+CEBinNJCZSisi6c5GkRCFCpCNSjJF8xppbz9kr9vZOu3MB6/D/y1+go6GlTcrsGRd8ZKSV3nlnRjP7JIVWjsPwqhEsPVteeb76Ym19Y/PlVvPV62OXl1ZSV+Y6tycJOtLKUJcVazopLGGWaOolZ/t1vfeLrFO5+cnTggYZjo1KlUT21rCZfE9hN04tyiq+jL+RZoTeaUxax5fDg5k372W7oBxotGP6COcEdFGQZOAJAVtURpkxcA4JyTwjKI1j9Ge8HTZbYTucBzwV0UK0xCKOhs2beJTLMiPDUqNz/SgseFChZSU1zdbi0lGB8gzH1PfSYEZuUM1ZzOC9d0aQ5tY/wzB3709UmDk3zRLfmSFP3ONabf6v1i85/TKolClKJiPvFqWlrn9cg4WRsp6GnnqB0ip/K8gJerDs8T/YMnL1aTPPJXpM4ak47rSjT+3Oj05r7+uC0Kp4I96JDyISn8WeOBRHoiuk+C1uxR/xt3EdrASbwdZda9BYzGyLBxHs/AN5V7ii</latexit>
If kW ` kF
is large, we expect the training to become unstable!
Ensures that the combination of the -scaled activation function and a Scaled Weight
<latexit sha1_base64="VcpJYHDTmvzaJ7oL4wnpNBW9op4=">AAACinicbZHPbtNAEMbX5l8JBQLc4LIiQeJCZOcArXqpCkgci0qaSkkUjddjZ9X9Y+2OI1Irr9D34wF4Ax6AjZMDbRlpV59+84129G1WKekpSX5F8b37Dx4+2nvcebL/9Nnz7ouX597WTuBIWGXdRQYelTQ4IkkKLyqHoDOF4+zy86Y/XqLz0poftKpwpqE0spACKKB593pK+JOar8bXDj2nBVC4kAurM2laE7dFi/rTErSG/gcvQGHOQZBcbh1FbUQrwATOz7aGMcpyQfyMAgWXy6vAFKzQcen5EpwEI5CHdT26pTTlYN2Zd3vJIGmL3xXpTvTYrk7n3d/T3IpaoyGhwPtJmlQ0a8CRFArXnWntsQJxCSVOgjSg0c+aNrc1fxdIzgvrwjHEW/rvRAPa+5XOglMDLfzt3gb+rzepqTiYNdJUNaER24eKWnGyfPMJPJcOBalVECCcDLtysQAXEg1fdeOV3G9WW4dc0tsp3BXnw0H6cTD8Puwdn+wS2mNv2Fv2nqXsEztm39gpGzHB/kSvo17Uj/fjYXwYH22tcbSbecVuVPzlL9akxyY=</latexit>
`
<latexit sha1_base64="Kkzynynl4KXpAAKwpJN5qghEbfo=">AAACMHicbVBLTsMwFHT4/ymwZGPRIrGhSioELBEsYAkSLUhNqRz3pbVw4sh+AaqoF+EYnIAtnABWiAUbToHTdkEpI1kazbynN54gkcKg6747E5NT0zOzc/MLi0vLK6uFtfWaUanmUOVKKn0dMANSxFBFgRKuEw0sCiRcBbcnuX91B9oIFV9iN4FGxNqxCAVnaKVmYe+0KW58kJL6WrQ7yLRW99RHeMCsJEq72KG5oEIaMdTigZZOS71moeiW3T7oOPGGpEiGOG8WvvyW4mkEMXLJjKl7boKNjGkUXEJvwU8NJIzfsjbULY1ZBKaR9X/Xo9tWadFQaftipH3190bGImO6UWAnbcaO+evl4n9ePcXwsJGJOEkRYj44FKaSoqJ5VbQlNHCUXUsY18JmpbzDNONoCx250jJ5tLwX728L46RWKXv75crFXvHoeNjQHNkkW2SHeOSAHJEzck6qhJNH8kxeyKvz5Lw5H87nYHTCGe5skBE43z8S7qm2</latexit>
<latexit sha1_base64="YizidxFdsbfT/jVy2d1FiOCsByE=">AAACHXicbVDLTgJBEJzFF+IL9ehlFEzwINnloB6JXjxiIo8ECJkdemHC7OxmpteEEM5+hl/gVb/Am/Fq/AD/w+FxELSSTipV3enu8mMpDLrul5NaWV1b30hvZra2d3b3svsHNRMlmkOVRzLSDZ8ZkEJBFQVKaMQaWOhLqPuDm4lffwBtRKTucRhDO2Q9JQLBGVqpkz0uJEogzYs8jQKKfaD5FkiZP8c+lWwI+qyTzblFdwr6l3hzkiNzVDrZ71Y34kkICrlkxjQ9N8b2iGkUXMI400oMxIwPWA+alioWgmmPpq+M6alVujSItC2FdKr+nhix0Jhh6NvOkGHfLHsT8T+vmWBw1R4JFScIis8WBYmkGNFJLrQrNHCUQ0sY18LeSnmfacbRprewpWsmp41tLt5yCn9JrVT0Loqlu1KufD1PKE2OyAkpEI9ckjK5JRVSJZw8kmfyQl6dJ+fNeXc+Zq0pZz5zSBbgfP4AM3ag5g==</latexit>
G ! i-th
suresithat the row
combination (unit
of matrixofGthe i of the
-scaled `-th layer)
activation function and a Scaled Weight Standardized layer is variance preserving.
<latexit sha1_base64="lwHYEf7JSy8GrHdeI0RIjnxvDsE=">AAACG3icbVC7TgJBFJ3FF+ILtbRwIjGxIrsUakm0scREHgkQMjtcYMLs7GbmLpFstvQz/AJb/QI7Y2vhB/gfDo9CwJNMcnLOPbl3jh9JYdB1v53M2vrG5lZ2O7ezu7d/kD88qpkw1hyqPJShbvjMgBQKqihQQiPSwAJfQt0f3k78+gi0EaF6wHEE7YD1legJztBKnfxpC+ERE2HoiGnBFAdq8wb0SKh+Mc118gW36E5BV4k3JwUyR6WT/2l1Qx4HoJBLZkzTcyNsJ0yj4BLSXCs2EDE+ZH1oWqpYAKadTD+S0nOrdGkv1PYppFP1byJhgTHjwLeTAcOBWfYm4n9eM8bedTsRKooRFJ8t6sWSYkgnrdCu0MBRji1hXAt7K+UDphlH293Clq6ZnJbaXrzlFlZJrVT0Loul+1KhfDNvKEtOyBm5IB65ImVyRyqkSjh5Ii/klbw5z8678+F8zkYzzjxzTBbgfP0CXB2iTg==</latexit>
Brock, Andrew, et al. "High-performance large-scale image recognition without normalization." arXiv preprint arXiv:2102.06171 (2021).
A ConvNet for the 2020s
Liu, Zhuang, et al. "A ConvNet for the 2020s." arXiv preprint arXiv:2201.03545 (2022).
Questions?
YouTube Playlist