Professional Documents
Culture Documents
Zip-Nerf: Anti-Aliased Grid-Based Neural Radiance Fields
Zip-Nerf: Anti-Aliased Grid-Based Neural Radiance Fields
Jonathan T. Barron Ben Mildenhall Dor Verbin Pratul P. Srinivasan Peter Hedman
Google Research
arXiv:2304.06706v1 [cs.CV] 13 Apr 2023
(a) Ground Truth (b) Our Model (c) mip-NeRF 360 [4] (d) iNGP [23] (e) mip-NeRF 360 + iNGP
Figure 1: (a) Test-set images from the mip-NeRF 360 dataset [4] with renderings from (b) our model and (c, d, e) three
state-of-the-art baselines. Our model accurately recovers thin structures and finely detailed foliage, while the baselines either
oversmooth or exhibit aliasing in the form of jaggies and missing scene content. PSNR values for each patch are inset.
Abstract such that novel views of that scene can be rendered via ray-
tracing [21]. NeRF has proven to be an effective tool for
Neural Radiance Field training can be accelerated tasks such as view synthesis [19], generative media [27],
through the use of grid-based representations in NeRF’s robotics [35], and computational photography [20].
learned mapping from spatial coordinates to colors and vol- The original NeRF model used a multilayer perceptron
umetric density. However, these grid-based approaches lack (MLP) to parameterize the mapping from spatial coordi-
an explicit understanding of scale and therefore often in- nates to colors and densities. Though compact and expres-
troduce aliasing, usually in the form of jaggies or missing sive, MLPs are slow to train, and recent work has acceler-
scene content. Anti-aliasing has previously been addressed ated training by replacing or augmenting MLPs with voxel-
by mip-NeRF 360, which reasons about sub-volumes along grid-like datastructures [8, 16, 29, 37]. One example is In-
a cone rather than points along a ray, but this approach is stant NGP (iNGP), which uses a pyramid of coarse and fine
not natively compatible with current grid-based techniques. grids (the finest of which are stored using a hash map) to
We show how ideas from rendering and signal processing construct learned features that are processed by a tiny MLP,
can be used to construct a technique that combines mip- enabling greatly accelerated training [23].
NeRF 360 and grid-based models such as Instant NGP to In addition to being slow, the original NeRF model was
yield error rates that are 8% – 76% lower than either prior also aliased: NeRF reasons about individual points along a
technique, and that trains 22× faster than mip-NeRF 360. ray, which results in “jaggies” in rendered images and lim-
its NeRFs ability to reason about scale. Mip-NeRF [3] re-
In Neural Radiance Fields (NeRF), a neural network is solved this issue by casting cones instead of rays, and by
trained to model a volumetric representation of a 3D scene featurizing the entire volume within a conical frustum for
1
Method
Fine ←→ Coarse
(a) Input / Baseline, 14.6 dB (b) Ground-truth (c) Downweighting, 24.3 dB (d) Multisampling, 25.2 dB (e) Our Approach, 26.5 dB
Figure 2: Here we show a toy 1-dimensional iNGP [23] with 1 feature per scale. Each subplot represents a different strategy
for querying the NGP at all coordinates along the x axis — imagine a Gaussian moving left to right, where each line is the
NGP feature for each coordinate, and where each color is a different scale in the iNGP. (a) The naive solution of querying
the Gaussian’s mean results in features with piecewise-linear kinks, where the high frequencies past the bandwidth of the
Gaussian are large and inaccurate. (b) The true solution, obtained by convolving the NGP features with a Gaussian — an
intractable solution in practice — results in coarse features that are smooth but informative and fine features that are near
0. (c) We can suppress unreliable high frequencies by downweighing them based on the scale of the Gaussian (color bands
behind each feature indicate the downweighting), but this results in unnaturally sharp discontinuities in coarse features. (d)
Alternatively, supersampling produces reasonable coarse scales features but erratic fine-scale features. (e) We therefore
multisample isotropic sub-Gaussians (5 shown here) and use each sub-Gaussian’s scale to downweight frequencies.
use as input to the MLP. Mip-NeRF and its successor mip- edly casting rays corresponding to pixels in training im-
NeRF 360 [4] showed that this approach enables highly ac- ages and minimizing (via gradient descent) the difference
curate rendering on challenging real-world scenes. between each pixel’s rendered and observed colors.
Regrettably, the progress made on these two problems Mip-NeRF 360 and iNGP differ significantly in how co-
of fast training and anti-aliasing are, at first glance, incom- ordinates along a ray are parameterized. In mip-NeRF 360,
patible with each other. This is because mip-NeRF’s anti- a ray is subdivided into a set of intervals [ti , ti+1 ], each of
aliasing strategy depends critically on the use of positional which represents a conical frustum whose shape is approx-
encoding [30, 32] to featurize a conical frustum into a dis- imated with a multivariate Gaussian, and the expected po-
crete feature vector, but current grid-based approaches do sitional encoding with respect to that Gaussian is used as
not use positional encoding, and instead use learned fea- input to a large MLP [3]. In contrast, iNGP trilinearly in-
tures that are obtained by interpolating into a hierarchy of terpolates into a hierarchy of differently-sized 3D grids to
grids at a single 3D coordinate. Though anti-aliasing is a produce feature vectors for a small MLP [23]. Our model
well-studied problem in rendering [9, 10, 28, 33], most ap- combines mip-NeRF 360’s overall framework with iNGP’s
proaches do not generalize naturally to inverse-rendering in featurization approach, but naively combining these two ap-
grid-based NeRF models like iNGP. proaches introduces two forms of aliasing:
In this work, we leverage ideas from multisampling,
statistics, and signal processing to integrate iNGP’s pyra-
mid of grids into mip-NeRF 360’s framework. We call our 1. Instant NGP’s feature grid approach is incompati-
model “Zip-NeRF” due to its speed, its similarity with mip- ble with mip-NeRF 360’s scale-aware integrated po-
NeRF, and its ability to fix zipper-like aliasing artifacts. On sitional encoding technique, so the features produced
the mip-NeRF 360 benchmark [4], Zip-NeRF reduces er- by iNGP are aliased with respect to spatial coordinates
ror rates by as much as 18% and trains 22× faster than the and thus produce aliased renderings. In Section 2, we
previous state-of-the-art. On our multiscale variant of that address this by introducing a multisampling-like solu-
benchmark, which more thoroughly measures aliasing and tion for computing prefiltered iNGP features.
scale, Zip-NeRF reduces error rates by as much as 76%.
3 (ŝ, ŵ)
ŝ4
<latexit sha1_base64="mJAJF8H85XHxWIqwbxSM/ZUaozE=">AAAB8XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr4OHgBePEcwDkyXMTmaTIbOzy0yvEJb8hRcPinj1b7z5N06SPWhiQUNR1U13V5BIYdB1v53Cyura+kZxs7S1vbO7V94/aJo41Yw3WCxj3Q6o4VIo3kCBkrcTzWkUSN4KRrdTv/XEtRGxesBxwv2IDpQIBaNopcfukGJmJr3zUq9ccavuDGSZeDmpQI56r/zV7ccsjbhCJqkxHc9N0M+oRsEkn5S6qeEJZSM64B1LFY248bPZxRNyYpU+CWNtSyGZqb8nMhoZM44C2xlRHJpFbyr+53VSDK/9TKgkRa7YfFGYSoIxmb5P+kJzhnJsCWVa2FsJG1JNGdqQpiF4iy8vk+ZZ1busXtyfV2o3eRxFOIJjOAUPrqAGd1CHBjBQ8Ayv8OYY58V5dz7mrQUnnzmEP3A+fwAQ+5CD</latexit>
✓
<latexit sha1_base64="HL3VhAB6eOSHo6ZWwY5YuD5BPJs=">AAACOnicbZDLSsNAFIYnXmu9VV26mVoEBS2JeKOrohuXCrYKSSiT6aQdOrkwc6KUkOdy41O4c+HGhSJufQCnbcR6OTDw8/3nMOf8Xiy4AtN8NCYmp6ZnZgtzxfmFxaXl0spqU0WJpKxBIxHJa48oJnjIGsBBsOtYMhJ4gl15vdOBf3XDpOJReAn9mLkB6YTc55SARq3ShSOYD1tOQKDr+anKdrDjS0LTL3KbZem3a1tOuebiXTyGak5513Iz7Eje6cJ2q1Qxq+aw8F9h5aKC8jpvlR6cdkSTgIVABVHKtswY3JRI4FSwrOgkisWE9kiH2VqGJGDKTYenZ3hTkzb2I6lfCHhIxydSEijVDzzdOdhY/fYG8D/PTsA/dlMexgmwkI4+8hOBIcKDHHGbS0ZB9LUgVHK9K6ZdopMDnXZRh2D9PvmvaO5VrcPqwcV+pX6Sx1FA62gDbSELHaE6OkPnqIEoukNP6AW9GvfGs/FmvI9aJ4x8Zg39KOPjE4A4rcc=</latexit>
◆ 2r
<latexit sha1_base64="ElFwZwcJpQMP+f17IEdaM6tQ8Tg=">AAAB6XicbVDLSgNBEOz1GeMr6tHLYBA8hd3g6xj04jGKeUCyhNnJbDJkdnaZ6RXCkj/w4kERr/6RN//GSbIHTSxoKKq66e4KEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LR7dRvPXFtRKwecZxwP6IDJULBKFrpoap7pbJbcWcgy8TLSRly1Hulr24/ZmnEFTJJjel4boJ+RjUKJvmk2E0NTygb0QHvWKpoxI2fzS6dkFOr9EkYa1sKyUz9PZHRyJhxFNjOiOLQLHpT8T+vk2J47WdCJSlyxeaLwlQSjMn0bdIXmjOUY0so08LeStiQasrQhlO0IXiLLy+TZrXiXVYu7s/LtZs8jgIcwwmcgQdXUIM7qEMDGITwDK/w5oycF+fd+Zi3rjj5zBH8gfP5A1FOjTw=</latexit>
✓ ◆
w
<latexit sha1_base64="D+2JNFsj69/cfVGr3T4x1L7+dRE=">AAACW3icbVFdSxtBFJ1da43RtrHiky+TBkGhht3SL3wK+uKjhUaFzBJmJ3eTwdmZZeauJSz7J/vUPvhXxNkkpVZ7YOBwzr3ce8+khZIOo+h3EK69WH+50dpsb22/ev2ms/P20pnSChgKo4y9TrkDJTUMUaKC68ICz1MFV+nNWeNf3YJ10ujvOC8gyflUy0wKjl4adywzBViOxmqeQ5Wq0rKxQyiyUtesyxRkeMhyjrM0q1z9nrLMclH9UX7UdfXXHcWse5LQY/pIOmHd4zjxjZYyK6czPBp3elE/WoA+J/GK9MgKF+POTzYxosxBo1DcuVEcFZhU3KIUCuo2Kx0UXNzwKYw8bQ5xSbXIpqYHXpnQzFj/NNKF+rij4rlz8zz1lc3S7qnXiP/zRiVmX5NK6qJE0GI5KCsVRUOboOlEWhCo5p5wYaXflYoZ9+Gh/462DyF+evJzcvmhH3/uf/r2sTc4XcXRIvvkHTkkMflCBuScXJAhEeQXuQ82glZwF66F7XB7WRoGq55d8g/CvQeJyLYo</latexit>
{
<latexit sha1_base64="51SVEuZzmwx389eTfTBO2KyTApY=">AAACAXicbZDLSsNAFIZP6q3WW9SN4GawCBWkJOJtWXTjsoK9QBvKZDpph04uzEyUEuLGV3HjQhG3voU738ZJG0GrPwx8/Occ5pzfjTiTyrI+jcLc/MLiUnG5tLK6tr5hbm41ZRgLQhsk5KFou1hSzgLaUExx2o4Exb7LacsdXWb11i0VkoXBjRpH1PHxIGAeI1hpq2fuVLo+VkPXS2R6iL75Lj3omWWrak2E/oKdQxly1XvmR7cfktingSIcS9mxrUg5CRaKEU7TUjeWNMJkhAe0ozHAPpVOMrkgRfva6SMvFPoFCk3cnxMJ9qUc+67uzFaUs7XM/K/WiZV37iQsiGJFAzL9yIs5UiHK4kB9JihRfKwBE8H0rogMscBE6dBKOgR79uS/0Dyq2qfVk+vjcu0ij6MIu7AHFbDhDGpwBXVoAIF7eIRneDEejCfj1XibthaMfGYbfsl4/wInP5a4</latexit>
<latexit sha1_base64="cLQivWeqZO4f7Lgkr8FxahHpZLU=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8co5gHJEmYnvcmQ2dllZlYIS/7AiwdFvPpH3vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WjGSfoR3QgecgZNVZ66Ga9csWtujOQZeLlpAI56r3yV7cfszRCaZigWnc8NzF+RpXhTOCk1E01JpSN6AA7lkoaofaz2aUTcmKVPgljZUsaMlN/T2Q00nocBbYzomaoF72p+J/XSU147WdcJqlByeaLwlQQE5Pp26TPFTIjxpZQpri9lbAhVZQZG07JhuAtvrxMmmdV77J6cX9eqd3kcRThCI7hFDy4ghrcQR0awCCEZ3iFN2fkvDjvzse8teDkM4fwB87nD57EjW8=</latexit>
<latexit sha1_base64="8bOftAy+FZki90aOa/mY5U3sd2w=">AAACAXicbVDLSgNBEJz1GeNr1YvgZTAInsKu+DoGvXiMYB6QLGF20kmGzM4sM7NCXOLFX/HiQRGv/oU3/8bZZA+aWNBQVHXT3RXGnGnjed/OwuLS8spqYa24vrG5te3u7Na1TBSFGpVcqmZINHAmoGaY4dCMFZAo5NAIh9eZ37gHpZkUd2YUQxCRvmA9RomxUsfdb8sYFDFSCRJBKqSKCGcPMO64Ja/sTYDniZ+TEspR7bhf7a6kSQTCUE60bvlebIKUKMMoh3GxnWiICR2SPrQszdbpIJ18MMZHVuninlS2hMET9fdESiKtR1FoOyNiBnrWy8T/vFZiepdBykScGBB0uqiXcGwkzuLAXaaAGj6yhFDF7K2YDogi1NjQijYEf/bleVI/Kfvn5bPb01LlKo+jgA7QITpGPrpAFXSDqqiGKHpEz+gVvTlPzovz7nxMWxecfGYP/YHz+QPnmZfX</latexit>
w3ŝ
<latexit sha1_base64="qyhsryyArh73vwl0qQvsbVlTWfk=">AAACAHicbVC5TsNAEB2HK4TLQEFBsyJCoops7jKChjJI5JBiY60362SV9aHdNSiy3PArNBQgRMtn0PE3rJMUkPCkkZ7em9HMPD/hTCrL+jZKC4tLyyvl1cra+sbmlrm905JxKghtkpjHouNjSTmLaFMxxWknERSHPqdtf3hd+O0HKiSLozs1Sqgb4n7EAkaw0pJn7j16J/eZM8Aqc0KsBn6QyTzPK55ZtWrWGGie2FNShSkanvnl9GKShjRShGMpu7aVKDfDQjHCaV5xUkkTTIa4T7uaRjik0s3GD+ToUCs9FMRCV6TQWP09keFQylHo687iSDnrFeJ/XjdVwaWbsShJFY3IZFGQcqRiVKSBekxQovhIE0wE07ciMsACE6UzK0KwZ1+eJ63jmn1eO7s9rdavpnGUYR8O4AhsuIA63EADmkAgh2d4hTfjyXgx3o2PSWvJmM7swh8Ynz8w2ZbO</latexit>
4 L
<latexit sha1_base64="5+nyS8hU2hxnqdw8c1v+IWfwmYw=">AAAB8nicbVDLSgMxFM3UV62vqks3wSK4KjPia1l048JFBfuA6VAyaaYNzSRDckcoQz/DjQtF3Po17vwbM+0stPVA4HDOveTcEyaCG3Ddb6e0srq2vlHerGxt7+zuVfcP2kalmrIWVULpbkgME1yyFnAQrJtoRuJQsE44vs39zhPThiv5CJOEBTEZSh5xSsBKfi8mMKJEZPfTfrXm1t0Z8DLxClJDBZr96ldvoGgaMwlUEGN8z00gyIgGTgWbVnqpYQmhYzJkvqWSxMwE2SzyFJ9YZYAjpe2TgGfq742MxMZM4tBO5hHNopeL/3l+CtF1kHGZpMAknX8UpQKDwvn9eMA1oyAmlhCquc2K6YhoQsG2VLEleIsnL5P2Wd27rF88nNcaN0UdZXSEjtEp8tAVaqA71EQtRJFCz+gVvTngvDjvzsd8tOQUO4foD5zPH4RFkWw=</latexit>
{ (ŝ, wŝ )
<latexit sha1_base64="98nHrgUXD728Tds6HcenmRQ8/YM=">AAACG3icbVDLSsNAFJ3UV62vqEs3g0WoICUpvpZFNy4r2Ac0sUymk3boZBJmJkoJ+Q83/oobF4q4Elz4N07aCNp6YODMOfdy7z1exKhUlvVlFBYWl5ZXiqultfWNzS1ze6clw1hg0sQhC0XHQ5IwyklTUcVIJxIEBR4jbW90mfntOyIkDfmNGkfEDdCAU59ipLTUM2sVZ4hU4gRIDT0/kWl6BH8+9+ltMuumh6WeWbaq1gRwntg5KYMcjZ754fRDHAeEK8yQlF3bipSbIKEoZiQtObEkEcIjNCBdTTkKiHSTyW0pPNBKH/qh0I8rOFF/dyQokHIceLoy21LOepn4n9eNlX/uJpRHsSIcTwf5MYMqhFlQsE8FwYqNNUFYUL0rxEMkEFY6ziwEe/bkedKqVe3T6sn1cbl+kcdRBHtgH1SADc5AHVyBBmgCDB7AE3gBr8aj8Wy8Ge/T0oKR9+yCPzA+vwGzYaJ3</latexit>
⇤
Z
r 5
=
<latexit sha1_base64="6JvA5TXxlzx1N/3Csk8aiB6p524=">AAAB63icbVBNS8NAEJ3Ur1q/qh69BIvgqSTi17HoxWMFawttKJvtpl26uwm7E6GE/gUvHhTx6h/y5r9x0+agrQ8GHu/NMDMvTAQ36HnfTmlldW19o7xZ2dre2d2r7h88mjjVlLVoLGLdCYlhgivWQo6CdRLNiAwFa4fj29xvPzFteKwecJKwQJKh4hGnBHOpxxX2qzWv7s3gLhO/IDUo0OxXv3qDmKaSKaSCGNP1vQSDjGjkVLBppZcalhA6JkPWtVQRyUyQzW6duidWGbhRrG0pdGfq74mMSGMmMrSdkuDILHq5+J/XTTG6DjKukhSZovNFUSpcjN38cXfANaMoJpYQqrm91aUjoglFG0/FhuAvvrxMHs/q/mX94v681rgp4ijDERzDKfhwBQ24gya0gMIInuEV3hzpvDjvzse8teQUM4fwB87nDyXojlM=</latexit>
w2ŝ
<latexit sha1_base64="cLQivWeqZO4f7Lgkr8FxahHpZLU=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8co5gHJEmYnvcmQ2dllZlYIS/7AiwdFvPpH3vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WjGSfoR3QgecgZNVZ66Ga9csWtujOQZeLlpAI56r3yV7cfszRCaZigWnc8NzF+RpXhTOCk1E01JpSN6AA7lkoaofaz2aUTcmKVPgljZUsaMlN/T2Q00nocBbYzomaoF72p+J/XSU147WdcJqlByeaLwlQQE5Pp26TPFTIjxpZQpri9lbAhVZQZG07JhuAtvrxMmmdV77J6cX9eqd3kcRThCI7hFDy4ghrcQR0awCCEZ3iFN2fkvDjvzse8teDkM4fwB87nD57EjW8=</latexit>
<latexit sha1_base64="zzmFqwS9z+wu2prYEc0HVVJrEq4=">AAACAHicbVC7TsNAEDyHVwgvAwUFjUWERBXZEa8ygoYySOQhxSY6X87JKeezdbcGRZYbfoWGAoRo+Qw6/oZz4gISRlppNLOr3R0/5kyBbX8bpaXlldW18nplY3Nre8fc3WurKJGEtkjEI9n1saKcCdoCBpx2Y0lx6HPa8cfXud95oFKxSNzBJKZeiIeCBYxg0FLfPHjs1+9Td4QhdUMMIz9IVZZllb5ZtWv2FNYicQpSRQWaffPLHUQkCakAwrFSPceOwUuxBEY4zSpuomiMyRgPaU9TgUOqvHT6QGYda2VgBZHUJcCaqr8nUhwqNQl93Zkfqea9XPzP6yUQXHopE3ECVJDZoiDhFkRWnoY1YJIS4BNNMJFM32qREZaYgM4sD8GZf3mRtOs157x2dntabVwVcZTRITpCJ8hBF6iBblATtRBBGXpGr+jNeDJejHfjY9ZaMoqZffQHxucPL0GWzQ==</latexit>
<latexit sha1_base64="SoIi+tikGdI+W5/sBdimPmJGRH4=">AAAB6HicbVDLSgNBEOz1GeMr6tHLYBA8hV3xdQx68ZiAeUCyhNlJbzJmdnaZmRXCki/w4kERr36SN//GSbIHTSxoKKq66e4KEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6m/qtJ1Sax/LBjBP0IzqQPOSMGivVVa9UdivuDGSZeDkpQ45ar/TV7ccsjVAaJqjWHc9NjJ9RZTgTOCl2U40JZSM6wI6lkkao/Wx26IScWqVPwljZkobM1N8TGY20HkeB7YyoGepFbyr+53VSE974GZdJalCy+aIwFcTEZPo16XOFzIixJZQpbm8lbEgVZcZmU7QheIsvL5PmecW7qlzWL8rV2zyOAhzDCZyBB9dQhXuoQQMYIDzDK7w5j86L8+58zFtXnHzmCP7A+fwB4GGNAA==</latexit>
<latexit sha1_base64="oe4MqwDkzrgs+J28FBixt9wQ/jI=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoMgHsKu+DoGvXhMwDwgCWF20puMmZ1dZmaFsOQLvHhQxKuf5M2/cZLsQRMLGoqqbrq7/FhwbVz328mtrK6tb+Q3C1vbO7t7xf2Dho4SxbDOIhGplk81Ci6xbrgR2IoV0tAX2PRHd1O/+YRK80g+mHGM3ZAOJA84o8ZKtbNeseSW3RnIMvEyUoIM1V7xq9OPWBKiNExQrdueG5tuSpXhTOCk0Ek0xpSN6ADblkoaou6ms0Mn5MQqfRJEypY0ZKb+nkhpqPU49G1nSM1QL3pT8T+vnZjgpptyGScGJZsvChJBTESmX5M+V8iMGFtCmeL2VsKGVFFmbDYFG4K3+PIyaZyXvavyZe2iVLnN4sjDERzDKXhwDRW4hyrUgQHCM7zCm/PovDjvzse8NedkM4fwB87nD3NBjLg=</latexit>
ŝ2
<latexit sha1_base64="zKSdr5/8qBV7SNALd3+OARfa6y8=">AAAB8XicbVDLSgNBEOz1GeMr6tHLYBA8hd3g6+Ah4MVjBPPAZAmzk9lkyOzsMtMrhCV/4cWDIl79G2/+jZNkD5pY0FBUddPdFSRSGHTdb2dldW19Y7OwVdze2d3bLx0cNk2casYbLJaxbgfUcCkUb6BAyduJ5jQKJG8Fo9up33ri2ohYPeA44X5EB0qEglG00mN3SDEzk1612CuV3Yo7A1kmXk7KkKPeK311+zFLI66QSWpMx3MT9DOqUTDJJ8VuanhC2YgOeMdSRSNu/Gx28YScWqVPwljbUkhm6u+JjEbGjKPAdkYUh2bRm4r/eZ0Uw2s/EypJkSs2XxSmkmBMpu+TvtCcoRxbQpkW9lbChlRThjakaQje4svLpFmteJeVi/vzcu0mj6MAx3ACZ+DBFdTgDurQAAYKnuEV3hzjvDjvzse8dcXJZ47gD5zPHw3xkIE=</latexit>
<latexit sha1_base64="WaL0tWjltrkuxlqV+KuqJEpv8BI=">AAAB6HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr4sQ9OIxAfOAZAmzk04yZnZ2mZkVwpIv8OJBEa9+kjf/xkmyB00saCiquunuCmLBtXHdbye3srq2vpHfLGxt7+zuFfcPGjpKFMM6i0SkWgHVKLjEuuFGYCtWSMNAYDMY3U395hMqzSP5YMYx+iEdSN7njBor1W66xZJbdmcgy8TLSAkyVLvFr04vYkmI0jBBtW57bmz8lCrDmcBJoZNojCkb0QG2LZU0RO2ns0Mn5MQqPdKPlC1pyEz9PZHSUOtxGNjOkJqhXvSm4n9eOzH9az/lMk4MSjZf1E8EMRGZfk16XCEzYmwJZYrbWwkbUkWZsdkUbAje4svLpHFW9i7LF7XzUuU2iyMPR3AMp+DBFVTgHqpQBwYIz/AKb86j8+K8Ox/z1pyTzRzCHzifP5ANjMs=</latexit>
{
<latexit sha1_base64="cLQivWeqZO4f7Lgkr8FxahHpZLU=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8co5gHJEmYnvcmQ2dllZlYIS/7AiwdFvPpH3vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WjGSfoR3QgecgZNVZ66Ga9csWtujOQZeLlpAI56r3yV7cfszRCaZigWnc8NzF+RpXhTOCk1E01JpSN6AA7lkoaofaz2aUTcmKVPgljZUsaMlN/T2Q00nocBbYzomaoF72p+J/XSU147WdcJqlByeaLwlQQE5Pp26TPFTIjxpZQpri9lbAhVZQZG07JhuAtvrxMmmdV77J6cX9eqd3kcRThCI7hFDy4ghrcQR0awCCEZ3iFN2fkvDjvzse8teDkM4fwB87nD57EjW8=</latexit>
<latexit sha1_base64="cLQivWeqZO4f7Lgkr8FxahHpZLU=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8co5gHJEmYnvcmQ2dllZlYIS/7AiwdFvPpH3vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WjGSfoR3QgecgZNVZ66Ga9csWtujOQZeLlpAI56r3yV7cfszRCaZigWnc8NzF+RpXhTOCk1E01JpSN6AA7lkoaofaz2aUTcmKVPgljZUsaMlN/T2Q00nocBbYzomaoF72p+J/XSU147WdcJqlByeaLwlQQE5Pp26TPFTIjxpZQpri9lbAhVZQZG07JhuAtvrxMmmdV77J6cX9eqd3kcRThCI7hFDy4ghrcQR0awCCEZ3iFN2fkvDjvzse8teDkM4fwB87nD57EjW8=</latexit>
w1ŝ
<latexit sha1_base64="i7CP01t57e2dQTbsPr+k8c81hJE=">AAACAHicbVC7TsNAEDzzDOFloKCgsYiQqCIb8SojaCiDRB5SbKzz5Zyccj5bd2tQZLnhV2goQIiWz6DjbzgnLiBhpJVGM7va3QkSzhTY9rexsLi0vLJaWauub2xubZs7u20Vp5LQFol5LLsBVpQzQVvAgNNuIimOAk47wei68DsPVCoWizsYJ9SL8ECwkBEMWvLN/Uffuc/cIYbMjTAMgzBTeZ5XfbNm1+0JrHnilKSGSjR988vtxySNqADCsVI9x07Ay7AERjjNq26qaILJCA9oT1OBI6q8bPJAbh1ppW+FsdQlwJqovycyHCk1jgLdWRypZr1C/M/rpRBeehkTSQpUkOmiMOUWxFaRhtVnkhLgY00wkUzfapEhlpiAzqwIwZl9eZ60T+rOef3s9rTWuCrjqKADdIiOkYMuUAPdoCZqIYJy9Ixe0ZvxZLwY78bHtHXBKGf20B8Ynz8tqZbM</latexit>
{
ŝ1
<latexit sha1_base64="AqtG3r6csEMOzLVhnDXHdYrTsfk=">AAAB8XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr4OHgBePEcwDkyXMTibJkNnZZaZXCEv+wosHRbz6N978G2eTPWhiQUNR1U13VxBLYdB1v53Cyura+kZxs7S1vbO7V94/aJoo0Yw3WCQj3Q6o4VIo3kCBkrdjzWkYSN4KxreZ33ri2ohIPeAk5n5Ih0oMBKNopcfuiGJqpj2v1CtX3Ko7A1kmXk4qkKPeK391+xFLQq6QSWpMx3Nj9FOqUTDJp6VuYnhM2ZgOecdSRUNu/HR28ZScWKVPBpG2pZDM1N8TKQ2NmYSB7Qwpjsyil4n/eZ0EB9d+KlScIFdsvmiQSIIRyd4nfaE5QzmxhDIt7K2EjaimDG1IWQje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQwUPMMrvDnGeXHenY95a8HJZw7hD5zPHwxskIA=</latexit>
<latexit sha1_base64="rMnlpw3aFZfWPvu4TlLKikZv6AM=">AAACGnicbVDLSsNAFJ34rPUVdekmWIR2UxLxtSy6cVnBPqANZTK5aYdOMmHmRiil3+HGX3HjQhF34sa/cfpY2NYDA4dz7uXOOUEquEbX/bFWVtfWNzZzW/ntnd29ffvgsK5lphjUmBRSNQOqQfAEashRQDNVQONAQCPo3479xiMozWXygIMU/Jh2Ex5xRtFIHdtryxQURakSGsMw5FE0Ks5rPEFQqVFZKLFU6tgFt+xO4CwTb0YKZIZqx/5qh5JlMSTIBNW65bkp+kOqkDMBo3w705BS1qddaBk6vqn94STayDk1SuhEUpmXoDNR/24Maaz1IA7MZEyxpxe9sfif18owuvZNtjRDSNj0UJQJB6Uz7skJuQKGYmAIZYqbvzqsRxVlpgydNyV4i5GXSf2s7F2WL+7PC5WbWR05ckxOSJF45IpUyB2pkhph5Im8kDfybj1br9aH9TkdXbFmO0dkDtb3LzbKojQ=</latexit>
<latexit sha1_base64="cLQivWeqZO4f7Lgkr8FxahHpZLU=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8co5gHJEmYnvcmQ2dllZlYIS/7AiwdFvPpH3vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WjGSfoR3QgecgZNVZ66Ga9csWtujOQZeLlpAI56r3yV7cfszRCaZigWnc8NzF+RpXhTOCk1E01JpSN6AA7lkoaofaz2aUTcmKVPgljZUsaMlN/T2Q00nocBbYzomaoF72p+J/XSU147WdcJqlByeaLwlQQE5Pp26TPFTIjxpZQpri9lbAhVZQZG07JhuAtvrxMmmdV77J6cX9eqd3kcRThCI7hFDy4ghrcQR0awCCEZ3iFN2fkvDjvzse8teDkM4fwB87nD57EjW8=</latexit>
di↵(interp(·))
w0ŝ
<latexit sha1_base64="eqn7LsKOhO27KUKqPFHzwjk+528=">AAACAHicbVC7TsNAEDzzDOFloKCgsYiQqCIb8SojaCiDRB5SbKzz5Zyccj5bd2tQZLnhV2goQIiWz6DjbzgnLiBhpJVGM7va3QkSzhTY9rexsLi0vLJaWauub2xubZs7u20Vp5LQFol5LLsBVpQzQVvAgNNuIimOAk47wei68DsPVCoWizsYJ9SL8ECwkBEMWvLN/Uffvs/cIYbMjTAMgzBTeZ5XfbNm1+0JrHnilKSGSjR988vtxySNqADCsVI9x07Ay7AERjjNq26qaILJCA9oT1OBI6q8bPJAbh1ppW+FsdQlwJqovycyHCk1jgLdWRypZr1C/M/rpRBeehkTSQpUkOmiMOUWxFaRhtVnkhLgY00wkUzfapEhlpiAzqwIwZl9eZ60T+rOef3s9rTWuCrjqKADdIiOkYMuUAPdoCZqIYJy9Ixe0ZvxZLwY78bHtHXBKGf20B8Ynz8sEZbL</latexit>
ŝ0
<latexit sha1_base64="Lct7ApcCEXj2Tfrgumbjudg5QGs=">AAAB8XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr4OHgBePEcwDkyXMTibJkNnZZaZXCEv+wosHRbz6N978G2eTPWhiQUNR1U13VxBLYdB1v53Cyura+kZxs7S1vbO7V94/aJoo0Yw3WCQj3Q6o4VIo3kCBkrdjzWkYSN4KxreZ33ri2ohIPeAk5n5Ih0oMBKNopcfuiGJqpj231CtX3Ko7A1kmXk4qkKPeK391+xFLQq6QSWpMx3Nj9FOqUTDJp6VuYnhM2ZgOecdSRUNu/HR28ZScWKVPBpG2pZDM1N8TKQ2NmYSB7Qwpjsyil4n/eZ0EB9d+KlScIFdsvmiQSIIRyd4nfaE5QzmxhDIt7K2EjaimDG1IWQje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQwUPMMrvDnGeXHenY95a8HJZw7hD5zPHwrnkH8=</latexit>
<latexit sha1_base64="cLQivWeqZO4f7Lgkr8FxahHpZLU=">AAAB6XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8co5gHJEmYnvcmQ2dllZlYIS/7AiwdFvPpH3vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WjGSfoR3QgecgZNVZ66Ga9csWtujOQZeLlpAI56r3yV7cfszRCaZigWnc8NzF+RpXhTOCk1E01JpSN6AA7lkoaofaz2aUTcmKVPgljZUsaMlN/T2Q00nocBbYzomaoF72p+J/XSU147WdcJqlByeaLwlQQE5Pp26TPFTIjxpZQpri9lbAhVZQZG07JhuAtvrxMmmdV77J6cX9eqd3kcRThCI7hFDy4ghrcQR0awCCEZ3iFN2fkvDjvzse8teDkM4fwB87nD57EjW8=</latexit>
Figure 5: Computing our anti-aliased loss requires that we smooth and resample a NeRF histogram (s, w) into the same set
of endpoints as a proposal histogram (ŝ, ŵ), which we outline here. (1) We divide w by the size of each interval in s to yield a
piecewise constant PDF that integrates to ≤ 1. (2) We convolve that PDF with a rectangular pulse to obtain a piecewise linear
PDF. (3) This PDF is integrated to produce a piecewise-quadratic CDF that is queried via piecewise quadratic interpolation
at each location in ŝ. (4) By taking the difference between adjacent interpolated values we obtain wŝ , which are the NeRF
histogram weights w resampled into the endpoints of the proposal histogram ŝ. (5) After resampling, we evaluate our loss
Lprop as an element-wise function of wŝ and ŵ, as they share a common coordinate space.
Figure 6 for a comparison of both loss functions. val i and yi is the value of the step function inside interval
i. We want to convolve this step function with a rectangu-
Blurring a Step Function To design a loss function that lar pulse with radius r that integrates to 1: [|x| < r]/(2r)
is smooth with respect to distance along the ray, we must where [ ] are Iverson brackets. Observe that the convolu-
first construct a technique for turning a piecewise constant tion of a single interval i of a step function with this rect-
step function into a continuous piecewise-linear function. angular pulse is a piecewise linear trapezoid with knots at
Smoothing a discrete 1D signal is trivial, and requires just (xi − r, 0), (xi + r, yi ), (xi+1 − r, yi ), (xi+1 + r, 0), and
discrete convolution with a box filter (or a Gaussian filter, that a piecewise linear spline is the double integral of scaled
etc.). But this problem becomes difficult when dealing with delta functions located at each spline knot [13]. With this,
piecewise constant functions whose endpoints are continu- and with the fact that summation commutes with integra-
ous, as the endpoints of each interval in the step function tion, we can efficiently convolve a step function with a
may be at any location. Rasterizing the step function and rectangular pulse as follows: We turn each endpoint xi of
performing convolution is not a viable solution, as this fails the step function into two signed delta functions located
in the common case of extremely narrow histogram bins. at xi − r and xi + r with values that are proportional to
the change between adjacent y values, we interleave (via
A new algorithm is therefore required. Consider a step
sorting) those delta functions, and we then integrate those
function (x, y) where (xi , xi+1 ) are the endpoints of inter-
sorted delta functions twice (see Algorithm 1 of the sup-
plement for pseudocode). With this, we can construct our
anti-aliased loss function.
Proposal w
3 3 1/t, tnear = 0.1 ues, as coarse scales are penalized by orders of magnitude
1/t, tnear = 1
(x, )
0.6 Ours, (2t, -1.5) more than fine scales. This simple trick is extremely ef-
2 2
0
-1
fective — it improves performance greatly compared to no
1 -
-4
1 weight decay, and significantly outperforms naive weight
decay. We use a loss multiplier of 0.1 on this normalized
0 0
0 1 2 3 4 5 0.0 0.2 0.4 0.6 0.8 1.0 weight decay in all experiments.
x Normalized Distance s
Figure 7: Many NeRF approaches require a function for 360 Dataset Since our model is an extension of mip-
converting metric distance t ∈ [0, ∞) into normalized dis- NeRF 360, we evaluate on the “360” benchmark presented
tance s ∈ [0, 1]. Left: Our power transformation P(x, λ) in that paper using the same evaluation procedure [4]. We
lets us interpolate between common curves such as linear, evaluate NeRF [21], mip-NeRF [3], NeRF++ [38], mip-
logarithmic, and inverse by modifying λ, while retaining a NeRF 360, iNGP [23] (using carefully tuned hyperparame-
linear-like shape near the origin. Right: This lets us con- ters taken from recent work [34]), and a baseline in which
struct a curve that transitions from linear to inverse/inverse- we naively combine mip-NeRF 360 and iNGP without any
square, and supports scene content close to the camera. of this paper’s contributions. Average error metrics across
Scale Factor: 1× 2× 4× 8×
Error Metric: PSNR ↑ SSIM ↑ LPIPS ↓ PSNR ↑ SSIM ↑ LPIPS ↓ PSNR ↑ SSIM ↑ LPIPS ↓ PSNR ↑ SSIM ↑ LPIPS ↓ Time (hrs)
Instant NGP [23, 34] 24.36 0.642 0.366 25.23 0.712 0.251 26.84 0.809 0.142 28.42 0.877 0.092 0.15
mip-NeRF 360 [4, 22] 27.47 0.779 0.254 29.15 0.864 0.136 30.41 0.912 0.077 30.83 0.931 0.059 21.97
mip-NeRF 360 + iNGP 26.53 0.774 0.252 27.99 0.856 0.140 27.72 0.867 0.115 25.61 0.804 0.159 0.32
Our Model 28.24 0.822 0.201 29.97 0.891 0.099 31.51 0.932 0.056 32.44 0.954 0.038 0.97
A) Naive Sampling 27.90 0.796 0.234 29.67 0.880 0.115 29.20 0.887 0.095 26.51 0.820 0.144 0.56
B) Naive Supersampling (7×) 27.40 0.803 0.223 28.94 0.881 0.109 28.35 0.881 0.097 25.94 0.810 0.151 3.04
C) Jittered 27.91 0.797 0.233 29.60 0.879 0.116 29.45 0.893 0.090 27.58 0.855 0.120 0.55
D) Jittered Supersampling (7×) 27.50 0.810 0.212 28.99 0.884 0.105 28.91 0.896 0.086 27.65 0.870 0.109 3.04
E) No Multisampling 28.13 0.816 0.210 29.83 0.886 0.106 31.29 0.927 0.061 32.06 0.947 0.043 0.56
F) No Downweighting 28.20 0.817 0.208 29.91 0.888 0.103 31.05 0.926 0.062 31.17 0.941 0.051 0.93
G) No Appended Scale ω 28.21 0.819 0.205 29.91 0.889 0.102 31.30 0.929 0.059 31.92 0.949 0.043 0.95
H) Random Multisampling 28.21 0.820 0.204 29.93 0.890 0.102 31.45 0.932 0.058 32.31 0.953 0.040 1.04
I) Unscented Multisampling 28.21 0.821 0.200 29.94 0.891 0.100 31.50 0.932 0.056 32.40 0.953 0.039 1.12
J) No New Interlevel Loss 28.04 0.823 0.198 29.73 0.892 0.099 31.18 0.932 0.057 32.01 0.952 0.040 0.94
K) No Weight Decay 26.64 0.805 0.214 28.19 0.871 0.120 29.53 0.908 0.081 30.44 0.927 0.064 0.96
L) Un-Normalized Weight Decay 27.91 0.820 0.199 29.55 0.888 0.102 30.98 0.929 0.059 31.93 0.950 0.041 0.97
M) Small View-Dependent MLP 27.38 0.811 0.210 28.95 0.882 0.109 30.26 0.923 0.065 31.01 0.944 0.048 0.68
Table 1: Performance on our multiscale version of the 360 dataset [4], where we train and evaluate on multiscale images [3].
Red, orange, and yellow highlights indicate the 1st, 2nd, and 3rd-best performing technique for each metric. Our model
significantly outperforms our two baselines — especially the iNGP-based one, and especially at coarse scales, where our
errors are 54%–76% reduced. Rows A–M are ablations of our model, see the text for details.
all scenes are shown in Table 2 (see the supplement for per- tances from the central object. We therefore use a more
scene metrics) and renderings of these four approaches are challenging evaluation procedure, similar to the multiscale
shown in Figure 1. Our model, Mip-NeRF 360, and our Blender dataset used by mip-NeRF [3]: we turn each image
“mip-NeRF 360 + iNGP” baseline were all trained on 8 into a set of four images that have been bicubically down-
NVIDIA Tesla V100-SXM2-16GB GPUs. The other base- sampled by scale factors [1, 2, 4, 8] where the downsampled
lines were trained on different accelerators (TPUs were used images act as a proxy for additional training/test views in
for NeRF, mip-NeRF, and NeRF++, and a single NVIDIA which the camera has been zoomed out from the center of
3090 was used for iNGP) so to enable comparison their run- the scene. During training we multiply the data term by the
times have been rescaled to approximate performance on scale factor of each ray, and at test time we evaluate each
our hardware. The render time of our model (which is not a scale separately. This significantly increases the difficulty
focus of this work) is 3.9s, while mip-NeRF 360 takes 9.0s of reconstruction by requiring the learned model to general-
and our mip-NeRF 360 + iNGP baseline takes 2.4s. ize across scales, and it causes aliasing artifacts to be highly
Our model significantly outperforms mip-NeRF 360 (the salient, especially at coarse scales.
previous state-of-the-art of this task) on this benchmark: we In Table 1 we evaluate our model against iNGP, mip-
observe 8%, 15%, and 18% reductions in RMSE, DSSIM, NeRF 360, our mip-NeRF 360 + iNGP baseline, and many
and LPIPS respectively. Additionally, our model trains 22× ablations. Though mip-NeRF 360 performs reasonably (as
faster than mip-NeRF 360. Compared to iNGP, our error it can reason about scale) our model still reduces RMSE
reduction is even more significant: 26%, 41%, and 36% re- by 8.5% at the finest scale and 17% at the coarsest scale,
ductions in RMSE, DSSIM, and LPIPS, though our model while being 22× faster. The mip-NeRF 360 + iNGP base-
is ∼ 6× slower to train than iNGP. Our combined “mip- line, which has no mechanism for anti-aliasing or reasoning
NeRF 360 + iNGP” baseline trains significantly faster than about scale, performs poorly: our RMSEs are 18% lower
mip-NeRF 360 (and is nearly 3× faster to train than our at the finest scale and 54% lower at the coarsest scale, and
model) and produces error rates comparable to mip-NeRF
360’s and better than iNGP’s. Our model’s improvement in
image quality over these baselines is clear upon visual in- PSNR ↑ SSIM ↑ LPIPS ↓ Time (hrs)
spection, as shown in Figure 1 and the supplemental video. NeRF [21, 11] 23.85 0.605 0.451 12.65
mip-NeRF [3] 24.04 0.616 0.441 9.64
NeRF++ [38] 25.11 0.676 0.375 28.73
Multiscale 360 Dataset Though the 360 dataset contains Instant NGP [23, 34] 25.68 0.705 0.302 0.15
mip-NeRF 360 [4, 22] 27.58 0.793 0.234 22.07
challenging scene content, it does not measure rendering
mip-NeRF 360 + iNGP 26.36 0.785 0.225 0.33
quality as a function of scale — because this dataset was Our Model 28.29 0.825 0.192 0.96
captured by orbiting a camera around a central object at a
roughly constant distance, learned models need not general- Table 2: Performance on the 360 dataset [4]. Runtimes
ize well across different image resolutions or different dis- shown in gray are approximate, see text for details.
(a) Ground Truth (b) mip-NeRF 360 (c) Our Model (d) mip360 + iNGP
both DSSIM and LPIPS at the coarsest scale are 76% lower.
along each ray, which reduces z-aliasing. To reveal mip-
This improvement can be seen in Figure 8. Our mip-NeRF
NeRF 360’s vulnerability to z-aliasing, in Figure 9 we plot
360 + iNGP baseline generally outperforms iNGP (except
test-set PSNR for our multiscale benchmark as we reduce
at the coarsest scale), as we might expect from Table 2.
the number of samples along the ray (both for the NeRF and
In Table 1 we also include an ablation study of our the proposal network) by factors of 2, while also doubling
model. (A) Using “naive” sampling with our multisampling r and the number of training iterations. Though our anti-
and downweighting disabled (which corresponds to the ap- aliased loss only slightly outperforms mip-NeRF 360’s loss
proach used in NeRF and iNGP) significantly degrades when the number of samples is large, as the sample count
quality, and (B) supersampling [9] by a 7× factor to match decreases the baseline loss fails catastrophically while our
our multisample budget does not improve image quality. (C, loss degrades gracefully.
D) Randomly jittering these naive samples within the cone
being cast improves performance at coarse scales, but only That said, the most egregious z-aliasing artifact of miss-
somewhat. Individually disabling (E) multisampling and ing scene content (shown in Figure 4) is hard to measure
(F) downweighting lets us measure the relative impact of in small benchmarks of still images, as scene content only
both anti-aliasing approaches; they both contribute evenly. disappears at certain distances that the test set may not
(G) Not using ω as a feature hurts performance slightly. probe. See the supplemental video for demonstrations of
Alternative multisampling approaches like (H) sampling n z-aliasing, and for renderings from our model where that
random points from each mip-NeRF Gaussian or (G) using aliasing has been fixed.
n unscented transform [15] control points from each Gaus-
sian as multisamples are competitive alternatives to our 5. Conclusion
spiral-based approach, but both have slightly worse speeds
and accuracies. (J) Disabling our anti-aliased interlevel loss We have presented Zip-NeRF, a model that integrates
decreases accuracy, especially at coarse scales. (K) Dis- the progress made in the formerly divergent areas of scale-
abling our normalized weight decay reduces accuracy sig- aware anti-aliased NeRFs and fast grid-based NeRF train-
nificantly, and (L) using non-normalized weight decay per- ing. By leveraging ideas about multisampling and prefilter-
forms poorly. (M) Using iNGP’s small view-dependent ing, our model is able to achieve error rates that are 8% –
MLP decreases quality at all scales. 76% lower than prior techniques, while also training 22×
faster than mip-NeRF 360 (the previous state-of-the-art on
our benchmarks). We hope that the tools and analysis pre-
Sample Efficiency Ablating our anti-aliased interlevel
sented here concerning aliasing (both the spatial aliasing of
loss in our multiscale benchmark does not degrade quality
NeRF’s learned mapping from spatial coordinate to color
significantly because mip-NeRF 360 (and our model, for the
and density, and z-aliasing of the loss function used during
sake of fair comparison) uses a large number of samples1
online distillation along each ray) enable further progress
1 Note that here we use “samples” to refer to sampled sub-frusta along towards improving the quality, speed, and sample-efficiency
the ray, not the sampled multisamples within each sub-frustum. of NeRF-like inverse rendering techniques.
References [20] Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla,
Pratul P. Srinivasan, and Jonathan T. Barron. NeRF in the
[1] Tomas Akenine-Möller. An extremely inexpensive multi- dark: High dynamic range view synthesis from noisy raw
sampling scheme. Technical report, Ericsson Mobile Plat- images. CVPR, 2022.
forms AB, 2003. [21] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik,
[2] Jonathan T. Barron. A general and adaptive robust loss func- Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF:
tion. CVPR, 2019. Representing scenes as neural radiance fields for view syn-
[3] Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter thesis. ECCV, 2020.
Hedman, Ricardo Martin-Brualla, and Pratul P. Srinivasan. [22] Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Hedman, Ricardo Martin-Brualla, and Jonathan T. Barron.
Neural Radiance Fields. ICCV, 2021. MultiNeRF: A Code Release for Mip-NeRF 360, Ref-NeRF,
[4] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. and RawNeRF, 2022.
Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded [23] Thomas Müller, Alex Evans, Christoph Schied, and Alexan-
anti-aliased neural radiance fields. CVPR, 2022. der Keller. Instant neural graphics primitives with a multires-
[5] Piotr Bojanowski, Armand Joulin, David Lopez-Pas, and olution hash encoding. SIGGRAPH, 2022.
Arthur Szlam. Optimizing the latent space of generative net- [24] Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas
works. ICML, 2018. Kurz, Joerg H. Mueller, Chakravarty R. Alla Chaitanya, An-
[6] George EP Box and David R Cox. An analysis of transfor- ton S. Kaplanyan, and Markus Steinberger. DONeRF: To-
mations. Journal of the Royal Statistical Society: Series B wards Real-Time Rendering of Compact Neural Radiance
(Methodological), 1964. Fields using Depth Oracle Networks. Computer Graphics
[7] James Bradbury, Roy Frostig, Peter Hawkins, Forum, 2021.
Matthew James Johnson, Chris Leary, Dougal Maclau- [25] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan
rin, George Necula, Adam Paszke, Jake VanderPlas, Skye Zhu. Semantic image synthesis with spatially-adaptive nor-
Wanderman-Milne, and Qiao Zhang. JAX: compos- malization. CVPR, 2019.
able transformations of Python+NumPy programs, 2018. [26] Ethan Perez, Florian Strub, Harm de Vries, Vincent Du-
http://github.com/google/jax. moulin, and Aaron C. Courville. Film: Visual reasoning with
[8] Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and a general conditioning layer. AAAI, 2018.
Hao Su. TensoRF: Tensorial radiance fields. ECCV, 2022. [27] Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden-
[9] Robert L Cook. Stochastic sampling in computer graphics. hall. DreamFusion: Text-to-3D using 2D Diffusion. ICLR,
TOG, 1986. 2023.
[10] Franklin C Crow. Summed-area tables for texture mapping. [28] Peter Shirley. Discrepancy as a quality measure for sample
SIGGRAPH, 1984. distributions. Eurographics, 1991.
[11] Boyang Deng, Jonathan T. Barron, and Pratul P. Srini- [29] Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel
vasan. JaxNeRF: an efficient JAX implementation of NeRF, grid optimization: Super-fast convergence for radiance fields
2020. http://github.com/google-research/ reconstruction. CVPR, 2022.
google-research/tree/master/jaxnerf. [30] Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara
[12] Ned Greene and Paul S. Heckbert. Creating raster omnimax Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ra-
images from multiple perspective views using the elliptical mamoorthi, Jonathan T. Barron, and Ren Ng. Fourier fea-
weighted average filter. IEEE Computer Graphics and Ap- tures let networks learn high frequency functions in low di-
plications, 1986. mensional domains. NeurIPS, 2020.
[13] Paul S Heckbert. Filtering by repeated integration. SIG- [31] John W. Tukey. Exploratory Data Analysis. Addison-
GRAPH, 1986. Wesley, 1977.
[14] Xun Huang and Serge Belongie. Arbitrary style transfer in [32] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-
real-time with adaptive instance normalization. ICCV, 2017. reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia
[15] Simon J Julier and Jeffrey K Uhlmann. Unscented filtering Polosukhin. Attention is all you need. NeurIPS, 2017.
and nonlinear estimation. Proceedings of the IEEE, 2004. [33] Lance Williams. Pyramidal parametrics. Computer Graph-
[16] Animesh Karnewar, Tobias Ritschel, Oliver Wang, and Niloy ics, 1983.
Mitra. ReLU fields: The little non-linearity that could. SIG- [34] Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin,
GRAPH, 2022. Pratul P. Srinivasan, Richard Szeliski, Jonathan T. Barron,
[17] Diederik P. Kingma and Jimmy Ba. Adam: A method for and Ben Mildenhall. BakedSDF: Meshing neural SDFs for
stochastic optimization. ICLR, 2015. real-time view synthesis. arXiv, 2023.
[18] Dale Kirkland, Bill Armstrong, Michael Gold, Jon Leech, [35] Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Tsung-
and Paula Womack. Arb multisample. Technical report, Yi Lin, Alberto Rodriguez, and Phillip Isola. NeRF-
Khronos OpenGL Registry, 1999. Supervision: Learning dense object descriptors from neural
[19] Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, radiance fields. ICRA, 2022.
Jonathan T. Barron, Alexey Dosovitskiy, and Daniel Duck- [36] In-Kwon Yeo and Richard A Johnson. A new family of
worth. NeRF in the Wild: Neural Radiance Fields for Un- power transformations to improve normality or symmetry.
constrained Photo Collections. CVPR, 2021. Biometrika, 2000.
[37] Alex Yu, Sara Fridovich-Keil, Matthew Tancik, Qinhong formance (unless an unusual evaluation procedure such as
Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: the one advocated for in NeRF-W is adopted).
Radiance fields without neural networks. CVPR, 2022.
[38] Kai Zhang, Gernot Riegler, Noah Snavely, and Vladlen B. Scale Featurization
Koltun. NeRF++: Analyzing and Improving Neural Radi-
ance Fields. arXiv:2010.07492, 2020. Along with features f` we also average and concatenate
a featurized version of {ωj,` } for use as input to our MLP:
A. Video Results q
2 + ∇(mean(V 2 )) ,
(2 · meanj (ωj,` ) − 1) Vinit (5)
This supplement includes video results for scenes from `
the 360 benchmark. We also present video results for new
more-challenging scenes that we have captured to qualita- where ∇ is a stop-gradient operator and Vinit is the magni-
tively demonstrate various kinds of aliasing and our model’s tude used to initialize each V` . This feature takes ωj (shifted
ability to ameliorate that aliasing. and scaled to [−1, 1]) and scales it by the standard devia-
tion of the values in V` (padded slightly using Vinit , which
Affine Generative Latent Optimization The video re- guards against the case where a V` ’s values shrink to zero
sults presented in mip-NeRF 360 use the appearance- during training). This scaling ensures that our featurized
embedding approach of NeRF-W [19], which assigns short scales have roughly the same magnitude features as the fea-
GLO vectors [5] to each image in the training set that are tures themselves, regardless of how the features may grow
concatenated to the input of the view-dependent branch or shrink during training. The stop-gradient prevents op-
of the MLP. By jointly optimizing over those embeddings timization from indirectly modifying this scale-feature by
during training, optimization is able to explain-away view- changing the values in V` .
dependent effects such as variation in exposure, white bal- When computing the downweighting factor ω, for the
ance, lighting, and cast shadows. On scenes with signif- sake of speed we use an approximation for erf(x):
icant illumination variation we found this approach to be p
reasonably effective, but it is limited by the relatively small erf(x) ≈ sign(x) 1 − exp(−(4/π)x2 ) . (6)
size of the view-dependent branch of our MLP compared
to NeRF-W and mip-NeRF 360. We therefore use an ap- This has no discernible impact on quality and a very
proach similar to NeRF-W’s GLO approach, but instead of marginal impact on performance.
optimizing over a short feature that is concatenated onto
our bottleneck vector, we instead optimize over an affine C. Spatial Contraction
transformation of our bottleneck vector itself. Furthermore, Mip-NeRF 360 [4] parameterizes unbounded scenes
we express this affine transformation not just with a per- with bounded coordinates using a spatial contraction:
image latent embedding, but also with a small MLP that
maps from a per-image latent embedding to an affine trans- (
x
kxk ≤ 1
formation. We found that optimizing over the large space of C(x) = 1 x (7)
embeddings and MLP weights resulted in more rapid train- 2− kxk kxk kxk > 1 .
ing than the GLO baseline, which allows the model to more
easily explain per-image appearance variation without plac- This maps values in [−∞, ∞]d to [−2, 2]d such that res-
ing floaters in front of training cameras. olution in the contracted domain is proportional to what is
We allocate an 128-length vector for each image in the required by perspective projection — scene content near the
training set and use those vectors as input to a two-layer origin is allocated significant model capacity, but distant
MLP with 128 hidden units whose output is two vectors scene content is allocated model capacity that is roughly
(scale and shift) that are the same length as our bottleneck proportional to disparity (inverse distance).
vector. The internal activation of the MLP is a ReLU, and Given our multisampled isotropic Gaussians {xj , σj },
the final activation that yields our scaling is exp. We then we need a way to efficiently apply a scene contraction. Con-
scale and shift each bottleneck vector by the two outputs tracting the means of the Gaussians simply requires evaluat-
of the MLP before evaluating the view-dependent MLP. ing each C(xj ), but contracting the scale is non-trivial, and
This approach of optimizing an affine function of an inter- the approach provided by mip-NeRF 360 [4] for contracting
nal activation resembles prior techniques in the literature a multivariate Gaussian is needlessly expressive and expen-
for modulating activations to control “style” and appear- sive for our purposes. Instead, we linearize the contrac-
ance [14, 25, 26]. This technique is only used to produce tion around each xj to produce JC (xj ), the Jacobian of the
our video results and is not used for the experiments men- contraction at xj , and we produce an isotropic scale in the
tioned in the paper, as it does not improve quantitative per- contracted space by computing the geometric mean of the
eigenvalues of JC (xj ). This can be done efficiently by sim-
ply computing the determinant of JC (xj ) and taking its dth
root (d = 3 in our case, as our coordinates are 3D):
1/d
C(σj ) = σj det(JC (xj )) . (8)
D. Power Transformation Details Figure 10: The behavior of mip-NeRF 360’s distortion loss
Here we expand upon the power transformation P(x, λ) can be significantly modified by applying a curve to met-
presented in the paper. First, let’s expand upon its definition ric distance, which we do with our power transformation.
to include its two removable singularities and its limits as λ Top: using a linear curve (i.e., using metric distance t it-
approaches ±∞: self) results in a distortion loss that heavily penalizes dis-
tant scene content but ignores “floaters” close to the camera
x λ=1 (the single large histogram bin near t = 0), as can be seen
log(1 + x) λ=0
by visualizing the gradient magnitude of distortion loss as a
x heat-map over the NeRF histogram shown here. Bottom:
P(x, λ) = e − 1 λ=∞ (9) Curving metric distance with a tuned power transforma-
1 − e−x
λ = −∞
tion P(10000x, −0.25)) before computing distortion loss
λ
|λ−1| x
+1 −1 otherwise
causes distortion loss to correctly penalize histogram bins
λ |λ−1|
near the camera.
Note that the λ = −∞, 0, +∞ cases can and should be im-
plemented using the log1p() and expm1() operations a steep gradient near the origin that tapers off into some-
that are standard in numerical computing and deep learn- thing resembling log-distance, we obtain a distortion loss
ing libraries. As discussed, the slope of this function is 1 that more aggressively penalizes “floaters” near the camera,
near the origin, but further from the origin λ can be tuned which significantly improves the quality of videos rendered
to describe a wide family of shapes: exponential, squared, from our model. This does not significantly improve test-
logarithmic, inverse, inverse-square, and (negative) inverse- set metrics on the 360 dataset, as floaters do not tend to
exponential. Scaling the x input to P lets us control the contribute much to error metrics computed on still images.
effective range of inputs where P(x, λ) is approximately In all of our experiments we set our model’s multiplier on
linear. The second derivative of P at the origin is ±1, de- distortion loss to 0.005. See Figure 10 for a visualization of
pending on if λ is more or less than 1, and the output of P our curve and the impact it has on distortion loss.
is bounded by λ−1λ when λ < 0.
This power transformation relates to the general robust E. Blurring A Step Function
loss ρ(x, α, c) [2], as ρ can be written as P applied to
Algorithm 1 contains pseudocode for the algorithm de-
squared error: ρ(x, λ, 1) = P(x2/2, λ/2). P also resembles
scribed in the paper for convolving a step function with a
a reparameterized Yeo–Johnson transformation [36] but al-
rectangular pulse to yield a piecewise linear spline. This
tered such that the transform always resembles a straight
code is valid JAX/Numpy code except that we have over-
line near the origin (the second derivative of the Yeo-
loaded sort() to include the behavior of argsort().
Johnson transformation at the origin is unbounded).
Algorithm 1 xr , yr = blur stepfun(x, y, r)
Distortion Loss As discussed in the paper, our power
xr , sortidx = sort(concatenate([x − r, x + r]))
transformation is used to curve metric distance into a nor-
y0 = ([y, 0] − [0, y])/(2r)
malized space where resampling and interlevel supervision
y00 = concatenate([y0 , −y0 ])[sortidx [: −1]]
can be performed effectively. We also use this transfor-
yr = [0, cumsum((xr [1 :] − xr [: −1]) cumsum(y00 ))]
mation to curve metric distance within the distortion loss
used in mip-NeRF 360. By tuning a transformation to have
F. Model Details G. Results
Our model, our “mip-NeRF 360 + iNGP” baseline, and An expanded version of our single-scale results on the
all ablations (unless otherwise stated) were trained for 25k mip-NeRF 360 dataset where we list error metrics for each
iterations using a batch size of 216 . We use the Adam opti- individual scene in the dataset can be found in Table 3.
mizer [17] with β1 = 0.9, β2 = 0.99, = 10−15 , and we
decay our learning rate logarithmically from 10−2 to 10−3
over training. We use no gradient clipping, and we use an
aggressive warm-up for our learning rate: for the first 5k
iterations we scale our learning rate by a multiplier that is
cosine-decayed from 10−8 to 1.
In our iNGP hierarchy of grids and hashes, we use 10
grid scales that are spaced by powers of 2 from 16 to 8192,
and we use 4 channels per level (we found this slightly
better and√faster than iNGP’s similar approach of spacing
scales by 2 and having 2 channels per level). Grid sizes
n` that are greater than 128 are parameterized identically to
iNGP using a hash of size 1283 .
We inherit the proposal sampling procedure used by mip-
NeRF 360: two rounds of proposal sampling where we
bound scene geometry and recursively generate new sam-
ple intervals, and one final NeRF round in which we render
that final set of intervals into an image. We use a distinct
NGP and MLP for each round of sampling, as this improves
performance slightly. Because high-frequency content is
largely irrelevant for earlier rounds of proposal sampling,
we truncate the hierarchy of grids of each proposal NGP at
maximum grid sizes of 512 and 2048. We additionally only
use a single channel of features for each proposal NGP, as
(unlike the NeRF NGP) these models only need to predict
density, not density and color.
We found that small view-dependent MLP used by iNGP
to be a significant bottleneck to performance, as it limits the
model’s ability to express and recover complicated view-
dependent effects. This not only reduces rendering quality
on shiny surfaces, but also causes more “floaters” by en-
couraging optimization to explain away view-dependent ef-
fects with small floaters in front of the camera. We therefore
use a larger model: We have a view-dependent bottleneck
size of 256, and we process those bottleneck vectors with a
3-layer MLP with 256 hidden units and a skip connection
from the bottleneck to the second layer.
Table 3: Per-scene performance on the dataset of “360” indoor and outdoor scenes from mip-NeRF 360 [4].