You are on page 1of 624

Springer Proceedings in Mathematics & Statistics

Ronald Cools
Dirk Nuyens Editors

Monte Carlo
and Quasi-
Monte Carlo
Methods
MCQMC, Leuven, Belgium, April 2014
Springer Proceedings in Mathematics & Statistics

Volume 163
Springer Proceedings in Mathematics & Statistics

This book series features volumes composed of selected contributions from


workshops and conferences in all areas of current research in mathematics and
statistics, including operation research and optimization. In addition to an overall
evaluation of the interest, scientific quality, and timeliness of each proposal at the
hands of the publisher, individual contributions are all refereed to the high quality
standards of leading journals in the field. Thus, this series provides the research
community with well-edited, authoritative reports on developments in the most
exciting areas of mathematical and statistical research today.

More information about this series at http://www.springer.com/series/10533


Ronald Cools Dirk Nuyens

Editors

Monte Carlo and


Quasi-Monte Carlo Methods
MCQMC, Leuven, Belgium, April 2014

123
Editors
Ronald Cools Dirk Nuyens
Department of Computer Science Department of Computer Science
KU Leuven KU Leuven
Heverlee Heverlee
Belgium Belgium

ISSN 2194-1009 ISSN 2194-1017 (electronic)


Springer Proceedings in Mathematics & Statistics
ISBN 978-3-319-33505-6 ISBN 978-3-319-33507-0 (eBook)
DOI 10.1007/978-3-319-33507-0
Library of Congress Control Number: 2016937963

Mathematics Subject Classification (2010): 11K45, 11K38, 65-06, 65C05, 65D30, 65D18, 65C30,
65C35, 65C40, 91G60

© Springer International Publishing Switzerland 2016


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG Switzerland
Preface

This volume represents the refereed proceedings of the Eleventh International


Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific
Computing which was held at the KU Leuven in Belgium from 6 to 11 April 2014.
It contains a limited selection of articles based on presentations given at the con-
ference. The conference program was arranged with the help of an international
committee consisting of the following members:
• Ronald Cools (Belgium, KU Leuven)—Chair
• Luc Devroye (Canada, McGill University)
• Josef Dick (Australia, University of New South Wales)
• Alain Dubus (Belgium, Université libre de Bruxelles)
• Philip Dutré (Belgium, KU Leuven)
• Henri Faure (France, Aix-Marseille Université)
• Alan Genz (USA, Washington State University)
• Mike Giles (UK, Oxford University)
• Paul Glasserman (USA, Columbia University)
• Michael Gnewuch (Germany, Universität Kaiserslautern)
• Stefan Heinrich (Germany, Universität Kaiserslautern)
• Fred Hickernell (USA, Illinois Institute of Technology)
• Aicke Hinrichs (Germany, Universität Rostock)
• Stephen Joe (New Zealand, University of Waikato)
• Aneta Karaivanova (Bulgaria, Bulgarian Academy of Sciences)
• Alexander Keller (Germany, NVIDIA)
• Dirk Kroese (Australia, The University of Queensland)
• Frances Kuo (Australia, University of New South Wales)
• Pierre L’Ecuyer (Canada, Université de Montréal)
• Gerhard Larcher (Austria, Johannes Kepler Universität Linz)
• Christiane Lemieux (Canada, University of Waterloo)
• Christian Lécot (France, Université de Savoie)
• Makoto Matsumoto (Japan, Hiroshima University)
• Thomas Müller-Gronbach (Germany, Universität Passau)

v
vi Preface

• Harald Niederreiter (Austria, Austrian Academy of Sciences)


• Erich Novak (Germany, Friedrich-Schiller-Universität Jena)
• Dirk Nuyens (Belgium, KU Leuven)
• Art Owen (USA, Stanford University)
• Gareth Peters (UK, University College London)
• Friedrich Pillichshammer (Austria, Johannes Kepler Universität Linz)
• Leszek Plaskota (Poland, University of Warsaw)
• Eckhard Platen (Australia, University of Technology Sydney)
• Klaus Ritter (Germany, Universität Kaiserslautern)
• Giovanni Samaey (Belgium, KU Leuven)
• Wolfgang Schmid (Austria, Universität Salzburg)
• Nikolai Simonov (Russia, Russian Academy of Sciences)
• Ian Sloan (Australia, University of New South Wales)
• Shu Tezuka (Japan, Kyushu University)
• Xiaoqun Wang (China, Tsinghua University)
• Grzegorz Wasilkowski (USA, University of Kentucky)
• Henryk Woźniakowski (Poland, University of Warsaw)
This conference continued the tradition of biennial MCQMC conferences initi-
ated by Harald Niederreiter, held previously at the following places:
1. Las Vegas, USA (1994)
2. Salzburg, Austria (1996)
3. Claremont, USA (1998)
4. Hong Kong (2000)
5. Singapore (2002)
6. Juan-Les-Pins, France (2004)
7. Ulm, Germany (2006)
8. Montreal, Canada (2008)
9. Warsaw, Poland (2010)
10. Sydney, Australia (2012)
The next conference will be held at Stanford University, USA, in August 2016.
The proceedings of these previous conferences were all published by
Springer-Verlag, under the following titles:
• Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing
(H. Niederreiter and P.J.-S. Shiue, eds.)
• Monte Carlo and Quasi-Monte Carlo Methods 1996 (H. Niederreiter,
P. Hellekalek, G. Larcher and P. Zinterhof, eds.)
• Monte Carlo and Quasi-Monte Carlo Methods 1998 (H. Niederreiter and
J. Spanier, eds.)
• Monte Carlo and Quasi-Monte Carlo Methods 2000 (K.-T. Fang,
F.J. Hickernell and H. Niederreiter, eds.)
• Monte Carlo and Quasi-Monte Carlo Methods 2002 (H. Niederreiter, ed.)
• Monte Carlo and Quasi-Monte Carlo Methods 2004 (H. Niederreiter and
D. Talay, eds.)
Preface vii

• Monte Carlo and Quasi-Monte Carlo Methods 2006 (A. Keller, S. Heinrich and
H. Niederreiter, eds.)
• Monte Carlo and Quasi-Monte Carlo Methods 2008 (P. L’Ecuyer and A. Owen,
eds.)
• Monte Carlo and Quasi-Monte Carlo Methods 2010 (L. Plaskota and
H. Woźniakowski, eds.)
• Monte Carlo and Quasi-Monte Carlo Methods 2012 (J. Dick, F.Y. Kuo,
G.W. Peters and I.H. Sloan, eds.)
The program of the conference was rich and varied with 207 talks. Highlights
were the invited plenary talks, the tutorials and a public lecture. The plenary talks
were given by Steffen Dereich (Germany, Westfälische Wilhelms-Universität
Münster), Peter Glynn (USA, Stanford University), Wenzel Jakob (Switzerland,
ETH Zürich), Makoto Matsumoto (Japan, Hiroshima University), Harald
Niederreiter (Austria, Austrian Academy of Sciences), Erich Novak (Germany,
Friedrich-Schiller-Universität Jena), Christian Robert (France, Université
Paris-Dauphine and UK, University of Warwick) and Raul Tempone (Saudi Arabia,
King Abdullah University of Science and Technology). The tutorials were given by
Mike Giles (UK, Oxford University) and Art Owen (USA, Stanford University),
and the public lecture was by Jos Leys.
The papers in this volume were carefully refereed and cover both theory and
applications of Monte Carlo and quasi-Monte Carlo methods. We thank the
reviewers for their extensive reports.
We gratefully acknowledge financial support from the KU Leuven, the city of
Leuven, the US National Science Foundation and the FWO Scientific Research
Community Stochastic Modelling with Applications in Financial Markets.

Leuven Ronald Cools


December 2015 Dirk Nuyens
Contents

Part I Invited Papers


Multilevel Monte Carlo Implementation for SDEs Driven
by Truncated Stable Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Steffen Dereich and Sangmeng Li
Construction of a Mean Square Error Adaptive Euler–Maruyama
Method With Applications in Multilevel Monte Carlo . . . . . . . . . . . . . 29
Håkon Hoel, Juho Häppölä and Raúl Tempone
Vandermonde Nets and Vandermonde Sequences . . . . . . . . . . . . . . . . 87
Roswitha Hofer and Harald Niederreiter
Path Space Markov Chain Monte Carlo Methods in Computer
Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Wenzel Jakob
Walsh Figure of Merit for Digital Nets: An Easy Measure
for Higher Order Convergent QMC . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Makoto Matsumoto and Ryuichi Ohori
Some Results on the Complexity of Numerical Integration . . . . . . . . . . 161
Erich Novak
Approximate Bayesian Computation: A Survey on Recent Results . . . . 185
Christian P. Robert

Part II Contributed Papers


Multilevel Monte Carlo Simulation of Statistical Solutions
to the Navier–Stokes Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Andrea Barth, Christoph Schwab and Jonas Šukys

ix
x Contents

Unbiased Simulation of Distributions with Explicitly


Known Integral Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Denis Belomestny, Nan Chen and Yiwei Wang
Central Limit Theorem for Adaptive Multilevel Splitting
Estimators in an Idealized Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Charles-Edouard Bréhier, Ludovic Goudenège and Loïc Tudela
Comparison Between LS-Sequences and b-Adic van der
Corput Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Ingrid Carbone
Computational Higher Order Quasi-Monte Carlo Integration . . . . . . . 271
Robert N. Gantner and Christoph Schwab
Numerical Computation of Multivariate Normal Probabilities
Using Bivariate Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Alan Genz and Giang Trinh
Non-nested Adaptive Timesteps in Multilevel Monte
Carlo Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Michael B. Giles, Christopher Lester and James Whittle
On ANOVA Decompositions of Kernels and Gaussian
Random Field Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
David Ginsbourger, Olivier Roustant, Dominic Schuhmacher,
Nicolas Durrande and Nicolas Lenz
The Mean Square Quasi-Monte Carlo Error for Digitally
Shifted Digital Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Takashi Goda, Ryuichi Ohori, Kosuke Suzuki and Takehito Yoshiki
Uncertainty and Robustness in Weather Derivative Models . . . . . . . . . 351
Ahmet Göncü, Yaning Liu, Giray Ökten and M. Yousuff Hussaini
Reliable Adaptive Cubature Using Digital Sequences . . . . . . . . . . . . . . 367
Fred J. Hickernell and Lluís Antoni Jiménez Rugama
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate
Periodic Functions with Bounded Mixed Derivatives . . . . . . . . . . . . . . 385
Aicke Hinrichs and Jens Oettershagen
Adaptive Multidimensional Integration Based on Rank-1 Lattices . . . . . 407
Lluís Antoni Jiménez Rugama and Fred J. Hickernell
Path Space Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Alexander Keller, Ken Dahm and Nikolaus Binder
Tractability of Multivariate Integration in Hybrid Function
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Peter Kritzer and Friedrich Pillichshammer
Contents xi

Derivative-Based Global Sensitivity Measures and Their Link


with Sobol’ Sensitivity Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Sergei Kucherenko and Shugfang Song
Bernstein Numbers and Lower Bounds for the Monte Carlo Error . . . . 471
Robert J. Kunsch
A Note on the Importance of Weak Convergence Rates for SPDE
Approximations in Multilevel Monte Carlo Schemes . . . . . . . . . . . . . . 489
Annika Lang
A Strategy for Parallel Implementations of Stochastic
Lagrangian Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Lionel Lenôtre
A New Rejection Sampling Method for Truncated Multivariate
Gaussian Random Variables Restricted to Convex Sets . . . . . . . . . . . . 521
Hassan Maatouk and Xavier Bay
Van der Corput and Golden Ratio Sequences Along the Hilbert
Space-Filling Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Colas Schretter, Zhijian He, Mathieu Gerber, Nicolas Chopin
and Harald Niederreiter
Uniform Weak Tractability of Weighted Integration . . . . . . . . . . . . . . 545
Paweł Siedlecki
Incremental Greedy Algorithm and Its Applications
in Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
Vladimir Temlyakov
On “Upper Error Bounds for Quadrature Formulas
on Function Classes” by K.K. Frolov . . . . . . . . . . . . . . . . . . . . . . . . . . 571
Mario Ullrich
Tractability of Function Approximation with Product Kernels . . . . . . . 583
Xuan Zhou and Fred J. Hickernell
Discrepancy Estimates For Acceptance-Rejection Samplers
Using Stratified Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
Houying Zhu and Josef Dick

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
List of Participants

Nico Achtsis, KU Leuven, Belgium


Sergios Agapiou, University of Warwick, UK
Giacomo Albi, University of Ferrara, Italy
Martin Altmayer, Universität Mannheim, Germany
Anton Antonov, Saint Petersburg State University, Russia
Emanouil Atanassov, Bulgarian Academy of Sciences, Bulgaria
Yves Atchadé, University of Michigan, USA
Serge Barbeau, Montreal University, Canada
Andrea Barth, ETH Zürich, Switzerland
Kinjal Basu, Stanford University, USA
Tobias Baumann, University of Mainz, Germany
Christian Bayer, Weierstrass Institute, Germany
Benot Beck, Arxios sprl, Belgium
Denis Belomestny, Duisburg-Essen University, Germany
Francisco Bernal, Instituto Superior Técnico, Portugal
Debarati Bhaumik, CWI Amsterdam, The Netherlands
Dmitriy Bilyk, University of Minnesota, USA
Jose Blanchet, Columbia University, USA
Bastian Bohn, University of Bonn, Germany
Luke Bornn, Harvard University, USA
Bruno Bouchard, ENSAE-ParisTech, France
Luca Brandolini, University of Bergamo, Italy
Johann Brauchart, The University of New South Wales, Australia
Charles-Edouard Bréhier, Ecoles des Ponts, France
Tim Brereton, Universität Ulm, Germany
Glenn Byrenheid, University of Bonn, Germany
Ingrid Carbone, University of Calabria, Italy
Biagio Ciuffo, Joint Research Centre European Commission, Italy
Leonardo Colzani, Università di Milano-Bicocca, Italy
Ronald Cools, KU Leuven, Belgium
Simon Cotter, University of Manchester, UK

xiii
xiv List of Participants

Radu Craiu, University of Toronto, Canada


Antonio Dalessandro, University College London, UK
Fred Daum, Raytheon, USA
Thomas Daun, Technische Universität Kaiserslautern, Germany
Lucia Del Chicca, Johannes Kepler University Linz, Austria
Steffen Dereich, Westfälische Wilhelms-Universität Münster, Germany
Josef Dick, The University of New South Wales, Australia
Giacomo Dimarco, University of Toulouse III, France
Ivan Dimov, Bulgarian Academy of Sciences, Bulgaria
Dũng Dinh, Vietnam National University, Vietnam
Benjamin Doerr, Ecole Polytechnique, France
Gonçalo dos Reis, Technical University Berlin, Germany
Alain Dubus, Université Libre de Bruxelles, Belgium
Pınar H. Durak, Yeditepe University, Turkey
Pierre Étoré, Grenoble University, France
Henri Faure, Aix-Marseille Université, France
Robert Gantner, ETH Zürich, Switzerland
Christel Geiss, University of Innsbruck, Austria
Stefan Geiss, University of Innsbruck, Austria
Alan Genz, Washington State University, USA
Iliyan Georgiev, Solid Angle Ltd., UK
Mathieu Gerber, University of Lausanne, Switzerland
Giacomo Gigante, University of Bergamo, Italy
Mike Giles, University of Oxford, UK
David Ginsbourger, University of Bern, Switzerland
Peter W. Glynn, Stanford University, USA
Michael Gnewuch, Technische Universität Kaiserslautern, Germany
Maciej Goćwin, AGH University of Science and Technology, Poland
Takashi Goda, The University of Tokyo, Japan
Ahmet Göncü, Xian Jiaotong Liverpool University, China
Peter Grabner, Graz University of Technology, Austria
Mathilde Grandjacques, Grenoble University, France
Andreas Griewank, Humboldt-University Berlin, Germany
Adrien Gruson, Rennes 1 University, France
Arnaud Guyader, University of Rennes, France
Toshiya Hachisuka, Aarhus University, Denmark
Georg Hahn, Imperial College London, UK
Abdul-Lateef Haji-Ali, King Abdullah University of Science and Technology,
Saudi Arabia
Hiroshi Haramoto, Ehime University, Japan
Shin Harase, Tokyo Institute of Technology, Japan
Carsten Hartmann, Freie Universität Berlin, Germany
Mario Hefter, Technische Universität Kaiserslautern, Germany
Stefan Heinrich, Technische Universität Kaiserslautern, Germany
Clemens Heitzinger, Arizona State University, USA
List of Participants xv

Peter Hellekalek, University of Salzburg, Austria


Fred J. Hickernell, Illinois Institute of Technology, USA
Aicke Hinrichs, University of Rostock, Germany
Håkon Hoel, King Abdullah University of Science and Technology, Saudi Arabia
Wanwan Huang, Roosevelt University, USA
Martin Hutzenthaler, University of Frankfurt, Germany
Mac Hyman, Tulane University, USA
Christian Irrgeher, Johannes Kepler University Linz, Austria
Pierre Jacob, University of Oxford, UK
Wenzel Jakob, ETH Zürich, Switzerland
Alexandre Janon, Université Paris Sud, France
Karl Jansen, Deutsches Elektronen Synchroton, Germany
Wojciech Jarosz, The Walt Disney Company, Switzerland
Arnulf Jentzen, ETH Zürich, Switzerland
Lan Jiang, Illinois Institute of Technology, USA
Llus Antoni Jiménez Rugama, Illinois Institute of Technology, USA
Stephen Joe, The University of Waikato, New Zealand
Charles Joseph, Case Western Reserve University, USA
Lutz Kämmerer, TU Chemnitz, Germany
Anton S. Kaplanyan, Karlsruhe Institute of Technology, Germany
Alexander Keller, NVIDIA, Germany
Amirreza Khodadadian, TU Vienna, Austria
Anton Kostiuk, Technische Universität Kaiserslautern, Germany
Alexander Kreinin, IBM, Canada
Peter Kritzer, Johannes Kepler University Linz, Austria
Jaroslav Křivánek, Charles University in Prague, Czech Republic
Sergei Kucherenko, Imperial College London, UK
Thomas Kühn, Universität Leipzig, Germany
Arno Kuijlaars, KU Leuven, Belgium
Robert J. Kunsch, Friedrich Schiller University Jena, Germany
Frances Kuo, The University of New South Wales, Australia
Pierre L’Ecuyer, University of Montreal and INRIA Rennes, Canada
Céline Labart, Université de Savoie, France
William Lair, EDF R&D, France
Annika Lang, Chalmers University of Technology, Sweden
Gerhard Larcher, Johannes Kepler University Linz, Austria
Kody Law, King Abdullah University of Science and Technology, Saudi Arabia
Christian Lécot, Université de Savoie, France
Fabrizio Leisen, University of Kent, UK
Tony Lelièvre, Ecole des Ponts, France
Jérôme Lelong, Grenoble University, France
Lionel Lenôtre, INRIA Rennes Bretagne – Atlantique and Rennes 1, France
Gunther Leobacher, Johannes Kepler University Linz, Austria
Paul Leopardi, The Australian National University, Australia
Hernan Leövey, Humboldt-University Berlin, Germany
xvi List of Participants

Chris Lester, University of Oxford, UK


Josef Leydold, Vienna University of Economics and Business, Austria
Sangmeng Li, Westfälische Wilhelms-Universität Münster, Germany
Binghuan Lin, Techila Technologies Ltd., Finland
Jingchen Liu, Columbia University, USA
Kai Liu, University of Waterloo, Canada
Yanchu Liu, The Chinese University of Hong Kong, China
Hassan Maatouk, Ecole des Mines de St-Etienne, France
Sylvain Maire, Université de Toulon, France
Lev Markhasin, University of Stuttgart, Germany
Luca Martino, University of Helsinki, Finland
Makoto Matsumoto, Hiroshima University, Japan
Charles Matthews, University of Edinburgh, UK
Roel Matthysen, KU Leuven, Belgium
Sebastian Mayer, University of Bonn, Germany
Błażej Miasojedow, University of Warsaw, Poland
Alvaro Moraes, King Abdullah University of Science and Technology, Saudi Arabia
Paweł Morkisz, AGH University of Science and Technology, Poland
Hozumi Morohosi, National Graduate Institute for Policy Studies, Japan
Eric Moulines, Télécom ParisTech, France
Chiranjit Mukhopadhyay, Indian Institute of Science, India
Thomas Müller-Gronbach, University of Passau, Germany
Tigran Nagapetyan, Fraunhofer ITWM, Germany
Andreas Neuenkirch, Universität Mannheim, Germany
Duy Nguyen, University of Wisconsin-Madison, USA
Nguyet Nguyen, Florida State University, USA
Thi Phuong Dong Nguyen, KU Leuven, Belgium
Harald Niederreiter, Austrian Academy of Sciences, Austria
Wojciech Niemiro, University of Warsaw, Poland
Takuji Nishimura, Yamagata University, Japan
Erich Novak, Friedrich Schiller University Jena, Germany
Dirk Nuyens, KU Leuven, Belgium
Jens Oettershagen, University of Bonn, Germany
Ryuichi Ohori, The University of Tokyo, Japan
Giray Ökten, Florida State University, USA
Steffen Omland, Technische Universität Kaiserslautern, Germany
Michela Ottobre, Imperial College London, UK
Daoud Ounaissi, Université Lille 1, France
Art Owen, Stanford University, USA
Angeliki Papana, University of Macedonia, Greece
Peter Parczewski, Universität Mannheim, Germany
Robert Patterson, Weierstrass Institute, Germany
Stefan Pauli, ETH Zürich, Switzerland
Jean-Philippe Péraud, Massachusetts Institute of Technology, USA
List of Participants xvii

Magnus Perninge, Lund University, Sweden


Gareth William Peters, University College London, UK
Friedrich Pillichshammer, Johannes Kepler University Linz, Austria
Ísabel Piršić, Johannes Kepler University Linz, Austria
Leszek Plaskota, University of Warsaw, Poland
Jan Pospšil, University of West Bohemia, Czech Republic
Clémentine Prieur, Grenoble University, France
Antonija Pršlja, Arctur d.o.o., Slovenia
Paweł Przybyłowicz, AGH University of Science and Technology, Poland
Mykhailo Pupashenko, Technische Universität Kaiserslautern, Germany
Vilda Purutçuoğlu, Middle East Technical University, Turkey
Shaan Qamar, Duke University, USA
Christoph Reisinger, University of Oxford, UK
Lee Ricketson, Univerisity of California, Los Angeles, USA
Klaus Ritter, Technische Universität Kaiserslautern, Germany
Christian Robert, Université Paris-Dauphine, France
Werner Roemisch, Humboldt-University Berlin, Germany
Mathias Rousset, INRIA Paris, Rocquencourt, France
Raphaël Roux, Université Pierre et Marie Curie, France
Daniel Rudolf, Friedrich Schiller University Jena, Germany
Halis Sak, Yeditepe University, Turkey
Andrea Saltelli, Joint Research Centre European Commission, Italy
Giovanni Samaey, KU Leuven, Belgium
Wolfgang Ch. Schmid, University of Salzburg, Austria
Scott Schmidler, Duke University, USA
Colas Schretter, Vrije Universiteit Brussel, Belgium
Nikolaus Schweizer, Saarland University, Germany
Jean Michel Sellier, Bulgarian Academy of Sciences, Bulgaria
John Shortle, George Mason University, USA
Winfried Sickel, Friedrich Schiller University Jena, Germany
Paweł Siedlecki, University of Warsaw, Poland
Martin Simon, University of Mainz, Germany
Ian Sloan, The University of New South Wales, Australia
Alexey Stankovskiy, SCK-CEN, Belgium
Živa Stepančič, Arctur d.o.o., Slovenia
Jonas Šukys, ETH Zürich, Switzerland
Gowri Suryanarayana, KU Leuven, Belgium
Kosuke Suzuki, The University of Tokyo, Japan
David Swenson, Universiteit van Amsterdam, The Netherlands
Michaela Szölgyenyi, Johannes Kepler University Linz, Austria
Lukasz Szpruch, University of Edinburgh, UK
Tor Sørevik, University of Bergen, Norway
Stefano Tarantola, Joint Research Centre European Commission, Italy
Rodrigo Targino, University College London, UK
Aretha Teckentrup, Florida State University, USA
xviii List of Participants

Vladimir Temlyakov, University of South Carolina, USA


Raúl Tempone, King Abdullah University of Science and Technology, Saudi Arabia
Tomáš Tichý, VSB-TU Ostrava, Czech Republic
Giancarlo Travaglini, Università di Milano-Bicocca, Italy
Benjamin Trendelkamp-Schroer, Freie Universität Berlin, Germany
Bruno Tuffin, INRIA Rennes Bretagne, Atlantique, France
Gerhard Tulzer, TU Vienna, Austria
Plamen Turkedjiev, Ecole Polytechnique, France
Mario Ullrich, Friedrich Schiller University Jena, Germany
Tino Ullrich, University of Bonn, Germany
Manolis Vavalis, Univeristy of Thessaly, Greece
Matti Vihola, University of Jyväskylä, Finland
Pedro Vilanova, King Abdullah University of Science and Technology, Saudi Arabia
Toni Volkmer, TU Chemnitz, Germany
Sebastian Vollmer, University of Oxford, UK
Jan Vybíral, Technical University Berlin, Germany
Wander Wadman, CWI Amsterdam, The Netherlands
Clément Walter, Université Paris Diderot—Paris 7, France
Xiaoqun Wang, Tsinghua University, China
Yiwei Wang, The Chinese University of Hong Kong, China
Markus Weimar, Philipps-University Marburg, Germany
Jakub Wojdyła, AGH University of Science and Technology, Poland
Kasia Wolny, University of Warwick, UK
Yijun Xiao, Université Paris-Ouest Nanterre-La Défense, France
Yuanwei Xu, University of Warwick, UK
Larisa Yaroslavtseva, University of Passau, Germany
Takehito Yoshiki, The University of Tokyo, Japan
Xuan Zhou, Illinois Institute of Technology, USA
Houying Zhu, The University of New South Wales, Australia
Part I
Invited Papers
Multilevel Monte Carlo Implementation
for SDEs Driven by Truncated Stable
Processes

Steffen Dereich and Sangmeng Li

Abstract In this article we present an implementation of a multilevel Monte Carlo


scheme for Lévy-driven SDEs introduced and analysed in (Dereich and Li, Multi-
level Monte Carlo for Lévy-driven SDEs: central limit theorems for adaptive Euler
schemes, Ann. Appl. Probab. 26, No. 1, 136–185, 2016 [12]). The scheme is based
on direct simulation of Lévy increments. We give an efficient implementation of the
algorithm. In particular, we explain direct simulation techniques for Lévy increments.
Further, we optimise over the involved parameters and, in particular, the refinement
multiplier. This article complements the theoretical considerations of the above ref-
erence. We stress that we focus on the case where the frequency of small jumps is
particularly high, meaning that the Blumenthal–Getoor index is larger than one.

Keywords Multilevel Monte Carlo · Lévy-driven stochastic differential equation ·


Truncated stable distributions · Computation of expectations

1 Introduction

The numerical computation of expectations E[G(X )] for solutions X = (X t )t∈[0,T ]


of stochastic differential equations (SDE) is a classical problem in stochastic analy-
sis and numerous numerical schemes were developed and analysed within the last
twenty to thirty years, see for instance the textbooks by Kloeden and Platen [19]
and Glasserman [15]. Recently, a new very efficient class of Monte Carlo algorithms
was introduced by Giles [14], see also Heinrich [17] for an earlier variant of the
computational concept. Central to these multilevel Monte Carlo algorithms is the
use of whole hierarchies of approximations in numerical simulations.

S. Dereich (B) · S. Li
Institut Für Mathematische Statistik, Westfälische Wilhelms-Universität Münster,
Orléans-Ring 10, 48149 Münster, Germany
e-mail: steffen.dereich@wwu.de
S. Li
e-mail: li.sangmeng@wwu.de

© Springer International Publishing Switzerland 2016 3


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_1
4 S. Dereich and S. Li

In this article, we focus on stochastic differential equations that are driven by


Lévy processes. That means the driving process is a sum of a Brownian motion and
a discontinuous process typically featuring infinitely many jumps in compact inter-
vals. Numerical methods for Lévy-driven SDEs have been introduced and analysed
by various authors, see e.g. [18, 27]. A common approach in the simulation of Lévy
processes is to simulate all discontinuities of the Lévy process that are larger than a
threshold and to ignore the remainder or to approximate the remainder by a Brownian
motion (Gaussian approximation), see [2]. Corresponding multilevel Monte Carlo
schemes are analysed in [10, 11]. In general the efficiency of such schemes depends
on the frequency of small jumps that is measured in terms of the Blumenthal-Getoor
index (BG index), a number in [0, 2] with a higher number referring to a higher fre-
quency. If the BG index is less than one, then the quadrature error of simple schemes
based on shot noise representations is of the same order as the one obtained for con-
tinuous diffusions. However, when the BG index is larger than one, schemes that are
based on the simulation of individual discontinuities slow down significantly and the
simulation of the Lévy process is the main bottleneck in the numerics. Introducing a
Gaussian approximation improves the order of convergence, but still such schemes
show worse orders of convergence as obtained for diffusions. A remedy to obtain the
same order of convergence as for diffusions is to directly sample from the distribution
of Lévy increments. In this article, we consider an adaptive scheme introduced in
[12] that applies direct sampling techniques. Our focus lies on the implementation
of such algorithms with a particular emphasize on SDEs driven by truncated sta-
ble processes. We conduct numerical tests concerning the accuracy of the sampling
algorithm and of the multilevel scheme.
In the following, (Ω, F , P) denotes a probability space that is sufficiently rich
to ensure existence of all random variables used in the exposition. We let Y =
(Yt )t∈[0,T ] be a square integrable Lévy-process and note that there exist
 b ∈ R (drift),
σ 2 ∈ [0, ∞) (diffusion coefficient) and a measure ν on R\{0} with x 2 ν( dx) < ∞
(Lévy measure) such that
  
1 2
E[e ] = exp t ibz − σ z + (ei zx − 1 − i zx) ν( dx)
i zYt
2

for t ∈ [0, T ] and z ∈ R. We call the unique triplet (b, σ 2 , ν) Lévy triplet, although
this notion slightly deviates from its original use. We refer the reader to the textbooks
by Applebaum [1], Bertoin [6] and Sato [28] for a concise treatment of Lévy processes.
The process X = (X t )t∈[0,T ] denotes the solution to the stochastic integral equation
 t
X t = x0 + a(X s− ) dYs , t ∈ [0, T ], (1)
0

where a : R → R is a continuously differentiable Lipschitz function and x0 ∈ R. Both


processes Y and X attain values in the space of càdlàg functions, i.e. the space of right
continuous functions with left limits, on [0, T ] which we denote by D(R) and endow
with the Skorokhod topology. We will analyse multilevel algorithms for the computa-
tion of expectations E[G(X )], where G : D(R) → R is a measurable functional such
Multilevel Monte Carlo Implementation for SDEs Driven … 5

that G(x) depends on the marginals, integrals and/or supremum of the path x ∈ D(R).
Before we state the results we introduce the underlying numerical schemes.

1.1 Jump-Adapted Euler Scheme

In the context of Lévy-driven stochastic differential equations there are various Euler-
type schemes analysed in the literature. We consider jump-adapted Euler schemes.
For finite Lévy measures these were introduced by Platen [25] and analysed by
various authors, see, e.g., [7, 22]. For infinite Lévy measures an error analysis is
conducted in [9, 11] for two multilevel Monte Carlo schemes. Further, weak approx-
imation is analysed in [20, 24].
For the definition of the scheme we use the simple Poisson point process Π on
the Borel sets of (0, T ] × (R\{0}) that is associated to Y , that is

Π= δ(s,ΔYs ) ,
s∈(0,T ]:ΔYs =0

where δ denotes the Dirac delta function and Δxt = xt − xt− for x ∈ D(R) and
t ∈ (0, T ]. It has intensity (0,T ] ⊗ ν, where (0,T ] denotes Lebesgue measure on
(0, T ]. Further, let Π̄ be the compensated variant of Π that is the random signed
measure on (0, T ] × (R\{0}) given by

Π̄ = Π − (0,T ] ⊗ ν.

The process (Yt )t∈[0,T ] admits the representation



Yt = bt + σ Wt + lim x dΠ̄ (s, x), (2)
δ↓0 (0,t]×B(0,δ)c

where (Wt )t∈[0,T ] is an appropriate (of Π independent) standard Brownian motion


and the limit is to be understood uniformly in L2 .
We introduce the numerical scheme from [12] that is based on direct simulation
of Lévy increments. We use a family of approximations indexed by three strictly
positive parameters h, ε and ε satisfying

T ∈ εN and ε ∈ εN.

We represent (Yt ) as a sum of two independent processes (Yth ) and (Ȳth ). The former
one is constituted by the drift, the diffusive part and the jumps bigger than h, that is

Yth = bt + σ Wt + x dΠ̄ (s, x), (3)
(0,t]×B(0,h)c
6 S. Dereich and S. Li

and the latter one by the (compensated) small jumps only, that is

Ȳth = lim x dΠ̄ (s, x). (4)
δ↓0 (0,t]×(B(0,h)\B(0,δ))

We apply an Euler scheme with two sets of update times for the coefficient. We
enumerate the times

εZ ∩ [0, T ] ∪ {t ∈ (0, T ] : |ΔYt | ≥ h} = {T0 , T1 , . . . },

 
in increasing order and consider the Euler approximation X̄ h,ε,ε = ( X̄ th,ε,ε )t∈[0,T ]

given as the unique process with X̄ 0h,ε,ε = x0 that is piecewise constant on [Tn−1 , Tn )
and satisfies
   
X̄ Th,ε,ε
n
= X̄ Th,ε,ε
n−1
+ a( X̄ Th,ε,ε
n−1
) (YThn − YThn−1 ) + 1ε Z (Tn ) a( X̄ Th,ε,ε
n −ε
 ) (Ȳ T − Ȳ T −ε  ),
h
n
h
n
(5)
for n = 1, 2, . . . . Note that the coefficient in front of (Yth ) is updated at all times in
{T0 , T1 , . . . } and the coefficient in front of (Ȳth ) at all times in {0, ε , 2ε , . . . , T } ⊂
{T0 , T1 , . . . }. Hence two kinds of updates are used and we will consider schemes
where in the limit the second kind is in number negligible to the first kind. The para-
meter h serves as a threshold for jumps being considered large that entail immediate
updates on the fine scale. The parameters ε and ε control the regular updates on the
fine and coarse scale.

We call X̄ h,ε,ε piecewise constant approximation with parameter (h, ε, ε ). We
 
will also work with the continuous approximation X h,ε,ε = (X th,ε,ε )t∈[0,T ] defined
for n = 1, 2, . . . and t ∈ [Tn−1 , Tn ) by
  
X th,ε,ε = X̄ Th,ε,ε
n−1
+ a( X̄ Th,ε,ε
n−1
)(Yth − YThn−1 ).

Note that for this approximation the evolution Y h takes effect continuously.

1.2 Multilevel Monte-Carlo

In general a multilevel scheme makes use of a whole hierarchy of approximate


solutions and we choose decreasing sequences (h k )k∈N , (εk )k∈N and (εk )k∈N and

denote for each k ∈ N by X k := X h k ,εk ,εk the corresponding Euler approximation as
introduced above, the so-called kth level.
Once this hierarchy of approximations has been fixed, a multilevel scheme S is
parametrised by a N-valued vector (n 1 , . . . , n L ) of arbitrary finite length L: for a
measurable function G : D(R) → R we approximate E[G(X )] by

E[G(X 1 )] + E[G(X 2 ) − G(X 1 )] + . . . + E[G(X L ) − G(X L−1 )]


Multilevel Monte Carlo Implementation for SDEs Driven … 7

and denote by S(G) the random output that is obtained when estimating the individual
expectations E[G(X 1 )], E[G(X 2 ) − G(X 1 )], . . . , E[G(X L ) − G(X L−1 )] indepen-
dently by classical Monte-Carlo with n 1 , . . . , n L iterations and summing up the
individual estimates. More explicitly, a multilevel scheme S associates to each mea-
surable G a random variable

1   1 
n1 nk
L


S(G) = G(X 1,i ) + G(X k,i, f ) − G(X k−1,i,c ) , (6)
n 1 i=1 n
k=2 k i=1

where the pairs of random variables (X k,i, f , X k−1,i,c ), resp. the random variables X 1,i ,
appearing in the sums are all independent with identical distribution as (X k , X k−1 ),
resp. X 1 . Note that the entries of the pairs are not independent of each other and the
superscript f and c refer to the fine and coarse simulation, respectively!

1.3 Error Analysis

In this section, we provide error estimates for multilevel Monte Carlo algorithms
based on the adaptive Euler scheme introduced before. We consider the quadrature
problem for functionals G : D(R) → R of the form

G(x) = g(Ax)

with g : Rd → R and linear functional A : D(R) → Rd both satisfying regularity


assumptions to be specified below. Further we will consider the case where d = 1
and Ax = supt∈[0,T ] xt .
The hierarchical scheme of approximations: The hierarchical scheme of approx-
imate solutions is described by a sequence of parameters ((h k , εk , εk ) : k ∈ N) each
triple describing an approximation as before. We assume that all three parameters
tend to zero and satisfy
(ML1) εk = M −k T , where M ∈ N\{1} is fixed,
(ML2) limk→∞ ν(B(0, h k )c ) εk = 0,
(ML3) εk B(0,h k ) x 2 ν( dx) log2 (1 + 1/εk ) = o(εk ),
(ML4) h 2k log2 (1 + 1/εk ) = o(εk ).
We note that (ML3) and (ML4) are conditions that entail that our approximations
have the same quality as the ones that one obtains when doing adapted Euler with
update times {T0 , T1 , . . . }. Condition (ML2) implies that the number of updates
caused by large jumps is negligible in comparison to the regular updates at times in
εN0 ∩ [0, T ]. This will be in line with our examples and entails that the error process
is of a particularly simple form.
Let (X k : k ∈ N) be a family of path approximation for X depending on ((h k , εk ,

εk ) : k ∈ N) and assume that α is a parameter greater or equal to 1/2 such that
8 S. Dereich and S. Li

lim εn−α E G(X n ) − G(X ) = 0. (7)
n→∞

The maximal level and iteration numbers: We specify the family of multilevel
schemes. For each δ ∈ (0, 1) we denote by Sδ the multilevel scheme which has
maximal level
 log δ −1 
L(δ) =
α log M

and iteration numbers  


n k (δ) = δ −2 L(δ) εk−1

for k = 1, 2, . . . , L(δ).
The error process: The error estimate will make use of an additional process
which can be interpreted as idealised description of the difference between two
consecutive levels, the so called error process. We equip the points (s, ΔYs ) of the
Poisson point process Π with two independent marks σs2 and ξs , the former one being
uniformly distributed on {0, M1 , . . . , M−1
M
} and the latter one being standard normal.
The error process U = (Ut )t∈[0,T ] is defined as the solution of the integral equation
  
t
 1 1
t
Ut = a (X s− )Us− dYs + σ 2
1− (aa  )(X s− ) dBs
0 2 M 0
 (8)
+ σs ξs (aa  )(X s− ) ΔYs ,
s∈(0,t]:ΔYs =0

where B = (Bt )t∈[0,T ] is an additional independent standard Brownian motion.


Note that the above infinite sum has to be understood as an appropriate martingale
limit. More explicitly, denoting by Z = (Z t )t∈[0,T ] the Lévy process

1 1

Zt = σ 2
1− Bt + lim σs ξs ΔYs
2 M δ↓0
s∈(0,t]:|ΔY |≥δ s

we can rewrite (8) as


 t  t
Ut = a  (X s− )Us− dYs + (aa  )(X s− ) dZ s .
0 0

Central limit theorem: We cite an error estimate from [12]. We assume as before
that the driving process Y is a square integrable Lévy process and that the coefficient a
is a continuously differentiable Lipschitz function. Additionally we assume that σ 2
is strictly positive.
Suppose that G : D(R) → R is of the form

G(x) = g(Ax)
Multilevel Monte Carlo Implementation for SDEs Driven … 9

with A : D(R) → Rd and g : Rd → R satisfying the following assumptions:


1. A is a Lipschitz continuous functional A : D(R) → Rd (w.r.t. supremum norm)
that is continuous w.r.t. the Skorokhod topology in PU -almost every path (Case 1)
or
2. A is given by Ax = supt∈[0,T ] xt and in particular d = 1 (Case 2),
and g is Lipschitz continuous and differentiable in P AX -almost every point.
Further we assume that we are given a hierarchical scheme of approximations
as described above. In particular, we assume that assumptions (ML1)-(ML4) and
Eq. (7) are satisfied for a fixed parameter α ∈ [ 21 , ∞).

Theorem 1 Assume that Y is as introduced in this subsection and additionally


assume that the coefficient a : R → R does not attain zero in Case 2.
The multilevel schemes ( Sδ : δ ∈ (0, 1)) as introduced above satisfy

δ −1 (
Sδ (G) − E[G(X )]) ⇒ N (0, ρ 2 ) as δ ↓ 0,

where N (0, ρ 2 ) is the normal distribution with mean zero and


1. variance ρ 2 = Var ∇ f (AX ) ·


AU in Case 1 and
2. variance ρ 2 = Var f  (AX )U S with S denoting the random time when X reaches
its supremum in Case 2.
Further,
lim δ −2 E (
Sδ (G) − E[G(X )])2 = ρ 2 .
δ↓0

Remark 1 1. The theorem is a combination of Theorem 1.6, 1.8, 1.9, 1.10 of [12].
2. One of the assumptions requires a control on the bias, see (7). We note that the
assumptions imposed on G in the theorem imply validity of (7) for α = 21 . In
general, research on weak approximation of SDEs suggests that (7) is typically
valid for α < 1, see [3] for a corresponding result concerning diffusions.
3. If  T
 
Ax = x T , xs ds ,
0

then the statement of the previous theorem remains true for the multilevel scheme
based on piecewise constant approximations with the same terms appearing in
the limit.
4. For k = 1, 2, . . . the expected number of Euler steps to generate X k (at the
discrete time skeleton of update times) is T (εk−1 + ν(B(0, h k )c )). Taking as cost
for a joint simulation of G(X k ) − G(X k−1 ) the expected number of Euler steps
we assign one simulation of Sδ (G) the cost
10 S. Dereich and S. Li

L(δ)

T ε1−1 + T ν(B(0, h 1 )c + T n k (δ)(εk−1 + εk−1
−1
+ ν(B(0, h k )c + ν(B(0, h k−1 )c )
k=2
T (M + 1) −2
∼ 2 δ (log 1/δ)2 ,
α (log M)2

as δ ↓ 0. In general we write for two functions g, h, h ∼ g to indicate that


lim hg = 1.
5. The supremum of the continuous approximation is simulatable. Between the
update times of the coefficient, the continuous approximation is a Brownian
motion plus drift and joint simulation of increments and suprema are feasible,
see [4].
6. In the original work the results are proved for one dimensional SDEs only to
keep the notation and presentation simple. However the proofs do not make use
of that fact and a generalisation to the multidimensional setting does not require
new techniques.
7. Error estimates as in the previous theorem that give only an upper bound
(of the same order) are known to hold under weaker assumptions. In particu-
lar, the differentiability of f and a is not needed for such a result, see [21].
8. In the diffusion setting a similar result as Theorem 1 can be found in Ben Alaya
and Kebaier [5] for diffusions and a smaller class of functionals. The main effort
in the analysis is the representation of the variance in terms of the error process.
In general, the validity of a central limit theorem without control on the vari-
ance can be often easily deduced with the Lindeberg condition. This approach
has appeared at other places in the literature and we mention [24] as an early
reference.

In Theorem 1 the effect of the multiplier M on the variance ρ 2 is not completely


obvious. We cite Theorem 1.11 of [12].

Theorem 2 We assume the setting of Theorem 1. Further assume in Case 1 that A is


of integral type meaning that there exist finite signed measures μ1 , . . . , μd on [0, T ]
such that A = (A1 , . . . , Ad ) with
 T
Ajx = xs dμ j (s), for x ∈ D(R) and j = 1, . . . , d,
0

and generally suppose that a  (X s− )ΔYs = −1 for all s ∈ [0, T ], almost surely. Then
there exists a constant κ depending on G and the underlying SDE, but not on M such
that the variance ρ 2 satisfies

1 1

ρ=κ 1− .
2 M
Multilevel Monte Carlo Implementation for SDEs Driven … 11

2 Direct Simulation of Lévy Increments

In this section, we explain how we achieve sampling of Lévy increments. In the


following, we denote by F the cumulative distribution function of the real infinitely
divisible distribution with characteristic function
 
φ(z) = exp (ei zx − 1 − i zx) ν( dx) , for z ∈ R, (9)
R\{0}


where ν is a measure on R\{0} with x 2 ν( dx) < ∞. In practise, the measure ν is
given and we need an effective algorithm for sampling from F.

2.1 Fourier Inversion

In a first precomputation we approximately invert the characteristic function φ with


the Hilbert transform method analysed in [8].
We consider a family of approximate cumulative distribution functions (cdf) that
is parametrised by two parameters δ > 0 and K ∈ N. We set

i  −i x(k− 1 )δ φ((k − 21 )δ)


K
1
Fδ,K (x) = + e 2 , for x ∈ R. (10)
2 2 k=−K (k − 21 )π

This approximation converges fast to the cdf, provided that φ satisfies certain assump-
tions. We cite an error estimate from [8].
Theorem 3 Suppose there exist positive reals d− , d+ such that
• φ is analytic in the space {z ∈ C; im(z) ∈ (−d− , d+ )},
d
• −d+− |φ(u + i y)| dy → 0, as u → ±∞,

• φ± := limε↓0 R |φ(u ± i(d± − ε))| du < +∞.
If there exist constants κ, c, β > 0 such that,

|φ(z)| ≤ κ exp(−c|z|β ), for z ∈ R,

then

e−2πd− /δ−xd− e−2πd+ /δ+xd+


|G(x) − Fδ,K (x)| ≤ −2πd /δ
φ− + φ+
2π d− (1 − e − ) 2π d+ (1 − e−2πd+ /δ )
κ 1 4  β
+ + β
e−c(K δ)
2π K βc(K δ)

for x ∈ R.
12 S. Dereich and S. Li

2.2 Sampling Algorithm

We aim at using a second order spline approximation to do sampling via an inverse


cdf method. We describe separately the precomputation and sampling algorithm.
Precomputation: In a precomputation, we compute second order approximations
for Fδ,K on N consecutive intervals of equal length. More explicitly, we fix an interval
[xmin , xmax ], store for each k = 0, . . . , N the values

xmax − xmin
xk = xmin + k and yk = re(Fδ,K (xk ))
N
and, for each k = 1, . . . , N , the unique parabola pk that coincides with re Fδ,K in
the points xk−1 , (xk−1 + xk )/2, xk . We suppose that F is strictly increasing and note
that by choosing a sufficiently accurate approximation Fδ,K we can guarantee that
each parabola pk is strictly increasing on [xk−1 , xk ] and thus has a unique inverse
pk−1 when restricted to the domain [xk−1 , xk ].
We assume that N is of the form 2d+1 − 1 with d ∈ N and arrange the N entries
y0 , . . . , y N in a binary search tree of depth d.
Sampling: Sampling is achieved by carrying out the following steps:
• generation of an on [y0 , y N ] uniformly distributed random number u,
• identification of an index k ∈ {1, . . . , N } with u ∈ [yk−1 , yk ] based on the binary
search tree,
• output of pk−1 (u).

3 Truncated Stable Processes

In this section we focus on truncated stable processes. Let c+ , c− , h > 0 and β ∈


(0, 2). A Lévy process Y = (Yt )t≥0 is called truncated stable process with parameters
c+ ,c−
β, h, c+ , c− , if it has Lévy triplet (0, 0, νh,β ) with

c ,c c+ 1(0,h] (x) + c− 1[−h,0) (x)


νh,β
+ −
( dx) = dx. (11)
|x|1+β

This class of processes has a scaling property similar to stable processes which is
particularly useful in simulations. It will allow us to do the precomputation for one
infinitely divisible distribution only and use the scaling property to do simulations
of different levels. For applications of truncated stable processes, we refer the reader
to [23, 26].
Multilevel Monte Carlo Implementation for SDEs Driven … 13

3.1 Preliminaries

Proposition 1 Let κ > 0 and (Yt ) be a truncated stable process with parameters
β, h, c+ , c− . The process (κYt/κ β ) is a truncated stable process with parameters
β, κh, c+ , c− .
Proof The process (κYt/κ β ) is a Lévy process with
 t   c+ c−  
i zxκYt/κ β
E[e ] = exp (eiκzx − iκzx − 1) 1(0,h] (x) 1+β + 1[−h,0) (x) 1+β dx
κβ |x| |x|
   c+ c−  
= exp t (ei zy − i zy − 1) 1(0,κh] (y) 1+β + 1[−κh,0) (x) 1+β dy
|y| |y|

for t ≥ 0 and z ∈ R. 

In order to do a Fourier inversion via (10) we need the characteristic function of


a truncated stable distribution.

Proposition 2 Let Y be a truncated stable process with parameters β, h, c+ , c− .


Then for t ≥ 0 and z ∈ R
 t  i zh 
−i zh
E[ei zYt ] = exp c+ e + c− e − (c+ + c− ) − (c+ − c − )i zh
−βh β
t  
− β−1
i z c+ ei zh − c− e−i zh − (c+ − c− )
β(β − 1)h
th 2−β 
− z 2 c+ 1 F1 (2 − β, 3 − β, i zh)
β(β − 1)(2 − β)

+ c− 1 F1 (2 − β, 3 − β, −i zh) ,

where 1 F1 denotes the hypergeometric function. In the symmetric case where c :=


c+ = c− , we have
 ct ct
E[ei zYt ] = exp (ei zh + e−i zh − 2) − i z(ei zh − e−i zh )
−βh β β(β − 1)h β−1
ct 
− z 2
( 1 F1 (2 − β, 3 − β, i zh) + 1 F1 (2 − β, 3 − β, −i zh))
β(β − 1)(2 − β)h β−2

Proof It suffices to prove the statement for c+ = 1 and c− = 0. All other cases can
be deduced from this case via scaling, reflection and superposition. Recall that
  h 1 
E[ei zYt ] = exp t (ei zx − 1 − i zx) 1+β dx .
0 x
14 S. Dereich and S. Li

Applying partial integration we get


  1 
h 1 1 h i z h i zx 1
(ei zx − 1 − i zx) dx = − (ei zx − 1 − i zx) β + (e − 1) β dx
0 x 1+β β x 0+ β 0 x
(12)
ei zx −1−i zx
and de l’Hôpital’s rule implies that lim x↓0 xβ
= 0. Doing an additional partial
integration we get
   h
h
1 1 1 h iz 1
(e i zx
− 1) β dx = − (e − 1) β−1
i zx
+ ei zx β−1 dx
0 x β −1 x 0+ β −1 0 x
 h
1 1 i z cos(zx)
=− (ei zh − 1) β−1 + dx (13)
β −1 h β − 1 0 x β−1
 h
z sin(zx)
− dx.
β − 1 0 x β−1

Using the integral tables of [16, Sect. 3.761] we conclude that


 h  1
cos(zx) cos(zhx)
β−1
dx = h 2−β
dx
0 x 0 x β−1
h 2−β
= (1 F1 (2 − β, 3 − β, i zh) + 1 F1 (2 − β, 3 − β, −i zh))
2(2 − β)

and
 h  1
sin(zx) sin(zhx)
β−1
dx = h 2−β
dx
0 x 0 x β−1
−i h 2−β
= (1 F1 (2 − β, 3 − β, i zh) − 1 F1 (2 − β, 3 − β, −i zh)).
2(2 − β)

Inserting this into (13) and then inserting the result into (12) finishes the proof. 
Next we show that Theorem 3 is applicable for increments of truncated stable
processes. This implies that the distribution function of the increment can be effi-
ciently approximated with the techniques of the previous section.
Proposition 3 Let h, c+ , c− ≥ 0 and β ∈ (1, 2) and let F be the distribution func-
tion with characteristic function
 
φ(z) = exp (ei zx − i zx − 1) ν( dx) .
R\{0}

Then the assumptions of Theorem 3 are satisfied for arbitrary d = d+ = d− > 0 and
one has
 π(β − 1)
β c+ + c− −β 
|φ(z)| ≤ exp −(c+ + c− )Γ (−β) sin |z| + 2 h
2 β
Multilevel Monte Carlo Implementation for SDEs Driven … 15

for z ∈ R. Here Γ (−β) denotes the Γ -function evaluated at −β. Furthermore,

κ2 β
 2β 
φ± ≤ κ1 e− 2β |d| 2 1 + ,
κ2

where
1 c+ + c− 2−β c+ + c− −β −hd 
κ1 := exp ehd d 2 h +2 h e and κ2 := κe−dh > 0.
2 2−β β

Proof Fix d > 0 and take z = u + i y with u ∈ R and y ∈ [−d, d]. Using that

ei(u+i y)x − 1 − i(u + i y)x = eiux−yx − 1 − iux + yx


= e−yx (eiux − 1 − iux) + e−yx − 1 + yx + iux(e−yx − 1)

we write φ(z) as product


   
−yx
φ(z) = exp e (e − 1 − iux)ν( dx) exp
iux
(e−yx − 1 + yx)ν( dx)
 
exp iux(e−yx − 1)ν( dx) =: φ1 (z) φ2 (z) φ3 (z).

We will analyse φ1 , φ2 and φ3 separately.


Since y ∈ R, the integral ux(e−yx − 1) ν( dx) is real and hence |φ3 (z)| = 1. To
estimate φ2 we note that as a consequence of the Taylor approximation one has

1 |ξ | 2
|eξ − 1 − ξ | ≤ e ξ , for ξ ∈ R.
2
Together with |y| ≤ d and |x| ≤ h we get that
1   1 c+ + c− 2−β 
|φ2 (z)| ≤ exp ehd d 2 x 2 ν( dx) = exp ehd d 2 h .
2 2 2−β

Finally we estimate φ1 (z). Note that Re(eiux − 1 − iux) ≤ 0 and e−yx ≥ e−dh if
|x| ≤ h. Hence,
 
Re e−yx (eiux − 1 − iux) ν( dx) ≤ e−dh Re(eiux − 1 − iux) ν( dx)

so that   
−dh
|φ1 (z)| ≤ exp e Re(eiux − 1 − iux) ν( dx) .
16 S. Dereich and S. Li

c+ +c−
In terms of the measure ν ∗ with ν ∗ ( dx) = 2
1[−h,h] (x) |x|−(1+β) dx we have
 
Re(eiux − 1 − iux) ν( dx) = (eiux − 1 − iux) ν ∗ ( dx)
 
c+ + c− −(1+β) c+ + c− −(1+β)
= (eiux − 1 − iux) |x| dx − (eiux − 1 − iux) |x| dx
2 2
 [−h,h]
c
 
=−κ|u|β (symm. stable)
c+ + c− −β
≤ −κ |u|β + 2 h ,
β

where κ := (c+ + c− )Γ (−β) sin( π(β−1)


2
) > 0. Combining the estimates yields for
z = u + i y with u ∈ R, y ∈ [−d, d] the estimate
β
|φ(z)| ≤ κ1 e−κ2 |u| (14)

where 1 c+ + c− 2−β c+ + c− −β −hd 


κ1 := exp ehd d 2 h +2 h e
2 2−β β

and
κ2 := κe−dh > 0.

Equation (14) implies that all assumptions of Theorem 3 are satisfied. If additionally
the imaginary part y of z is zero, then φ2 (z) = φ3 (z) = 1 and using the estimate for
φ1 gives that  c+ + c− −β 
|φ(z)| ≤ exp −κ |z|β + 2 h .
β

It remains to estimate φ± . One has


  β

|u|+|d−ε| β
|φ(u + i(u − ε))| du ≤ κ1 e−κ2 (u +|d−ε| ) 2 du ≤ κ1 e−κ2 ( 2 ) du
2 2

R
R R
κ2 β β
≤ κ1 e− 2β (|u| +|d−ε| ) du
R
κ
 κ2 |u|β

κ2 |u|

− β2 |d−ε|β
≤ κ1 e 2 e− 2β du + e− 2β du
B(0,1) B(0,1)c
κ2 β
 2β+1 
≤ κ1 e− 2β |d−ε| 2 + ,
κ2

and letting ε ↓ 0 we get that indeed φ+ satisfies the inequality of the statement. A
similar computation shows that also φ− satisfies the same inequality. 
Multilevel Monte Carlo Implementation for SDEs Driven … 17

3.2 Multilevel Monte Carlo for Truncated Stable Processes

In this section we introduce a particular simple multilevel scheme for truncated


stable processes. We suppose that (Yt ) is a L 2 -integrable Lévy process with triplet
(b, σ 2 , ν), where ν is of the form

c ,c c+ 1(0,H ] (x) + c− 1[−H,0) (x)


ν( dx) = ν H,β
+ −
( dx) = dx (15)
|x|1+β
with c+ , c− ≥ 0, β ∈ (1, 2) and H ≥ 1 (in order to simplify notation we assume that
H ≥ 1, although we could equally well allow any H > 0).
We choose the hierarchical scheme of approximations as follows. We fix M ∈
N\{1} and a parameter M  ∈ (M β/2 , M) and let
1. εk = M −k T
2. εk ∈ εk N ∈ (0, 1] with εk ∈ εk+1

N and εk ≈ M  −k as k → ∞
 1/β
3. h k = εk .
In general, we write for two functions g and h, g ≈ h, if 0 < lim inf g
h
≤ lim sup hg <
∞. For instance we may define (εk ) iteratively via ε1 = T and
 −k
εk+1 = min{εk /m : m ∈ N} ∩ εk N ∩ [M  T, ∞) (16)

for k ∈ N.

Proposition 4 If the parameters ((h k , εk , εk ) : k ∈ N) are chosen as above, then


properties (ML1)–(ML4) are satisfied.

Proof One has for k → ∞

c+ + c−  M  k
−β
ν(B(0, h k )c ) εk ≤ εk h k ≈ →0
β M

since M  < M and

h 2k log2 (1 + 1/εk ) M  −2k/β log2 k  M k


≈ = log2 k → 0,
εk M −k M  2/β

since M  2/β > M. Hence, (ML2) and (ML4) are satisfied. Property (ML3) follows
analogously
 
εk 1  c+ + c− 2−β εk  1
x 2 ν( dx) log2 1 +  = hk log2 1 + 
εk B(0,h k ) εk 2−β εk εk
M  −2k/β
≈ log2 k
M −k
and the proof is finished. 
18 S. Dereich and S. Li

As we show in the next proposition the fact that h k /εk 1/β is constant in k allows
us to do the sampling of the process constituted by the small increments with the
help of only one additional distribution for which we have to do a precomputation
in advance.
Proposition 5 Suppose that Υ is a real random variable with
 c+ 1(0,1] (x) + c− 1[−1,0) (x) 
E[ei zΥ ] = exp (ei zx − 1 − i zx) dx .
(0,1] x 1+β

For every k ∈ N the increments of (Ȳth k )t≥0 over intervals of length εk are independent
and distributed as h k Υ .
Proof We note that the increments of (Ȳt1 ) over intervals of length one are equally
distributed as Υ . Furthermore, using that h k = εk 1/β we get with Proposition 1 that
hk
the processes (h k Ȳt/ε
1
 )t≥0 and (Ȳt )t≥0 are equally distributed. Hence, the increments
k
of Ȳ h k over intervals of length εk are distributed as h k Υ . 
Next we describe how we implement one joint simulation of two consecutive
levels ( X̄ tk ) and ( X̄ tk+1 ). We will assume that we can sample from the distribution
of Υ with Υ as in the previous proposition. In practise we use the approximate
sampling algorithm introduced before.
One joint simulation of two levels: First we discuss how we simulate the fine
level ( X̄ tk+1 ). Once we know the values of
h k+1
1. (Yt ) on the random set of times

(εk+1 Z ∩ [0, T ]) ∪ {s ∈ (0, T ] : |ΔYs | ≥ h k+1 } = {T0 , T1 , . . . }


h k+1 
2. (Ȳt ) on the set of times εk+1 Z ∩ [0, T ]
we can compute ( X̄ tk+1 ) via the Euler update rule (5). The increments of the process
in (2) are independent and distributed as h k+1 Υ so that the simulation is straight-
forward. To simulate the process in (1) we first simulate the random set of disconti-
nuities

{(s, ΔYs ) : s ∈ (0, T ], |ΔYs | ≥ h k+1 } =: {(S1 , D1 ), (S2 , D2 ), . . . }.

Here the points are ordered in such a way that S1 , S2 , . . . is increasing. These points
constitute a Poisson point process with intensity |(0,T ] ⊗ ν| B(0,h k+1 )c . When consid-
ering an infinite time horizon the random variables ((Sk − Sk−1 , Dk ) : k ∈ N) (with
S0 = 0) are independent and identically distributed with both components being
independent of each other, the first being exponentially distributed with parameter
ν({x : |x| ≥ h k+1 }) and the second having distribution
1{|x|≥h k+1 }
ν( dx).
ν({|x| ≥ h k+1 })
Multilevel Monte Carlo Implementation for SDEs Driven … 19

Hence, the sampling of the large discontinuities is achieved by drawing iid


samples of the previous distribution, adding the time increments and stopping once
one exceeds T . Once the discontinuities have been sampled we build the set of update
times via
{T0 , T1 , . . . } = (εk+1 Z ∩ [0, T ]) ∪ {S1 , S2 , . . . }

and simulate standard Brownian motion (Bt ) on this set of times (using that the
increments are independent and conditionally N (0, Tk − Tk−1 )-distributed). Then
   
h
YTkk+1 = σ BTk + Dk + Tk b − x ν( dx)
i:Si ≤Tk |x|≥h k+1

for the times T0 , T1 , . . . .


To generate the coarse level (X tk ) we do not need further random samples. It only
depends on the values of
1. (Yth k ) on the random set of times

(εk Z ∩ [0, T ]) ∪ {s ∈ (0, T ] : |ΔYs | ≥ h k } = {T0 , T1 , . . . }

2. (Ȳth k ) on the set of times εk Z ∩ [0, T ].


Note that since εk ∈ εk+1

N we have εk N ⊂ εk+1

N so that the updates times for the
coarse level are also update times for the fine level. We use that
 
h k+1
Ȳth k = Ȳt + Di − t x ν( dx)
i:Si ≤t |x|∈(h k+1 ,h k ]

to generate (Ȳth k ) on εk Z ∩ [0, T ]. To generate the former process we note that since
εk ∈ εk+1 N the set {T0 , T1 , . . . } is a subset of {T0 , T1 , . . . } so that we can use that
   
YThk = σB + Tk Dk + Tk b − x ν( dx) .
k
|x|≥h k
i:Si ≤Tk ,|Di |≥h k

We stress that all integrals can be made explicit due to the particular choice of ν.

3.3 Numerical Tests

In this section we do numerical tests for SDEs driven by truncated Lévy processes.
• In Sect. 3.3.1 we analyse the error of the approximate direct simulation algorithm
used. Here we discuss when the error is of the order of real machine precision.
20 S. Dereich and S. Li

• In Sect. 3.3.2 we optimise over the multiplier M appearing in the multilevel scheme.
The answer depends on a parameter that depends in a subtle way on the underlying
problem and the implementation. We conduct tests for a volatility model.
• In Sect. 3.3.3 we numerically analyse the error and runtime of the multilevel
schemes for the volatility model introduced there.

3.3.1 Error Analysis of the Sampling Algorithm

As error criterion on the space of real probability distributions we use the


L p -Wasserstein metric. For two real distributions ξ and ζ we call a distribution
ρ on the Borel sets of R2 coupling of ξ and ζ , if the first, resp. second, marginal
distribution of ρ is ξ , resp. ζ . For p ≥ 1 and two probability measures we denote
by W p the Wasserstein metric on the space of real distributions defined by
 1/ p 
W p (ξ, ζ ) = inf |x − y| p dρ(x, y) : ρ is a coupling of ξ and ζ

for two distributions ξ and ζ . For further details concerning the Wasserstein met-
ric we refer the reader to [13, 29]. For real distributions optimal couplings can
be given in terms of quantiles which leads to an alternative representation for the
Wasserstein metric that is particularly suited for explicit computations. We denote
by Fξ← : (0, 1) → R the generalised right continuous inverse of the cdf Fξ of ξ , that
is
Fξ← (u) = inf{t ∈ R : Fξ (t) ≥ u},

and we use analogous notation for the measure ξ replaced by ζ . One has
 1 1/ p
W p (ξ, ζ ) = |Fξ← (u) − Fζ← (u)| p du . (17)
0

We do a numerical test for fixed c+ = c− = H = 1 and β ∈ {1.2, 1.5, 1.8}. Our


sampling algorithm makes use of the following parameters
• δ: window width used in approximation (10)
• K : 2K + 1 summands used in approximation (10)
• xmin and xmax : the minimal and maximal point for which we precompute the dis-
tribution function, see Sect. 2.2
• N = 2d+1 − 1: the number of intervals used for the interpolation of the distribution
function, see Sect. 2.2.
To assess the quality of an approximation we numerically compute the Wasser-
stein metric between the sampling scheme with the given parameters and the one
with significantly refined parameters, namely 2δ , 4K , xmin − 2, xmax + 2 and d + 2.
The Wasserstein metric between the two sampling distributions is estimated by doing
a Monte Carlo simulation of the Wasserstein distance (17) for the second moment
Multilevel Monte Carlo Implementation for SDEs Driven … 21

Table 1 Dependence of the W2 β = 1.2


W2 -Wasserstein metric on the √
d = 11 8.1505 × 10−14 ± 5.2284 × 10−14
choice of d computed with √
double precision arithmetic d =9 1.6056 × 10−12 ± 1.2793 × 10−12

d =7 4.7843 × 10−12 ± 2.9185 × 10−13

d =5 2.0309 × 10−8 ± 7.7718 × 10−11
W2 β = 1.5

d = 11 1.2348 × 10−14 ± 4.9172 × 10−15

d =9 4.1492 × 10−14 ± 2.1661 × 10−14

d =7 9.4183 × 10−12 ± 3.8266 × 10−12

d =5 2.2023 × 10−8 ± 9.2660 × 10−11
W2 β = 1.8

d = 11 3.7942 × 10−15 ± 1.5269 × 10−15

d =9 5.2974 × 10−14 ± 1.5046 × 10−14

d =7 1.0557 × 10−11 ± 4.0751 × 10−13

d =5 4.7766 × 10−8 ± 2.2151 × 10−10

with 106 iterations. Preliminary numerical tests showed that for the following parame-
ters the approximate distribution function has an error of about the machine precision
for reals on the supporting points of the distribution function: K = 400, δ = 0.02
and ⎧

⎨11, if β = 1.2,
−xmin = xmax = 13, if β = 1.5,


20, if β = 1.8.

Since these parameters only effect the precomputation we choose them as above and
only vary d (and N ) in the following test that is depicted in Table 1. There the term
following ± is twice the estimated standard deviation. The results show that one
achieves machine precision for about d = 9.

3.3.2 Optimising the Multiplier M

When generating a pair of levels (X k , X k+1 ) we need to carry out an expected number
of T /εk+1 + T ν(B(0, h k+1 )) Euler steps for the simulation of X k+1 and an expected
number of T /εk + T ν(B(0, h k )) Euler steps for the simulation of X k . By assumption
ν(B(0, h k+1 )) = o(εk−1 ) is asymptotically negligible and hence it is natural to assign
one simulation of G(X k+1 ) − G(X k ) the cost

T /εk+1 + T /εk = (M + 1)T /εk .


22 S. Dereich and S. Li

A corresponding minimisation of the parameter M is carried out in [14, Sect. 4.1] for
diffusions. The number of Euler steps is only an approximation to the real runtime
caused by the algorithm. In general, the dominating cost is caused by computations
being of order εk−1 and εk+1
−1
and we make the Ansatz that the computational cost for
one simulation of G(X ) − G(X k ) is
k+1

Ck = (1 + o(1))κcost (M + γ )/εk , (18)

where κcost and γ are positive constants that do not depend on the choice of M. The
case where one restricts attention to the number of Euler steps is the one with γ = 1.
We note that for the numerical schemes as in Theorem 2 one has for F as in the
latter theorem

δ −1 (
Sδ (G) − E[G(X )]) ⇒ N 0, κerr
2
(1 − 1
M
) ,

where √κerr does not depend on the choice of M. Taking δ̄ := δ̄(δ) :=


δ/(κerr 1 − 1/M) we get

δ −1 (
Sδ̄ (G) − E[G(X )]) ⇒ N (0, 1).

Assigning computational cost (18) for one joint simulation of G(X k+1 ) − G(X k ) we
end up with a cost for one simulation of
Sδ̄ (F) of

κcost κerr
2
(M − 1)(M + γ ) −2
(1 + o(1)) δ (log δ −1 )2 .
α 2 M(log M)2

For γ = 1 this equals the result of [14] for diffusions.


To determine the optimal M we need to estimate the parameter γ . For ε > 0 and
M ∈ N\{1} we denote by Rε,M the expected time to jointly simulate two levels, one
with step size εcoarse = ε and the other one with εfine = εcoarse /M, and to evaluate
G at the respective realisations. We note that for M1 , M2 ∈ {2, 3, . . . } our Ansatz
implies that
Rε,M1 M1 + γ
= (1 + o(1))
Rε,M2 M2 + γ

as ε ↓ 0. We estimate the runtimes Rε,M1 and Rε,M2 for two distinct M1 and M2
and for small ε > 0 and we conclude back on the parameter γ by using the latter
equation.
We test our theoretical findings in a four-dimensional problem which formally
does not satisfy some of the assumptions of our theorems. Still we remark that the
results are believed to hold in greater generality and we chose a higher dimensional
example in order to decrease the relative computational cost of the direct simulation
of Levy increments. We let (X t ) be a three-dimensional and (σt ) be a one-dimensional
process solving the SDE
Multilevel Monte Carlo Implementation for SDEs Driven … 23

dX t = 1
10
Σ(X t )σt dWt + 10
1
dt
(19)
dσt = − 10
1
dt + 101
dYt ,

where (σt ) is conceived as random volatility. As starting values we choose X 0 =


(0.8, 0.8, 0.8) and σ0 = 0.2. Further (Yt ) is a Lévy process with Lévy triplet (0, 0, ν),
(Wt ) an independent three dimensional standard Brownian motion and
⎡ ⎤
4x1 0.1x1 0.1x1
Σ((x1 , x2 , x3 )) = ⎣0.1x2 3x2 0.1x2 ⎦ , for (x1 , x2 , x3 ) ∈ R3 .
0.1x3 0.1x3 2x3

We aim at computing the expectation E[G(X T )] for G(x) = max(x1 − 1,


x2 − 1, x3 − 1, 0) (x ∈ R3 ).
We estimate γ in the case where ν is as in (15) with H = 10, c+ = c− = 1 and
β = 1.2. In the Fourier-based simulation of the increments we choose as parameters
xmax = 11, d = 11, K = 400, δ = 0.02. In order to verify that the computational
time used for the direct simulation of Lévy increments is indeed of minor relevance
for our choices of the stepsize ε we also estimate γ in the classical setting, where
(Yt ) is replaced by a Brownian motion.
For various choices of ε and pairs of parameters (M1 , M2 ) we estimate γ twice.
The results are depicted in Table 2 for the genuine SDE and in Table 3 for the sim-
plified diffusion model. One notices that γ lies around 0.3. In various other tests we

Table 2 Estimates for γ in the volatility model with adapted Euler scheme
ε−1 M1 = 2, M2 = 4 M1 = 2, M2 = 8
214 0.2941 0.3215 0.3094 0.3087
215 0.3102 0. 3039 0.3238 0.3386
216 0.3401 0.3286 0.3153 0.3217
217 0.3030 0.3029 0.3002 0.3110
218 0.3049 0.3169 0.3187 0.3220
219 0.3053 0.3162 0.3169 0.3169

Table 3 Estimates for γ in simplified classical diffusion setting


ε−1 M1 = 2, M2 = 4 M1 = 2, M2 = 8
214 0.3574 0.3582 0.3581 0.3588
215 0.3590 0.3576 0.3595 0.3604
216 0.3478 0.3545 0.3591 0.3656
217 0.3568 0.3573 0.3481 0.3610
218 0.3573 0.3562 0.3581 0.3563
219 0.3594 0.3592 0.3599 0.3600
24 S. Dereich and S. Li

Fig. 1 Estimates for bias and variance for β = 1.2

Fig. 2 Error versus runtime in the volatility model for β = 1.2


Multilevel Monte Carlo Implementation for SDEs Driven … 25

Fig. 3 Error versus runtime in the volatility model for β = 1.5

Fig. 4 Error versus runtime in the volatility model for β = 1.8

noticed that γ varies strongly with the implementation and the choice of the stochastic
differential equation. In most tests we observed γ to be between 0.2 and 0.6.

3.3.3 Numerical Tests of Error and Runtime

In this section we numerically test the error of our multilevel schemes in the volatility
model (19). We adopt the same setting as described in the lines following (19). Further
β
we choose M = 4 and M  = M 3 + 3 in the calibration of the scheme.
1
26 S. Dereich and S. Li

Using Monte Carlo we estimate E[G(X k ) − G(X k−1 ))] and Var[G(X k ) −
G(X k−1 )] for k = 3, . . . , 7. The results for β = 1.2 are depicted in Fig. 1. They are
based on 1000 samples. Using interpolation we estimate that E[G(X k ) − G(X k−1 )]
is of order εk0.7812 and we choose α = 0.8 in the implementation of the algorithm.
We depict a log-log-plot of error versus runtime in Fig. 2. For comparison we also
treated the cases β = 1.5 and β = 1.8 similarly. The corresponding plots of error
versus runtime are depicted below, see Figs. 3 and 4.

References

1. Applebaum, D.: Lévy processes and stochastic calculus. Cambridge Studies in Advanced Math-
ematics, vol. 116. Cambridge University Press, Cambridge (2009)
2. Asmussen, S., Rosiński, J.: Approximations of small jumps of Lévy processes with a view
towards simulation. J. Appl. Probab. 38(2), 482–493 (2001)
3. Bally, V., Talay, D.: The law of the Euler scheme for stochastic differential equations. I. Con-
vergence rate of the distribution function. Probab. Theory Relat. Fields 104(1), 43–60 (1996)
4. Becker, M.: Exact simulation of final, minimal and maximal values of Brownian motion and
jump-diffusions with applications to option pricing. Comput. Manag. Sci. 7(1), 1–17 (2010)
5. Ben Alaya, M., Kebaier, A.: Central limit theorem for the multilevel Monte Carlo Euler method.
Ann. Appl. Probab. 25(1), 211–234 (2015)
6. Bertoin, J.: Lévy Processes. Cambridge University Press, Cambridge (1996)
7. Bruti-Liberati, N., Nikitopoulos-Sklibosios, C., Platen, E.: First order strong approximations
of jump diffusions. Monte Carlo Methods Appl. 12(3–4), 191–209 (2006)
8. Chen, Z.S., Feng, L.M., Lin, X.: Simulating Lévy processes from their characteristic functions
and financial applications. ACM Trans. Model. Comput. Simul. 22(3), 14 (2012)
9. Dereich, S.: The coding complexity of diffusion processes under supremum norm distortion.
Stoch. Process. Appl. 118(6), 917–937 (2008)
10. Dereich, S.: Multilevel Monte Carlo algorithms for Lévy-driven SDEs with Gaussian correc-
tion. Ann. Appl. Probab. 21(1), 283–311 (2011)
11. Dereich, S., Heidenreich, F.: A multilevel Monte Carlo algorithm for Lévy-driven stochastic
differential equations. Stoch. Process. Appl. 121(7), 1565–1587 (2011)
12. Dereich, S., Li, S.: Multilevel Monte Carlo for Lévy-driven SDEs: central limit theorems for
adaptive Euler schemes. Ann. Appl. Probab. 26(1), 136–185 (2016)
13. Dobrushin, R.L.: Prescribing a system of random variables by conditional distributions. Theory
Probab. Appl. 15(3), 458–486 (1970)
14. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
15. Glasserman, P.: Monte Carlo methods in financial engineering. Applications of Mathematics
(New York). Stochastic Modelling and Applied Probability, vol. 53. Springer, New York (2004)
16. Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Products. Academic, New York
(1980)
17. Heinrich, S.: Multilevel Monte Carlo methods. Lect. Notes Comput. Sci. 2179, 58–67 (2001)
18. Jacod, J., Kurtz, T.G., Méléard, S., Protter, P.: The approximate Euler method for Lévy driven
stochastic differential equations. Ann. Inst. H. Poincaré Probab. Statist. 41(3), 523–558 (2005).
doi:10.1016/j.anihpb.2004.01.007
19. Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations, Applications
of Mathematics (New York), vol. 23. Springer, Berlin (1992)
20. Kohatsu-Higa, A., Tankov, P.: Jump-adapted discretization schemes for Lévy-driven SDEs.
Stoch. Process. Appl. 120(11), 2258–2285 (2010)
21. Li, S.: Multilevel Monte Carlo simulation for stochastic differential equations driven by Lévy
processes. Ph.D. dissertation, Westfälische Wilhelms-Universität (2015)
Multilevel Monte Carlo Implementation for SDEs Driven … 27

22. Maghsoodi, Y.: Mean square efficient numerical solution of jump-diffusion stochastic differ-
ential equations. Sankhyā Ser. A 58(1), 25–47 (1996)
23. Menn, C., Rachev, S.T.: Smoothly truncated stable distributions, GARCH-models, and option
pricing. Math. Methods Oper. Res. 69(3), 411–438 (2009)
24. Mordecki, E., Szepessy, A., Tempone, R., Zouraris, G.E.: Adaptive weak approximation of
diffusions with jumps. SIAM J. Numer. Anal. 46(4), 1732–1768 (2008)
25. Platen, E.: An approximation method for a class of Itô processes with jump component. Litovsk.
Mat. Sb. 22(2), 124–136 (1982)
26. Quek, T., De La Roche, G., Güvenç, I., Kountouris, M.: Small Cell Networks: Deployment,
PHY Techniques, and Resource Management. Cambridge University Press, Cambridge (2013)
27. Rubenthaler, S.: Numerical simulation of the solution of a stochastic differential equation driven
by a Lévy process. Stoch. Process. Appl. 103(2), 311–349 (2003)
28. Sato, K.: Lévy processes and infinitely divisible distributions. Cambridge Studies in Advanced
Mathematics, vol. 68. Cambridge University Press, Cambridge (1999)
29. Vasershtein, L.N.: Markov processes over denumerable products of spaces describing large
system of automata. Problemy Peredači Informacii 5(3), 64–72 (1969)
Construction of a Mean Square Error
Adaptive Euler–Maruyama Method
With Applications in Multilevel Monte Carlo

Håkon Hoel, Juho Häppölä and Raúl Tempone

Abstract A formal mean square error expansion (MSE) is derived for Euler–
Maruyama numerical solutions of stochastic differential equations (SDE). The error
expansion is used to construct a pathwise, a posteriori, adaptive time-stepping Euler–
Maruyama algorithm for numerical solutions of SDE, and the resulting algorithm is
incorporated into a multilevel Monte Carlo (MLMC) algorithm for weak approxima-
tions of SDE. This gives an efficient MSE adaptive MLMC algorithm for handling
a number of low-regularity approximation problems. In low-regularity numerical
example problems, the developed adaptive MLMC algorithm is shown to outperform
the uniform time-stepping MLMC algorithm by orders of magnitude, producing out-
put whose error with high probability is bounded by TOL > 0 at the near-optimal
MLMC cost rate O TOL−2 log(TOL)4 that is achieved when the cost of sample
generation is O(1).

Keywords Multilevel monte carlo · Stochastic differential equations · Euler–


Maruyama method · Adaptive methods · A posteriori error estimation · Adjoints

1 Introduction

SDE models are frequently applied in mathematical finance [12, 28, 29], where an
observable may, for example, represent the payoff of an option. SDE are also used
to model the dynamics of multiscale physical, chemical or biochemical systems

H. Hoel (B)
Department of Mathematics, University of Oslo, P.O. Box 1053,
0316 Blindern, Oslo, Norway
e-mail: haakonah@math.uio.no
H. Hoel · J. Häppölä · R. Tempone
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division,
King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
e-mail: juho.happola@kaust.edu.sa
R. Tempone
e-mail: raul.tempone@kaust.edu.sa

© Springer International Publishing Switzerland 2016 29


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_2
30 H. Hoel et al.

[11, 25, 30, 32], where, for instance, concentrations, temperature and energy may
be sought observables.
Given a filtered, complete probability space (Ω, F , (Ft )0≤t≤T , P), we consider
the Itô SDE

dXt = a(t, Xt )dt + b(t, Xt )dWt , t ∈ (0, T ],


(1)
X0 = x0 ,

where X : [0, T ] × Ω → Rd1 is a stochastic process with randomness generated by a


d2 -dimensional Wiener process, W : [0, T ] × Ω → Rd2 , with independent compo-
nents, W = (W (1) , W (2) , . . . , W (d2 ) ), and a : [0, T ] × Rd1 → Rd1 and b : [0, T ] ×
Rd1 → Rd1 ×d2 are the drift and diffusion coefficients, respectively. The initial con-
dition x0 is a random variable on (Ω, P, F ) independent of W . The considered
filtration Ft is generated from the history of the Wiener process W up to time t and
the possible outcomes of the initial data X0 , and succeedingly completed with all
P-outer measure zero sets of the sample space Ω. That is

Ft := σ ({Ws }0≤s≤t ) ∨ σ (X0 )

where the operation A ∨ B denotes the σ -algebra generated by the pair of σ -algebras
A and B, i.e., A ∨ B := σ (A , B), and A denotes the P-outer measure null-set
completion of A ,
 
  

A := A ∨ A ⊂ Ω  inf P Â = 0 .
Â∈{Ǎ∈A | Ǎ⊃A}

The contributions of this work are twofold. First, an a posteriori adaptive time-
stepping algorithm for computing numerical realizations of SDEs using the Euler–
Maruyama method is developed. And second, for a given observable g : Rd1 → R,
we construct a mean square error (MSE) adaptive time-stepping multilevel

Monte
Carlo (MLMC) algorithm for approximating the expected value, E g(XT ) , under
the following constraint:

 
P E g(XT) − A  ≤ TOL ≥ 1 − δ. (2)

Here, A denotes the algorithm’s approximation of E g(XT ) (examples of which


are given in Item (A.2) and Eq. (6) and TOL and δ > 0 are accuracy and confidence
constraints, respectively.
The rest of this paper is organized as follows: First, in Sect. 1.1, we review the
Monte Carlo methods and their use with the Euler–Maruyama integrator. This is fol-
lowed by discussion of Multilevel Monte Carlo methods and adaptivity for SDEs. The
theory, framework and numerical examples for the MSE adaptive algorithm is pre-
sented in Sect. 2. In Sect. 3, we develop the framework for the MSE adaptive MLMC
algorithm and present implementational details in algorithms with pseudocode. In
Construction of a Mean Square Error Adaptive … 31

Sect. 4, we compare the performance of the MSE adaptive and uniform MLMC
algorithms in a couple of numerical examples, one of which is a low-regularity SDE
problem. Finally, we present brief conclusions followed by technical proofs and the
extension of the main result to higher-dimensional problems in the appendices.

1.1 Monte Carlo Methods and the Euler–Maruyama Scheme

Monte Carlo (MC) methods provide a robust and typically non-intrusive way to
compute weak approximations of SDE. The convergence rate of MC methods does
not depend on the dimension of the problem; for that reason, MC is particularly
effective on multi-dimensional problems. In its simplest form, an approximation by
the MC method consists of the following two steps:
(A.1) Make M independent and identically distributed numerical approximations,
{X m,T }m=1,2,...,M , of the
numerical solution of the SDE (1).
(A.2) Approximate E g(XT) a realization of the sample average
 
M
g X m,T
A := . (3)
m=1
M

Regarding ordinary differential equations (ODE), the theory for numerical integrators
of different orders for scalar SDE is vast. Provided sufficient regularity, higher order
integrators generally yield higher convergence rates [22]. With MC methods it is
straightforward
  to determine that the goal (2) is fulfilled at the computational cost
O TOL−2−1/α , where α ≥ 0 denotes the weak convergence rate of the numerical
method, as defined in Eq. (5).
As a method of temporal discretization, the Euler–Maruyama scheme is given by

X tn+1 = X tn + a(tn , X tn )Δtn + b(tn , X tn )ΔWn ,


(4)
X 0 = x0 ,

using time steps Δtn = tn+1 − tn and Wiener increments ΔWn = Wtn+1 − Wtn ∼
N(0, Δtn Id2 ), where Id2 denotes the d2 × d2 identity matrix. In this work, we will
focus exclusively on Euler–Maruyama time-stepping. The Euler–Maruyama scheme,
which may be considered the SDE-equivalent of the forward-Euler method for ODE,
has, under sufficient regularity, first-order weak convergence rate
  
  
E g(XT) − g X T  = O max Δtn , (5)
n
32 H. Hoel et al.

and also first-order MSE convergence rate


  2  
E g(XT) − g X T = O max Δtn , (6)
n

cf. [22]. For multi-dimensional SDE problems, higher order schemes are generally
less applicable, as either the diffusion coefficient matrix has to fulfill a rigid commu-
tativity condition, or Levy areas, required in higher order numerical schemes, have to
be accurately approximated to achieve better convergence rates than those obtained
with the Euler–Maruyama method [22].

1.2 Uniform and Adaptive Time-Stepping MLMC

MLMC is a class of MC methods that uses a hierarchy of subtly correlated and


increasingly refined realization ensembles to reduce the variance of the sample esti-
mator. In comparison with single-level MC, MLMC may yield orders of magnitude
reductions in the computational cost of moment approximations. MLMC was first
introduced by Heinrich [14, 15] for approximating integrals that depend on random
parameters. For applications in SDE problems, Kebaier [21] introduced a two-level
MC method and demonstrated its potential efficiency gains over single-level MC.
Giles [8] thereafter developed an MLMC algorithm for SDE, exhibiting even higher
potential efficiency gains. Presently, MLMC is a vibrant and growing research topic,
(cf. [3, 4, 9, 10, 13, 26, 34], and references therein).

1.2.1 MLMC Notation

We define the multilevel estimator by


L
M
Δ gm
AML := , (7)
=0 m=1
M

where
⎧  {0}
⎨g X T , if  = 0,
Δ gm :=  {}   {−1}
⎩g X
m,T − g X m,T , otherwise.

Here, the positive integer, L, denotes the final level of the estimator, M is the number
{} {−1}
of sample realizations on the th level, and the realization pair, X m,T and X m,T , are
copies of the by the Euler–Maruyama method (4) approximations of the SDE using
the same Wiener path, Wm , sampled on the respective meshes, Δt {} and Δt {−1} ,
(cf. Fig. 1).
Construction of a Mean Square Error Adaptive … 33

Fig. 1 (Left) A sample Wiener path, W , generated on the coarse mesh, Δt {0} , with uniform step
size 1/10 (blue line). The path is thereafter Brownian bridge interpolated onto a finer mesh, Δt {1} ,
which has uniform step size of 1/20 (green line). (Right) Euler–Maruyama numerical solutions of the
Ornstein–Uhlenbeck SDE problem, dXt = 2(1 − Xt )dt + 0.2dWt , with initial condition X0 = 3/2,
are computed on the meshes Δt {0} (blue line) and Δt {1} (green line) using Wiener increments from
the respective path resolutions

1.2.2 Uniform Time-Stepping MLMC

In the uniform time-stepping MLMC introduced in [8], the respective SDE realiza-
{}
tions {X T } are constructed on a hierarchy of uniform meshes with geometrically
decaying step size, min Δt {} = max Δt {} = T /N , and N = c N0 with c ∈ N\{1}
and N0 an integer. For simplicity, we consider the uniform time-stepping MLMC
method with c = 2.

1.2.3 Uniform Time-Stepping MLMC Error


and Computational Complexity

By construction,
 the  multilevel estimator is telescoping in expectation, i.e.,

{L}
E AML = E g X T . Using this property, we may conveniently bound the mul-
tilevel approximation error:

        
E g(XT) − A  ≤ E g(XT) − g X {L}  + E g X {L} − A 
.
ML T T ML 
     
=:ET =:ES

The approximation goal (2) is then reached by ensuring that the sum of the bias, ET ,
and the statistical error, ES , is bounded from above by TOL, e.g., by the constraints
ET ≤ TOL/2 and ES ≤ TOL/2, (see Sect. 3.2 for more details on the MLMC error
control). For the MSE error goal,

2
E E g(XT) − AML ≤ TOL2 ,

the following theorem states the optimal computational cost for MLMC:
34 H. Hoel et al.

Theorem 1 (Computational cost of deterministic MLMC; Cliffe et al. [4]) Suppose


)
there are constants α, β, γ such that α ≥ min(β,γ
2
and
     
 {} 
(i) E g X T − g(XT)  = O N−α ,
 
−β
(ii) Var(Δ g) = O N ,
 γ
(iii) Cost(Δ g) = O N .
Then, for any TOL < e−1 , there exists an L and a sequence {M }L=0 such that

2
E AML − E g(XT) ≤ TOL2 , (8)

and
⎧  
⎪ −2
⎨O TOL
⎪ ,

if β > γ ,
−2
Cost(AML ) = O TOL log(TOL)

2
, if β = γ, (9)

⎪ β−γ
⎩O TOL −2+ α , if β < γ .

In comparison,
 the computational cost of achieving the goal (8) with single-level
MC is O TOL−2−γ /α . Theorem 1 thus shows that for any problem with β > 0,
MLMC will asymptotically be more efficient than single-level MC. Furthermore,
the performance gain of MLMC over MC is particularly apparent in settings where
β ≥ γ . The latter property is linked to the contributions of this work. In low-regularity
SDE problems, e.g., Example 6 below and [1, 35], the uniform time-stepping Euler–
Maruyama results in convergence rates for which β < γ . More sophisticated inte-
grators can preserve rates such that β ≥ γ .
Remark 1 Similar accuracy versus complexity results to Theorem 1, requiring
slightly stronger moment bounds, have also been derived for the approximation
goal (2) in the asymptotic setting when TOL ↓ 0, cf. [5, 16].

1.2.4 MSE A Posteriori Adaptive Time-Stepping

In general, adaptive time-stepping algorithms seek to fulfill one of two equivalent


goals [2]:
(B.1) Provided a computational budget N and a norm · , determine
  the possibly
non-uniform mesh, which minimizes the error g(XT) − g X T .
  
(B.2) Provided an error constraint g(XT) − g X T  ≤ TOL, determine the possibly
non-uniform mesh, which achieves the constraint at the minimum computa-
tional cost.
Evidently, the refinement criterion of an adaptive algorithm depends on the error
one seeks to minimize. In this work, we consider adaptivity goal (B.1) with the error
measured in terms of the MSE. This error measure is suitable for MLMC algorithms
Construction of a Mean Square Error Adaptive … 35

as it often will lead to improved convergence rates, β (since Var(Δ g) ≤ E Δ g2 ),


which by Theorem 1 may reduce the computational cost of MLMC. In Theorem 2,
we derive the following error expansion for the MSE of Euler–Maruyama numerical
solutions of the SDE (1):
N−1 
  2  2
E g(XT ) − g X T =E ρ n Δtn + o Δtn ,
2
(10)
n=0

where the error density, ρ n , is a function of the local error and sensitivities from the
dual solution of the SDE problem, as defined in (24). The error expansion (10) is an
a posteriori error estimate for the MSE, and in our adaptive algorithm, the mesh is
refined by equilibration of the expansion’s error indicators

r n := ρ n Δtn2 , for n = 0, 1, . . . , N − 1. (11)

1.2.5 An MSE Adaptive MLMC Algorithm

Using the described MSE adaptive algorithm, we construct an MSE adaptive MLMC
{}
algorithm in Sect. 3. The MLMC algorithm generates SDE realizations, {X T } , on
a hierarchy of pathwise adaptively refined meshes, {Δt {} } . The meshes are nested,
i.e., for all realizations ω ∈ Ω,

Δt {0} (ω) ⊂ Δt {1} (ω) ⊂ . . . Δt {} (ω) ⊂ . . . ,


 
with the constraint that the number of time steps in Δt {} , Δt {} , is bounded by 2N :
 {} 
Δt  < 2N = 2+2 N−1 .

Here, N−1 denotes the pre-initial number of time steps; it is an integer set in advance
of the computations. This corresponds to the hierarchy setup for the uniform time-
stepping MLMC algorithm in Sect. 1.2.2.
The potential efficiency gain of adaptive MLMC is experimentally illustrated in
this work using the drift blow-up problem

rXt
dXt = dt + σ Xt dWt , X0 = 1.
|t − ξ |p

This problem is addressed in Example 6 for the three different singularity exponents
p = 1/2, 2/3 and 3/4, with a pathwise, random singularity point ξ ∼ U(1/4, 3/4),
an observable g(x) = x, and a final time T = 1. For the given singularity expo-
nents, we observe experimental deteriorating convergence rates, α = (1 − p) and
β = 2(1 − p), for the uniform time-stepping Euler–Maruyama integrator, while for
36 H. Hoel et al.

Table 1 Observed computational cost—disregarding log(TOL) multiplicative factors of finite


order—for the drift blow-up study in Example 6
Singularity exponent p Observed computational cost
Adaptive MLMC Uniform MLMC
1/2 TOL−2 TOL−2
2/3 TOL−2 TOL−3
3/4 TOL−2 TOL−4

the adaptive time-step Euler–Maruyama we observe α ≈ 1 and β ≈ 1. Then, as


predicted by Theorem 1, we also observe an order of magnitude difference in com-
putational cost between the two algorithms (cf. Table 1).

1.2.6 Earlier Works on Adaptivity for SDE

Gaines’ and Lyons’ work [7] is one of the seminal contributions on adaptive algo-
rithms for SDE. They present an algorithm that seeks to minimize the pathwise error
of the mean and variation of the local error conditioned on the σ -algebra generated by
(i.e., the values at which the Wiener path has been evaluated in order to numerically
integrate the SDE realization) {Wtn }Nn=1 . The method may be used in combination
with different numerical integration methods, and an approach to approximations of
potentially needed Levy areas is proposed, facilitated by a binary tree representation
of the Wiener path realization at its evaluation points. As for a posteriori adaptive
algorithms, the error indicators in Gaines’ and Lyons’ algorithm are given by prod-
ucts of local errors and weight terms, but, unlike in a posteriori methods, the weight
terms are computed from a priori estimates, making their approach a hybrid one.
Szepessy et al. [31] introduced a posteriori weak error based adaptivity for
the Euler–Maruyama algorithm with numerically computable error indicator terms.
Their development of weak error adaptivity took inspiration from Talay and Tubaro’s
seminal work [33], where an error expansion for the weak error was derived for the
Euler–Maruyama algorithm when uniform time steps were used. In [16], Szepessy
et al.’s weak error adaptive algorithm was used in the construction of a weak error
adaptive MLMC algorithm. To the best of our knowledge, the present work is the
first on MSE a posteriori adaptive algorithms for SDE both in the MC- and MLMC
setting.
Among other adaptive algorithms for SDE, many have refinement criterions based
only or primarily on estimates of the local error. For example in [17], where the step-
size depends on the size of the diffusion coefficient for a MSE Euler–Maruyama
adaptive algorithm; in [23], the step-size is controlled by the variation in the size of
the drift coefficient in the constructed Euler–Maruyama adaptive algorithm, which
preserves the long-term ergodic behavior of the true solution for many SDE problems;
and in [19], a local error based adaptive Milstein algorithm is developed for solving
multi-dimensional chemical Langevin equations.
Construction of a Mean Square Error Adaptive … 37

2 Derivation of the MSE A Posteriori Adaptive Algorithm

In this section, we construct an MSE a posteriori adaptive algorithm for SDE whose
realizations are numerically integrated by the Euler–Maruyama algorithm (4). Our
goal is, in rough terms, to obtain an algorithm for solving the SDE problem (1) that
for a fixed number of intervals
   N, determines the time-stepping, Δt0 , Δt1 , . . . , ΔtN−1
2
such that the MSE, E g X T − g(XT) is minimized. That is,
   2
E g X T − g(XT) → min!, N given. (12)

The derivation of our adaptive algorithm consists of two steps. First, an error expan-
sion for the MSE is presented in Theorem 2. Based on the error expansion, we
thereafter construct a mesh refinement algorithm. At the end of the section, we apply
the adaptive algorithm to a few example problems.

2.1 The Error Expansion

Let us now present a leading-order error expansion for the MSE (12) of the SDE prob-
lem (1) in the one-dimensional (1D) setting, i.e., when Xt attains values in R and the
drift and diffusion coefficients are respectively of the form a : [0, T ] × R → R and
b : [0, T ] × R → R. An extension of the MSE error expansion to multi-dimensions
is given in Appendix “Error Expansion for the MSE in Multiple Dimensions”. To state
the error expansion Theorem, some notation is needed. Let Xsx,t denote the solution
of the SDE (1) at time s ≥ t, when the initial condition is Xt = x at time t, i.e.,
 s  s
Xsx,t := x + a(u, Xu )du + b(u, Xu )dWu , s ∈ [t, T ], (13)
t t

and in light of this notation, Xt is shorthand for Xtx0 ,0 . For a given observable g, the
payoff-of-flow map function is defined by ϕ(t, x) = g(XTx,t ). We also make use of
the following function space notation

C(U) := {f : U → R | f is continuous},
Cb (U) := {f : U → R | f is continuous and bounded},
 dj 
Cbk (R) := f : R → R | f ∈ C(R) and j f ∈ Cb (R) for all integers 1 ≤ j ≤ k ,
 dx
k1 ,k2
Cb ([0, T ] × R) := f : [0, T ] × R → R | f ∈ C([0, T ] × R) and

j
∂t 1 ∂xj2 f ∈ Cb ([0, T ] × R) for all integers j1 ≤ k1 and 1 ≤ j1 + j2 ≤ k2 .
38 H. Hoel et al.

We are now ready to present our mean square expansion result, namely,
Theorem 2 (1D MSE leading-order error expansion) Assume that drift and diffusion
coefficients and input data of the SDE (1) fulfill
(R.1) a, b ∈ Cb2,4 ([0, T ] × R),
(R.2) there exists a constant C > 0 such that

|a(t, x)|2 + |b(t, x)|2 ≤ C(1 + |x|2 ), ∀x ∈ R and ∀t ∈ [0, T ],

(R.3) The gradient of g, g : R → R satisfies g ∈ Cb3 (R),


(R.4) for the initial data, X0 is F0 -measurable and E[|X0 |p ] < ∞ for all p ≥ 1.
Assume further the mesh points 0 = t0 < t1 < . . . < tN = T
(M.1) are stopping times for which tn is Ftn−1 -measurable for n = 1, 2, . . . , N,
(M.2) there exists Ň ∈ N, and a c1 > 0 such that c1 Ň ≤ inf ω∈Ω N(ω) and supω∈Ω
N(ω) ≤ Ň holds for each realization. Furthermore, there exists a c2 > 0 such
that supω∈Ω maxn∈{0,1,...,N−1} Δtn (ω) < c2 Ň −1 ,
(M.3) and there exists a c3 > 0 such that for all p ∈ [1, 8] and n ∈ {0, 1, . . . , Ň − 1}


p
E Δtn2p ≤ c3 E Δtn2 .

Then, as Ň increases,

  
 2 Ň−1  (bx b)2  

E g(XT ) − g X T = E ϕx tn , X tn (tn , X tn )Δtn2 + o Δtn2 , (14)
n=0
2

where we have defined tn = T and Δt  n = 0 for all n ∈ {N, N + 1, . . . , Ň}. And


replacing the first variation, ϕx tn , X n , by the numerical approximation, ϕx,n , as
defined in (23), yields the following to leading order all-terms-computable error
expansion:

 
 2 Ň−1  

2 (bx b)
2
E g(XT ) − g X T = E ϕx,n (tn , X tn )Δtn2 + o Δtn2 . (15)
n=0
2

We present the proof to the theorem in Appendix “Error Expansion for the MSE
in 1D”

Remark 2 In condition (M.2) of the above theorem we have introduced Ň to denote


the deterministic upper bound for the number of time steps in all mesh realizations.
Moreover, from this point on the mesh points {tn }n and time steps {Δtn }n are defined
for all indices {0, 1, . . . , Ň} with the natural extension tn = T and Δtn = 0 for all
n ∈ {N + 1, . . . , Ň}. In addition to ensuring an upper bound on the complexity of a
Construction of a Mean Square Error Adaptive … 39

numerical realization and that maxn Δtn → 0 as Ň → ∞, replacing the random N


(the smallest integer value for which tN = T in a given mesh) with the deterministic
Ň in the MSE error expansion (15) simplifies our proof of Theorem 2.

Remark 3 For most SDE problems on which it is relevant to apply a posteriori


adaptive integrators, at least one of the regularity conditions (R.1), (R.2), and (R.3)
and the mesh adaptedness assumption (M.1) in Theorem 2 will not be fulfilled. In
our adaptive algorithm, the error expansion (15) is interpreted in a formal sense and
only used to facilitate the systematic construction of a mesh refinement criterion.
When applied to low-regularity SDE problems where some of the conditions (R.1),
(R.2), or (R.3), do not hold, the actual leading-order term of the error expansion (15)
2 (bx b)2
may contain other or additional terms besides ϕx,n 2
(tn , X tn ) in the error density.
Example 6 presents a problem where ad hoc additional terms are added to the error
density.

2.1.1 Numerical Approximation of the First Variation

The first variation of the flow map, ϕ(t, x), is defined by

ϕx (t, x) = ∂x g(Xtx,t ) = g (XTx,t )∂x XTx,t

and the first variation of the path itself, ∂x Xsx,t , is the solution of the linear SDE

d(∂x Xsx,t ) = ax (s, Xsx,t )∂x Xsx,t ds + bx (s, Xsx,t )∂x Xsx,t dWs , s ∈ (t, T ],
(16)
∂x Xtx,t = 1,

where ax denotes the partial derivative of a with respect to its spatial argument. To
describe conditions under which the terms g (Xsx,t ) and ∂x Xsx,t are well defined, let us
first recall that if Xsx,t solves the SDE (13) and
 T 
E |Xsx,t |2 ds < ∞,
t

then we say that there exists a solution to the SDE. If a solution Xsx,t exists and all
solutions 
Xsx,t satisfy 
 
P sup Xsx,t −  Xsx,t  > 0 = 0,
s∈[t,T ]

we say the solution Xsx,t is pathwise unique.


40 H. Hoel et al.

Lemma 1 Assume the regularity assumptions (R.1), (R.2), (R.3), and (R.4) in
Theo-
rem 2 hold, and that for any fixed t ∈ [0, T ], x is Ft -measurable and E |x|2p < ∞,
for all p ∈ N. Then there exist pathwise unique solutions Xsx,t and ∂x Xsx,t to the respec-
tive SDE (13) and (16) for which
!    "
 x,t 2p  

max E sup Xs   x,t 2p
, E sup ∂x Xs < ∞, ∀p ∈ N.
s∈[t,T ] s∈[t,T ]

Furthermore, ϕx (t, x) is FT -measurable and


E |ϕx (t, x) |2p < ∞, ∀p ∈ N.

We leave the proof of the Lemma to Appendix “Variations of the flow map”.

To obtain an all-terms-computable error expansion in Theorem 2, which will be


needed to construct an a posteriori adaptive algorithm, the first variation of the flow
map, ϕx , is approximated by the first variation of the Euler–Maruyama numerical
solution,

ϕx,n := g (X T )∂X tn X T .

X tn ,tn
Here, for k > n, ∂x X v is the solution of the Euler–Maruyama scheme

X tn ,tn X tn ,tn X tn ,tn


(∂x X )tj+1 = (∂X X)tj + ax (tj , X tj )(∂x X )tj Δtj + bx (tj , X tj )(∂x X )tj ΔWj , (17)
tn

X tn ,tn
for j = n, n + 1, . . . k − 1 and with the initial condition ∂x X = 1, which is cou-
pled to the numerical solution of the SDE, X tj .

Lemma 2 If the assumptions (R.1), (R.2), (R.3), (R.4), (M.1) and (M.2) in Theorem 2
hold, then the numerical solution X of (4) converges in mean square sense to the
solution of the SDE (1),
  2p 1/2p
max E X tn − Xtn  ≤ C Ň −1/2 , (18)
1≤n≤Ň

and  2p
max E X tn  < ∞, ∀p ∈ N. (19)
1≤n≤Ň

For any fixed instant of time tn in the mesh, 1 ≤ n ≤ N, the numerical solution ∂X tn X
of (17) converges in mean square sense to ∂x X Xtn ,tn ,
#  2p $1/2p
 X tk ,tk Xtn ,tn 

max E ∂x X − ∂x Xtk  ≤ C Ň −1/2 . (20)
n≤k≤Ň
Construction of a Mean Square Error Adaptive … 41

and  2p 
 X tk ,tk 

max E ∂x X  < ∞, ∀p ∈ N. (21)
n≤k≤Ň


Furthermore, ϕx,n is FT -measurable and


E |ϕx,n |2p < ∞, ∀p ∈ N. (22)

From the SDE (16), it is clear that

X t ,tn %
N−1
 
∂x X T n = 1 + ax (tk , X tk )Δtk + bx (tk , X tk )ΔWk ,
k=n

and this implies that ϕx,n solves the backward scheme

ϕx,n = cx (tn , X tn )ϕx,n+1 , n = N − 1, N − 2, . . . , 0, (23)

with the initial condition ϕx,N = g (X T ) and the shorthand notation

c(tn , X tn ) := X tn + a(tn , X tn )Δtn + b(tn , X tn )ΔWn .

The backward scheme (23) is convenient from a computational perspective since it


implies that the set of points, {ϕx,n }Nn=0 , can be computed at the same cost as that of
one-path realization, {X tn }Nn=0 , which can be verified as follows

%
N−1
ϕx,n = g (X T ) cx (tk , X tk )
k=n

%
N−1
= cx (tn , X tn )g (X T ) cx (tk , X tk )
k=n+1

= cx (tn , X tn )g (X T )∂tn+1 X T


= cx (tn , X tn )ϕx,n+1 .

2.2 The Adaptive Algorithm

Having derived computable expressions for all terms in the error expansion, we next
introduce the error density using a heuristic leading-order expansion

(bx b)2
ρ n := ϕx,n
2
(tn , X tn ), n = 0, 1, . . . , N − 1, (24)
2
42 H. Hoel et al.

and, for representing the numerical solution’s error contribution from the time interval
(tn , tn+1 ), the error indicators

r n := ρ n Δtn2 , n = 0, 1, . . . , N − 1. (25)

The error expansion (15) may then be written as

  2 Ň−1
 

E g(XT ) − g X T = E r n + o Δtn2 . (26)


n=0

The final goal of the adaptive


& algorithm
is minimization of the leading order of the
N−1
MSE in (26), namely, E n=0 r n , which (for each realization) is approached by
&
minimization of the error expansion realization N−1n=0 r n . An approximately optimal
choice for the refinement procedure can be derived by introducing the Lagrangian
 T  T
1
L (Δt, λ) = ρ(s)Δt(s)ds + λ( ds − Ň), (27)
0 0 Δt(s)

for which we seek to minimize the pathwise squared error



  2 T
g(XT ) − g X T = ρ(s)Δt(s)ds
0

under the constraint that  T


1
ds = Ň,
0 Δt(s)

for a fixed number of time steps, Ň, and the implicit constraint that the error indicators
are equilibrated,
  2
g(XT ) − g X T
rn = ρ n Δtn2 = , n = 0, 1, . . . , Ň − 1. (28)

Minimizing (27) yields
'
(  
( g(XT ) − gX T2 T * 2
) 1
Δtn = and MSEadaptive ≤ E ρ(s) ds , (29)
Ň ρ(tn ) Ň 0
Construction of a Mean Square Error Adaptive … 43

where the above inequality follows from using Hölder’s inequality,


   
 2 1    T *
E g(XT ) − g X T 
= * E g(XT ) − g X T  ρ(s) ds
Ň 0
'  
+ (  T
1   2 ( * 2
≤* E g(XT ) − g X T  )E ρ(s) ds .
Ň 0

In comparison, we notice that if a uniform mesh is used, the MSE becomes


 T 
T
MSEuniform = E ρ(s) ds . (30)
Ň 0

A consequence of observations (29) and (30) is that for many low-regularity prob-
lems, for instance, if ρ(s) = s−p with p ∈ [1, 2), adaptive time-stepping Euler–
Maruyama methods may produce more accurate solutions (measured in the MSE)
than are obtained using the uniform time-stepping Euler–Maruyama method under
the same computational budget constraints.

2.2.1 Mesh Refinement Strategy

To equilibrate the error indicators (28), we propose an iterative mesh refinement


strategy to identify the largest error indicator and then refining the corresponding
time step by halving it.
To compute the error indicators prior to refinement, the algorithm first computes
the numerical SDE solution, X tn , and the corresponding first variation ϕx,n (using
Eqs. (4) and (23) respectively) on the initial mesh, Δt {0} . Thereafter, the error indi-
cators r n are computed by Eq. (25) and the mesh is refined a prescribed number of
times, Nrefine , as follows:
(C.1) Find the largest error indicator

n∗ := arg max r n , (31)


n

and refine the corresponding time step by halving


 tn∗ + tn∗ +1 
(tn∗ , tn∗ +1 ) → tn∗ , , tn∗ +1 , (32)
  2  
=t new n∗ +2
=tnnew
∗ +1

and increment the number of refinements by one.


(C.2) Update the values of the error indicators, either by recomputing the whole
problem or locally by interpolation, cf. Sect. 2.2.3.
44 H. Hoel et al.

(C.3) Go to step (C.4) if Nrefine mesh refinements have been made; otherwise, return
to step (C.1).
(C.4) (Postconditioning) Do a last sweep over the mesh and refine by halving
 every
−1
time step that is strictly larger than Δtmax , where Δtmax = O Ň denotes
the maximum allowed step size.
The postconditioning step (C.4) ensures that all time steps become infinitesimally
small as the number of time steps N → ∞ with such a rate of decay that condition
(M.2) in Theorem 2 holds and is thereby one of the necessary conditions from
Lemma 2 to ensure strong convergence for the numerical solutions of the MSE
adaptive Euler–Maruyama algorithm. However, the strong convergence result should
primarily be interpreted as a motivation for introducing the postconditioning step
(C.4) since Theorem 2’s assumption (M.1), namely that the mesh points are stopping
times tn measurable with respect to Ftn−1 , will not hold in general for our adaptive
algorithm.

2.2.2 Wiener Path Refinements

When a time step is refined, as described in (32), the Wiener path must be refined
correspondingly. The value of the Wiener path at the midpoint between Wtn∗ and
Wtn∗ +1 can be generated by Brownian bridge interpolation,

Wt ∗ + Wtn∗ +1 Δtn∗
W tnnew
∗ +1
= n +ξ , (33)
2 2
where ξ ∼ N(0, 1), cf. [27]. See Fig. 1 for an illustration of Brownian bridge inter-
polation applied to numerical solutions of an Ornstein–Uhlenbeck SDE.

2.2.3 Updating the Error Indicators

After the refinement of an interval, (tn∗ , tn∗ +1 ), and its Wiener path, error indicators
must also be updated before moving on to determine which interval is next in line for
refinement. There are different ways of updating error indicators. One expensive but
more accurate option is to recompute the error indicators completely by first solving
the forward problem (4) and the backward problem (23). A less costly but also less
accurate alternative is to update only the error indicators locally at the refined time
step by one forward and backward numerical solution step, respectively:
new
X tn∗ +1 = X tn∗ + a(tn∗ , X tn∗ )Δtnnew
∗ + b(tn∗ , X tn∗ )ΔWnnew
∗ ,
(34)
ϕx,n
new
∗ +1 = cx (tn∗ , X t new
new
n∗
)ϕx,n∗ +1 .
Construction of a Mean Square Error Adaptive … 45

Thereafter, we compute the resulting error density, ρ new


n∗+1 , by Eq. (24), and finally
update the error locally by
 2  new 2
r n∗ = ρ n∗ Δtnnew
∗ , n∗ +1 Δtn∗ +1 .
r n∗ +1 = ρ new (35)

As a compromise between cost and accuracy, we here propose the following mixed
approach to updating error indicators post refinement: With Nrefine denoting the pre-
scribed number of refinement iterations of the input mesh, let all error indicators
be completely recomputed every N  = O(log(Nrefine ))th iteration, whereas for the
 iterations, only local updates of the error indicators are com-
remaining Nrefine − N
puted. Following this approach, the computational cost of  refining a mesh holding
N time steps into a mesh of 2N time steps becomes O N log(N)2 . Observe that
the asymptotically dominating cost is to sort the mesh’s error indicators O(log(N))
times. To anticipate the computational cost for the MSE adaptive MLMC algo-
rithm, this implies
 that the cost of generating an MSE adaptive realization pair is
Cost(Δ g) = O 2 2 .

2.2.4 Pseudocode

The mesh refinement and the computation of error indicators are presented in Algo-
rithms 1 and 2, respectively.

Algorithm 1 meshRefinement
Input: Mesh Δt, Wiener path W , number of refinements Nrefine , maximum time step Δtmax
Output: Refined mesh Δt and Wiener path W .
Set the number of re-computations of all error indicators to a number N  = O (log(Nrefine )) and
compute the refinement batch size N , = Nrefine /N.

for i = 1 to N  do
Completely update the error density by applying
[r, X, ϕ x , ρ] = computeErrorIndicators(Δt, W ).
if Nrefine > 2N , then
Set the below for-loop limit to J = N. ,
else
Set J = Nrefine .
end if
for j = 1 to J do
Locate the largest error indicator r n∗ using Eq. (31).
Refine the interval (tn∗ , tn∗ +1 ) by the halving (32), add a midpoint value Wnnew
∗ +1 to the Wiener
path by the Brownian bridge interpolation (33), and set Nrefine = Nrefine − 1.
Locally update the error indicators rnnew ∗ and rnnew
∗ +1 by the steps (34) and (35).
end for
end for
Do a final sweep over the mesh and refine all time steps of the input mesh which are strictly larger
than Δtmax .
46 H. Hoel et al.

Algorithm 2 computeErrorIndicators
Input: mesh Δt, Wiener path W .
Output: error indicators r, path solutions X and ϕ x , error density ρ.
Compute the SDE path X using the Euler–Maruyama algorithm (4).
Compute the first variation ϕ x using the backward algorithm (23).
Compute the error density ρ and error indicators r by the formulas (24) and (25), respectively.

2.3 Numerical Examples

To illustrate the procedure for computing error indicators and the performance of the
adaptive algorithm, we now present four SDE example problems. To keep matters
relatively elementary, the dual solutions, ϕx (t), for these examples are derived not
from a posteriori but a priori analysis. This approach results in adaptively generated
mesh points which for all problems in this section will contain mesh points which are
stopping times for which tn is Ftn−1 -measurable for all n ∈ {1, 2, . . . , N}. In Exam-
ples 1–3, it is straightforward to verify that the other assumptions of the respective
single- and multi-dimensional MSE error expansions of Theorems 2 and 3 hold,
meaning that the adaptive approach produces numerical solutions whose MSE to
leading order are bounded by the respective error expansions (14) and (67).

Example 1 We consider the classical geometric Brownian motion problem

dXt = Xt dt + Xt dWt , X0 = 1,

for which we seek to minimize the MSE


E (XT − X T )2 = min!, N given, (36)

at the final time, T = 1, (cf. the goal (B.1)). One may derive that the dual solution
of this problem is of the form

XT
ϕx (Xt , t) = ∂Xt XTXt ,t = ,
Xt

which leads to the error density

(bx b)2 (Xt , t) (ϕx (Xt , t))2 X2


ρ(t) = = T.
2 2
We conclude that uniform time-stepping is optimal. A further reduction of the MSE
could be achieved by allowing the number of time steps to depend on the magnitude
of XT2 for each realization. This is however outside the scope of the considered
refinement goal (B.1), where we assume the number of time steps, N, is fixed for
all realizations and would be possible only to a very weak degree under the slight
generalization of (B.1) given in assumption (M.2) of Theorem 2.
Construction of a Mean Square Error Adaptive … 47

Example 2 Our second example is the two-dimensional (2D) SDE problem

dWt = 1dWt , W0 = 0,
dXt = Wt dWt , X0 = 0.

Here, we seek to minimize the MSE E (XT − X T )2 for the observable


 T
XT = Wt dWt
0

at the final time T = 1. With the diffusion matrix represented by


 
1
b((Wt , Xt ), t) = ,
Wt

and observing that


  T
∂Xt XTXt ,t = ∂Xt Xt + Ws dWs = 1,
t

it follows from the error density in multi-dimensions in Eq. (65) that ρ(t) = 21 . We
conclude that uniform time-stepping is optimal for this problem as well.

Example 3 Next, we consider the three-dimensional (3D) SDE problem

dWt(1) = 1dWt(1) , W0(1) = 0,


dWt(2) = 1dWt(2) , W0(2) = 0,
dXt = Wt(1) dWt(2) − Wt(2) dWt(1) , X0 = 0,

where Wt(1) and Wt(2) are


independent Wiener processes. Here, we seek to minimize
the MSE E (XT − X T )2 for the Levy area observable
 T
XT = (Wt(1) dWt(2) − Wt(2) dWt(1) ),
0

at the final time, T = 1. Representing the diffusion matrix by




1 0
b((Wt , Xt ), t) = ⎣ 0 1 ⎦,
−Wt Wt(2)
(1)
48 H. Hoel et al.

and observing that


  T
∂Xt XTXt ,t = ∂Xt Xt + (Ws(1) dWs(2) − Ws(2) dWs(1) ), = 1,
t

it follows from Eq. (65) that ρ(t) = 1. We conclude that uniform time-stepping is
optimal for computing Levy areas.

Example 4 As the last example, we consider the 2D SDE

dWt = 1dWt , W0 = 0,
dXt = 3(Wt2 − t)dWt , X0 = 0.

We seek to minimize the MSE (36) at the final time T = 1. For this problem, it
may be shown by Itô calculus that the pathwise exact solution is XT = WT3 − 3WT T .
Representing the diffusion matrix by
 
1
b((Wt , Xt ), t) = ,
3(Wt2 − t)

Equation (65) implies that ρ(t) = 18Wt2 . This motivates the use of discrete error
indicators, r n = 18Wt2n Δtn2 , in the mesh refinement criterion. For this problem, we
may not directly conclude that the error expansion (67) holds since the diffusion
coefficient does not fulfill the assumption in Theorem 3. Although we will not include
j
the details here, it is easy to derive that ∂x XTx,t = 0 for all j > 1 and to prove that
the MSE leading-order error expansion also holds for this particular problem by
following the steps of the proof of Theorem 2. In Fig. 2, we compare the uniform
and adaptive time-stepping Euler–Maruyama algorithms in terms of MSE versus the
Mean square error E[(XT − X T )2]

100
Uniform time stepping
Adaptive time stepping
10−1

10−2

10−3
101 102 103 104
Number of time steps N

Fig. 2 Comparison of the performance of uniform and adaptive time-stepping Euler–Maruyama


numerical integration for Example 4 in terms of MSE versus number of time steps
Construction of a Mean Square Error Adaptive … 49

number of time steps, N. Estimates for the MSE for both algorithms are computed
by MC sampling using M = 106 samples. This is a sufficient sample size to render
the MC estimates’ statistical error negligible. For the adaptive algorithm, we have
used the following input parameter in Algorithm 1: uniform input mesh, Δt, with
step size 2/N (and Δtmax = 2/N). The number of refinements is set to Nrefine = N/2.
We observe that the algorithms have approximately equal convergence rates, but, as
expected, the adaptive algorithm is slightly more accurate than the uniform time-
stepping algorithm.

3 Extension of the Adaptive Algorithm


to the Multilevel Setting

In this section, we incorporate the MSE adaptive time-stepping algorithm presented


in the preceding section into an MSE adaptive MLMC algorithm for weak approx-
imations. First, we shortly recall the approximation goal and important concepts
for the MSE adaptive MLMC algorithm, such as the structure of the adaptive mesh
hierarchy and MLMC error control. Thereafter, the MLMC algorithm is presented
in pseudocode form.

3.1 Notation and Objective

For a tolerance, TOL > 0, and confidence, 0 < 1 − δ < 1, we recall that our objec-
tive is to construct an adaptive time-stepping MLMC estimator, AML , which meets
the approximation constraint

 
P E g(XT) − AML  ≤ TOL ≥ 1 − δ. (37)

We denote the multilevel estimator by

L M
Δ gm
AML := ,
=0 m=1
M
  
=:A (Δ g;M )

where   
gX m,T ,  −1 if  = 0,
Δ gm := 
g X m,T − g X m,T , else.

Section 1.2.5 presents further details on MLMC notation and parameters.


50 H. Hoel et al.

3.1.1 The Mesh Hierarchy


 
A realization, Δ g ωi, , is generated on a nested pair of mesh realizations

. . . ⊂ Δt {−1} (ωi, ) ⊂ Δt {} (ωi, ).

Subsequently, mesh realizations are generated step by step from a prescribed and
deterministic input mesh, Δt {−1} , holding N−1 uniform time steps. First, Δt {−1} is
refined into a mesh, Δt {0} , by applying Algorithm 1, namely
 
[Δt {0} , W {0} ] = meshRefinement Δt {−1} , W {−1} , Nrefine = N−1 , Δtmax = N0−1 .

The mesh refinement process is iterated until meshes Δt {−1} and Δt {−1} are pro-
duced, with the last couple of iterations being
 
−1
[Δt {−1} , W {−1} ] = meshRefinement Δt {−2} , W {−2} , Nrefine = N−2 , Δtmax = N−1 ,

and
 
[Δt {} , W {} ] = meshRefinement Δt {−1} , W {−1} , Nrefine = N−1 , Δtmax = N−1 .
 {}  {−1}
The output realization for the difference Δ gi = g X i − g X i is thereafter
generated on the output temporal mesh and Wiener path pairs, (Δt {−1} , W {−1} ) and
(Δt {} , W {} ).
For later estimates of the computational cost of the MSE adaptive MLMC algo-
rithm, it is useful to have upper bounds on the growth of the number of time steps
in the mesh hierarchy, {Δt {} } , as  increases. Letting |Δt| denote the number of
time steps in a mesh, Δt (i.e., the cardinality of the set Δt = {Δt0 , Δt1 , . . .}), the
following bounds hold
 
N ≤ Δt {}  < 2N ∀ ∈ N0 .

The lower bound follows straightforwardly from the mesh hierarchy refinement pro-
cedure described above. To show the upper bound, notice the maximum number of
mesh refinements going from a level  − 1 mesh, Δt {−1} to a level  mesh, Δt {} is
2N−1 − 1. Consequently,

−1

|Δt {} | ≤ |Δt {−1} | + Maximum number of refinements going from Δt {j−1} to Δt {j}
j=0


≤ N−1 + 2 Nj−1 − ( + 1) < 2N .
j=0
Construction of a Mean Square Error Adaptive … 51


 {}
Remark 4 For the telescoping property E AML = E g X T to hold, it is not
required that the adaptive mesh hierarchy is nested, but non-nested meshes make it
more complicated to compute Wiener path pairs (W {−1} , W {} ). In the numerical
tests leading to this work, we tested both nested and non-nested adaptive meshes and
found both options performing satisfactorily.

3.2 Error Control

The error control for the adaptive MLMC algorithm follows the general framework
of a uniform time-stepping MLMC, but for the sake of completeness, we recall the
error control framework for the setting of weak approximations. By splitting

        
E g(XT) − A  ≤ E g(XT) − g X {L}  + E g X {L} − A 
ML T T ML 
     
=:ET =:ES

and

TOL = TOLT + TOLS , (38)

we seek to implicitly fulfill (37) by imposing the stricter constraints

ET ≤ TOLT , the time discretization error, (39)


 
P ES ≤ TOLS ≥ 1 − δ, the statistical error. (40)

3.2.1 The Statistical Error

Under the moment assumptions stated in [6], Lindeberg’s version of the Central
Limit Theorem yields that as TOL ↓ 0,
 {L}
AML − E g X T D
1   −
→ N(0, 1).
Var AML

D
Here, −
→ denotes convergence in distribution. By construction, we have

  L
Var(Δ g)
Var AML = .
=0
M
52 H. Hoel et al.

This asymptotic result motivates the statistical error constraint

  TOLS 2
Var AML ≤ , (41)
CC 2 (δ)

where CC (δ) is the confidence parameter chosen such that


 CC
1
(δ)e−x /2
2
1− √ dx = (1 − δ), (42)
2π −CC (δ)

for a prescribed confidence (1 − δ).


Another important question is how to distribute the number of samples, {M } ,
on the level hierarchy such that both the computational cost of the MLMC estimator
 Letting C denote the expected cost of
is minimized and the constraint (41) is met.
generating a numerical realization Δ g ωi, , the approximate total cost of generating
the multilevel estimator becomes


L
CML := C M .
=0

An optimization of the number of samples at each level can then be found through
minimization of the Lagrangian
# L $
Var(Δ g) TOLS 2
L
L (M0 , M1 , . . . , ML , λ) = λ − + C M ,
=0
M CC 2 (δ) =0

yielding
2 3 4
Var(Δ g) *
L
CC 2 (δ)
M = C Var(Δ g) ,  = 0, 1, . . . , L.
TOLS 2 C =0

 
Since the cost of adaptively refining a mesh, Δt {} , is O N log(N )2 , as noted in
 2.2.3, the cost of generating an SDE realization, is of the same order: C =
Sect.
O N log(N )2 . Representing the cost by its leading-order term and disregarding the
logarithmic factor, an approximation to the level-wise optimal number of samples
becomes
2 3 4
CC 2 (δ) Var(Δ g) *
L
M = N Var(Δ g) ,  = 0, 1, . . . , L. (43)
TOLS 2 N =0
Construction of a Mean Square Error Adaptive … 53

Remark 5 In our MLMC implementations, the variances, Var(Δ g), in Eq. (43)
are approximated by sample variances. To save memory in our parallel computer
implementation,
  the maximum permitted batch size for a set of realizations,
, samples,
{Δ g ωi, }i , is set to 100,000. For the initial batch consisting of M = M
the sample variance is computed by the standard approach,

1 M
 
V (Δ g; M ) = (Δ g ωi, − A (Δ g; M ))2 .
M − 1 i=1

   +M
Thereafter, for every new batch of realizations, {Δ g ωi, }M
i=M +1 (M here denotes
an arbitrary natural number smaller or equal to 100,000), we incrementally update
the sample variance,

M
V (Δ g; M + M) = × V (Δ g; M )
M + M
M  +M
1  
+ (Δ g ωi, − A (Δ g; M + M))2 ,
(M + M − 1) i=M +1


and update the total number of samples on level  accordingly, M = M + M.

3.2.2 The Time Discretization Error

To control the time discretization error, we assume that a weak order convergence
rate, α > 0, holds for the given SDE problem when solved with the Euler–Maruyama
method, i.e.,
  {L}   
 
E g(XT) − g X T  = O NL−α ,

and we assume that the asymptotic rate is reached at level L − 1. Then


 ∞  

  {L}  
 
 ∞ E ΔL g 
  
E g(XT) − g X T  =  E Δ g  ≤ E ΔL g  2 −α
= α .
  2 −1
=L+1 =1


the weak convergence rate, α, is known prior to
In our implementation, we assume
sampling and, replacing E ΔL g with a sample average approximation in the above
inequality, we determine L by the following stopping criterion:
54 H. Hoel et al.

 
max 2−α |A (ΔL−1 g; ML−1 )| , |A (ΔL g; ML )|
≤ TOLT , (44)
2α − 1

(cf. Algorithm 3). Here we implicitly assume that the statistical error in estimating
the bias condition is not prohibitively large.
A final level L of order log(TOLT −1 ) will thus control the discretization error.

3.2.3 Computational Cost

Under the convergence rate assumptions stated in Theorem 1, it follows that the cost
of generating
an adaptive
MLMC


estimator, AML , fulfilling the MSE approximation
goal E (AML − E g(XT) )2 ≤ TOL2 is bounded by
⎧  

⎪O TOL−2 , if β > 1,
L ⎨  
−2
M C ≤ O TOL log(TOL) ,  if β = 1,
4
CML = (45)

⎪ β−1
=0 ⎩O TOL−2+ α log(TOL)2 , if β < 1.

Moreover, under the additional higher moment approximation rate assumption


   2+ν 
 {}   
E g X T − g(XT) = O 2−β+ν/2 ,

the complexity bound (45) also holds for fulfilling criterion (2) asymptotically as
TOL ↓ 0, (cf. [5]).

3.3 MLMC Pseudocode

In this section, we present pseudocode for the implementation of the MSE adaptive
MLMC algorithm. In addition to Algorithms 1 and 2, presented in Sect. 2.2.4, the
implementation consists of Algorithms 3 and 4. Algorithm 3 describes how the stop-
ping criterion for the final level L is implemented and how the multilevel estimator
is generated, and Algorithm 4 describes the steps for generating a realization Δ g.
Construction of a Mean Square Error Adaptive … 55

Algorithm 3 mlmcEstimator
Input: TOLT , TOLS , confidence δ, initial mesh Δt {−1} , initial number of mesh steps N−1 , input
,
weak rate α, initial number of samples M.
Output: Multilevel estimator AML .
Compute the confidence parameter CC (δ) by (42).
Set L = −1.
while L < 2 or (44), using the input α for the weak rate, is violated do
Set L = L + 1.   L
Set ML = M, , generate a set of realizations {Δ g ωi, }M by applying
i=1
adaptiveRealizations(Δt {−1} ).
for  = 0 to L do
Compute the sample variance V (Δ g; Ml ).
end for
for  = 0 to L do
Determine the number of samples M by (43).
if new value of M is larger than the old value then
  Mnew
Compute additional realizations {Δ g ωi, }i=M  +1
by applying
adaptiveRealizations(Δt {−1} ).
end if
end for
end while
Compute AML from the generated samples by using formula (7).

Remark 6 For each increment of L in Algorithm 3, all realizations Δ g that have


been generated up to that point are reused in later computations of the multilevel
estimator. This approach, which is common in MLMC, (cf. [8]), seems to work fine
in practice although the independence between samples is then lost. Accounting for
the lack of independence complicates the convergence analysis.

4 Numerical Examples for the MLMC Algorithms

To illustrate the implementation of the MSE adaptive MLMC algorithm and to show
its robustness and potential efficiency gain over the uniform MLMC algorithm, we
present two numerical examples in this section. The first example considers a geo-
metric Brownian motion SDE problem with sufficient regularity, such that there is
very little (probably nothing) to gain by introducing adaptive mesh refinement. The
example is included to show that in settings where adaptivity is not required, the
MSE adaptive MLMC algorithm is not excessively more expensive than the uniform
MLMC algorithm. In the second example, we consider an SDE with a random time
drift coefficient blow-up of order t −p with p ∈ [0.5, 1). The MSE adaptive MLMC
algorithm performs progressively more efficiently than does the uniform MLMC
algorithm as the value of the blow-up exponent p increases. We should add, however,
that although we observe numerical evidence for the numerical solutions converg-
56 H. Hoel et al.

Algorithm 4 adaptiveRealization
Input: Mesh Δt {−1} .
Outputs: One realization Δ g(ω)
Generate a Wiener path W {−1} on the initial mesh Δt {−1} .
for j = 0 to  do
Refine the mesh by applying

[Δt {j} , W {j} ] = meshRefinement(Δt {j−1} , W {j−1} , Nrefine = Nj−1 , Δtmax = Nj−1 ).

end for
{−1} {}
Compute Euler–Maruyama realizations (X T , X T )(ω) using the mesh pair (Δt {−1} , Δt {} )(ω)
and Wiener path pair (W {−1} {}
, W )(ω), cf. (4), and return the output
   
{} {−1}
Δ g(ω) = g X T (ω) − g X T (ω) .

ing for both examples, all of the assumptions in Theorem 2 are not fulfilled for our
adaptive algorithm, when applied to either of the two examples. We are therefore not
able to prove theoretically that our adaptive algorithm converges in these examples.
For reference, the implemented MSE adaptive MLMC algorithm is described in
Algorithms 1–4, the standard form of the uniform time-stepping MLMC algorithm
that we use in these numerical comparisons is presented in Algorithm 5, Appendix “A
Uniform Time Step MLMC Algorithm”, and a summary of the parameter values used
in the examples is given in Table 2. Furthermore, all average properties derived from
the MLMC algorithms that we plot for the considered examples in Figs. 3, 4, 5, 6, 7,
8, 9, 10, 11 and 12 below are computed from 100 multilevel estimator realizations,
and, when plotted, error bars are scaled to one sample standard deviation.

Example 5 We consider the geometric Brownian motion

dXt = Xt dt + Xt dWt , X0 = 1,

where we seek to fulfill the weak approximation goal (2) for


the observable, g(x) = x,
at the final time, T = 1. The reference solution is E g(XT) = eT . From Example 1,
we recall that the MSE minimized in this problem by using uniform time steps.
However, our a posteriori MSE adaptive MLMC algorithm computes error indicators
from numerical solutions of the path and the dual solution, which may lead to slightly
non-uniform output meshes. In Fig. 3, we study how  close 
to uniform the MSE
adaptive meshes are by plotting the level-wise ratio, E Δt {}  /N , where we recall
 
that Δt {}  denotes the number of time steps in the mesh, Δt {} , and
that a uniform
mesh on level  has N time steps. As the level, , increases, E Δt {}  /N converges
to 1, and to interpret this result, we recall from the construction of the adaptive mesh
Construction of a Mean Square Error Adaptive … 57

Table 2 List of parameter values used by the MSE adaptive MLMC algorithm and (when required)
the uniform MLMC algorithm for the numerical examples in Sect. 4
Parameter Description of parameter Example 5 Example 6
δ Confidence parameter, cf. (37) 0.1 0.1
TOL Accuracy parameter, cf. (37) [10−3 , 10−1 ] [10−3 , 10−1 ]
TOLS Statistical error tolerance, cf. (38) TOL/2 TOL/2
TOLT Bias error tolerance, cf. (38) TOL/2 TOL/2
Δt {−1} Pre-initial input uniform mesh 1/2 1/2
having the following step size
N0 Number of time steps in the initial 4 4
mesh Δt {0}
5 6 5 6
log(+2) log(+2)
Ñ() The number of complete updates log(2) log(2)
of the error indicators in the MSE
adaptive algorithm, cf. Algorithm 1
Δtmax () Maximum permitted time step size N−1 N−1
Δtmin Minimum permitted time step size 2−51 2−51
(due to the used double-precision
binary floating-point format)
,
M Number of first batch samples for a 100 20
(first) estimate of the variance
Var(Δ g)
αU Input weak convergence rate used 1 (1 − p)
in the stopping rule (44) for
uniform time step
Euler–Maruyama numerical
integration
αA Input weak convergence rate used 1 1
in the stopping rule (44) for the
MSE adaptive time step
Euler–Maruyama numerical
integration

 
hierarchy in Sect. 3 that if Δt {}  = N , then the mesh, Δt {} , is uniform. We thus
conclude that for this problem, the higher the level, the more uniform the MSE
adaptive mesh realizations generally become.
Since adaptive mesh refinement is costly and since this problem has sufficient
regularity for the first-order weak and MSE convergence rates (5) and (6) to hold,
respectively, one might expect that MSE adaptive MLMC will be less efficient than
the uniform MLMC. This is verified in Fig. 5, which shows that the runtime of the
MSE adaptive MLMC algorithm grows slightly faster than the uniform MLMC algo-
rithm and that the cost ratio is at most roughly 3.5, in favor of uniform MLMC. In
Fig. 4, the accuracy of the MLMC algorithms is compared, showing that both algo-
rithms fulfill the goal (2) reliably. Figure 6 further shows that
both
 algorithms have
roughly first-order convergence rates for the weak error E Δ g  and the variance
Var(Δ g), and that the decay rates for Ml are close to identical. We conclude that
58 H. Hoel et al.

Number of time steps ratio E[|Δt{} |]/N


1.010

1.008

1.006

1.004

1.002

1.000
0 2 4 6 8 10 12
Level 
 

Fig. 3 The ratio of the level-wise mean number of time steps E Δt {}  /N , of MSE adaptive
mesh realizations to uniform mesh realizations for Example 6

Fig. 4 For a set of TOL values, 100 realizations of the MSE adaptive multilevel estimator are

computed using both MLMC algorithms for Example 5. The errors |AML (ωi ; TOL, δ) − E g(XT) |
are respectively plotted as circles (adaptive MLMC) and triangles (uniform MLMC), and the

number
of multilevel estimator realizations failing the constraint |AML (ωi ; TOL, δ) − E g(XT) | < TOL
is written above the (TOL−1 , TOL) line. Since the confidence parameter is set to δ = 0.1 and less
than 10 realizations fail for any of the tested TOL values, both algorithms meet the approximation
goal (37)

although MSE adaptive MLMC is slightly more costly than uniform MLMC, the
algorithms perform comparably in terms of runtime for this example.

Remark 7 The reason why we are unable to prove theoretically that the numerical
solution of this problem computed with our adaptive algorithm asymptotically con-
verges to the true solution is slightly subtle. The required smoothness conditions in
Theorem 2 are obviously fulfilled, but due to the local update of the error indicators
in our mesh refinement procedure, (cf. Sect. 2.2.3), we cannot prove that the mesh
points will asymptotically be stopping times for which tn is Ftn−1 -measurable for all
n ∈ {1, 2, . . . , N}. If we instead were to use the version of our adaptive algorithm
that recomputes all error indicators for each mesh refinement, the definition of the
error density (24) implies that, for this particular problem, it would take the same
Construction of a Mean Square Error Adaptive … 59

104
103 adaptive MLMC
2
uniform MLMC
10
c1 TOL−2 log(TOL)2
Runtime [s]

101
100
10−1
10−2
10−3
101 102 103
−1
TOL

Fig. 5 Average runtime versus TOL−1 for the two MLMC algorithms solving Example 5

Adaptive MLMC Uniform MLMC


0
10

10−1

10−2

10−3 A(g ; M )
A(Δ g; M )
10−4
c2−
10−5
101

100

10−1

10−2 V(g ; M )
10−3 V(Δ g; M )
−4
c2−
10
1010
E[M (TOL = 10−3 )]
109
E[M (TOL = 10−2.11 )]
108 c2−
7
10

106

105

104
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Level  Level 

Fig. 6 Output for Example 5 solved



with the MSE adaptive and uniform time-stepping MLMC
algorithms. (Top) Weak error E Δ g  for solutions at TOL = 10−3 . (Middle) Variance Var(Δ g)
for solutions at TOL = 10−3 . (Bottom) Average number of samples E[Ml ]
60 H. Hoel et al.

7
value, ρ n = N−1k=0 cx (tk , X tk ) /2, for all indices, n ∈ {0, 1, . . . , N}. The resulting
2

adaptively refined mesh would then become uniform and we could verify conver-
gence, for instance, by using Theorem 2. Connecting this to the numerical results for
the adaptive algorithm that we have implemented
 
here, we notice that the level-wise
mean number of time steps ratio, E Δt {}  /N , presented in Fig. 3 seems to tend
towards 1 as  increases, a limit ratio that is achieved only if Δt {} is indeed a uniform
mesh.

Example 6 We next consider the two-dimensional SDE driven by a one-dimensional


Wiener process

dXt = a(t, Xt ; ξ )dt + b(t, Xt ; ξ )dWt


(46)
X0 = [1, ξ ]T ,

with the low-regularity drift coefficient, a(t, x) = [r|t − x (2) |−p , 0]T , interest rate,
r = 1/5, and volatility b(t, x) = [σ, 0]T with, σ = 0.5, and observable, g(x) = x, at
the final time T = 1. The ξ in the initial condition is distributed as ξ ∼ U(1/4, 3/4)
and it is independent from the Wiener process, W . Three different blow-up exponent
test cases are considered, p = (1/2, 2/3, 3/4), and to avoid blow-ups in the numerical
integration of the drift function component, f (·; ξ ), we replace the fully explicit
Euler–Maruyama integration scheme with the following semi-implicit scheme:

rf (tn ; ξ )X tn Δtn + σ X tn ΔWn , if f (tn ; ξ ) < 2f (tn+1 ; ξ ),
X tn+1 = X tn + (47)
rf (tn+1 ; ξ )X tn Δtn + σ X tn ΔWn , else,

where we have dropped the superscript for the first component of the SDE, writing
out only the first component, since the evolution of the second component is trivial.
For p ∈ [1/2, 3/4] it may be shown that for any singularity point, any path integrated
by the scheme (47) will have at most one drift-implicit integration step. The reference
mean for the exact solution is given by
 
3/4
r(x 1−p + (1 − x)1−p )
E[XT ] = 2 exp dx,
1/4 1−p

and in the numerical experiments, we approximate this integral value by quadrature


to the needed accuracy.

The MSE Expansion for the Adaptive Algorithm


Due to the low-regularity drift present in this problem, the resulting MSE expansion
will also contain drift-related terms that formally are of higher order. From the proof
of Theorem 2, Eq. (59), we conclude that, to leading order the MSE is bounded by
Construction of a Mean Square Error Adaptive … 61

Adaptive realization p=0.5 Adaptive realization p=0.67 Adaptive realization p=0.75


2.5
{2} {6}
X t (ω) X t (ω)
{4} {8}
X t (ω) X t (ω)
2.0

1.5

1.0

0.5
0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0
time t time t time t

Adaptive mesh p=0.5 Adaptive mesh p=0.67 Adaptive mesh p=0.75


10−1
10−2
10−3
10−4
10−5
10−6
10−7
10−8
10−9
10−10
10−11
10−12 Δt{2} (ω) Δt{6} (ω)
10−13
10−14 Δt{4} (ω) Δt{8} (ω)
10−15
0.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.00.0 0.2 0.4 0.6 0.8 1.0
time t time t time t

Fig. 7 (Top) One MSE adaptive numerical realization of the SDE problem (46) at different mesh
hierarchy levels. The blow-up singularity point is located at ξ ≈ 0.288473 and the realizations
are computed for three singularity exponent values. We observe that as the exponent, p, increases,
the more jump at t = ξ becomes more pronounced. (Bottom) Corresponding MSE adaptive mesh
realizations for the different test cases

N−1   
 2 N(at + ax a)2 Δtn2 + (bx b)2 (tn , X tn ; ξ ) 2
E X T − XT  ≤ E ϕx,n
2
Δtn .
n=0
2

This is the error expansion we use for the adaptive mesh refinement (in Algorithm 1)
in this example. In Fig. 7, we illustrate the effect that the singularity exponent, p, has
on SDE and adaptive mesh realizations.
Implementation Details and Observations
Computational tests for the uniform and MSE adaptive MLMC algorithms are imple-
mented with the input parameters summarized in Table 2. The weak convergence
rate, α, which is needed in the MLMC implementations’ stopping criterion (44), is
estimated experimentally as α(p) = (1 − p) when using the Euler–Maruyama inte-
grator with uniform time steps, and roughly α = 1 when using the Euler–Maruyama
integrator with adaptive time steps, (cf. Fig. 8). We further estimate the variance con-
vergence rate to β(p) = 2(1 − p), when using uniform time-stepping, and roughly
62 H. Hoel et al.

p = 0.5, TOL = 10−3 p = 0.67, TOL = 10−2 p = 0.75, TOL = 10−1.5


100

10−1

10−2

10−3
A(g ; M )
10−4 A(Δ g; M )
c2−
10−5
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Level  Level  Level 

p = 0.5, TOL = 10−3 p = 0.67, TOL = 10−2 p = 0.75, TOL = 10−1

100

10−1

10−2

A(g ; M )
10−3
A(Δ g; M )
−4
c2 TOL−0.5 c2−0.33 c2−0.25
10
5 10 15 20 5 10 15 20 25 5 10 15 20 25
Level  Level  Level 


Fig. 8 (Top) Average errors E Δ g  for Example 6 solved with the MSE adaptive MLMC algo-
rithm for three singularity exponent values. (Bottom) Corresponding average errors for the uniform
MLMC algorithm

to β = 1 when using MSE adaptive time-stepping, (cf. Fig. 9). The low weak con-
vergence rate for uniform MLMC implies that the number of levels L in the MLMC
estimator will be become very large, even with fairly high tolerances. Since compu-
tations of realizations on high levels are extremely costly, we have, for the sake of
computational feasibility, chosen a very low value, M , = 20, for the initial number
of samples in both MLMC algorithms. The respective estimators’ use of samples,
M , (cf. Fig. 10), shows that the low number of initial samples is not strictly needed
for the the adaptive MLMC algorithm, but for the sake of fair comparisons, we have
chosen to use the same parameter values in both algorithms.
From the rate estimates of α and β, we predict the computational cost of reaching
the approximation goal (37) for the respective MLMC algorithms to be
   
Costadp (AML ) = O log(TOL)4 TOL−2 and Costunf (AML ) = O TOL− 1−p ,
1
Construction of a Mean Square Error Adaptive … 63

p = 0.5, TOL = 10−3 p = 0.67, TOL = 10−2 p = 0.75, TOL = 10−1.5


101

100

10−1

10−2

10−3

10−4
V(g ; M )
V(Δ g; M )
−5
10
c2−
10−6
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Level  Level  Level 

p = 0.5, TOL = 10−3 p = 0.67, TOL = 10−2 p = 0.75, TOL = 10−1


101
100
10−1
10−2
10−3
10−4
10−5 V(g ; M )
10−6 c2−1 V(Δ g; M ) c2 TOL−0.5
−7
10 c2−0.67

10−8
5 10 15 20 5 10 15 20 25 5 10 15 20 25
Level  Level  Level 

Fig. 9 (Top) Variances Var(Δ g) for for Example 6 solved with the MSE adaptive MLMC algorithm
for three singularity exponent values. (Bottom) Corresponding variances for the uniform MLMC
algorithm. The more noisy data on the highest levels is due to the low number used for the initial
samples, M̂ = 20, and only a subset of the generated 100 multilevel estimator realizations reached
the last levels

by using the estimate (45) and Theorem 1 respectively. These predictions fit well
with the observed computational runtime for the respective MLMC algorithms,
(cf. Fig. 11). Lastly, we observe that the numerical results are consistent with both
algorithms fulfilling the goal (37) in Fig. 12.
Computer Implementation
The computer code for all algorithms was written in Java and used the “Stochastic
Simulation in Java” library to sample the random variables in parallel from thread-
independent MRG32k3a pseudo random number generators, [24]. The experiments
were run on multiple threads on Intel Xeon(R) CPU X5650, 2.67GHz processors
and the computer graphics were made using the open source plotting library Mat-
plotlib, [18].
64 H. Hoel et al.

p = 0.5 p = 0.67 p = 0.75


109

108 TOL = 10−2.11 TOL = 10−1.56 TOL = 10−1.06


TOL = 10−3 TOL = 10−2 TOL = 10−1.5
107
c2−
106
E[M ]

105

104

103

102

101
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Level  Level  Level 

p = 0.5 p = 0.67 p = 0.75


108
−2.11 −1.56
TOL = 10 TOL = 10 TOL = 10−1.06
107
TOL = 10−3 TOL = 10−2 TOL = 10−1.5
106 c2−1 c2−0.83 c2−0.75
105
E[M ]

104

103

102

101
5 10 15 20 5 10 15 20 25 5 10 15 20 25
Level  Level  Level 

Fig. 10 (Top) Average number of samples M for for Example 6 solved with the MSE adaptive
MLMC algorithm for three singularity exponent values. (Bottom) Corresponding average number of
samples for the uniform MLMC algorithm. The plotted decay rate reference lines, c2−((β(p)+1)/2) ,
for M follow implicitly from Eq. (43) (assuming that β(p) = 2(1 − p) is the correct variance decay
rate)

104 Adaptive MLMC c2 TOL−3 c2 TOL−4


c1 TOL−2
3
10
Uniform MLMC
Runtime [s]

102

101

100

10−1

10−2
101 102 103 101 102 100.50 101 101.50
TOL−1 TOL−1 TOL−1

Fig. 11 Average runtime versus TOL−1 for the two MLMC algorithms for three singularity expo-
nent values in Example 6
Construction of a Mean Square Error Adaptive … 65

|E[g(XT )] − AML (ωi ; TOL, δ)| p=0.5 p=0.67 p=0.75


TOL
100 3 1 2
5 3 Adaptive MLMC
0 1 0 0 1 0
10−1 1 2 1 1 0 2 1 0
0 0 0 1
0 1 0
10−2 0 0
0 0
10−3

10−4

10−5
101 102 103 101 102 100.50 101 101.50
TOL−1 TOL−1 TOL−1

p=0.5 p=0.67 p=0.75


|E[g(XT )] − AML (ωi ; TOL, δ)|

TOL
100 0 1 0 1 0 0 0
Uniform MLMC
7 7 2 0 3 5 0 0 0
10−1 8 4 2 4 1 2
4 6 0 2
10−2 2 4
5 3
10−3

10−4

10−5
101 102 103 101 102 100.50 101
TOL−1 TOL−1 TOL−1

Fig. 12 Approximation errors for both of the MLMC algorithms solving Example 6. At every
TOL value, circles and triangles represent the errors from 100 independent multilevel estimator
realizations of the respective algorithms

5 Conclusion

We have developed an a posteriori, MSE adaptive Euler–Maruyama time-stepping


algorithm and incorporated it into an MSE adaptive MLMC algorithm. The MSE
error expansion presented in Theorem 2 is fundamental to the adaptive algorithm.
Numerical tests have shown that MSE adaptive time-stepping may outperform uni-
form time-stepping, both in the single-level MC setting and in the MLMC setting,
(Examples 4 and 6). Due to the complexities of implementing adaptive time-stepping,
the numerical examples in this work were restricted to quite simple, low-regularity
SDE problems with singularities in the temporal coordinate. In the future, we aim
to study SDE problems with low-regularity in the state coordinate (preliminary tests
and analysis do however indicate that then some ad hoc molding of the adaptive
algorithm is required).
Although a posteriori adaptivity has proven to be a very effective method for
deterministic differential equations, the use of information from the future of the
numerical solution of the dual problem makes it a somewhat unnatural method to
extend to Itô SDE: It can result in numerical solutions that are not Ft -adapted,
which consequently may introduce a bias in the numerical solutions. [7] provides
an example of a failing adaptive algorithm for SDE. A rigorous analysis of the
convergence properties of our developed MSE adaptive algorithm would strengthen
the theoretical basis of the algorithm further. We leave this for future work.
66 H. Hoel et al.

Acknowledgments This work was supported by King Abdullah University of Science and Technol-
ogy (KAUST); by Norges Forskningsråd, research project 214495 LIQCRY; and by the University
of Texas, Austin Subcontract (Project Number 024550, Center for Predictive Computational Sci-
ence). The first author was and the third author is a member of the Strategic Research Initiative on
Uncertainty Quantification in Computational Science and Engineering at KAUST (SRI-UQ). The
authors would like to thank Arturo Kohatsu-Higa for his helpful suggestions for improvements in
the proof of Theorem 2.

Theoretical Results

Error Expansion for the MSE in 1D

In this section, we derive a leading-order error expansion for the MSE (12) in the 1D
setting when the drift and diffusion coefficients are respectively mappings of the form
a : [0, T ] × R → R and b : [0, T ] × R → R. We begin by deriving a representation
of the MSE in terms of products of local errors and weights.
Recalling the definition of the flow map, ϕ(x, t) := g(XTx,t ), and the first variation
of the flow map and the path itself given in Sect. 2.1.1, we use the Mean Value
Theorem to deduce that
 
g(XT) − g X T = ϕ(0, x0 ) − ϕ(0, X T )

N−1
= ϕ(tn , X tn ) − ϕ(tn+1 , X tn+1 )
n=0


N−1   (48)
X tn ,tn
= ϕ tn+1 , Xtn+1 − ϕ(tn+1 , X tn+1 )
n=0


N−1
 
= ϕx tn+1 , X tn+1 + sn Δen Δen ,
n=0

X ,t
where the local error is given by Δen := Xtn+1
tn n
− X tn+1 and sn ∈ [0, 1]. Itô expansion
of the local error gives the following representation:
 tn+1  tn+1
X tn ,tn X ,t
Δen = a(t, Xt ) − a(tn , X tn ) dt + b(t, Xt tn n ) − b(tn , X tn ) dWt
t tn
n     
Δan Δbn
 tn+1  t  tn+1  t
axx 2 X ,t X ,t
= (at + ax a + b )(s, Xs tn n ) ds dt + (ax b)(s, Xs tn n ) dWs dt
t tn 2 tn tn
n     
|n
=:Δa 8n
=:Δa
 tn+1  t  tn+1  t
bxx 2 X ,t X ,t
+ (bt + bx a + b )(s, Xs tn n )ds dWt + (bx b)(s, Xs tn n )dWs dWt .
t tn 2 tn tn
n     
|n
=:Δb 8n
=:Δb
(49)
Construction of a Mean Square Error Adaptive … 67

By Eq. (48) we may express the MSE by the following squared sum
⎡⎛ ⎞2 ⎤
  2 
Ň−1

E g(XT ) − g X T = E ⎣⎝ ϕx tn+1 , X tn+1 + sn Δen Δen ⎠ ⎦
n=0


Ň−1
   

= E ϕx tk+1 , X tk+1 + sk Δek ϕx tn+1 , X tn+1 + sn Δen Δek Δen .


n,k=0

This is the first step in deriving the error expansion in Theorem 2. The remaining
steps follow in the proof below.
Proof of Theorem 2. The main tools used in proving this theorem are Taylor and
Itô–Taylor expansions, Itô isometry, and truncation of higher order terms. For errors
attributed to the leading-order local error term, Δb 8 n , (cf. Eq. (49)), we do detailed
calculations, and the remainder is bounded by stated higher order terms.
We begin by noting that under the assumptions in Theorem 2 Lemmas 1 and 2
respectively verify then the existence and uniqueness of the solution of the SDE X
and the numerical solution X, and provide higher order moment bounds for both.
Furthermore, due to the assumption of the mesh points being stopping times for
which tn is Ftn−1 -measurable for all n, it follows also that the numerical solution is
adapted to the filtration, i.e., X tn is Ftn -measurable for all n.
We further need to extend the flow map and the first variation notation from
x,tk
Sect. 2.1.1. Let X tn for n ≥ k denote the numerical solution of the Euler–Maruyama
scheme
x,tk x,tk x,tk x,tk
X tj+1 = X tj + a(tj , X tj )Δtj + b(tj , X tj )ΔWj , j ≥ k, (50)

x,tk x,tk
with initial condition Xtk = x. The first variation of X tn is defined by ∂x X tn . Provided

that E |x|2p < ∞ for all p ∈ N, x is Ftk -measurable and provided the assumptions
of Lemma 2 hold, it is straightforward to extend the proof of the lemma to verify
x,tk x,tk
that (X , ∂x X ) converges strongly to (X x,tk , ∂x X x,tk ) for t ∈ [tk , T ],
#  $
 2p  1/2p
 x,tk k
max E X tn − Xtx,t
n
 ≤ C Ň −1/2 , ∀p ∈ N
k≤n≤Ň
#  $
 2p  1/2p
 x,tk x,tk 
max E ∂x X tn − ∂x Xtn  ≤ C Ň −1/2 , ∀p ∈ N
k≤n≤Ň

and        
 x,tk 2p  x,tk 2p
max max E X tn  , E ∂x X tn  < ∞, ∀p ∈ N. (51)
k≤n≤Ň
68 H. Hoel et al.

In addition to this, we will also make use of moment bounds for the second and
third variation of the flow map in the proof, i.e., ϕxx (t, x) and ϕxxx (t, x). The second
variation is described in Section “Variations of the flow map”,
where it is shown in
Lemma 3 that provided that x is Ft -measurable and E |x|2p < ∞ for all p ∈ N, then




max E |ϕxx (t, x)|2p , E |ϕxxx (t, x)|2p , E |ϕxxxx (t, x)|2p < ∞, ∀p ∈ N.

Considering the MSE error contribution from the leading order local error terms
8 n , i.e.,
Δb
   

8 k Δb
E ϕx tk+1 , X tk+1 + sk Δek ϕx tn+1 , X tn+1 + sn Δen Δb 8n , (52)

we have for k = n,
     2
E ϕx tn+1 , X tn+1 + ϕxx tn+1 , X tn+1 + ŝn Δen sn Δen Δb8 2n
 2  
= E ϕx tn+1 , X tn+1 Δb8 2n + o Δtn2 .

 
The above o Δtn2 follows from Young’s and Hölder’s inequalities,
   
8 2n
E 2ϕx tn+1 , X tn+1 ϕxx tn+1 , X tn+1 + ŝn Δen sn Δen Δb
#  $
    2 3 8 4n
Δe2n Δb
≤ C E ϕx tn+1 , X tn+1 ϕxx tn+1 , X tn+1 + ŝn Δen Δtn + E
Δtn3

    2  3
≤ C E E ϕx tn+1 , X tn+1 ϕxx tn+1 , X n+1 + ŝn Δen Ftn Δtn
   2 4  2 4  6
| 2n Δb
Δa 8 4n 8n
8 n Δb
Δa | Δb
Δb 8 8n
Δb
+E + E + E n n
+ E
Δtn3 Δtn3 Δtn3 Δtn3
! 3   3 

1 
≤ C E Δtn3 + E E Δa | 4n |Ftn 1 + E E Δa 8 4n |Ftn
Δtn Δtn
3   3  3
4 1   1 "
+ E E Δb | |Ft 1 + E E Δb 8 4
|F t E E 8
Δb
8
|F t
n n
Δtn n n
Δtn n n
Δtn5

= E o(Δtn2 )
(53)
where the last inequality is derived by applying the moment bounds for multiple
Itô integrals described in [22, Lemma 5.7.5] and under the assumptions (R.1), (R.2),
(M.1), (M.2) and (M.3). This yields
Construction of a Mean Square Error Adaptive … 69
 
 axx 2 4 
| 4n |Ftn ≤ CE  X tn ,tn 
E Δa sup at + ax a + b  (s, Xs )  Ftn Δtn8 ,
s∈[tn ,tn+1 ) 2
 

8 n |Ftn ≤ CE
4 ,t 
E Δa sup |ax b| (s, Xs tn n )  Ftn Δtn6 ,
4 X
s∈[tn ,tn+1 )
   
4  bxx 2 4 
| |Ft ≤ CE  X tn ,tn 
E Δb sup bt + bx a + b (s, Xs )  Ftn Δtn6 , (54)
n n
s∈[tn ,tn+1 ) 2 
 
4 
8 n |Ftn ≤ CE ,t 
E Δb sup |bx b| (s, Xs tn n )  Ftn Δtn4 ,
4 X
s∈[tn ,tn+1 )
 
8 
8 n |Ftn ≤ CE 
E Δb sup |bx b| 8
(s, XsX tn ,tn )  Ftn Δtn8 .
s∈[tn ,tn+1 )

And by similar reasoning,


 2

8 2n ≤ CE Δtn4 .
E ϕxx X tn+1 + ŝn Δen , tn+1 sn2 Δe2n Δb

For achieving independence between forward paths and dual solutions in the expec-
tations, an Itô–Taylor expansion of ϕx leads to the equality
 2    
E ϕx tn+1 , X tn+1 Δb8 2n = E ϕx tn+1 , X tn 2 Δb
8 2n + o Δtn2 .

Introducing the null set completed σ -algebra


 
,n = σ σ ({Ws }0≤s≤tn ) ∨ σ ({Ws − Wtn+1 }tn+1 ≤s≤T ) ∨ σ (X0 ),
F
 2
,n measurable by construction, (cf. [27, Appen-
we observe that ϕx tn+1 , X tn is F
dix B]). Moreover, by conditional expectation,
 2   2 n
E ϕx tn+1 , X tn Δb8 2n = E ϕx tn+1 , X tn 2 E Δb
8 n |F ,
 
 2 Δtn2  2
= E ϕx tn+1 , X tn (bx b) (tn , X tn )
2
+ o Δtn ,
2

where the last equality follows from using Itô’s formula,


 t # $
X ,t b2 2  X ,t
(bx b)2 (t, Xt tn n ) = (bx b)2 (tn , X tn ) + 2
∂t + a∂x + ∂x (bx b) (s, Xs tn n ) ds
tn 2
 t 
X ,t
+ b∂x (bx b)2 (s, Xs tn n ) dWs , t ∈ [tn , tn+1 ),
tn
70 H. Hoel et al.

to derive that
  
2 tn+1 t 2 
8 , 
E Δbn |F = E
n
(bx b)(s, XsX tn ,tn )dWs dWt X tn
tn tn

(bx b)2 (tn , X tn ) 2  


= Δtn + o Δtn2 .
2
 2
Here, the higher order o Δtn terms are bounded in a similar fashion as the terms in
inequality (53), by using [22, Lemma 5.7.5].
For the terms in (52) for which k < n, we will show that


Ň−1     
Ň−1 
E ϕx tk+1 , X tk+1 + sk Δek ϕx tn+1 , X tn+1 + sn Δen Δb 8n =
8 k Δb E o Δtn2 ,
k,n=0 n=0
(55)
which means that the contribution to the MSE from these terms is negligible to
leading order. For the use in later expansions, let us first observe by use of the chain
rule that for any Ftn -measurable y with bounded second moment,

ϕx (tk+1 , y) = g (XT
y,tk+1 y,tk+1
)∂x XT
y,tk+1
Xt +sm Δek ,tk+1 Xt ,tn+1
= g (XT k+1
y,t
)∂x XT n+1 ∂x Xtn+1k+1
 y,t  y,t
= ϕx tn+1 , Xtn+1k+1 ∂x Xtn+1k+1 ,

and that
Xt +sk Δek ,tk+1 Xt +sk Δek ,tk+1
∂x Xtn+1k+1 = ∂x Xtn k+1
 tn+1
X tk+1 +sk Δek ,tk+1 X tk+1 +sk Δek ,tk+1
+ ax (s, Xs )∂x Xs ds
tn
 tn+1
X tk+1 +sk Δek ,tk+1 X tk+1 +sk Δek ,tk+1
+ bx (s, Xs )∂x Xs dWs .
tn

We next introduce the σ -algebra

,k,n := σ ({Ws }0≤s≤t ) ∨ σ ({Ws − Wt }t ≤s≤t ) ∨ σ ({Ws − Wt }t ≤s≤T ) ∨ σ (X0 ),


F k k+1 k+1 n n+1 n+1

and Itô–Taylor expand the ϕx functions in (55) about center points that are F ,k,n -
measurable:

  X t +sk Δek ,tk+1 X t +sk Δek ,tk+1
ϕx tk+1 , X tk+1 + sk Δek = ϕx tn+1 , Xtn+1k+1 ∂x Xtn+1k+1
   
X tk ,tk+1 X tk ,tk+1 X t +sk Δek ,tk+1 X t ,tk+1
= ϕx tn+1 , Xtn + ϕxx tn+1 , Xtn Xtn+1k+1 − Xtn k
Construction of a Mean Square Error Adaptive … 71
 2
X t +sk Δek ,tk+1 X t ,tk+1
 Xtn+1k+1 − Xtn k
X t ,tk+1
+ ϕxxx tn+1 , Xtn k
2

X t ,tk+1 X t +sk Δek ,tk+1
+ ϕxxxx tn+1 , (1 − šn )Xtn k + šn Xtn+1k+1
Xt +sk Δek ,tk+1 X t ,tk+1 2 
(Xtn+1k+1 − Xtn k )
×
2

X t ,tk+1 X t ,tk+1
× ∂x Xtn k + ∂xx Xtn k (a(tk , X tk )Δtk + b(tk , X tk )ΔWk + sk Δek )

X t +s̀k (a(tk ,X tk )Δtk +(b(tk ,X tk )ΔWk +sk Δek ),tk+1


+ ∂xxx Xtn k
(a(tk , X tk )Δtk + b(tk , X tk )ΔWk + sk Δek )2
×
2
 tn+1
X tk+1 +sk Δek ,tk+1 X tk+1 +sk Δek ,tk+1
+ ax (s, Xs )∂x Xs ds
tn
 tn+1 
X t +sk Δek ,tk+1 X t +sk Δek ,tk+1
+ bx (s, Xs k+1 )∂x Xs k+1 dWs , (56)
tn

where
Xtk+1 +sk Δek ,tk+1 X t ,tk+1
Xtn+1 − Xtn k
 t  t
n+1 X tk+1 +sk Δek ,tk+1 n+1 X tk+1 +sk Δek ,tk+1
= a(s, Xs )ds + b(s, Xs )dWs
tn tn
X t +s̃k (a(tk ,X tk )Δtk +b(tk ,X tk )ΔWk +sk Δek ),tk+1
+ ∂x Xtn k (a(tk , X tk )Δtk + b(tk , X tk )ΔWk + sk Δek ),

and

  X t ,tk+1
ϕx tn+1 , X tn+1 + sn Δen = ϕx tn+1 , X tn k
 
X t ,tk+1 X k ,tk+1 Δνk,n
2
+ ϕxx tn+1 , X tn k Δνk,n + ϕxxx tn+1 , X n
2

X t ,tk+1 Δνk,n
3
+ ϕxxxx tn+1 , (1 − śn )X tn k + śn (X tn+1 + sn Δen ) , (57)
6

with

Δνk,n := a(tn , X tn )Δtn + b(tn , X tn )ΔWn + sn Δen


X t +ŝk (a(tk ,X tk )Δtk +b(tk ,X tk )ΔWk ),tk+1
+ ∂x X tn k (a(tk , X tk )Δtk + b(tk , X tk )ΔWk + sk Δek ).

Plugging the expansions (56) and (57) into the expectation


72 H. Hoel et al.
   

8 k Δb
E ϕx tk+1 , X k+1 + sk Δek ϕx tn+1 , X n+1 + sn Δen Δb 8n ,

the summands in the resulting expression that only contain products of the first
variations vanishes,
   
X t ,tk+1 X t ,tk+1 X t ,tk+1
E ϕx tn+1 , Xtn k ∂x Xtn k ϕx tn+1 , X tn k+1 8 k Δb
Δb 8n
   

= E E Δb 8 n Δb
8 k |F ,k,n ϕx tn+1 , XtX tk ,tk+1 ∂x XtX tk ,tk+1 ϕx tn+1 , X X tk ,tk+1 = 0.
n n tn

One can further deduce that all of the the summands in which the product of multiple
8 k and Δb
Itô integrals Δb 8 n are multiplied only with one additional Itô integral of
first-order vanish by using the fact that the inner product of the resulting multiple
Itô integrals is zero, cf. [22, Lemma 5.7.2], and by separating the first and second
variations from the Itô integrals by taking a conditional expectation with respect to
the suitable filtration. We illustrate this with a couple of examples,
   
X t ,tk+1 X t ,tk+1 X t ,tk+1
E ϕx tn+1 , Xtn k ∂xx Xtn k b(tk , X tk )ΔWk ϕx tn+1 , X tn k Δb 8n
8 k Δb
  
X tk ,tk+1 X tk ,tk+1 X t ,tk+1
= E ϕx tn+1 , Xtn ∂xx Xtn b(tk , X tk )ΔWk ϕx tn+1 , X tn k 8k
Δb


,n = 0,
8 n |F
× E Δb

and
   
X tk ,tk+1 X tk ,tk+1 X tk ,tk+1
E ϕx tn+1 , Xtn ∂x Xtn b(tn , X tn )ΔWn ϕx tn+1 , X tn 8 8
Δbk Δbn
   
X tk ,tk+1 X tk ,tk+1

= E ϕx tn+1 , Xtn+1 ϕx tn+1 , X tn 8 8 ,


Δbk b(tn , X tn )E Δbn ΔWn |F n
= 0.

From these observations, assumption (M.3), inequality (54), and, when necessary,
,k -
additional expansions of integrands to render the leading order integrand either F
,
or F -measurable and thereby sharpen the bounds (an example of such an expan-
n

sion is
 tn+1  t
8n =
Δb (bx b)(s, XsX tn ,tn )dWs dWt
tn tn
 tn+1  t  X t ,tk+1
X k
,tn
= (bx b) s, Xs tn dWs dWt + h.o.t.).
tn tn
Construction of a Mean Square Error Adaptive … 73

We derive after a laborious computation which we will not include here that
    

E ϕx tk+1 , X t 8n 
8 k Δb
+ sk Δek ϕx tn+1 , X tn+1 + sn Δen Δb
1
k+1

≤ C Ň −3/2 E Δtk2 E Δtn2 .

This further implies that


Ň−1
   

8 k Δb
E ϕx tk+1 , X tk+1 + sk Δek ϕx tn+1 , X tn+1 + sn Δen Δb 8n
k,n=0,k=n


Ň−1 1

≤ C Ň −3/2 E Δtk2 E Δtn2


k,n=0,k=n
⎛ ⎞2
1
Ň−1

≤ C Ň −3/2 ⎝ E Δtn2 ⎠
n=0


Ň−1

≤ C Ň −1/2 E Δtn2 ,
n=0

such that inequality (55) holds.


So far, we have shown that
⎡# $2 ⎤

N−1

E⎣ 8n ⎦
ϕx tn+1 , X tn+1 + sn Δen Δb
n=0
N−1 
 2 (bx b)2  
=E ϕx tn+1 , X tn (tn , X tn )Δtn2 + o Δtn2 . (58)
n=0
2

| n , Δa
The MSE contribution from the other local error terms, Δa | n , can also be
8 n and Δb
,
bounded using the above approach with Itô–Taylor expansions, F -conditioning
m,n

and Itô isometries. This yields that


   
E ϕx tk+1 , X tk+1 + sk Δek ϕx tn+1 , X tn+1 + sn Δen Δa | k Δa|n

     at + ax a + axx b2 /2 
= E ϕx X tk , tk ϕx tn , X tn (tk , X tk )× (59)
2
 a + a a + a b2 /2  
t x xx  
(tn , X tn )Δtk2 Δtn2 + o Δtk2 Δtn2 ,
2
74 H. Hoel et al.
   

E ϕx tk+1 , X tk+1 + sk Δek ϕx tn+1 , X tn+1 + sn Δen Δa 8n


8 k Δa
⎧    
⎨E ϕx tn , X t 2 (ax b)2 (tn , X t )Δt 3 + o Δt 3 , if k = n,
=  n

2


 
n n n
⎩O Ň −3/2 E Δt 3 E Δt 3 1/2 , if k = n,
k n

and
   
E ϕx tk+1 , X tk+1 + sk Δek ϕx tn+1 , X tn+1 + sn Δen Δb | k Δb
|n
⎧    
⎨E ϕx tn , X t 2 (bt +bx a+bxx b2 /2)2 (tn , X t )Δt 3 + o Δt 3 , if k = n,
=  n


3

 
n n n
⎩O Ň −3/2 E Δt 3 E Δt 3 1/2 , if k = n.
k n

Moreover, conservative bounds for error contributions involving products of different


| k Δb
local error terms, e.g., Δa 8 n , can be induced from the above bounds and Hölder’s
inequality. For example,
 ⎡ ⎤
 
 
Ň−1    
E ⎣ ϕ t , X + s Δe |
Δa ϕ t , X + s Δe 8
Δb ⎦
 x k+1 tk+1 k k k x n+1 tn+1 n n n 
 k,n=0 
 ⎡⎛ ⎞⎛ ⎞⎤
 
 
Ň−1  
Ň−1
 
= E ⎣⎝ ϕx tk+1 , X tk+1 + sk Δek Δa | k⎠ ⎝ 8 n ⎠⎦
ϕx tn+1 , X tn+1 + sn Δen Δb 
 k=0 k=0 
' ⎡
( ⎛
( ⎞2 ⎤
( ⎢ Ň−1  
(
≤ )E ⎣⎝ ϕx tk+1 , X t + sk Δek Δa | k⎠ ⎥ ⎦
k+1
k=0
' ⎡
( ⎛
( ⎞2 ⎤
( ⎢ Ň−1
 
×(
)E ⎣⎝ 8 n⎠ ⎥
ϕx tn+1 , X tn+1 + sn Δen Δb ⎦
n=0
⎛ ⎞

Ň−1
= O ⎝Ň −1/2 E Δtn2 ⎠ .
n=0

The proof is completed in two replacement


 steps
 applied
 to ϕx on the right-hand
side of equality (58). First, we replace ϕx tn+1 , X tn by ϕx tn , X tn . Under the regularity
assumed in this theorem, the replacement is possible without introducing additional
leading order error terms as
 
   
 X ,t X ,t X ,t X ,t 
E |ϕx tn+1 , X tn − ϕx tn , X tn | = E g (XT tn n+1 )∂x XT tn n+1 − g (XT tn n )∂x XT tn n 
 
 X ,t X ,t X ,t 
≤ E (g (XT tn n+1 ) − g (XT tn n ))∂x XT tn n+1 
 
 X ,t X ,t X ,t 
+ E g (XT tn n )(∂x XT tn n+1 − ∂x XT tn n )
 
= O Ň −1/2 .
Construction of a Mean Square Error Adaptive … 75

Here, the last equality follows from the assumptions (M.2), (M.3), (R.2), and (R.3),
and Lemmas 1 and 2,
  
 X ,t X ,t X ,t 
E  g (XT tn n+1 ) − g (XT tn n ) ∂x XT tn n+1 
' ⎡
(  2 ⎤ 
( 
( ⎣ X tn ,tn+1 X tn ,tn 
,tn+1 
 2 
)
Xtn+1
⎦  X tn ,tn+1 
≤ C E XT − XT  E ∂x XT 
 
⎛ ⎡ 4 ⎤⎞1/4
 X tn ,tn 
 (1−sn )X tn +sn Xtn+1 ,tn+1 
≤ C ⎝E ⎣∂x XT  ⎦⎠
 
#   4 $1/4
 tn+1 tn+1 
× E  a(s, XsX tn ,tn )ds + b(s, XsX tn ,tn )dWs 
tn tn
   1/4
≤ C E sup |a(s, XsX tn ,tn )|4 Δtn4 + sup |b(s, XsX tn ,tn )|4 Δtn2
tn ≤s≤tn+1 tn ≤s≤tn+1
 
= O Ň −1/2 ,

and that
' 
  ( 
(  2 
  X tn ,tn X tn ,tn  X ,t 
) ≤ C )E ∂x XT tn n+1 − ∂x XT tn n 
 X tn ,tn+1 X ,t
E g (XT )(∂x XT − ∂x XT

' ⎡
( 
(  2 ⎤
( 
X ,tn
Xt tn ,tn+1 
X ,t X tn ,tn  ⎦
= C )E ⎣∂x XT tn n+1 − ∂x XT n+1 ∂x Xtn+1 
 
'  
((  Xt tn ,tn+1 
X ,tn

≤ C )E ∂x XT tn n+1 − ∂x XT n+1


X ,t

 
' ⎡
( 
(   tn+1  tn+1 2 ⎤
( 
X ,tn
Xt tn ,tn+1 
X ,t X ,t  ⎦
+ )E ⎣∂x XT n+1 ax (s, Xs tn n )ds + bx (s, Xs tn n )dWs 
 tn tn 
' ⎡
( 
(   tn+1  tn+1 2 ⎤
( 
X tn ,tn
(1−ŝn )X tn +ŝn Xtn+1 ,tn+1 
X ,t X ,t  ⎦
≤ C )E ⎣∂xx XT ax (s, Xs tn n )ds + bx (s, Xs tn n )dWs 
 tn tn 
 
+ O Ň −1/2
 
= O Ň −1/2 .

 
The last step is to replace the first variation of the exact path ϕx tn , X tn with the
X t ,tn
first variation of the numerical solution ϕx,n = g (X T )∂x X T n . This is also possible
without introducing additional leading order error terms by the same assumptions
and similar bounding arguments as in the two preceding bounds as
76 H. Hoel et al.
 
  
 X t ,tn X ,t X ,t 
E ϕx,n − ϕx tn , X tn  = E g (X T )∂x X T n − g (XT tn n )∂x XT tn n 
     
 X t ,tn X ,t   X ,t   X ,t 
≤ E |g (X T )| ∂x X T n − ∂x XT tn n  + E g (X T ) − g (XT tn n ) ∂x XT tn n 
 
= O Ň −1/2 . 

Variations of the Flow Map

The proof of Theorem 2 relies on bounded moments of variations of order up to


four of the flow map ϕ. Furthermore, the error density depends explicitly on the
first variation. In this section, we we will verify that these variations are indeed well
defined random variables with all required moments bounded. First, we present the
proof of Lemma 1. Having proven Lemma 1, we proceed to present how essentially
the same technique can be used in an iterative fashion to prove the existence, pathwise
uniqueness and bounded moments of the higher order moments. The essentials of
this procedure are presented in Lemma 3.
First, let us define the following set of coupled SDE

(1) (1) (1)


dYu =a(u, Yu )du + b(u, Yu )dWu ,
(2) (1) (2) (1) (2)
dYu =ax (u, Yu )Yu du + bx (u, Yu )Yu dWu ,
  
(3) (1) (2) 2 (1) (3)
dYu = axx (u, Yu ) Yu + ax (u, Yu )Yu du
  
(1) (2) 2 (1) (3)
+ bxx (u, Yu ) Yu + bx (u, Yu )Yu dWu ,
  
(4) (1) (2) 3 (1) (2) (3) (1) (4)
dYu = axxx (u, Yu ) Yu + 3axx (u, Yu )Yu Yu + ax (u, Yu )Yu du
  
(1) (2) 3 (1) (2) (3) (1) (4)
+ bxxx (u, Yu ) Yu + 3bxx (u, Yu )Yu Yu + bx (u, Yu )Yu dWu ,
    
(5) (1) (2) 4 (1) (2) 2 (3)
dYu = axxxx (u, Yu ) Yu + 6axxx (u, Yu ) Yu Yu du
   
(1) (3) 2 (2) (4) (1) (5)
+ axx (u, Yu ) 3 Yu + 4Yu Yu + ax (u, Yu )Yu du
    
(1) (2) 4 (1) (2) 2 (3)
+ bxxxx (u, Yu ) Yu + 6bxxx (u, Yu ) Yu Yu dWu
   
(1) (3) 2 (2) (4) (1) (5)
+ bxx (u, Yu ) 3 Yu + 4Yu Yu + bx (u, Yu )Yu dWu ,
(60)
Construction of a Mean Square Error Adaptive … 77

defined for u ∈ (t, T ] with the initial condition Yt = (x, 1, 0, 0, 0). The first compo-
nent of the vector coincides with Eq. (13), whereas the second one is the first variation
of the path from Eq. (16). The last three components can be understood as the second,
third and fourth variations of the path, respectively.
Making use of the solution of SDE (60), we also define the second, third and
fourth variations as

ϕxx (t, x) = g (XTx,t )∂xx XTx,t + g (XTx,t )(∂x XTx,t )2 ,


ϕxxx (t, x) = g (XTx,t )∂xxx XTx,t + · · · + g (XTx,t )(∂x XTx,t )3 , (61)
 
ϕxxxx (t, x) = g (XTx,t )∂xxxx XTx,t + ··· + g (XTx,t )(∂x XTx,t )4 .

In the sequel, we prove that the solution to Eq. (60) when understood in the integral
sense that extends (13) is a well defined random variable with bounded moments.
Given sufficient differentiability of the payoff g, this results in the boundedness of
the higher order variations as required in Theorem 2.
Proof of Lemma 1. By writing (Ys(1) , Ys(2) ) := (Xsx,t , ∂x Xsx,t ), (13) and (16) together
form an SDE:
dYs(1) = a(s, Ys(1) )ds + b(s, Ys(1) )dWs
(62)
dYs(2) = ax (s, Ys(1) )Ys(2) ds + bx (s, Ys(1) )Ys(2) dWs

for s ∈ (t, T ] and with initial condition Yt = (x, 1). As before, ax stands for the
partial derivative of the drift function with respect to its spatial argument. We note
that (62) has such a structure that dynamics of Ys(2) depends on Ys(1) , that, in turn, is
independent of Ys(2) . By the Lipschitz continuity of a(s, Ys(1) ) and the linear growth
bound of the drift and diffusion coefficients a(s, Ys(1) ) and b(s, Ys(1) ), respectively,
there exists a pathwise unique solution of Ys(1) that satisfies
 
E sup |Ys(1) |2p < ∞, ∀p ∈ N,
s∈[t,T ]

(cf. [22, Theorems 4.5.3 and 4.5.4 and Exercise 4.5.5]). As a solution of an Itô SDE,
XTx,t is measurable with respect to FT it generates.
Note that Theorem [20, Theorem 5.2.5] establishes that the solutions of (62) are
pathwise unique. Kloeden and Platen [22, Theorems 4.5.3 and 4.5.4] note that the
existence and uniqueness theorems for SDEs they present can be modified in order
to account for looser regularity conditions, and the proof below is a case in point.
Our approach below follows closely presentation of Kloeden and Platen, in order to
prove the existence and moment bounds for Ys(2) .
(2)
Let us define Yu,n , n ∈ N by
 u  u
(2)
Yu,n+1 = ax (s, Ys(2) )Ys,n
(2)
ds + bx (s, Ys(2) )Ys,n
(2)
dWs ,
t t
78 H. Hoel et al.

(2)
with Yu,1 = 1, for all u ∈ [t, T ]. We then have, using Young’s inequality, that
    
2  2 
   u  u 
 (2) 2
E Yu,n+1  ≤ 2E  (2) 
+ 2E 
ax (s, Ys(1) )Ys,n ds bx (s, Ys )Ys,n dWs 
(1) (2)
t t
 u  2   u  2 
 (2)   (2) 
≤ 2(u − t)E ax (s, Ys(1) )Ys,n  ds + 2E bx (s, Ys(1) )Ys,n  ds .
t t

Boundedness of the partial derivatives of the drift and diffusion terms in (62) gives
     
 (2) 2
u  (2) 2 
E Yu,n+1  ≤ C(u − t + 1)E 1 + Ys,n  ds .
t

By induction, we consequently obtain that


 
sup E Yu,n
(2) 2
< ∞, ∀n ∈ N.
t≤u≤T

(2) (2) (2)


Now, set ΔYu,n = Yu,n+1 − Yu,n . Then
  2   2 
   u   u 
 (2) 2 (2) (2)
E ΔYu,n  ≤ 2E  ax (s, Ys(1) )ΔYs,n−1 ds + 2E  bx (s, Ys(1) )ΔYs,n−1 dWs 
t t

 u    u   
 (2) 2  (2) 2
≤ 2(u − t) E ax (s, Ys(1) )ΔYs,n−1  ds + 2 E bx (s, Ys(1) )ΔYs,n−1  ds
t t
 u   
 (2) 2
≤ C1 E ΔYs,n−1  ds.
t

Thus, by Grönwall’s inequality,

    
2 C1n−1 u
 (2) 2
E ΔY (2)  ≤ (u − s) n−1
E ΔYs,1  ds.
u,n
(n − 1)! t

  
 (2) 2
Next, let us show that E ΔYs,1  is bounded. First,

   2 
   u  u 
 (2) 2
E ΔYu,1  = E  (2)
ax (s, Ys(1) )Ys,2 ds + (3)
bx (s, Ys(1) )Yu,2 dWs 
t t
  
 (2) 2
≤ C(u − t + 1) sup E Ys,2  .
s∈[t,u]

Consequently, there exists a C ∈ R such that


  C n (u − t)n   C n (T − t)n
(2) 2
E ΔYu,n ≤ , (2) 2
sup E ΔYu,n ≤ .
n! u∈[t,T ] n!
Construction of a Mean Square Error Adaptive … 79

Define
 (2) 
Zn = sup ΔYu,n ,
t≤u≤T

and note that


  T 
 (2) (2) 
Zn ≤ ax (s, Ys(1) )Ys,n+1 − ax (s, Ys(1) )Ys,n  ds
t
 u 
 
+ sup  bx (s, Ys )Ys,n+1 − bx (s, Ys )Ys,n dWs  .
(1) (2) (1) (2)
t≤u≤T t

Using Doob’s and Schwartz’s inequalities, as well as the boundedness of ax and bx ,


  2 


T
(2) (2) 
E |Zn |2 ≤ 2(T − t) E ax (s, Ys(1) )Ys,n+1 − ax (s, Ys(1) )Ys,n  ds
t
 T  2 
 (2) (2) 
+8 E bx (s, Ys(1) )Ys,n+1 − bx (s, Ys(1) )Ys,n  ds
t
C n (T − t)n
≤ ,
n!

for some C ∈ R. Using the Markov inequality, we get

∞ ∞
  n4 C n (T − t)n
P Zn > n−2 ≤ .
n=1 n=1
n!

The right-hand side of the equation above converges by the ratio test, whereas the
Borel–Cantelli Lemma guarantees the (almost sure) existence of K ∗ ∈ N, such that
Zk < k 2 , ∀k > K ∗ . We conclude that Yu,n
(2)
converges uniformly in L 2 (P) to the limit
(2)
&∞ (2) (2)
Yu = n=1 ΔYu,n and that since {Yu,n }n is a sequence of continuous and Fu -adapted
processes, Yu(2) is also continuous and Fu -adapted. Furthermore, as n → ∞,
 u  u   u 
 (1) (3) (1) (3)   (3) (3) 
 a (s, Y )Y ds − a (s, Y )Y ds ≤ C Ys,n − Ys  ds → 0, a.s.,
 x s s,n x s s 
t t t

and, similarly,
  
 u u 
 bx (s, Ys(1) )Ys,n
(3)
dWs − bx (s, Ys(1) )Ys(3) dWs  → 0, a.s.

t t

This implies that (Yu(1) , Yu(2) ) is a solution to the SDE (62).


80 H. Hoel et al.

Having established that Yu(2) solves the relevant SDE and that it has a finite second
moment, we may follow the principles laid out in [22, Theorem 4.5.4] and show that
all even moments of
 u  u
Yu(2) = + ax (t, Ys(1) )Ys(2) ds + bx (t, Ys(1) )Ys(2) dWs
t t

are finite. By Itô’s Lemma, we get that for any even integer l,

 (3) l u  (2) l−2 (2)
Y  = Y  Y ax (s, Y (1) )Y (2) ds
u s s s s

t
l(l − 1)  (2) l−2 
u 2
+ Ys bx (s, Ys(1) )Ys(2) ds
2
t u
 (2) l−2 (2)  
+ Y  Y bx (s, Ys(1) )Ys(2) dWs .
s s
t

Taking expectations, the Itô integral vanishes,


  
l u  (2) l−2 (2)  
E Ys(2)  = E Y  Y
s ax (s, Y (1) )Y (2) ds
s s s
t
  l−2 
u l(l − 1) Ys(2)   (1)

(2) 2
+E bx (s, Ys )Ys ds .
t 2

Using Young’s inequality and exploiting the boundedness of ax , we have that


 
 u


E Y (2) l
≤C E |Y2,u |l ds
u
t
  l−2 
u l(l − 1) Ys(2)    (1)
 (2) 2
+E bx s, Ys Ys ds .
t 2

By the same treatment for the latter integral, using that bx is bounded,
 
 u  l

E Y (2) l
≤C E Yu(2)  ds.
u
t
 l
Thus, by Grönwall’s inequality, E Y (2)  < ∞. u 

Lemma 3 Assume that (R.1), (R.2), and (R.3) in Theorem



2 hold and that for any
fixed t ∈ [0, T ] and x is Ft -measurable such that E |x|2p < ∞ for all p ∈ N. Then,
Eq. (60) has pathwise unique solutions with finite moments. That is,
  2p
max sup E Yu(i)  < ∞, ∀p ∈ N.
i∈{1,2,...,5} u∈[t,T ]
Construction of a Mean Square Error Adaptive … 81

Furthermore, the higher variations as defined by Eq. (61) satisfy are FT -measurable
and for all p ∈ N,
?



@
max E |ϕx (t, x)|2p , E |ϕxx (t, x)|2p , E |ϕxxx (t, x)|2p , E |ϕxxxx (t, x)|2p < ∞.

Proof We note that (60) shares with (62) the triangular dependence structure. That
(j) 1
is, the truncated SDE for {Yu }dj=1 for d1 < 5 has drift and diffusion functions â :
(j)
[0, T ] × Rd1 → Rd1 and b̂ : [0, T ] × Rd1 → Rd1 ×d2 that do not depend on Yu for
j ≥ d1 .
This enables verifying existence of solutions for the SDE in stages: first for
(Y (1) , Y (2) ), thereafter for (Y (1) , Y (2) , Y (3) ), and so forth, proceeding iteratively to
add the next component Y (d1 +1) of the SDE. We shall also exploit this structure
for proving the result of bounded moments for each component. The starting point
for our proof is Lemma 1, which guarantees existence, uniqueness and the needed
moment bounds for the first two components Y (1) , and Y (2) . As one proceeds to Y (i) ,
i > 2, the relevant terms in (64) feature derivatives of a and b of increasingly high
order. The boundedness of these derivatives is guaranteed by assumption (R.1).
(3)
Defining a successive set of approximations Yu,n , n ∈ N by

(3)
u  2
Yu,n+1 = axx (s, Ys(1) ) Ys(2) + ax (s, Ys(2) )Ys,n
(3)
ds
t
 u
 2
+ bxx (s, Ys(1) ) Ys(2) + bx (s, Ys(2) )Ys,n
(3)
dWs ,
t

(3)
with the initial approximation defined by Yu,1 = 0, for all u ∈ [t, T ]. Let us denote by
 
u  2 u  2
Q= axx (s, Ys(1) ) Ys(1) ds + bxx (s, Ys(1) ) Ys(2) dWs (63)
t t

(3)
the terms that do not depend on the, highest order variation Yu,n . We then have, using
Young’s inequality, that
  
2  2 
  2
 u  u 
 (3) 2
E Yu,n+1  ≤ 3E |Q| + 3E  (3) 
ax (s, Ys(1) )Ys,n
+ 3E 
ds bx (s, Ys )Ys,n dWs 
(1) (3)
t t
 u  2   u  2 

 (3)   (3) 
≤ 3E |Q|2 + 3(u − t)E ax (s, Ys(1) )Ys,n  ds + 3E bx (s, Ys(1) )Ys,n  ds .
t t

The term Q is bounded by Lemma 1 and the remaining terms can be bounded by
the same methods as in the proof of 1. Using the same essential tools: Young’s
and Doob’s inequalities, Grönwall’s lemma, Markov inequality and Borel–Cantelli
(3)
Lemma, we can establish the existence of a limit to which Yu,n converges. This limit
(3)
is the solution of of Yu , and has bounded even moments through arguments that are
straightforward generalisations of those already presented in the proof of Lemma 1.
82 H. Hoel et al.

Exploiting the moment bounds of Yu(3) and the boundedness of derivatives of g,


we can establish the measurability of the second order variation ϕx (t, x). Repeating
the same arguments in an iterative fashion, we can establish the same properties for
Yu(4) and Yu(5) as well as ϕxx (t, x), ϕxxx (t, x), ϕxxxx (t, x). 

Error Expansion for the MSE in Multiple Dimensions

In this section, we extend the 1D MSE error expansion presented in Theorem 2 to


the multi-dimensional setting.
Consider the SDE

dXt = a (t, Xt ) dt + b (t, Xt ) dWt , t ∈ (0, T ]


(64)
X0 = x0 ,

where X : [0, T ] → Rd1 , W : [0, T ] → Rd2 , a : [0, T ] × Rd1 → Rd1 and


b : [0, T ] × Rd1 → Rd1 ×d2 . Let further xi denote the ith component of x ∈ Rd1 , a(i) ,
the ith component of a drift coefficient and b(i,j) and bT denote the (i, j)th element
and the transpose of the diffusion matrix b, respectively. (To avoid confusion, this
derivation does not make use of any MLMC notation, particularly not the multilevel
superscript ·{} .)
Using the Einstein summation convention to sum over repeated indices, but not
over the time index n, the 1D local error terms in Eq. (49) generalize into
 tn+1  t
| (i) 1
Δa n = at(i) + ax(i)j a(j) + ax(i)j xk (bbT )(j,k) ds dt,
tn tn 2
 tn+1  t
8 (i)
Δa n = ax(i)j b(j,k) dWs(k) dt,
tn tn
 tn+1  t
| (i) =
Δb
(i,j)
bt + bx(i,j)
1
a(k) + bx(i,j)
(j)
(bbT )(k,) ds dWt ,
n
tn tn
k
2 k x
 tn+1  t
8 (i)
Δb n = bx(i,j) b(k,) dWs() dWt ,
(j)
k
tn tn

where all the above integrand functions in all equations implicitly depend on the
X ,t X ,t
state argument Xs tn n . In flow notation, at(i) is shorthand for at(i) (s, Xs tn n ).
Under sufficient regularity, a tedious calculation similar to the proof of Theorem 2
verifies that, for a given smooth payoff, g : Rd1 → R,
N−1 
  2  2
E g(XT) − g X T ≤E ρ n Δtn + o Δtn ,
2

n=0
Construction of a Mean Square Error Adaptive … 83

where
1  (i,j)
ρ n := ϕxi ,n (bbT )(k,) (bxk bxT ) (tn , X tn )ϕxj ,n . (65)
2
In the multi-dimensional setting, the ith component of first variation of the flow map,
ϕx = (ϕx1 , ϕx2 , . . . , ϕxd1 ), is given by

y,t  y,t (j)


ϕxi (t, y) = gxj (XT )∂xi XT .

The first variation is defined as the second component to the solution of the SDE,
   
dYs(1,i) = a(i) s, Ys(1) ds + b(i,j) s, Ys(1) dWs(j)
   
dYs(2,i,j) = ax(i)k s, Ys(1) Ys(2,k,j) ds + bx(i,)
k
s, Ys(1) Ys(2,k,j) dWs() ,

where s ∈ (t, T ] and the initial conditions are given by Yt(1) = x ∈ Rd1 , Yt(2) = Id1 ,
with Id1 denoting the d1 × d1 identity matrix. Moreover, the extension of the numer-
ical method for solving the first variation of the 1D flow map (23) reads

ϕ xi ,n = cx(j)i (tn , X tn )ϕ xj ,n+1 , n = N − 1, N − 2, . . . 0. (66)


ϕ xi ,N = gxi (X T ),

with the jth component of c : [0, T ] × Rd1 → Rd1 defined by


  (j)
c(j) tn , X tn = X tn + a(j) (tn , X tn )Δtn + b(j,k) (tn , X tn )ΔWn(k) .

Let U and V denote subsets of Euclidean spaces and let us introduce the
& ν = (ν1 , ν2 , . . . , νd ) to represent spatial
multi-index 7 partial derivatives of order
|ν| := dj=1 νj on the following short form ∂xν := dj=1 ∂xνj . We further introduce
the following function spaces.

C(U; V ) := {f : U → V | f is continuous},
Cb (U; V ) := {f : U → V | f is continuous and bounded},
 dj
Cbk (U; V ) := f : U → V | f ∈ C(U; V ) and j f ∈ Cb (U; V )
 dx
for all integers 1 ≤ j ≤ k ,

Cbk1 ,k2 ([0, T ] × U; V ) := f : [0, T ] × U → V | f ∈ C([0, T ] × U; V ), and

j
∂t ∂ν f ∈ Cb ([0, T ] × U; V ) for all integers j ≤ k1 and 1 ≤ j + |ν| ≤ k2 .

Theorem 3 (MSE leading order error expansion in the multi-dimensional setting)


Assume that drift and diffusion coefficients and input data of the SDE (64) fulfill
84 H. Hoel et al.

(R.1) a ∈ Cb2,4 ([0, T ] × Rd1 ; Rd1 ) and b ∈ Cb2,4 ([0, T ] × Rd1 ; Rd1 ×d2 ),
(R.2) there exists a constant C > 0 such that

|a(t, x)|2 + |b(t, x)|2 ≤ C(1 + |x|2 ), ∀x ∈ Rd1 and ∀t ∈ [0, T ],

(R.3) g ∈ Cb4 (Rd1 ),


(R.4) for the initial data, X0 is F0 -measurable and E[|X0 |p ] < ∞ for all p ≥ 1.
Assume further the mesh points 0 = t0 < t1 < · · · < tN = T
(M.1) are stopping times such that tn is Ftn−1 -measurable for n = 1, 2, . . . , N,
(M.2) there exists Ň ∈ N, and a c1 > 0 such that c1 Ň ≤ inf ω∈Ω N(ω) and supω∈Ω
N(ω) ≤ Ň holds for each realization. Furthermore, there exists a c2 > 0 such
that supω∈Ω maxn∈{0,1,...,N−1} Δtn (ω) < c2 Ň −1 ,
(M.3) and there exists a c3 > 0 such that for all p∈[1, 8] and n∈{0, 1, . . . , Ň − 1},


p
E Δtn2p ≤ c3 E Δtn2 .

Then, as Ň increases,
  2
E g(XT ) − g X T
⎡   (i,j)  ⎤
ϕxi (bbT )(k,) (bxk bxT )
N−1 ϕxj (tn , X tn )
= E⎣ Δtn2 + o(Δtn2 )⎦,
n=0
2

where we have dropped the arguments of the first variation as well as the diffusion
matrices for clarity.  
Replacing the first variation ϕxi tn , X n by the numerical approximation ϕxi ,n ,
as defined in (66) and using the error density notation ρ from (65), we obtain the
following to leading order all-terms-computable error expansion:
N−1 
  2
E g(XT ) − g X T =E ρ n Δtn + o(Δtn ) .
2 2
(67)
n=0

A Uniform Time Step MLMC Algorithm

The uniform time step MLMC algorithm for MSE approximations of SDE was
proposed in [8]. Below, we present the version of that method that we use in the
numerical tests in this work for reaching the approximation goal (2).
Construction of a Mean Square Error Adaptive … 85

Algorithm 5 mlmcEstimator
Input: TOLT , TOLS , confidence δ, input mesh Δt {−1} , input mesh intervals N−1 , inital number
, weak convergence rate α, SDE problem.
of samples M,
Output: Multilevel estimator AML .
Compute the confidence parameter CC (δ) by (42).
Set L = −1.
while L < 3 or (44), using the input rate α, is violated do
Set L = L + 1.  
Set ML = M,, generate a set of (Euler–Maruyama) realizations {Δ g ωi, }ML on mesh and
i=1
Wiener path pairs (Δt {L−1} , Δt {L} ) and (W {L−1} , W {L} ), where the uniform mesh pairs have
step sizes Δt {L−1} = T /NL−1 and Δt {L} = T /NL ), respectively.
for  = 0 to L do
Compute the sample variance V (Δ g; Ml ).
end for
for  = 0 to L do
Determine the number of samples by
2 3 4
CC 2 (δ) Var(Δ g) *
L
M = N  Var(Δ g) .
TOLS 2 N
=0

(The equation for Ml is derived by Lagrangian optimization, cf. Sect. 3.2.1.)


if New value of M is larger than the old value then
  Mnew
Compute additional (Euler–Maruyama) realizations {Δ g ωi, }i=M  +1
on mesh and
Wiener path pairs (Δt {−1} , Δt {} ) and (W {−1} , W {} ), where the uniform mesh pairs
have step sizes Δt {−1} = T /(2 N−1 ) and Δt {} = T /(2+1 N−1 ), respectively.
end if
end for
end while
Compute AML using the generated samples by the formula (7).

References

1. Avikainen, R.: On irregular functionals of SDEs and the Euler scheme. Financ. Stoch. 13(3),
381–401 (2009)
2. Bangerth, W., Rannacher, R.: Adaptive Finite Element Methods for Differential Equations.
Lectures in Mathematics ETH Zürich. Birkhäuser, Basel (2003)
3. Barth, A., Lang, A.: Multilevel Monte Carlo method with applications to stochastic partial
differential equations. Int. J. Comput. Math. 89(18), 2479–2498 (2012)
4. Cliffe, K.A., Giles, M.B., Scheichl, R., Teckentrup, A.L.: Multilevel Monte Carlo methods and
applications to elliptic PDEs with random coefficients. Comput. Vis. Sci. 14(1), 3–15 (2011)
5. Collier, Nathan, Haji-Ali, Abdul-Lateef, Nobile, Fabio, von Schwerin, Erik, Tempone, Raúl:
A continuation multilevel Monte Carlo algorithm. BIT Numer. Math. 55(2), 399–432 (2014)
6. Durrett, R.: Probability: Theory and Examples, 2nd edn. Duxbury Press, Belmont (1996)
7. Gaines, J.G., Lyons, T.J.: Variable step size control in the numerical solution of stochastic
differential equations. SIAM J. Appl. Math. 57, 1455–1484 (1997)
8. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
9. Giles, M.B.: Multilevel Monte Carlo methods. Acta Numerica 24, 259–328 (2015)
10. Giles, M.B., Szpruch, L.: Antithetic multilevel Monte Carlo estimation for multi-dimensional
SDEs without Lévy area simulation. Ann. Appl. Probab. 24(4), 1585–1620 (2014)
86 H. Hoel et al.

11. Gillespie, D.T.: The chemical Langevin equation. J. Chem. Phys. 113(1), 297–306 (2000)
12. Glasserman, P.: Monte Carlo Methods in Financial Engineering. Applications of Mathematics
(New York), vol. 53. Springer, New York (2004). Stochastic Modelling and Applied Probability
13. Haji-Ali, A.-L., Nobile, F., von Schwerin, E., Tempone, R.: Optimization of mesh hierarchies
in multilevel Monte Carlo samplers. Stoch. Partial Differ. Equ. Anal. Comput. 1–37 (2015)
14. Heinrich, S.: Monte Carlo complexity of global solution of integral equations. J. Complex.
14(2), 151–175 (1998)
15. Heinrich, S., Sindambiwe, E.: Monte Carlo complexity of parametric integration. J. Complex.
15(3), 317–341 (1999)
16. Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Implementation and analysis of an
adaptive multilevel Monte Carlo algorithm. Monte Carlo Methods Appl. 20(1), 1–41 (2014)
17. Hofmann, N., Müller-Gronbach, T., Ritter, K.: Optimal approximation of stochastic differential
equations by adaptive step-size control. Math. Comp. 69(231), 1017–1034 (2000)
18. Hunter, J.D.: Matplotlib: a 2d graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007)
19. Ilie, S.: Variable time-stepping in the pathwise numerical solution of the chemical Langevin
equation. J. Chem. Phys. 137(23), 234110 (2012)
20. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. Graduate Texts in Math-
ematics, vol. 113, 2nd edn. Springer, New York (1991)
21. Kebaier, A.: Statistical Romberg extrapolation: a new variance reduction method and applica-
tions to option pricing. Ann. Appl. Probab. 15(4), 2681–2705 (2005)
22. Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Applications
of Mathematics (New York). Springer, Berlin (1992)
23. Lamba, H., Mattingly, J.C., Stuart, A.M.: An adaptive Euler-Maruyama scheme for SDEs:
convergence and stability. IMA J. Numer. Anal. 27(3), 479–506 (2007)
24. L’Ecuyer, P., Buist, E.: Simulation in Java with SSJ. In: Proceedings of the 37th conference on
Winter simulation, WSC ’05, pages 611–620. Winter Simulation Conference (2005)
25. Milstein, G.N., Tretyakov, M.V.: Quasi-symplectic methods for Langevin-type equations. IMA
J. Numer. Anal. 23(4), 593–626 (2003)
26. Mishra, S., Schwab, C.: Sparse tensor multi-level Monte Carlo finite volume methods for
hyperbolic conservation laws with random initial data. Math. Comp. 81(280), 1979–2018
(2012)
27. Øksendal, B.: Stochastic Differential Equations. Universitext, 5th edn. Springer, Berlin (1998)
28. Platen, E., Heath, D.: A Benchmark Approach to Quantitative Finance. Springer Finance.
Springer, Berlin (2006)
29. Shreve, S.E.: Stochastic Calculus for Finance II. Springer Finance. Springer, New York (2004).
Continuous-time models
30. Skeel, R.D., Izaguirre, J.A.: An impulse integrator for Langevin dynamics. Mol. Phys. 100(24),
3885–3891 (2002)
31. Szepessy, A., Tempone, R., Zouraris, G.E.: Adaptive weak approximation of stochastic differ-
ential equations. Comm. Pure Appl. Math. 54(10), 1169–1214 (2001)
32. Talay, D.: Stochastic Hamiltonian systems: exponential convergence to the invariant measure,
and discretization by the implicit Euler scheme. Markov Process. Relat. Fields 8(2), 163–198
(2002). Inhomogeneous random systems (Cergy-Pontoise, 2001)
33. Talay, D., Tubaro, L.: Expansion of the global error for numerical schemes solving stochastic
differential equations. Stoch. Anal. Appl. 8(4), 483–509 (1990)
34. Teckentrup, A.L., Scheichl, R., Giles, M.B., Ullmann, E.: Further analysis of multilevel Monte
Carlo methods for elliptic PDEs with random coefficients. Numer. Math. 125(3), 569–600
(2013)
35. Yan, L.: The Euler scheme with irregular coefficients. Ann. Probab. 30(3), 1172–1194 (2002)
Vandermonde Nets and Vandermonde
Sequences

Roswitha Hofer and Harald Niederreiter

Abstract A new family of digital nets called Vandermonde nets was recently
introduced by the authors. We generalize the construction of Vandermonde nets
with a view to obtain digital nets that serve as stepping stones for new constructions
of digital sequences called Vandermonde sequences. Another new family of Van-
dermonde sequences is built from global function fields, and this family of digital
sequences has asymptotically optimal quality parameters for a fixed prime-power
base and increasing dimension.

Keywords Low-discrepancy point sets and sequences · (t, m, s)-nets · (t, s)-
sequences · Digital point sets and sequences

1 Introduction and Basic Definitions

Low-discrepancy point sets and sequences are basic ingredients of quasi-Monte Carlo
methods for numerical integration. The most powerful known methods for the con-
struction of low-discrepancy point sets and sequences are based on the theory of
(t, m, s)-nets and (t, s)-sequences, which are point sets, respectively sequences,
satisfying strong uniformity properties with regard to their distribution in the s-
dimensional unit cube [0, 1]s . Various methods for the construction of (t, m, s)-nets
and (t, s)-sequences have been developed, and we refer to the monograph [1] for an
excellent survey of these methods. We follow the recent handbook article [9] in the
notation and terminology. First we recall the definition of a (t, m, s)-net.

R. Hofer
Institute of Financial Mathematics and Applied Number Theory,
Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
e-mail: roswitha.hofer@jku.at
H. Niederreiter (B)
Johann Radon Institute for Computational and Applied Mathematics,
Austrian Academy of Sciences, Altenbergerstr. 69, 4040 Linz, Austria
e-mail: ghnied@gmail.com

© Springer International Publishing Switzerland 2016 87


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_3
88 R. Hofer and H. Niederreiter

Definition 1 Let b ≥ 2 and s ≥ 1 be integers and let t and m be integers with 0 ≤


t ≤ m. A (t, m, s)-net in base b is a point set P consisting of bm points in the s-
dimensional half-open unit cube [0, 1)s such that every subinterval J of [0, 1)s of
the form
 s
J= [ai b−di , (ai + 1)b−di )
i=1

with integers di ≥ 0 and 0 ≤ ai < bdi for 1 ≤ i ≤ s and with volume bt−m contains
exactly bt points of P.

The number t is called the quality parameter of a (t, m, s)-net in base b and it
should be as small as possible in order to get strong uniformity properties of the net.
It was shown in [7] (see also [8, Theorem 4.10]) that in the nontrivial case m ≥ 1,
the star discrepancy D ∗N (P) of a (t, m, s)-net P in base b with N = bm satisfies
 
D ∗N (P) ≤ B(b, s)bt N −1 (log N )s−1 + O bt N −1 (log N )s−2 , (1)

where B(b, s) and the implied constant in the Landau symbol depend only on b and
s. The currently best values of B(b, s) are due to Kritzer [6] for odd b and to Faure
and Kritzer [3] for even b.
Most of the known constructions of (t, m, s)-nets are based on the digital method
which was introduced in [7]. Although the digital method works for any base b ≥ 2,
we focus in the present paper on the case where b is a prime power. In line with
standard notation, we write q for a prime-power base. The construction of a digital
net over Fq proceeds as follows. Given a prime power q, a dimension s ≥ 1, and an
integer m ≥ 1, we let Fq be the finite field of order q and we choose m × m matrices
C (1) , . . . , C (s) over Fq . We write Z q = {0, 1, . . . , q − 1} ⊂ Z for the set of digits in
base q. Then we define the map Ψm : Fqm → [0, 1) by


m

Ψm (b ) = ψ(b j )q − j
j=1

for any column vector b = (b1 , . . . , bm ) ∈ Fqm , where ψ : Fq → Z q is a chosen


bijection. With a fixed column vector b ∈ Fqm , we associate the point
 
Ψm (C (1) b ), . . . , Ψm (C (s) b ) ∈ [0, 1)s . (2)

By letting b range over all q m column vectors in Fqm , we arrive at a digital net
consisting of q m points in [0, 1)s .

Definition 2 If the digital net P over Fq consisting of the q m points in (2) with b
ranging over Fqm is a (t, m, s)-net in base q for some value of t, then P is called a
digital (t, m, s)-net over Fq . The matrices C (1) , . . . , C (s) are the generating matrices
of P.
Vandermonde Nets and Vandermonde Sequences 89

This construction of digital nets can be generalized somewhat by employing fur-


ther bijections between Fq and Z q (see [8, p. 63]), but this is not needed for our
purposes since our results depend only on the generating matrices of a given digital
net. Note that a digital net over Fq consisting of q m points in [0, 1)s is always a digital
(t, m, s)-net over Fq with t = m.
A new family of digital nets called Vandermonde nets was recently introduced
by the authors in [5]. In the present paper, we extend the results in [5] in several
directions. Most importantly, we show how to obtain not only new (t, m, s)-nets, but
also new (t, s)-sequences from our approach. It seems reasonable to give the name
Vandermonde sequences to these (t, s)-sequences.
The rest of the paper is organized as follows. In Sect. 2, we briefly review the con-
struction of digital nets in [5]. We generalize this construction in Sect. 3, as a prepa-
ration for the construction of Vandermonde sequences. Finally, the constructions of
new (t, s)-sequences and more generally of (T, s)-sequences called Vandermonde
sequences are presented in Sects. 4 and 5.

2 Vandermonde Nets via Extension Fields

We recall that the construction of an s-dimensional digital net over Fq with q m


points requires m × m generating matrices C (1) , . . . , C (s) over Fq . The row vectors
of these generating matrices belong to the vector space Fqm over Fq , and according
to a suggestion in [10, Remark 6.3] we can view these row vectors as elements of
the extension field Fq m . Instead of choosing generating matrices C (1) , . . . , C (s) , we
may thus set up a single s × m matrix C = (γ j(i) )1≤i≤s, 1≤ j≤m over Fq m . By taking a
vector space isomorphism φ : Fq m → Fqm , we obtain the jth row vector c(i) j ∈ Fq of
m

C (i) as
c(i)
j = φ(γ j )
(i)
for 1 ≤ i ≤ s, 1 ≤ j ≤ m. (3)

The crucial idea of the paper [5] is to consider a matrix C with a Vandermonde-
type structure. Concretely, we choose an s-tuple a = (α1 , . . . , αs ) ∈ Fqs m and we
then set up the s × m matrix C = (γ j(i) )1≤i≤s, 1≤ j≤m over Fq m defined by γ j(1) = α1
j−1

for 1 ≤ j ≤ m and (if s ≥ 2) γ j(i) = αi for 2 ≤ i ≤ s and 1 ≤ j ≤ m. We use the


j

standard convention 00 = 1 ∈ Fq . The digital net over Fq whose generating matrices


are obtained from C and (3) is called a Vandermonde net over Fq .
We need some notation in order to state, in Proposition 1 below, the formula for
the quality parameter of a Vandermonde net given in [5]. Let Fq [X ] be the ring of
polynomials over Fq in the indeterminate X . For any integer m ≥ 1, we put

G q,m = {h ∈ Fq [X ] : deg(h) < m},


Hq,m = {h ∈ Fq [X ] : deg(h) ≤ m, h(0) = 0}.
90 R. Hofer and H. Niederreiter

For the zero polynomial 0 ∈ Fq [X ] we use the convention deg(0) = 0. We define a


second degree function on Fq [X ] by deg∗ (h) = deg(h) for h ∈ Fq [X ] with h = 0 and
deg∗ (0) = −1. We write h = (h 1 , . . . , h s ) ∈ Fq [X ]s for a given dimension s ≥ 1.
For every s-tuple a = (α1 , . . . , αs ) ∈ Fqs m , we put

 
s

L(a) = h ∈ G q,m × s−1
Hq,m : h i (αi ) = 0
i=1

and L
(a) = L(a)\{0}. The following figure of merit was defined in [5, Defini-
tion 2.1]. We use the standard convention that an empty sum is equal to 0.

Definition 3 If L
(a) is nonempty, we define the figure of merit

 
s

ρ(a) = min

deg∗ (h 1 ) + deg(h i ) .
h∈L (a)
i=2

Otherwise, we define ρ(a) = m.

Proposition 1 Let q be a prime power, let s, m ∈ N, and let a ∈ Fqs m . Then the
Vandermonde net determined by a is a digital (t, m, s)-net over Fq with

t = m − ρ(a).

A nonconstructive existence theorem for large figures of merit was shown in


[5, Corollary 2.7] and is stated in the proposition below. The subsequent corollary
follows from this proposition and from Proposition 1.

Proposition 2 Let q be a prime power and let s, m ∈ N. Then there exists an a ∈ Fqs m
with 
ρ(a) ≥ m − s logq m − 3 ,

where logq denotes the logarithm to the base q.

Corollary 1 For any prime power q and any s, m ∈ N, there exists a Vandermonde
net over Fq which is a digital (t, m, s)-net over Fq with

t ≤ m − m − s logq m − 3 .

If we combine Corollary 1 with the discrepancy bound in (1), then we see that the
Vandermonde net P over Fq in Corollary 1 satisfies
 
D ∗N (P) = O N −1 (log N )2s−1 ,

where N = q m and where the implied constant depends only on q and s. If q is


a prime and s ≥ 3, then the exponent 2s − 1 of log N can be improved to s + 1
Vandermonde Nets and Vandermonde Sequences 91

by an averaging argument (see [5, Sect. 3]). Again for a prime q, suitable s-tuples
a ∈ Fqs m yielding this improved discrepancy bound can be obtained by a component-
by-component algorithm (see [5, Sect. 5]).
We comment on the relationship between Vandermonde nets and other known
families of digital nets. A broad class of digital nets, namely that of hyperplane
nets, was introduced in [17] (see also [1, Chap. 11]). Choose α1 , . . . , αs ∈ Fq m not
all 0. Then for the corresponding hyperplane net relative to a fixed ordered basis
ω1 , . . . , ωm of Fq m over Fq , the matrix C = (γ j(i) )1≤i≤s, 1≤ j≤m at the beginning of this
section is given by γ j(i) = αi ω j for 1 ≤ i ≤ s and 1 ≤ j ≤ m (see [1, Theorem 11.5]
and [10, Remark 6.4]). Thus, this matrix C is also a structured matrix, but the structure
is in general not a Vandermonde structure. Consequently, Vandermonde nets are in
general not hyperplane nets relative to a fixed ordered basis of Fq m over Fq . The well-
known family of polynomial lattice point sets (see [1, Chap. 10] and [15]) belongs
to the family of hyperplane nets by [16, Theorem 2] (see also [1, Theorem 11.7]),
and so Vandermonde nets are in general not polynomial lattice point sets.

3 Vandermonde Nets with General Moduli

It was already pointed out in [5, Remark 2.3] that the construction of Vandermonde
nets over Fq in [5], which is described also in Sect. 2 of the present paper, can
be presented in the language of polynomials over Fq . There is then an analogy
with polynomial lattice point sets with irreducible moduli. This analogy was carried
further in [5, Remark 2.4] where a construction of Vandermonde nets with general
moduli was sketched. Such a generalization of the theory of Vandermonde nets is
needed for the construction of Vandermonde sequences in Sect. 4.
For a prime power q and an integer m ≥ 1, we choose a polynomial f ∈ Fq [X ]
with deg( f ) = m which serves as the modulus. We consider the residue class ring
Fq [X ]/( f ) which can be viewed as a vector space over Fq isomorphic to Fqm . Let
B be an ordered basis of the vector space Fq [X ]/( f ) over Fq . We set up the map
κ f : Fq [X ] → Fqm as follows: for every h ∈ Fq [X ], let h ∈ Fq [X ]/( f ) be the residue
class of h modulo f and let κ f (h) ∈ Fqm be the coordinate vector of h relative to the
ordered basis B. It is obvious that κ f is an Fq -linear transformation.
Now we construct an s-dimensional digital net over Fq with m × m generat-
ing matrices C (1) , . . . , C (s) over Fq in the following way. We choose an s-tuple
g = (g1 , . . . , gs ) ∈ G q,m s
. The first generating matrix C (1) has the row vectors
c1(1) , . . . , cm(1) with c(1)
j−1
j = κ f (g1 ) for 1 ≤ j ≤ m. If s ≥ 2, then for 2 ≤ i ≤ s the jth
row vector c(i) (i)
is given by c(i)
j
j of C j = κ f (gi ) for 1 ≤ j ≤ m. The digital net over
Fq with generating matrices C (1) , . . . , C (s) is called the Vandermonde net V (g, f ).
If the modulus f ∈ Fq [X ] is irreducible over Fq , then Fq [X ]/( f ) and Fq m are iso-
morphic as fields, and so it is clear that the present construction of Vandermonde
nets reduces to that in Sect. 2.
92 R. Hofer and H. Niederreiter

In order to determine the quality parameter of V (g, f ), we have to generalize


Definition 3. We write h ◦ g for the composition of two polynomials h, g ∈ Fq [X ],
that is, (h ◦ g)(X ) = h(g(X )). Then for g = (g1 , . . . , gs ) ∈ G q,m
s
and f ∈ Fq [X ]
with deg( f ) = m ≥ 1, we put



s
L(g, f ) = h ∈ G q,m × Hq,m
s−1
: (h i ◦ gi ) ≡ 0 (mod f )
i=1

and L
(g, f ) = L(g, f )\{0}.

Definition 4 Let q be a prime power and let s, m ∈ N. Let f ∈ Fq [X ] with deg( f ) =


m and let g ∈ G q,m
s
. If L
(g, f ) is nonempty, we define the figure of merit


s
ρ(g, f ) = min

deg∗ (h 1 ) + deg(h i ) .
h∈L (g, f )
i=2

Otherwise, we define ρ(g, f ) = m.

Remark 1 It is trivial that we always have ρ(g, f ) ≥ 0. For s = 1 it is clear that


ρ(g, f ) ≤ m. For s ≥ 2 the m + 1 vectors κ f (1), κ f (g1 ), κ f (g12 ),
. . . , κ f (g1m−1 ), κ f (g2 ) in Fqm must be linearly dependent over Fq . Hence for some
b0 , b1 , . . . , bm ∈ Fq , not all 0, we have


m−1
j
b j κ f (g1 ) + bm κ f (g2 ) = 0 ∈ Fqm .
j=0

Since κ f is an Fq -linear transformation, this can also be written as

m−1

j
κf b j g1 + bm g2 = 0 ∈ Fqm .
j=0

m−1 j
The definition of κ f implies that j=0 b j g1 + bm g2 ≡ 0 (mod f ). If we put


m−1
h 1 (X ) = b j X j , h 2 (X ) = bm X, h i (X ) = 0 for 3 ≤ i ≤ s,
j=0

then h = (h 1 , . . . , h s ) ∈ G q,m × Hq,m


s−1
is a nonzero s-tuple belonging to L(g, f ).

Hence L (g, f ) is nonempty and ρ(g, f ) ≤ m by Definition 4.

Theorem 1 Let q be a prime power and let s, m ∈ N. Let f ∈ Fq [X ] with deg( f ) =


m and let g ∈ G q,m
s
. Then the Vandermonde net V (g, f ) is a digital (t, m, s)-net
over Fq with
Vandermonde Nets and Vandermonde Sequences 93

t = m − ρ(g, f ).

Proof The case ρ(g, f ) = 0 is trivial, and so in view of Remark 1 we can assume that
1 ≤ ρ(g, f ) ≤ m. According to a well-known result for digital nets (see [1, Theo-
rem 4.52]),
s it suffices to show the following:(i) for any nonnegative integers d1 , . . . , ds
with i=1 di = ρ(g, f ), the row vectors c j ∈ Fqm , 1 ≤ j ≤ di , 1 ≤ i ≤ s, of the
generating matrices of V (g, f ) are linearly independent over Fq . Suppose, on the
contrary, that we had a linear dependence relation


s 
di
bi, j c(i)
j = 0 ∈ Fq ,
m

i=1 j=1

where all bi, j ∈ Fq and not all of them are 0. By the definition of the c(i)
j and the
Fq -linearity of κ f we obtain


d1 
s 
di
j−1 j
κf b1, j g1 + bi, j gi = 0 ∈ Fqm .
j=1 i=2 j=1

s
This means that i=1 (h i ◦ gi ) ≡ 0 (mod f ), where


d1 
di
h 1 (X ) = b1, j X j−1 ∈ G q,m , h i (X ) = bi, j X j ∈ Hq,m for 2 ≤ i ≤ s,
j=1 j=1

and so h = (h 1 , . . . , h s ) ∈ L
(g, f ). Furthermore, by the definitions of the degree
functions deg∗ and deg in Sect. 2, we have deg∗ (h 1 ) < d1 and deg(h i ) ≤ di for 2 ≤
i ≤ s. It follows that


s 
s
deg∗ (h 1 ) + deg(h i ) < di = ρ(g, f ),
i=2 i=1

which is a contradiction to the definition of ρ(g, f ). 

Now we generalize the explicit construction of Vandermonde nets in [5, Sect. 4].
Let q be a prime power and let s and m be integers with 1 ≤ s ≤ q + 1 and m ≥ 2. Put
g1 (X ) = X ∈ G q,m . If s ≥ 2, then we choose s − 1 distinct elements c2 , . . . , cs of Fq ;
this is possible since s − 1 ≤ q. Furthermore, let f ∈ Fq [X ] be such that deg( f ) =
m. If s ≥ 2, then suppose that f (ci ) = 0 for 2 ≤ i ≤ s (for instance, this condition
is automatically satisfied if f is a power of a nonlinear irreducible polynomial over
Fq ). For each i = 2, . . . , s, we have gcd(X − ci , f (X )) = 1, and so there exists a
uniquely determined gi ∈ G q,m with

gi (X )(X − ci ) ≡ 1 (mod f (X )). (4)


94 R. Hofer and H. Niederreiter

In this way, we arrive at the Vandermonde net V (g, f ) with g = (g1 , . . . , gs ) ∈


s
G q,m .

Theorem 2 Let q be a prime power and let s and m be integers with 1 ≤ s ≤ q + 1


and m ≥ 2. Let f ∈ Fq [X ] be such that deg( f ) = m. If s ≥ 2, then let c2 , . . . , cs ∈
Fq be distinct and suppose that f (ci ) = 0 for 2 ≤ i ≤ s. Then the Vandermonde net
V (g, f ) constructed above is a digital (t, m, s)-net over Fq with t = 0.

Proof According to Theorem 1, it suffices to show that ρ(g, f ) = m. This is trivial


for s = 1 since then L
(g, f ) is empty. Therefore we can assume that s ≥ 2.
We proceed by contradiction and assume that ρ(g, f ) ≤ m −  1. Then by Defin-
ition 4, there exists an s-tuple h = (h 1 , . . . , h s ) ∈ L
(g, f ) with i=1 s
di ≤ m − 1,

where
s d1 = deg (h 1 ) and di = deg(h i ) for 2 ≤ i ≤ s. Now h ∈ L (g, )
s implies that
f
i=1 (h i ◦ gi ) ≡ 0 (mod f ), and multiplying this congruence by k=2 (X − ck )
dk

we get


s 
s 
s
h 1 (X ) (X − ck ) + dk
(h i ◦ gi )(X ) (X − ck )dk ≡ 0 (mod f (X )).
k=2 i=2 k=2

di
If we write h i (X ) = j=1 h i, j X j for 2 ≤ i ≤ s with all h i, j ∈ Fq , then


s 
di 
s
j
(h i ◦ gi )(X ) (X − ck )dk = h i, j gi (X ) (X − ck )dk
k=2 j=1 k=2


di 
s
j
= h i, j gi (X ) (X − ci ) di
(X − ck )dk
j=1 k=2
k =i


di 
s
≡ h i, j (X − ci )di − j (X − ck )dk (mod f (X ))
j=1 k=2
k =i

by (4), and so


s s 
 di 
s
h 1 (X ) (X − ck ) + dk
h i, j (X − ci )di − j (X − ck )dk ≡ 0 (mod f (X )).
k=2 i=2 j=1 k=2
k =i

Let f 0 ∈ Fq [X ] denote
sthe left-hand side of the preceding
s congruence. The first term
of f 0 has degree ≤ i=1 di ≤ m − 1. In the sum i=2 in
the expression
for f0 , a
s s
term appears only if di ≥ 1 and such a term has degree ≤ i=2 di − 1 ≤ i=1 di ≤
m − 1 since d1 = deg∗ (h 1 ) ≥ −1. Altogether we have deg( f 0 ) ≤ m − 1 < deg( f ).
But f divides f 0 according to the congruence above, and so f 0 = 0 ∈ Fq [X ]. If we
assume that dr ≥ 1 for some r ∈ {2, . . . , s}, then substituting X = cr in f 0 (X ) we
obtain
Vandermonde Nets and Vandermonde Sequences 95


dr 
s 
s
0 = f 0 (cr ) = h r, j (cr − cr )dr − j (cr − ck )dk = h r,dr (cr − ck )dk .
j=1 k=2 k=2
k =r k =r

Since the last product is nonzero, we deduce that h r,dr = 0. This is a contradiction to
deg(h r ) = dr . Thus we have shown that di = 0 for 2 ≤ i ≤ s, and so h i = 0 ∈ Fq [X ]
for 2 ≤ i ≤ s. Since f 0 = 0 ∈ Fq [X ], it follows that also h 1 = 0 ∈ Fq [X ]. This is
the final contradiction since h ∈ L
(g, f ) means in particular that h is a nonzero
s-tuple. 

In the case where f ∈ Fq [X ] with deg( f ) = m ≥ 2 is irreducible over Fq , this


construction of Vandermonde (0, m, s)-nets over Fq is equivalent to that in [5,
Sect. 4]. The construction is best possible in terms of the condition on s since it is well
known that if m ≥ 2, then s ≤ q + 1 is a necessary condition for the existence of a
(0, m, s)-net in base q (see [8, Corollary 4.21]). The fact that we can explicitly con-
struct Vandermonde (0, m, s)-nets over Fq for all dimensions s ≤ q + 1 represents
an advantage over polynomial lattice point sets since explicit constructions of good
polynomial lattice point sets are known only for s = 1 and s = 2 (see [8, Sect. 4.4]
and also [1, p. 305]).

4 Vandermonde Sequences from Polynomials

We now extend the work in the previous sections from (finite) point sets to (infinite)
sequences, and thus we arrive at new digital (t, s)-sequences and more generally
digital (T, s)-sequences which we call Vandermonde sequences. We first provide
the necessary background (see [1, Chap. 4] and [9]). For integers b ≥ 2, s ≥ 1, and
m ≥ 1, let [x]b,m denote the coordinatewise m-digit truncation in base b of a point
x ∈ [0, 1]s (compare with [13, p. 194]). We write N0 for the set of nonnegative
integers.

Definition 5 Let b ≥ 2 and s ≥ 1 be integers and let T : N → N0 be a function


with T(m) ≤ m for all m ∈ N. Then a sequence x0 , x1 , . . . of points in [0, 1]s is a
(T, s)-sequence in base b if for all integers k ≥ 0 and m ≥ 1, the points [xn ]b,m with
kbm ≤ n < (k + 1)bm form a (T(m), m, s)-net in base b. If for some integer t ≥ 0 we
have T(m) = t for m > t, then a (T, s)-sequence in base b is called a (t, s)-sequence
in base b.

Every (t, s)-sequence


 S in
 base b is a low-discrepancy sequence, in the sense that
D ∗N (S ) = O N −1 (log N )s for all N ≥ 2, where D ∗N (S ) is the star discrepancy
of the first N terms of S (see [8, Theorem 4.17]). The currently best values of the
implied constant can be found in [6] for odd b and in [3] for even b.
The digital method for the construction of (t, m, s)-nets (see Sect. 1) can be
extended to the digital method for the construction of (T, s)-sequences. As in Sect. 1,
96 R. Hofer and H. Niederreiter

we restrict the attention to the case of a prime-power base b = q. For a given dimen-
sion s ≥ 1, the generating matrices are now ∞ × ∞ matrices C (1) , . . . , C (s) over
Fq , where by an ∞ × ∞ matrix we mean a matrix with denumerably many rows
and columns. Let Fq∞ be the sequence space over Fq , viewed as a vector space of
column vectors over Fq of infinite length. We define the map Ψ∞ : Fq∞ → [0, 1] by



Ψ∞ (e) = ψ(e j )q − j
j=1

for all e = (e1 , e2 , . . .) ∈ Fq∞ , where ψ : Fq → Z q is a chosen bijection. For n =


0, 1, . . ., let
∞
n= a j (n)q j−1 ,
j=1

with all a j (n) ∈ Z q and a j (n) = 0 for all sufficiently large j, be the unique digit
expansion of n in base q. With n we associate the column vector

n = (η(a1 (n)), η(a2 (n)), . . .) ∈ Fq∞ ,

where η : Z q → Fq is a given bijection with η(0) = 0. Now we define the sequence


S by
 
xn = Ψ∞ (C (1) n), . . . , Ψ∞ (C (s) n) ∈ [0, 1]s for n = 0, 1, . . . .

Note that the matrix-vector products C (i) n for i = 1, . . . , s are meaningful since
n has only finitely many nonzero coordinates. The sequence S is called a digital
sequence over Fq .

Definition 6 If the digital sequence S over Fq is a (T, s)-sequence in base q for


some function T : N → N0 with T(m) ≤ m for all m ∈ N, then S is called a digital
(T, s)-sequence over Fq . Similarly, if S is a (t, s)-sequence in base q for some
integer t ≥ 0, then S is called a digital (t, s)-sequence over Fq .

For i = 1, . . . , s and any integer m ≥ 1, we write Cm(i) for the left upper m × m
submatrix of the generating matrix C (i) of a digital sequence over Fq . The follow-
ing well-known result serves to determine a suitable function T for a given digital
sequence over Fq (see [1, Theorem 4.84]).

Lemma 1 Let S be a digital sequence over Fq with generating matrices C (1) , . . . ,


C (s) and let T : N → N0 with T(m) ≤ m for all m ∈ N. Then S is a digital (T, s)-
sequence over Fq if the following property holds: for any integer m ≥ 1 and any
s
integers d1 , . . . , ds ≥ 0 with i=1 di = m − T(m), the vectors c(i)
j,m ∈ Fq , 1 ≤ j ≤
m
(i)
di , 1 ≤ i ≤ s, are linearly independent over Fq , where c j,m denotes the jth row
vector of Cm(i) .
Vandermonde Nets and Vandermonde Sequences 97

In our construction of digital (T, s)-sequences over Fq in this section, we will


initially determine the values of T(m) for m from a proper subset of N. The values
of T(m) for any m ∈ N can then be derived from the following general principle.

Lemma 2 Let S be a digital (T0 , s)-sequence over Fq for some function T0 : N →


N0 with T0 (m) ≤ m for all m ∈ N. Then S is also a digital (T, s)-sequence over
Fq for a suitably defined function T : N → N0 which satisfies T(m) ≤ T0 (m) for all
m ∈ N and
T(m + r ) ≤ T(m) + r for all m, r ∈ N.

Proof Let T : N → N0 be such that T(m) is the least possible value for any m ∈ N to
make S a digital (T, s)-sequence over Fq or, in the language of [1, Definition 4.31],
such that S is a strict (T, s)-sequence in base q. Then it is trivial that T(m) ≤
T0 (m) for all m ∈ N. For given m ∈ N, the fact that S is a digital sequence over Fq
and a strict (T, s)-sequence in base q implies, according to [1, Theorem 4.84], the
following
s property with the notation in Lemma 1: for any integers d1 , . . . , ds ≥ 0 with
d
i=1 i = m − T(m), the vectors c(i) j,m ∈ Fq , 1 ≤ j ≤ di , 1 ≤ i ≤ s, are linearly
m

independent over Fq . In order to verify that T(m + r ) ≤ T(m) + r for all r ∈ N, it


suffices to show by Lemma 1 that for any integers d1 , . . . , ds ≥ 0 with


s
di = (m + r ) − (T(m) + r ) = m − T(m),
i=1

the vectors c(i)


j,m+r ∈ Fq
m+r
, 1 ≤ j ≤ di , 1 ≤ i ≤ s, are linearly independent over Fq .
But this is obvious since any nontrivial linear dependence relation between the latter
vectors would yield, by projection onto the first m coordinates of these vectors, a
nontrivial linear dependence relation between the vectors c(i) j,m ∈ Fq , 1 ≤ j ≤ di ,
m

1 ≤ i ≤ s. 

Now we show how to obtain digital (T, s)-sequences over Fq from the Vander-
monde nets in Theorem 2. Let k and s be integers with k ≥ 2 and 1 ≤ s ≤ q + 1. Let
f ∈ Fq [X ] be such that deg( f ) = k. If s ≥ 2, then let c2 , . . . , cs ∈ Fq be distinct
and suppose that f (ci ) = 0 for 2 ≤ i ≤ s. For any integer e ≥ 1, we consider the
modulus f e ∈ Fq [X ]. We have again f e (ci ) = 0 for 2 ≤ i ≤ s, and so Theorem 2
yields a Vandermonde net V (g e , f e ) which is a digital (0, ek, s)-net over Fq . We
write
g e = (g1,e , . . . , gs,e ) ∈ G q,ek
s
for all e ∈ N.

Then we have the compatibility property

g e+1 ≡ g e (mod f e ) for all e ∈ N, (5)

where a congruence between s-tuples of polynomials is meant coordinatewise. The


congruence for the first coordinates is trivial since g1,e (X ) = X for all e ∈ N. For the
98 R. Hofer and H. Niederreiter

other coordinates, the congruence follows from the fact that gi ∈ G q,m is uniquely
determined by (4).
Recall that V (g e , f e ) depends also on the choice of an ordered basis Be of the
vector space Fq [X ]/( f e ) over Fq (see Sect. 3). We make these ordered bases Be
for e ∈ N compatible by choosing them as follows. Let B1 consist of the residue
classes of 1, X, . . . , X k−1 modulo f (X ), let B2 consist of the residue classes of
1, X, . . . , X k−1 , f (X ), X f (X ), . . . , X k−1 f (X ) modulo f 2 (X ), and so on in an obvi-
ous manner. For the maps κ f , κ f 2 , . . . in Sect. 3, this has the pleasant effect that for
any e ∈ N and any h ∈ Fq [X ] we have

κ f e (h) = π(e+1)k,ek (κ f e+1 (h)), (6)

where π(e+1)k,ek : Fq(e+1)k → Fqek is the projection onto the first ek coordinates of a
vector in Fq(e+1)k .
Finally, we construct the ∞ × ∞ generating matrices C (1) , . . . , C (s) of an s-
dimensional digital sequence over Fq . We do this by defining certain left upper square
submatrices of each C (i) and by showing that these submatrices are compatible.
(i)
Concretely, for i = 1, . . . , s and any e ∈ N, the left upper (ek) × (ek) submatrix Cek
(i)
of C is defined as the ith generating matrix of the Vandermonde net V (g e , f ). e

For this to make sense, we have to verify the compatibility condition that for each
(i)
i = 1, . . . , s and e ∈ N, the left upper (ek) × (ek) submatrix of C(e+1)k is equal to
(i)
Cek . In the notation of Lemma 1, this means that we have to show that

c(i) (i)
j,ek = π(e+1)k,ek (c j,(e+1)k )

for e ∈ N, 1 ≤ i ≤ s, and 1 ≤ j ≤ ek. For 2 ≤ i ≤ s, we have

π(e+1)k,ek (c(i) (i)


j j j
j,(e+1)k ) = π(e+1)k,ek (κ f e+1 (gi,e+1 )) = κ f e (gi,e+1 ) = κ f e (gi,e ) = c j,ek

by (6) and (5), and obvious modifications show the analogous identity for i = 1. This
completes the construction of the Vandermonde digital sequence S over Fq with
generating matrices C (1) , . . . , C (s) .

Theorem 3 Let q be a prime power and let k and s be integers with k ≥ 2 and 1 ≤
s ≤ q + 1. Let f ∈ Fq [X ] be such that deg( f ) = k. If s ≥ 2, then let c2 , . . . , cs ∈
Fq be distinct and suppose that f (ci ) = 0 for 2 ≤ i ≤ s. Then the Vandermonde
sequence S constructed above is a digital (T, s)-sequence over Fq with T(m) =
rk (m) for all m ∈ N, where rk (m) is the least residue of m modulo k.

Proof It suffices to show that S is a digital (T0 , s)-sequence over Fq with T0 (m) = 0
if m ≡ 0 (mod k) and T0 (m) = m otherwise. The rest follows from Lemma 2.
Now let m ≡ 0 (mod k), say m = ek with e ∈ N. Then for m = ek, we have to
verify the linear independence property in Lemma 1 for the left upper (ek) × (ek)
(1) (s) (1) (s)
submatricesC ek , . . . , C ek of the generating matrices C , . . . , C of S , with the
s
condition i=1 di = ek in Lemma 1. By the construction of the latter generating
Vandermonde Nets and Vandermonde Sequences 99

(1) (s)
matrices, the submatrices Cek , . . . , Cek are the generating matrices of the Vander-
monde net V (g e , f ). Now V (g e , f ) is a digital (0, ek, s)-net over Fq by Theo-
e e

rem 2, and this implies the desired linear independence property in Lemma 1 for
(1) (s)
Cek , . . . , Cek . 

Example 1 Let q be a prime power and let s = q + 1. Let c2 , . . . , cq+1 be the q


distinct elements of Fq and let f be an irreducible quadratic polynomial over Fq .
Then Theorem 3 provides a digital (T, q + 1)-sequence over Fq with T(m) = 0 for
even m ∈ N and T(m) = 1 for odd m ∈ N. A digital sequence with these parameters
was also constructed in [11], but the present construction is substantially simpler
than that in [11]. Note that there cannot exist a digital (U, q + 1)-sequence over Fq
with U(m) = 0 for all m ∈ N, because of the well-known necessary condition s ≤ q
for (0, s)-sequences in base q (see [8, Corollary 4.24]).

5 Vandermonde Sequences from Global Function Fields

The construction of Vandermonde sequences in Sect. 4 can be described also in the


language of valuations and Riemann-Roch spaces of the rational function field Fq (X )
over Fq (see Example 2 below and [4, Sect. 2]). This description is the starting point
for a generalization of the construction by using arbitrary global function fields.
The generalized construction allows us to overcome the restriction s ≤ q + 1 in the
construction in Sect. 4. It is well known that global function fields are powerful tools
for constructing (t, m, s)-nets and (t, s)-sequences; see [1, Chap. 8], [13, Chap. 8],
and [14, Sect. 5.7] for expository accounts of constructions based on global function
fields. The construction in the present section serves as another illustration for the
power of global function fields in this area.
Concerning global function fields, we follow the notation and terminology in
the book [14]. Another good reference for global function fields is the book of
Stichtenoth [19]. We briefly review some basic notions from the theory of global
function fields and we refer to [14] and [19] for more detailed information. For a
finite field Fq , a global function field F over Fq is an algebraic function field of one
variable with constant field Fq , i.e., F is a finite extension (in the sense of field theory)
of the rational function field Fq (X ) over Fq . We assume without loss of generality
that Fq is the full constant field of F, which means that Fq is algebraically closed in F.
An important concept is that of a valuation ν of F, which is a map ν : F → R ∪ {∞}
satisfying the following four axioms: (i) ν( f ) = ∞ if and only if f = 0; (ii)
ν( f 1 f 2 ) = ν( f 1 ) + ν( f 2 ) for all f 1 , f 2 ∈ F; (iii) ν( f 1 + f 2 ) ≥ min (ν( f 1 ), ν( f 2 ))
for all f 1 , f 2 ∈ F; (iv) ν(F ∗ ) = {0} for F ∗ := F\{0}. Two valuations of F are equiv-
alent if one is a constant multiple of the other. An equivalence class of valuations of
F is called a place of F. Each place P of F contains a unique normalized valuation
ν P for which ν P (F ∗ ) = Z. The residue class field of P is a finite extension of Fq
and the degree of this extension is the degree deg(P) of P. A place P of F with
deg(P) = 1 is called a rational place of F. Let P F denote the set of all places of F.
100 R. Hofer and H. Niederreiter

A divisor D of F is a formal sum



D= nP P
P∈P F

with n P ∈ Z for all P ∈ P F and all but finitely many n P = 0. We write also n P =
ν P (D). The finite set of all places P of F with ν P (D) = 0 is called the support of
D. The degree deg(D) of a divisor D is defined by
 
deg(D) = n P deg(P) = ν P (D) deg(P).
P∈P F P∈P F

Divisors are added and subtracted term by term. We say that a divisor D of F is
positive if ν P (D) ≥ 0 for all P ∈ P F . The principal divisor div( f ) of f ∈ F ∗ is
defined by 
div( f ) = ν P ( f ) P.
P∈P F

For any divisor D of F, the Riemann-Roch space

L (D) = { f ∈ F ∗ : div( f ) + D ≥ 0} ∪ {0}

associated with D is a finite-dimensional vector space over Fq . We write (D) for


the dimension of this vector space. If the integer g ≥ 0 is the genus of F, then the
celebrated Riemann-Roch theorem [14, Theorem 3.6.14] says that (D) ≥ deg(D) +
1 − g, with equality whenever deg(D) ≥ 2g − 1. We quote the following result from
[14, Corollary 3.4.4].
Lemma 3 If the divisor D of the global function field F satisfies deg(D) < 0, then
L (D) = {0}.
We are now ready to describe a construction of s-dimensional Vandermonde
sequences based on the global function field F of genus g. We avoid trivial cases by
assuming that s ≥ 2 and g ≥ 1. We suppose that F has at least s + 1 rational places.
Let P1 , . . . , Ps , P∞ be distinct rational places of F and let D be a positive divisor
of F with deg(D) = 2g such that P2 , . . . , Ps , P∞ are not in the support of D (for
instance D = 2g P1 ).
Lemma 4 For every integer j ≥ 1, there exist

β (1)
j ∈ L (D + ( j − 1)P1 − ( j − 1)P2 ) \L (D + ( j − 2)P1 − ( j − 1)P2 ) ,

(i)
β j ∈ L (D + j Pi − j P1 ) \ (L (D + j Pi − ( j + 1)P1 ) ∪ L (D + ( j − 1)Pi − j P1 ))

for 2 ≤ i ≤ s. Furthermore, we have:


(i) ν P1 (β (1)
j ) = −( j − 1) − ν P1 (D),
Vandermonde Nets and Vandermonde Sequences 101

(ii) ν P1 (β (i)
j ) = j − ν P1 (D),
(iii) ν Pi (β (i)
j ) = − j,
(l)
(iv) ν Ph (β j ) ≥ 0
for j ≥ 1, for 2 ≤ i ≤ s, and for 2 ≤ h ≤ s and 1 ≤ l ≤ s with h = l.

Proof We first observe that obviously

deg (D + ( j − 1)P1 − ( j − 1)P2 ) = 2g, (7)


deg (D + ( j − 2)P1 − ( j − 1)P2 ) = 2g − 1, (8)
deg (D + j Pi − j P1 ) = 2g, (9)
deg (D + j Pi − ( j + 1)P1 ) = 2g − 1, (10)
deg (D + ( j − 1)Pi − j P1 ) = 2g − 1, (11)
(L (D + j Pi − ( j + 1)P1 ) ∩ L (D + ( j − 1)Pi − j P1 )) ⊇ {0}. (12)

The existence of the β (1)


j for j ≥ 1 follows directly from the Riemann-Roch theorem
together with (7) and (8). The existence of the β (i)
j for 2 ≤ i ≤ s and j ≥ 1 follows
from

|L (D + j Pi − j P1 )| − |L (D + j Pi − ( j + 1)P1 )| − |L (D + ( j − 1)Pi − j P1 )|
+ |L (D + j Pi − ( j + 1)P1 ) ∩ L (D + ( j − 1)Pi − j P1 )| ≥ q g+1 − q g − q g + 1 ≥ 1,

where we used the Riemann-Roch theorem together with (9), (10), (11), and (12).
The results (i), (ii), (iii), and (iv) are now obtained from the choice of the β (i)
j for
1 ≤ i ≤ s and j ≥ 1 and from the given properties of the divisor D. 

Example 2 If F is the rational function field Fq (X ), then the elements β (i) j ∈ F in


Lemma 4 can be given explicitly. For this F we have the so-called infinite place
(which is a rational place of F), and the remaining places of F are in one-to-one
correspondence with the monic irreducible polynomials over Fq (see [14, Sect. 1.5]).
For an integer s with 2 ≤ s ≤ q + 1, let P1 be the infinite place of F and for i =
2, . . . , s let Pi be the rational place of F corresponding to the polynomial X −
ci ∈ Fq [X ], where c2 = 0, c3 , . . . , cs are distinct elements of Fq . Let D be the zero
divisor of F. Then the elements β (1) j = X
j−1
for j ≥ 1 and β (i)
j = (X − ci )
−j
for
2 ≤ i ≤ s and j ≥ 1 satisfy all properties in Lemma 4 (note that no choice of P∞ is
needed for Lemma 4). There is an obvious relationship between these elements β (i) j
and the construction of Vandermonde sequences in Sect. 4 (compare also with the
construction leading to Theorem 2).

A trick that was used in [20] for the construction of good digital sequences comes
in handy now. We first determine a basis {w1 , . . . , wg } of the vector space L (D −
P1 ) with dimension (D − P1 ) = g as follows. By the Riemann-Roch theorem and
Lemma 3, we know the dimensions (D − P1 ) = g and (D − P1 − 2g P∞ ) = 0.
Hence there exist integers 0 ≤ n 1 < · · · < n g < 2g such that
102 R. Hofer and H. Niederreiter

(D − P1 − n r P∞ ) = (D − P1 − (n r + 1)P∞ ) + 1 for 1 ≤ r ≤ g.

Now we choose wr ∈ L (D − P1 − n r P∞ )\L (D − P1 − (n r + 1)P∞ ) to obtain the


basis {w1 , . . . , wg } of L (D − P1 ). Note that ν P∞ (wr ) = n r , ν P1 (wr ) ≥ 1 − ν P1 (D),
and ν Pi (wr ) ≥ 0 for all 2 ≤ i ≤ s, 1 ≤ r ≤ g.

Lemma 5 With the notation above, the system {w1 , . . . , wg } ∪ {β (i)


j }1≤i≤s, j≥1 is lin-
early independent over Fq .

Proof The linear independence of {β (i)


j } j≥1 for every fixed i = 1, . . . , s is obvious
from the known values of valuations in Lemma 4. Suppose that


g

s 
v
ar wr + b(i) (i)
j βj = 0
r =1 i=1 j=1

for some integer v ≥ 1 and ar , b(i)


j ∈ Fq . For a fixed h = 2, . . . , s, we consider


v 
g

s 
v
b(h)
j βj
(h)
=− ar wr − b(i) (i)
j βj .
j=1 r =1 i=1 j=1
i =h

We abbreviate the left-hand side by β. Now if we had β = 0, then we know from


Lemma 4 that ν Ph (β) < 0. But the right-hand side satisfies ν Ph (β) ≥ 0. Hence β = 0
and all coefficients b(h)
j on the left-hand side have to be 0. We arrive at the identity


v 
g
b(1) (1)
j βj = − ar wr .
j=1 r =1

We abbreviate the left-hand side by γ . If there were a b(1)


j = 0 for at least one j ≥ 1,
then by Lemma 4 the left-hand side yields ν P1 (γ ) ≤ −ν P1 (D), but the right-hand
side shows that ν P1 (γ ) ≥ −ν P1 (D) + 1. Hence all b(1)
j , and therefore also all ar by
the basis property of {w1 , . . . , wg }, have to be 0. 
Now we construct the generating matrices C (1) , . . . , C (s) of a digital sequence over
Fq . For 1 ≤ i ≤ s and j ≥ 1, the jth row of C (i) is determined by the coefficients of
the local expansion of β (i) (i)
j at the rational place P∞ . Since ν P∞ (β j ) ≥ 0 by Lemma 4,
this local expansion has the form


β (i)
j = a (i)
j,k z k
k=0

with coefficients a (i)


j,k ∈ Fq for k ≥ 0, j ≥ 1, 1 ≤ i ≤ s. The sequence (z k )k≥0 of
elements of F satisfies ν P∞ (z k ) = k for k ∈ N0 \{n 1 , . . . , n g }, and for k = n r with
Vandermonde Nets and Vandermonde Sequences 103

r ∈ {1, . . . , g} we put z k = wr , so that ν P∞ (z k ) = n r . This preliminary construction


yields the sequence

 
(a (i) (i) (i) (i) (i)
j,0 , . . . , a j,n 1 , a j,n 1 +1 , . . . , a j,n g , a j,n g +1 , . . .)

of elements of Fq for any j ≥ 1 and 1 ≤ i ≤ s. After deleting the terms with the hat,
we arrive at a sequence of elements of Fq which serves as the jth row c(i)j of C ,
(i)

and we write
c(i) (i) (i)
j = (c j,0 , c j,1 , . . .).

Theorem 4 Let q be a prime power and let s ≥ 2 be an integer. Let F be a global


function field with full constant field Fq and with genus g ≥ 1. Suppose that F has at
least s + 1 rational places. Then the digital sequence with the generating matrices
C (1) , . . . , C (s) constructed above is a digital (t, s)-sequence over Fq with t = g.
Proof We proceed by Lemma 1 and prove that for any integer m > g and any integers
s
d1 , . . . , ds ≥ 0 with i=1 di = m − g, the vectors

c(i) (i) (i)


j,m := (c j,0 , . . . , c j,m−1 ) ∈ Fq
m

with 1 ≤ j ≤ di , 1 ≤ i ≤ s, are linearly independent over Fq . Choose b(i)


j ∈ Fq for
1 ≤ j ≤ di , 1 ≤ i ≤ s, satisfying


s 
di
b(i) (i)
j c j,m = 0 ∈ Fq .
m
(13)
i=1 j=1

We can assume without loss of generality that d1 , . . . , ds ≥ 1. The linearity of the


local expansion implies that
⎛ ⎞

s 
di 
s 
di 
g ∞
 
s 
di
β := b(i)
j βj
(i)
− b(i)
j a (i)
j,nr wr = ⎝ b(i) (i) ⎠
j a j,k zk .
i=1 j=1 i=1 j=1 r =1 k=0 i=1 j=1
   k =n 1 ,...,n g

=:α

In view of the construction algorithm and (13), we obtain


⎛ ⎞

 s 
di
β= ⎝ b(i) (i) ⎠
j a j,k zk .
k=m+g i=1 j=1

Therefore ν P∞ (β) ≥ m + g. We observe that

α ∈ L (D − P1 ) ⊆ L (D + (d1 − 1)P1 + d2 P2 + · · · + ds Ps ),
104 R. Hofer and H. Niederreiter

(1)
βj ∈ L (D + ( j − 1)P1 − ( j − 1)P2 ) ⊆ L (D + (d1 − 1)P1 + d2 P2 + · · · + ds Ps )

for 1 ≤ j ≤ d1 , and
(i)
β j ∈ L (D + j Pi − j P1 ) ⊆ L (D + di Pi − P1 ) ⊆ L (D + (d1 − 1)P1 + d2 P2 + · · · + ds Ps )

for 1 ≤ j ≤ di , 2 ≤ i ≤ s. This together with ν P∞ (β) ≥ m + g implies that

β ∈ L (D + (d1 − 1)P1 + d2 P2 + · · · + ds Ps − (m + g)P∞ ) =: L (Dm,d1 ,...,ds ).

We compute the degree of Dm,d1 ,...,ds and obtain

deg(Dm,d1 ,...,ds ) = 2g + m − g − 1 − (m + g) = −1,

which entails β = 0 by Lemma 3. Finally, the linear independence over Fq of the


system {w1 , . . . , wg } ∪ {β (i) (i)
j }1≤i≤s, j≥1 shown in Lemma 5 yields b j = 0 for 1 ≤
j ≤ di , 1 ≤ i ≤ s. 

For the missing case g = 0 in Theorem 4, we have the Faure-Niederreiter


sequences obtained from the rational function field Fq (X ) which yield digital (0, s)-
sequences over Fq for every dimension s ≤ q (see [2, 7], and [8, Remark 4.52]). It
follows from this and Theorem 4 that for every prime power q and every integer s ≥ 1,
there exists a digital (Vq (s), s)-sequence over Fq , where Vq (s) is the least value of
g ≥ 0 for which there is a global function field with full constant field Fq and genus
g containing at least s + 1 rational places. For fixed q, we have Vq (s) = O(s) as
s → ∞ with an absolute implied constant by [12, Theorem 4]. The (t, s)-sequences
obtained from Theorem 4 are asymptotically optimal in terms of the quality parame-
ter since it is known that for any fixed base b ≥ 2, the values of t for (t, s)-sequences
in base b must grow at least linearly as a function of s as s → ∞. The currently best
version of the latter result can be found in [18].
There is also a construction of Vandermonde sequences for the case where
P1 , . . . , Ps are again s ≥ 2 distinct rational places of the global function field F,
but where P∞ is a place of F with degree k ≥ 2 (see [4, Sect. 3]). This construction
yields a digital (T, s)-sequence over Fq with

T(m) = min (m, 2g + rk (m)) for all m ∈ N,

where g is the genus of F and rk (m) is as in Theorem 3.

Acknowledgments The first author was supported by the Austrian Science Fund (FWF), Project
F5505-N26, which is a part of the Special Research Program “Quasi-Monte Carlo Methods: Theory
and Applications”.
Vandermonde Nets and Vandermonde Sequences 105

References

1. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
2. Faure, H.: Discrépance de suites associées à un système de numération (en dimension s). Acta
Arith. 41, 337–351 (1982)
3. Faure, H., Kritzer, P.: New star discrepancy bounds for (t, m, s)-nets and (t, s)-sequences.
Monatsh. Math. 172, 55–75 (2013)
4. Hofer, R., Niederreiter, H.: Explicit constructions of Vandermonde sequences using global
function fields, preprint available at http://arxiv.org/abs/1311.5739
5. Hofer, R., Niederreiter, H.: Vandermonde nets. Acta Arith. 163, 145–160 (2014)
6. Kritzer, P.: Improved upper bounds on the star discrepancy of (t, m, s)-nets and (t, s)-
sequences. J. Complex. 22, 336–347 (2006)
7. Niederreiter, H.: Point sets and sequences with small discrepancy. Monatsh. Math. 104, 273–
337 (1987)
8. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM,
Philadelphia (1992)
9. Niederreiter, H.: (t, m, s)-nets and (t, s)-sequences. In: Mullen, G.L., Panario, D. (eds.) Hand-
book of Finite Fields, pp. 619–630. CRC Press, Boca Raton (2013)
10. Niederreiter, H.: Finite fields and quasirandom points. In: Charpin, P., Pott, A., Winterhof, A.
(eds.) Finite Fields and Their Applications: Character Sums and Polynomials, pp. 169–196. de
Gruyter, Berlin (2013)
11. Niederreiter, H., Özbudak, F.: Low-discrepancy sequences using duality and global function
fields. Acta Arith. 130, 79–97 (2007)
12. Niederreiter, H., Xing, C.P.: Low-discrepancy sequences and global function fields with many
rational places. Finite Fields Appl. 2, 241–273 (1996)
13. Niederreiter, H., Xing, C.P.: Rational Points on Curves over Finite Fields: Theory and Appli-
cations. Cambridge University Press, Cambridge (2001)
14. Niederreiter, H., Xing, C.P.: Algebraic Geometry in Coding Theory and Cryptography. Prince-
ton University Press, Princeton (2009)
15. Pillichshammer, F.: Polynomial lattice point sets. In: Plaskota, L., Woźniakowski, H. (eds.)
Monte Carlo and Quasi-Monte Carlo Methods 2010, pp. 189–210. Springer, Berlin (2012)
16. Pirsic, G.: A small taxonomy of integration node sets. Österreich. Akad. Wiss. Math.-Naturw.
Kl. Sitzungsber. II(214), 133–140 (2005)
17. Pirsic, G., Dick, J., Pillichshammer, F.: Cyclic digital nets, hyperplane nets, and multivariate
integration in Sobolev spaces. SIAM J. Numer. Anal. 44, 385–411 (2006)
18. Schürer, R.: A new lower bound on the t-parameter of (t, s)-sequences. In: Keller, A., Heinrich,
S., Niederreiter, H. (eds.)Monte Carlo and Quasi-Monte Carlo Methods 2006, pp. 623–632.
Springer, Berlin (2008)
19. Stichtenoth, H.: Algebraic Function Fields and Codes, 2nd edn. Springer, Berlin (2009)
20. Xing, C.P., Niederreiter, H.: A construction of low-discrepancy sequences using global function
fields. Acta Arith. 73, 87–102 (1995)
Path Space Markov Chain Monte Carlo
Methods in Computer Graphics

Wenzel Jakob

Abstract The objective of a rendering algorithm is to compute a photograph of a


simulated reality, which entails finding all the paths along which light can flow from a
set of light sources to the camera. The purpose of this article is to present a high-level
overview of the underlying physics and analyze how this leads to a high-dimensional
integration problem that is typically handled using Monte Carlo methods. Following
this, we survey recent work on path space Markov Chain Monte Carlo (MCMC)
methods that compute the resulting integrals using proposal distributions defined on
sets of light paths.

Keywords Rendering · Path space · Specular manifold · MCMC

1 Introduction

The central goal of light transport algorithms in computer graphics is the generation
of renderings, two-dimensional images that depict a simulated environment as if
photographed by a virtual camera. Driven by the increasing demand for photorealism,
computer graphics is currently undergoing a substantial transition to physics-based
rendering techniques that compute such images while accurately accounting for the
interaction of light and matter.
These methods require a detailed model of the scene including the shape and
optical properties of all objects including light sources; the final rendering is then
generated by a simulation of the relevant physical laws, specifically transport and
scattering, i.e., the propagation of light and its interaction with the materials that
comprise the objects. In this article, we present a high-level overview of the under-
lying physics and analyze how this leads to a high-dimensional integration problem
that is typically handled using Monte Carlo methods.

W. Jakob (B)
Realistic Graphics Lab, EPFL, Lausanne, Switzerland
e-mail: wenzel.jakob@epfl.ch

© Springer International Publishing Switzerland 2016 107


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_4
108 W. Jakob

Section 2 begins with a discussion of the geometric optics framework used in com-
puter graphics. After defining the necessary notation and physical units, we state
the energy balance equation that characterizes the interaction of light and matter.
Section 3 presents a simple recursive Monte Carlo estimator that solves this equa-
tion, though computation time can be prohibitive if accurate solutions are desired.
Section 4 introduces path space integration, which offers a clearer view of the under-
lying light transport problem. This leads to a large class of different estimators that
can be combined to improve convergence. Section 5 introduces MCMC methods in
rendering. Section 6 covers an MCMC method that explores a lower-dimensional
manifold of light paths, and Sect. 7 discusses extensions to cases involving inter-
reflection between glossy objects. Section 8 concludes with a discussion of limita-
tions and unsolved problems.
This article is by no means a comprehensive treatment of rendering; the selection
of topics is entirely due to the author’s personal preference. It is intended that the
discussion will be helpful to readers who are interested in obtaining an understand-
ing of recent work on path-space methods and applications of MCMC methods in
rendering.

2 Geometric Optics and Light Transport on Surfaces

Light transport simulations in computer graphics are generally conducted using a


simplified variant of geometric optics. In this framework, light moves along a straight
line until an interaction (i.e., a scattering event) occurs, which involves a change of
direction and potentially some absorption. The wave-like nature of light is not sim-
ulated, which leads to a simpler computation and is an excellent approximation in
general (the wavelength of visible light is minuscule compared to the sizes of every-
day objects). Light is also assumed to be incoherent and unpolarized, and although it
moves at a finite speed, this motion is not modeled explicitly. More complex theories
without these assumptions are available but ultimately not needed since the phenom-
ena described by them are in most cases too subtle to be observed by humans. For
the sake of simplicity, we only discuss monochromatic rendering in this article; the
generalization to the full color spectrum poses no fundamental difficulties.
In the following sections, we review relevant background material, starting with
the standard light transport model used in computer graphics and leading up to the
path space framework proposed by Veach [28].
In geometric optics, light is usually quantified using radiance, which has units of
W · sr −1 · m−2 . Given a point x ∈ R3 and a direction ω ∈ S 2 , the radiance L(x, ω) is a
density function that describes how much illumination flows through the point, in this
direction. Radiance can be measured by registering the amount of energy arriving
on a small surface patch dA at x that is perpendicular to ω and sensitive to a small
cone of directions dω around ω, and then letting the surface and solid angle tend to
zero. For a thorough review of radiance and many related radiometric quantities, we
refer the reader to Preisendorfer [25].
Path Space Markov Chain Monte Carlo Methods … 109

An important property of radiance is that it remains invariant along rays when


there are no obstructions (e.g., in vacuum),

L(x, ω) = L(x + tω, ω), t ∈ [0, tobstr ).

Due to this property, a complete model of a virtual environment can be obtained


simply by specifying how L behaves in places where an obstruction interacts with
the illumination, i.e., at the boundaries of objects or inside turbid substances like fog
or milk. In this article, we only focus on the boundary case for simplicity. For a more
detailed discussion including volumetric scattering, we refer to [10].
We assume that the scene to be rendered is constructed from a set of surfaces that
all lie inside a bounded domain Ω ⊆ R3 . The union of these surfaces is denoted
M ⊂ Ω and assumed to be a differentiable manifold, i.e. is parameterized by a set
of charts with differentiable transition maps.
Furthermore, let N : M → S 2 denote the Gauss map, which maps surface
positions to normal directions on the unit sphere.
Because boundaries of objects introduce discontinuities in the radiance function
L, we must take one-sided limits to distinguish between the exterior radiance function
L + (x, ω) and the interior radiance function L − (x, ω) at surface locations x ∈ M as
determined by the normal N(x) (Fig. 1). Based on these limits, intuitive incident and
outgoing radiance functions can then be defined as

L + (x, −ω), ω · N(x) > 0
Li (x, ω) := and
L − (x, −ω), ω · N(x) < 0

L + (x, ω), ω · N(x) > 0
Lo (x, ω) := .
L − (x, ω), ω · N(x) < 0

With the help of these definitions, we can introduce the surface energy balance
equation that describes the relation between the incident and outgoing radiance based
on the material properties at x:

 
Lo (x, ω) = Li (x, ω ) f (x, ω → ω) ω · N(x) dω + Le (x, ω), x ∈ M . (1)
S2

The integration domain S 2 is the unit sphere and f is the bidirectional scattering dis-
tribution function (BSDF) of the surface, which characterizes the surface’s response

Fig. 1 Limits of the


radiance function L from
above and below
110 W. Jakob

Incident Reflectance Fore- Emitted


radiance function shortening radiance

Final
pixel color

Fig. 2 Illustration of the energy balance Eq. (1) on surfaces. Here, it is used to compute the pixel
color of the surface location highlighted in white (only the top hemisphere is shown in the figure)

to illumination from different directions. Given illumination reaching a point x from


a direction ω , the BSDF expresses how much of this illumination is scattered into
the direction ω. For a detailed definition of the concept of a BSDF as well as other
types of scattering functions, we refer the reader to Nicodemus [22]. The function
Le (x, ω) is the source term which specifies how much light is emitted from position
x into direction ω; it is zero when the position x is not located on a light source.
Figure 2 visualizes the different terms in Eq. (1) over the top hemisphere. The
example shows a computation of the radiance traveling from the surface location
marked with a white dot towards the camera. The first term is an integral over the
incident radiance as seen from the surface location. The integral also contains the
BSDF and a cosine foreshortening term which models the effect that a beam of light
arriving at a grazing angle spreads out over a larger region on the receiving surface
and thus deposits less energy per unit area. The “ceiling” of the scene is made of
rough metal; its reflectance function effectively singles out a small portion of the
incident illumination, which leads to a fairly concentrated reflection compared to
the other visible surfaces. The emission term is zero, since the highlighted surface
position is not located on a light source.
Considerable research has been conducted on characterizing the reflectance prop-
erties of different materials, and these works have proposed a wide range of BSDF
functions f that reproduce their appearance in renderings. Figure 3 shows several
commonly used BSDF models, along with the resulting material appearance. The
illustrations left of the renderings show polar plots of the BSDF f (ω → ω) where
the surface receives illumination from a fixed incident direction ω highlighted in
red. The primary set of reflected directions is shown in blue, and the transmitted
directions (if any) are shown in green.
Specular materials shown in the top row are characterized by having a “degen-
erate” BSDF f that is described by a Dirac delta distribution. For instance, a mirror
reflects light arriving from ω into only a single direction ω = 2N(x)(ω · N(x)) − ω.
In comparison, rough materials usually have a smooth function f . BSDFs based on
Path Space Markov Chain Monte Carlo Methods … 111

Smooth conducting material Smooth dielectric material

Rough conducting material Rough dielectric material

Smooth diffuse material

Fig. 3 An overview of common material types. The left side of each example shows a 2D illustration
of the underlying scattering process for light arriving from the direction highlighted in red. The
right side shows a corresponding rendering of a material test object

microfacet theory [4, 27, 32] are a popular choice in particular—they model the inter-
action of light with random surfaces composed of tiny microscopic facets that are
oriented according to a statistical distribution. Integration over this distribution then
leads to simple analytic expressions that describe the expected reflection and refrac-
tion properties at a macroscopic scale. In this article, we assume that the BSDFs are
provided as part of the input scene description and will not discuss their definitions
in detail.

3 Path Tracing

We first discuss how Eq. (1) can be solved using Monte Carlo integration, which leads
to a simple method known as Path Tracing [12]. For this, it will be convenient to
establish some further notation: we define the distance to the next surface encountered
by the ray (x, ω) ∈ R3 × S 2 as

dM (x, ω) := inf {d > 0 | x + dω ∈ M }

where inf ∅ = ∞. Based on this distance, we can define a ray-casting function r:

r(x, ω) := x + dM (x, ω)ω. (2)


112 W. Jakob

Due to the preservation of radiance along unoccluded rays, the ray-casting function
can be used to relate the quantities Li and Lo :

Li (x, ω) = Lo (r(x, ω), −ω).

In other words, to find the incident radiance along a ray (x, ω), we must only deter-
mine the nearest surface visible in this direction and evaluate its outgoing radiance
into the opposite direction. Using this relation, we can eliminate Li from the energy
balance Eq. (1):

 
Lo (x, ω) = Lo (r(x, ω ), −ω ) f (x, ω → ω) ω · N(x) dω + Le (x, ω) (3)
S2

Although the answer is still not given explicitly, the equation is now in a form
that is suitable for standard integral equation solution techniques. However, this is
made difficult by the ill-behaved nature of the integrand, which is generally riddled
with singularities and discontinuities caused by visibility changes in the ray-casting
function r. Practical solution methods often rely on a Neumann series expansion of the
underlying integral operator, in which case the resulting high number of dimensions
rules out standard deterministic integration rules requiring an exponential number
of function evaluations. Monte Carlo methods are resilient to these issues and hence
see significant use in rendering.
To obtain an unbiased MC estimator based on Eq. (3), we replace the integral with
a single sample of the integrand at a random direction ω and divide by its probability
density p(ω ), i.e.
 
Lo (r(x, ω ), −ω ) f (x, ω → ω) ω · N(x)


Lo (x, ω) = + Le (x, ω) (4)
p(ω )

Lo = Lo , and by averaging many estimates 


In this case, Ep Lo , we obtain an approx-
imation of the original integral. Typically, some form of importance sampling is
employed, e.g. by choosing a sampling density function p(ω ) ∝ f (x, ω → ω).
Algorithm 1 shows the pseudo-code of the resulting recursive method. Based on the
underlying sequence of spherical sampling steps, path tracing can also be interpreted
as a method that generates trajectories along which light is carried from the light
source to the camera; we refer to these trajectories as a light paths and will revisit
this concept in more detail later. In practice, the path tracing algorithm is combined
with additional optimizations that lead to better convergence, but this is beyond the
scope of this article.
Due to their simplicity and ability to produce photorealistic images, optimized
path tracing methods have seen increased use in research and industrial applications.
The downside of these methods is that they converge very slowly given challenging
input, sometimes requiring days or even weeks to compute a single image on state-
of-the-art computers. Problems arise whenever complete light paths are found with
too low a probability—a typical example is shown in Fig. 5a.
Path Space Markov Chain Monte Carlo Methods … 113

Algorithm 1 Pseudocode of a simple Path Tracer


function  Lo (x, ω)
Return zero with probability α ∈ (0, 1).
Sample a direction ω proportional to f (x, ω → ω),
let the factor of proportionality be denoted as fprop .
Set x = r(x, ω)


Return 1−α1
Le (x, ω) + fprop 
Lo (x , −ω ) .

4 The Path Space Formulation of Light Transport

In this section, we discuss the path space formulation of light transport, which pro-
vides a clearer view of the sampling operations performed by Algorithm 1. This
framework can be used to develop other types of integration methods, including
ones based on MCMC proposals that we discuss afterwards.
The main motivation for using path space is that it provides an explicit expression
for the value of the radiance function as an integral over light paths, as opposed to
the unwieldy recursive estimations on spherical domains in Algorithm 1. This allows
for considerable freedom in developing and comparing sampling strategies. The path
space framework was originally developed by Veach [28] and builds on a theoretical
analysis of light transport operators by Arvo [1]. Here, we only present a high-level
sketch.
Let us define an integral operator T

 
(Th)(x, ω) := h(r(x, ω ), −ω ) f (x, ω → ω) ω · N(x) dω , (5)
S2

and use it to rewrite Eq. (3) as


Lo = TLo + Le .

An explicit solution for Lo can be found by inverting the operator so that

Lo = (1 − T )−1 Le .

Let
·
L be a norm on the space of radiance functions
 

h
L := h(x, ω) |ω · N(x)| dω dA(x),
M S2

which induces a corresponding operator norm


T
op = sup
h
L ≤1
Th
. Veach
proved that physically realizable scenes satisfy
T l
op < 1 for some fixed l ∈ N.
Given this property, it is not only guaranteed that the inverse operator (1 − T )−1
exists, but it can also be computed using a Neumann series expansion:

(1 − T )−1 = I + T + T 2 + . . . ,
114 W. Jakob

which intuitively expresses the property that the outgoing radiance is equal to the
emitted radiance plus radiance that has scattered one or more times (the sum con-
verges since the energy of the multiply scattered illumination tends to zero).

Lo = Le + TLe + T 2 Le + · · · . (6)

Rather than explicitly computing the radiance function Lo , the objective of ren-
dering is usually to determine the response of a simulated camera to illumination that
reaches its aperture. Suppose that the sensitivity of a pixel j in the camera is given by
(j)
sensitivity profile function We : M × S 2 → R defined on ray space. The intensity
Ij of the pixel is given by
 
Ij = We(j) (x, ω) Lo (r(x, ω), −ω) |ω · N(x)| dω dA(x), (7)
M S2

which integrates over its sensitivity function weighted by the outgoing radiance
on surfaces that are observed by the camera. The spherical integral in the above
expression involves an integrand that is evaluated at the closest surface position as
seen from the ray (x, ω). It is convenient to switch to a different domain involving
only area integrals. We can transform the above integral into this form using the
identity  
q(r(x, ω)) |ω · N(x)| dω = q(y) G(x ↔ y) dA(y), (8)
S2 M

where x, y ∈ M , and q : M → R is any integrable function defined on surfaces,


and G is the geometric term [24] defined as
  
xy  N(y) · −
N(x) · −
→ →
xy 
G(x ↔ y) := V (x ↔ y) · . (9)

x − y
2

The double arrows emphasize the symmetric nature of this function, − →


xy is the nor-
malized direction from x to y, and V is a visibility function defined as

1, if {αx + (1 − α)y | 0 < α < 1} ∩ M = ∅
V (x ↔ y) := (10)
0, otherwise

Applying the change of variables (8) to Eq. (7) yields


 
Ij = We(j) (x, −

xy) Lo (y, −

yx) G(x ↔ y) dA(x, y). (11)
M M

We can now substitute Lo given by Eq. (6) into the above integral, which is a power
series of the T operator (i.e. increasingly nested spherical integrals). Afterwards, we
apply the change of variables once more to convert all nested spherical integrals into
Path Space Markov Chain Monte Carlo Methods … 115

nested surface integrals. This is tedious but straightforward and leads to an explicit
expression of Ij in terms of an infinite series of integrals over increasing Cartesian
powers of M .
These nested integrals over surfaces are due to the propagation of light along
straight lines and changes of direction at surfaces, which leads to the concept of a light
path. This can be thought of as the trajectory of a particle carrying an infinitesimal
portion of the illumination. It is a piecewise linear curve x̄ = x1 · · · xn with endpoints
x1 and xn and intermediate scattering vertices x2 , . . . , xn−1 . The space of all possible
light paths is a union consisting of paths with just the endpoints, paths that have one
intermediate scattering event, and so on. More formally, we define path space as


P := Pn , and
n=2
Pn := {x1 · · · xn | x1 , . . . , xn ∈ M } . (12)

The nested integrals which arose from our manipulation of Eq. (11) are simply inte-
grals over light paths of different lengths, i.e.
 
Ij = ϕ(x1 x2 ) dA(x1 , x2 ) + ϕ(x1 x2 x3 ) dA(x1 , x2 , x3 ) + . . . . (13)
P2 P3

Because some paths carry more illumination from the light source to the camera
than others, the integrand ϕ : P → R is needed to quantify their “light-carrying
capacity”; its definition varies based on the number of input arguments and is given
by Eq. (15). The total illumination Ij arriving at the camera is often written more
compactly as an integral of ϕ over the entire path space, i.e.:

=: ϕ(x̄) dA(x̄). (14)
P

The definition of the weighting function ϕ consists of a product of terms—one for


each vertex and edge of the path:
n−1

ϕ(x1 · · · xn ) = Le (x1 → x2 ) G(xk−1 ↔ xk ) f (xk−1 → xk → xk+1 )


k=2

G(xn−1 ↔ xn ) We(j) (xn−1 → xn ). (15)

The arrows in the above expression symbolize the symmetry of the geometric terms
as well as the flow of light at vertices. xi → xi+1 can also be read as a spatial argument
xi followed by a directional argument − x− −→
i xi+1 . Figure 4 shows an example light path
and the different weighting terms. We summarize their meaning once more:
116 W. Jakob

Fig. 4 Illustration of a simple light path with four vertices and its corresponding weighting function

• Le (x1 → x2 ) is the emission profile of the light source. This term expresses the
amount of radiance emitted from position x1 traveling towards x2 . It is equal to
zero when x1 is not located on a light source.
j
• We (xn−1 → xn ) is the sensitivity profile of pixel j of the camera; we can think of
the pixel grid as an array of sensors, each with its own profile function.
• G(x ↔ y) is the geometric term (Eq. 9), which specifies the differential amount
of illumination carried along segments of the light path. Among other things, it
accounts for visibility: when there is no unobstructed line of sight between x and
y, G evaluates to zero.
• f (xk−1 → xk → xk+1 ) is the BSDF, which specifies how much of the light
that travels from xk−1 to xk is then scattered towards position xk+1 . This function
essentially characterizes the material appearance of an object (e.g., whether it is
made of wood, plastic, concrete, etc.).
Over the last 40 years, considerable research has investigated realistic expressions
for the Le , We , and f terms. In this article, we do not discuss their definition and prefer
to think of them as black box functions that can be queried by the rendering algo-
rithm. This is similar to how rendering software is implemented in practice: a scene
description might reference a particular material (e.g., car paint) whose correspond-
ing function f is provided by a library of material implementations. The algorithm
accesses it through a high-level interface shared by all materials, but without specific
knowledge about its internal characteristics.
Path Space Markov Chain Monte Carlo Methods … 117

4.1 Regular Expressions to Select Sets of Light Paths

Different materials can interact with light in fundamentally different ways, which
has important implications on the design of rendering algorithms. It is helpful to
distinguish between interactions using a 1-letter classification for each vertex type:
S (ideal specular): specular surfaces indicate boundaries between materials with
different indices of refraction (e.g., air and water). Ideal specular boundaries have
no roughness and cause and incident ray of light to be scattered into a discrete set
of outgoing directions (Fig. 3). Examples of specular materials include polished
glass and metal surfaces and smooth coatings.
G (glossy): glossy surfaces also mark an index of refraction transition, but in this
case the surface is affected by small-scale roughness. This causes the same ray
to scatter into a continuous distribution of directions which concentrates around
the same directions as the ideally smooth case.
D (diffuse): diffuse surfaces reflect light into a directional distribution that is either
uniform or close to uniform; examples include clay and plaster.
We additionally assign the labels L and E to light source and camera (“eye”) vertices,
respectively, allowing for the classification of entire light paths using a sequence of
symbols (e.g., “LSDSE”). Larger classes of paths can be described using Heckbert’s
path regular expressions [8], which add convenient regular expression rules such as
the Kleene star “*” and plus “+” operators. For instance, LD+E refers to light that
has been scattered only by diffuse surfaces before reaching the camera. We will use
this formalism shortly.

4.2 Path Tracing Variants

The path tracing algorithm discussed in Sect. 3 constructs complete light paths by
randomly sampling them one vertex at a time (we refer to this as sequential sam-
pling). In each iteration, it randomly chooses an additional light path vertex xi−1
using a probability density that is proportional to the (partial) weighting function
ϕ(· · · xi−1 xi xi+1 · · · ) involving only factors that depend on the previous two ver-
tices, i.e. xi and xi+1 (this is a variant of the Markov property). The indices decrease
because the algorithm constructs paths in reverse; intuitively, it searches for the tra-
jectory of an idealized light “particle” that moves backwards in time until its emission
point on the light source is found.
Path tracing performs poorly when the emission point of a light path is challenging
to find, so that complete light paths are constructed with low probability. This occurs
in a wide range of situations; Fig. 5 shows an example where the light sources are
encased, making it hard to reach them by chance. The path tracing rendering has
unacceptably high variance at 32 samples per pixel.
The path space view makes it possible to construct other path tracing variants with
better behavior. For instance, we can reverse the direction of the random walk and
118 W. Jakob

Fig. 5 A bidirectional path tracer finds light paths by generating partial paths starting at the camera
and light sources and connecting them in every possible way. The resulting statistical estimators
tend to have lower variance than unidirectional techniques. Modeled after a scene by Eric Veach. a
Path tracer, 32 samples/pixel. b Bidirectional path tracer, 32 samples/pixel

generate vertex xi+1 from xi and xi−1 , which leads to a method referred to as light
tracing or particle tracing. This method sends out particles from the light source
(thus avoiding problems with the enclosure) and records the contribution to rendered
pixels when they hit the aperture of the camera.

4.2.1 Bidirectional Path Tracing (BDPT)

The bidirectional path tracing method (BDPT) [17, 29] computes radiance estimates
via two separate random walks from the light sources and the camera. The resulting
two partial paths are connected for every possible vertex pair, creating many complete
paths of different lengths, which supplies this method with an entire family of path
sampling strategies. A path with n vertices can be created in n + 1 different ways,
which is illustrated by Fig. 6 for a simple path with 3 vertices (2 endpoints and 1
scattering event). The captions s and t indicate the number of sampling steps from
the camera and light source. In practice, each of the strategies is usually successful at
dealing with certain types of light paths, while being a poor choice for others (Fig. 7).

4.2.2 Multiple Importance Sampling (MIS)

Because all strategies are defined on the same space (i.e. path space), and because
each has a well-defined density function on this space, it is possible to evaluate
and compare these densities to determine the most suitable strategy for sampling
particular types of light paths. This is the key insight of multiple importance sampling
Path Space Markov Chain Monte Carlo Methods … 119

(a) s=0, t=3 (b) s=1, t=2

(c) s=2, t=1 (d) s=3, t=0

Fig. 6 The four different ways in which bidirectional path tracing can create a path with one
scattering event: a Standard path tracing, b Path tracing variant: connect to sampled light source
positions, c Standard light tracing, d Light tracing variant: connect to sampled camera positions.
Solid lines indicate sampled rays which are intersected with the geometry, whereas dashed lines
indicate deterministic connection attempts which must be validated by a visibility test

(MIS) [30] which BDPT uses to combine multiple sampling strategies in a provably
good way to minimize variance in the resulting rendering (bottom of Fig. 7).
Suppose two statistical estimators of the pixel intensity Ij are available. These
estimators can be used to generate two light paths x̄1 and x̄2 , which have path space
probability densities p1 (x̄1 ) and p2 (x̄2 ), respectively. The corresponding MC esti-
mates are given by

ϕ(x̄1 ) ϕ(x̄2 )
Ij(1)  = and Ij(2)  = .
p1 (x̄1 ) p2 (x̄2 )

To obtain a combined estimator, we could simply average these estimators, i.e.:

1  (1) 
Ij(3)  := Ij  + Ij(2)  .
2
However, this is not a good idea, since the combination is affected by the variance
of the worst ingredient estimator (BDPT generally uses many estimators, including
ones that have very high variance). Instead, MIS combines estimators using weights
that are related to the underlying sample density functions:

Ij(4)  := w1 (x̄1 )Ij(1)  + w2 (x̄2 )Ij(2) ,

where
pi (x̄)
wi (x̄) := . (16)
p1 (x̄) + p2 (x̄)
120 W. Jakob

(a)

(b)

Fig. 7 The individual sampling strategies that comprise the previous BDPT rendering, both a
without and b with multiple importance sampling. Each row corresponds to light paths of a certain
length, and the top row matches the four strategies from Fig. 6. Almost every strategy has deficiencies
of some kind; multiple importance sampling re-weights samples to use strategies where they perform
well
Path Space Markov Chain Monte Carlo Methods … 121

While not optimal, Veach proves that no other choice of weighting functions can
significantly improve on Eq. (16). He goes on to propose a set of weighting heuris-
tics that combine many estimators (i.e., more than two), and which yield perceptually
better results. The combination of BDPT and MIS often yields an effective method
that addresses many of the flaws of the path tracing algorithm. Yet, even this combi-
nation can fail in simple cases, as we will discuss next.

4.3 Limitations of Monte Carlo Path Sampling

Ultimately, all Monte Carlo path sampling techniques can be seen to compute inte-
grals of the weighting function ϕ using a variety of importance sampling techniques
that evaluate ϕ at many randomly chosen points throughout the integration domain,
i.e., path space P.
Certain input, particularly scenes containing metal, glass, or other shiny surfaces,
can lead to integrals that are difficult to evaluate. Depending on the roughness of the
surfaces, the integrand can take on large values over small regions of the integration
domain. Surfaces of lower roughness lead to smaller and higher-valued regions,
which eventually collapse to lower-dimensional sets with singular integrands as the
surface roughness tends to zero. This case where certain paths cannot be sampled at
all is known as the problem of insufficient techniques [16].
Convergence problems arise whenever high-valued regions receive too few sam-
ples. Depending on the method used, this manifests as objectionable noise or other
visual artifacts in the output image that gradually disappear as the sample count N
tends to infinity. However, due to the slow convergence rate of MC integration (typ-
ical error is O(N −0.5 )), it may not be an option to wait for the error to average out.
Such situations can force users of rendering software to make unrealistic scene mod-
ifications (e.g., disabling certain light interactions), thereby compromising realism in
exchange for obtaining converged-looking results within a reasonable time. Biased
estimators can achieve lower errors in some situations—however, these methods are
beyond the scope of this article, we refer the reader to Pharr et al. [24] for an overview.
Figure 8 illustrates the behavior of several path sampling methods when rendering
caustics, which we define as light paths matching the regular expression LS+DS*E.
They form interesting light patterns at the bottom of the swimming pool due to
focusing effect of ripples in the water surface.
In Fig. 8a, light tracing is used to emit particles proportional to the light source
emission profile Le . The highlighted path is the trajectory of a particle that encoun-
ters the water surface and refracts into the pool. The refraction is an ideal specular
interaction described by Snell’s law and the Fresnel equations. The diffuse concrete
surface at the pool bottom then reflects the particle upwards into a direction drawn
from a uniform distribution, where it is refracted once more by the water surface.
Ultimately, the particle never hits the camera aperture and thus cannot contribute to
the output image.
122 W. Jakob

(a) Path tracing from the light source (b) Path tracing from the camera (c) Bidirectional path tracing

Fig. 8 Illustration of the difficulties of sequential path sampling methods when rendering LSDSE
caustic patterns at the bottom of a swimming pool. a, b Unidirectional techniques sample light paths
by executing a random walk consisting of alternating transport and scattering steps. The only way to
successfully complete a path in this manner is to randomly “hit” the light source or camera, which
happens with exceedingly low probability. c Bidirectional techniques trace paths from both sides,
but in this case they cannot create a common vertex at the bottom of the pool to join the partial light
paths

Figure 8b shows the behavior of the path tracing method, which generates paths
in the reverse direction but remains extremely inefficient: in order to construct a
complete light path x̄ with ϕ(x̄) > 0, the path must reach the “other end” by chance,
which happens with exceedingly low probability. Assuming for simplicity that rays
leave the pool with a uniform distribution in Fig. 8b, the probability of hitting the
sun with an angular diameter of ∼ 0.5◦ is on the order of 10−5 .
BDPT traces paths from both sides, but even this approach is impractical here:
vertices on the water surface cannot be used to join two partial paths, since the
resulting pair of incident and outgoing directions would not satisfy Snell’s law. It is
possible to generate two vertices at the bottom of the pool as shown in the figure,
but these cannot be connected: the resulting path edge would be fully contained in a
surface rather than representing transport between surfaces.
In this situation, biased techniques would connect the two vertices at the bottom
of the pool based on a proximity criterion, which introduces systematic errors into
the solution. We will only focus on unbiased techniques that do not rely on such
approximations.
The main difficulty in scenes like this is that caustic paths are tightly constrained:
they must start on the light source, end on the aperture, and satisfy Snell’s law in two
places. Sequential sampling approaches are able to satisfy all but one constraint and
run into issues when there is no way to complete the majority of paths.
Paths like the one examined in Fig. 8 lead to poor convergence in other settings
as well; they are collectively referred to as specular–diffuse–specular (SDS) paths
due to the occurrence of this sequence of interactions in their path classification.
SDS paths occur in common situations such as a tabletop seen through a drinking
glass standing on it, a bottle containing shampoo or other translucent liquid, a shop
window viewed and illuminated from outside, as well as scattering inside the eye of
a virtual character. Even in scenes where these paths do not cause dramatic effects,
their presence can lead to excessively slow convergence in rendering algorithms that
attempt to account for all transport paths. It is important to note that while the SDS
class of paths is a well-studied example case, other classes (e.g., involving glossy
Path Space Markov Chain Monte Carlo Methods … 123

Algorithm 2 Pseudocode of a MCMC-based rendering algorithm


function Metropolis- Light- Transport
x̄0 ← An initial light path
for i = 1 to N do
x̄i ← Mutate(x̄i−1 )

ϕ(x̄ )T (x̄ ,x̄ )
x̄i with probability min 1, ϕ(x̄ i )T (x̄i i−1,x̄ )
x̄i ← i−1 i−1 i
x̄i−1 otherwise
Record(x̄i )
end for

interactions) can lead to many similar issues. It is desirable that rendering methods
are robust to such situations. Correlated path sampling techniques based on MCMC
offer an attractive way to approach such challenges. We review these methods in the
remainder of this article.

5 Markov Chain Monte Carlo (MCMC) Rendering


Techniques

In 1997, Veach and Guibas proposed an unusual rendering technique named Metropo-
lis Light Transport [31], which applies the Metropolis–Hastings algorithm to the path
space integral in Eq. (14). Using correlated samples and highly specialized mutation
rules, their approach enables more systematic exploration of the integration domain,
avoiding many of the problems encountered by methods based on standard Monte
Carlo and sequential path sampling.
Later, Kelemen et al. [14] showed that a much simpler approach can be used
to combine MCMC sampling with existing MC rendering algorithms, making it
possible to side-step the difficulties of the former method. The downside of their
approach is the reduced flexibility in designing custom mutation rules. An extension
by Hachisuka et al. [7] further improves the efficiency of this method.
Considerable research has built on these two approaches, including extensions
to participating media [23], combinations of MCMC and BDPT [7], specialized
techniques for specular [11] and glossy [13] materials, gradient-domain rendering
[18, 19], and MCMC variants which perform a localized non-ergodic exploration of
path space [3].
In this section, we provide an overview of the initial three methods, starting first
with the Primary Sample Space approach by Kelemen et al. followed the extension by
Hachisuka et al. and finally the Metropolis Light Transport algorithm by Veach and
Guibas. All variants are based on a regular MCMC iteration shown in Algorithm 2.
Starting with an initial light path x̄0 , the methods simulate N steps of a Markov Chain.
In each step, a mutation is applied to the path x̄i−1 to obtain a proposal path x̄i , where
it is assumed that the proposal density is known and given by T (x̄i−1 , x̄i ). After a
standard Metropolis–Hastings acceptance/rejection step, the algorithm invokes the
124 W. Jakob

function Record(x̄i ), which first determines the pixel associated with the current
iteration’s light path xi and then increases its brightness by a fixed amount.
These MCMC methods all sample light paths proportional to the amount they
contribute to the pixels of the final rendering; by increasing the pixel brightness in
this way during each iteration, these methods effectively compute a 2D histogram of
the marginal distribution of ϕ over pixel coordinates. This is exactly the image to be
rendered up to a global scale factor, which can be recovered using a traditional MC
sampling technique such as BDPT. The main difference among these algorithms is
the underlying state space, as well as the employed set of mutation rules.

5.1 Primary Sample Space Metropolis Light Transport


(PSSMLT)

Primary Sample Space Metropolis Light Transport (PSSMLT) [14] combines tra-
ditional MC sampling techniques with a MCMC iteration. The approach is very
flexible and can also be applied to integration problems outside of computer graph-
ics. PSSMLT always operates on top of an existing MC sampling technique; we
assume for simplicity that path tracing is used, but many other techniques are also
admissible. The details of this method are easiest to explain from a implementation-
centric viewpoint.
Recall the path tracing pseudo-code shown earlier in Algorithm 1. Lines 1 and 2
performed random sampling steps, but the rest of the procedure was fully determin-
istic. In practice, the first two lines are often realized using a pseudorandom number
generator such as Mersenne Twister [20] or a suitable quasi-Monte Carlo scheme
[6], potentially using the inversion method or a similar technique to warp uniform
variates to desired distributions as needed. For more details, we refer the reader to a
tutorial by Keller [15].
Let us consider a small adjustment to the implementation of this method: instead
of generating univariate samples during the recursive sampling steps, we can also
generate them ahead of time and supply them to the implementation as an additional
argument, in which case the algorithm can be interpreted as a fully deterministic
function of its (random or pseudorandom) arguments. Suppose that we knew (by
some way) that the maximum number of required random variates was equal to n,
and that the main computation was thus implemented by a function with signature
Ψ : [0, 1]n → R, which maps a vector of univariate samples to a pixel intensity
estimate. By taking many estimates and averaging them to obtain a converged pixel
intensity, path tracing is effectively integrating the estimator over a n-dimensional
unit hypercube of “random numbers” denoted as primary sample space:

Ij = Ψ (ξ ) dξ . (17)
[0,1]n
Path Space Markov Chain Monte Carlo Methods … 125

(b)
(a)

Fig. 9 Primary sample space MLT performs mutations in an abstract random number space. A deter-
ministic mapping Ψ induces corresponding mutations in path space. a Primary sample space view.
b Path space view

The key idea of PSSMLT is to compute Eq. (17) using MCMC integration on primary
sample space, which leads to a trivial implementation, as all complications involving
light paths and other rendering-specific details are encapsulated in the “black box”
mapping Ψ (Fig. 9).
One missing detail is that the primary sample space dimension n is unknown
ahead of time. This can be solved by starting with a low-dimensional integral and
extending the dimension on demand when additional samples are requested by Ψ .
PSSMLT uses two types of Mutate functions. The first is an independence
sampler, i.e., it forgets the current state and switches to a new set of pseudorandom
variates. This is needed to ensure that the Markov Chain is ergodic. The second is a
local (e.g. Gaussian or similar) proposal centered around a current state ξi ∈ [0, 1]n .
Both are symmetric so that the proposal density T cancels in the acceptance ratio
(Line 5 in Algorithm 2).
PSSMLT uses independent proposals to find important light paths that cannot be
reached using local proposals. When it finds one, local proposals are used to explore
neighboring light paths which amortizes the cost of the search. This can significantly
improve convergence in many challenging situations and is an important advantage
of MCMC methods in general when compared to MC integration.
Another advantage of PSSMLT is that it explores light paths through a black box
mapping Ψ that already makes internal use of sophisticated importance sampling
techniques for light paths, which in turn leads to an easier integration problem in
primary sample space. The main disadvantage of this method is that its interaction
with Ψ is limited to a stream of pseudorandom numbers. It has no direct knowledge
of the generated light paths, which prevents the design of more efficient mutation
rules based on the underlying physics.
126 W. Jakob

5.2 Multiplexed Metropolis Light Transport (MMLT)

PSSMLT is commonly implemented in conjunction with the BDPT technique: in this


setting, the rendering algorithm generates paths using a large set of BDPT connection
strategies and then re-weights them using MIS. In most cases, only a subset of the
strategies is truly effective, and MIS will consequently assign a large weight to this
subset. One issue with the combination of BDPT and PSSMLT is that the algorithm
still spends a considerable portion of its time generating connections with strategies
that have low weights and thus contribute very little to the rendered image. Hachisuka
et al. [7] recently presented an extension of PSSMLT named Multiplexed Metropolis
Light Transport (MMLT) to address this problem.
They propose a simple but effective modification to the inner BDPT sampler;
the outer Metropolis–Hastings iteration remains unchanged: instead of generating
a sample from all BDPT connection strategies, the algorithm (pseudo-)randomly
chooses a single strategy and returns its contribution scaled by the inverse discrete
probability of the choice. This (pseudo-)random sample is treated in the same way as
other sampling operations in PSSMLT and exposed as an additional state dimension
that can be mutated using small or large steps. The practical consequence is that the
Markov Chain will tend to spend more computation on effective strategies, which
further improves the statistical efficiency of the underlying estimator (Fig. 10).
Components
MMLT
PSSMLT

(t, s ) = (2 , 5) (3 , 4) (4 , 3) (5 , 2) (6 , 1) Visualization

Fig. 10 Analysis of the Multiplexed MLT (MMLT) technique [7] (used with permission): the
top row shows weighted contributions from different BDPT strategies in a scene with challenging
indirect illumination [18, 28]. The intensities in the middle row visualize the time spent on each
strategy using the MMLT technique: they are roughly proportional to the weighted contribution
in the first row. The rightmost column visualizes the dominant strategies (3,4), (4, 3), and (5, 2)
using RGB colors. PSSMLT (third row) cannot target samples in this way and thus produces almost
uniform coverage
Path Space Markov Chain Monte Carlo Methods … 127

5.3 Path Space Metropolis Light Transport (MLT)

Path Space Metropolis Light transport, or simply Metropolis Light transport (MLT)
[31] was the first application of MCMC to the problem of light transport. Doucet
et al. [5] proposed a related method in applied mathematics, which focuses on a more
general class of integral equations.
The main difference as compared to PSSMLT is that MLT operates directly on path
space and does not use a black-box mapping Ψ . Its mutation rules are considerably
more involved than those of PSSMLT, but this also provides substantial freedom
to design custom rules that are well-suited for rendering specific physical effects.
MLT distinguishes between mutations that change the structure of the path and
perturbations that move the vertices by small distances while preserving the path
structure, both using the building blocks of bidirectional path tracing to sample paths.
One of the following operations is randomly selected in each iteration:
1. Bidirectional mutation: This mutation replaces a segment of an existing path
with a new segment (possibly of different length) generated by a BDPT-like
sampling strategy. This rule generally has a low acceptance ratio but it is essential
to guarantee ergodicity of the resulting Markov Chain.
2. Lens subpath mutation: The lens subpath mutation is similar to the previous
mutation but only replaces the lens subpath, which is defined as the trailing portion
of the light path matching the regular expression [^S]S*E.
3. Lens perturbation: This transition rule shown in Fig. 11a only perturbs the lens
subpath rather than regenerating it from scratch. In the example, it slightly rotates
the outgoing ray at the camera and propagates it until the first non-specular mate-
rial is encountered. It then attempts to create a connection (dashed line) to the
unchanged remainder of the path.
4. Caustic perturbation: The caustic perturbation (Fig. 11b) works just like the
lens perturbation, except that it proceeds in reverse starting at the light source. It
is well-suited for rendering caustics that are directly observed by the camera.
5. Multi-chain perturbation: This transition rule (Fig. 11c) is used when there
are multiple separated specular interactions, e.g., in the swimming pool example
encountered before. After an initial lens perturbation, a cascade of additional
perturbations follows until a connection to the remainder of the path can finally
be established.

The main downside of MLT is the severe effort needed to implement this method:
several of the mutation and perturbation rules (including their associated proposal
densities) are challenging to reproduce. Another problem is that a wide range of
different light paths generally contribute to the output image. The MLT perturbations
are designed to deal with specific types of light paths, but it can be difficult to foresee
every kind in order to craft a suitable set of perturbation rules. In practice, the included
set is insufficient.
128 W. Jakob

(a) (b)

(c) (d)

Fig. 11 MLT operates on top of path space, which permits the use of a variety of mutation rules that
are motivated by important physical scattering effects. The top row illustrates ones that are useful
when rendering a scene involving a glass object on top of a diffuse table. The bottom row is the
swimming pool example from Fig. 8. In each example, the original path is black, and the proposal
is highlighted in blue. a Lens perturbation. b Caustic perturbation. c Multi-chain perturbation. d
Manifold perturbation

6 Specular Manifolds and Manifold Exploration (ME)

In this section, we discuss the principles of Manifold Exploration (ME) [11], which
leads to the manifold perturbation (Fig. 11d). This perturbation provides local explo-
ration for large classes of different path types and subsumes MLT’s original set of
perturbations. We begin with a discussion of the concept of a specular manifold.
When a scene contains ideal specular materials, these materials require certain phys-
ical laws to be satisfied (e.g. Snell’s law or the law of reflection). Mathematically,
these act like constraint equations that remove some dimensions of the space of light
paths, leaving behind a lower-dimensional manifold embedded in path space.
We illustrate this using a simple example in two dimensions, in which a camera
observes a planar light source through an opposing mirror (Fig. 12). We will refer to
a light path joining two endpoints through a sequence of k ideal specular scattering

Light source Camera

Mirror

Fig. 12 A motivating example in two dimensions: specular reflection in a mirror


Path Space Markov Chain Monte Carlo Methods … 129

events as a specular chain of length k. A specular chain of length 1 from the light
source to the camera is shown in the figure.
Reflections in the mirror must satisfy the law of specular reflection. Assuming
that the space of all specular chains in this simple scene can be parameterized using
the horizontal coordinates x1 , x2 , and x3 , it states that

x1 + x3
x2 = , (18)
2
i.e., the x coordinate of the second vertex must be exactly half-way between the
endpoints. Note that this equation can also be understood as the implicit definition
of a plane in R3 (x1 − 2x2 + x3 = 0).
When interpreting the set of all candidate light paths as a three-dimensional space
P3 of coordinate tuples (x1 , x2 , x3 ), this constraint then states that the subset of
relevant paths has one dimension less and is given by the intersection of P3 and the
plane Eq. (18). With this extra knowledge, it is now easy to sample valid specular
chains, e.g. by generating x1 and x3 and solving for x2 .
Given general non-planar shapes, the problem becomes considerably harder, since
the equations that have to be satisfied are nonlinear and may admit many solutions.
Prior work has led to algorithms that can find solutions even in such cases [21, 33]
but these methods are closely tied to the representation of the underlying geometry,
and they become infeasible for specular chains with lengths greater than one. Like
these works, ME finds valid specular chains—but because it does so within the
neighborhood of a given path, it avoids the complexities of a full global search and
does not share these limitations.
ME is also related to the analysis of reflection geometry presented by Chen and
Arvo [2], who derived second-order expansion of the neighborhood of a path. The
main difference is that ME solves for paths exactly and is used as part of an unbiased
MCMC rendering algorithm.

6.1 Integrals Over Specular Manifolds

Let us return to our previous example of the swimming pool involving the family
of light paths LSDSE. These paths belong to the P5 component of the path space
P (Eq. 12), which is a 10-dimensional space with two dimensions for each surface
position. As we will see shortly, the paths that contribute have to satisfy two constraint
equations involving unit directions in R3 (which each have 2 degrees of freedom).
This constrains a total of four dimensions of the path, meaning that all contributing
paths lie on a manifold S of dimension 6 embedded in P5 .
The corresponding integral Eq. (13) is more naturally expressed as an integral
over this specular manifold S , rather than as an integral over the entire path space:

ϕ(x1 · · · x5 ) dA(x1 , x3 , x5 ).
S
130 W. Jakob

Note the absence of the specular vertices x2 and x4 in the integral’s area product
measure. The contribution function ϕ still has the same form: a product of terms cor-
responding to vertices and edges of the path. However, singular reflection functions
at specular vertices are replaced with (unitless) specular reflectance values, and the
geometric terms are replaced by generalized geometric terms over specular chains
that we will denote G(x1 ↔ x2 ↔ x3 ) and G(x3 ↔ x4 ↔ x5 ).
The standard geometric term G(x ↔ y) for a non-specular edge computes the area
ratio of an (infinitesimally) small surface patch at one vertex and its projection onto
projected solid angles as seen from the other vertex. The generalized geometry factor
is defined analogously: the ratio of solid angle at one end of the specular chain with
respect to area at the other end of the chain, considering the path as a function of the
positions of the endpoints.

6.2 Constraints for Reflection and Refraction

Equation (18) introduced a simple specular reflection constraint for axis-aligned


geometry in two dimensions. This constraint easily generalizes to arbitrary geometry
in three dimensions and to both specular reflection and refraction.
Recall the law of specular reflection, which states that incident and outgoing direc-
tions make the same angle with the surface normal. Furthermore, all three vectors
must be coplanar (Fig. 13). We use an equivalent reformulation of this law, which
states that the half direction vector of the incident and outgoing direction ωi and ωo ,
defined as
ωi + ωo
h(ωi , ωo ) := , (19)

ωi + ωo

is equal to the surface normal, i.e., h(ωi , ωo ) = n. In the case of refraction, the
relationship of these directions is explained by Snell’s law. Using a generalized
definition of the half direction vector which includes weighting by the incident and
outgoing indices of refraction [32]; i.e.,

Specular reflection Specular refraction

Fig. 13 In-plane view of the surface normal n and incident and outgoing directions ωi and ωo at a
surface marking a transition between indices of refraction ηi and ηo
Path Space Markov Chain Monte Carlo Methods … 131

ηi ωi + ηo ωo
h(ωi , ωo ) := , (20)

ηi ωi + ηo ωo

we are able to use a single constraint h(ωi , ωo ) = ±n which subsumes both Snell’s
law and the law of specular reflection (in which case ηi equals ηo ). Each specular
vertex xi of a path x̄ must satisfy this generalized constraint involving its own position
and the positions of the preceding and following vertices. Note that this constraint
involves unit vectors with only two degrees of freedom. We can project (20) onto a
two-dimensional subspace to reflect its dimensionality:

ci (x̄) = T (xi )T h(−


x− −→ −−−→
i xi−1 , xi xi+1 ), (21)

The functions ci : P → R2 compute the generalized half-vector at vertex xi and


project it onto the tangent space of the underlying scene geometry at this position,
which is spanned by the columns of the matrix T (xi ) ∈ R3×2 ; the resulting 2-vector
is zero when h(ωi , ωo ) is parallel to the normal. Then the specular manifold is simply
the set
S = {x̄ ∈ P | ci (x̄) = 0 if vertex xi is specular} . (22)

6.3 Local Manifold Geometry

The complex nonlinear behavior of S severely limits our ability to reason about its
geometric structure globally. In this section, we therefore focus on local properties,
leading to an explicit expression for the tangent space at any point on the mani-
fold. This constitutes the key geometric information needed to construct a numerical
procedure that is able to move between points on the manifold.
For simplicity, let us restrict ourselves to the case of a single specular chain
x̄ = x1 · · · xk with k − 2 specular vertices and non-specular endpoints x1 and xk ,
matching the path regular expression DS+D. This suffices to cover most cases by
separate application to each specular chain along a path. To analyze the geometry
locally, we require a point in S , i.e., a light path x̄ satisfying all specular constraints,
to be given.
We assume that local parameterizations of the surfaces in the scene on small
neighborhoods around every vertex are provided via functions x̂i (ui , vi ) : R2 → M ,
where x̂i (0, 0) = xi . We can then express the constraints ci in terms of these local
coordinates and stack them on top of each other to create a new function ĉ with
signature ĉ : R2k → R2k−4 , which maps 2k local coordinate values to 2k − 4 =
2(k − 2) projected half direction vector coordinates—two for each of the specular
vertices of the chain. The set
 
Sloc = (u1 , v1 , . . . , uk , vk ) ∈ R2k | ĉ(. . .) = 0 (23)
132 W. Jakob

then describes the (four-dimensional) specular manifold in terms of local coordinates


around the path x̄, which is identified with the origin. Under the assumption that the
Jacobian of ĉ has full rank (more on this shortly), the Implicit Function Theorem [26]
states that the implicitly defined manifold (23) can be converted into the (explicit)
graph of a function q : R4 → R2k−4 on an epsilon ball B4 (ε) around the origin.
Different functions q are possible—in our case, the most useful variant determines
the positions of all the specular vertices from the positions of the non-specular end-
points, i.e.
  

Sloc = (u1 , v1 , q(u1 , v1 , uk , vk ), uk , vk )  (u1 , v1 , uk , vk ) ∈ B4 (ε) . (24)

Unfortunately, the theorem does not specify how to compute q—it only guarantees
the existence of such a function. It does, however, provide an explicit expression for
the derivative of q, which contains all information we need to compute a basis for
the tangent space at the path x̄, which corresponds to the origin in local coordinates.
This involves the Jacobian of the constraint function ∇ ĉ(0), which is a matrix of
k − 2 by k 2-by-2 blocks with a block tridiagonal structure (Fig. 14).

(a) (b)

where

(c) (d)

Fig. 14 The linear system used to compute the tangent space and its interpretation as a derivative
of a specular chain. a An example path. b Associated constraints. c Constraint Jacobian. d Tangent
space
Path Space Markov Chain Monte Carlo Methods … 133

If we block the derivative ∇ ĉ, as shown in the figure, into 2-column matrices B1
and Bk for the first and last vertices and a square matrix A for the specular chain, the
tangent space to the manifold in local coordinates is
 
TS (x̄) = −A−1 B1 Bk . (25)

This matrix is k − 2 by 2 blocks in size, and each block represents the derivative of
one vertex with respect to one endpoint.
This construction computes tangents with respect to a graph parameterization
of the manifold, which is guaranteed to exist for a suitable choice of independent
variables. Because we always use the endpoint vertices for this purpose, difficulties
arise when one of the endpoints is located exactly at the fold of a caustic wavefront,
in which case ∇ ĉ becomes rank-deficient and A fails to be invertible. This happens
rarely in practice and is not a problem for our method, which allows for occasional
parameterization failures. In other contexts where this is not acceptable, the chain
could be parameterized by a different pair of vertices when a non-invertible matrix
is detected.
These theoretical results about the structure of the specular manifold can be used
in an algorithm to solve for specular paths, which we discuss next.

6.4 Walking on the Specular Manifold

In practice, we always keep one endpoint fixed (e.g., x1 ), while parameterizing the
remaining two-dimensional set. Figure 15 shows a conceptual sketch of the manifold
of a specular chain that is parameterized by the last vertex xk . This vertex is initially
target
located at xkstart , and we search for a valid configuration where it is at position xk .
The derivation in Sect. 6.3 provides a way of extrapolating the necessary change of
x2 , . . . , xk−1 to first order, but this is not enough: an expansion, no matter to what
order, will generally not be able to find a valid path that is located on S .
To address this issue, we combine the extrapolation with a simple projection oper-
ation, which maps approximate paths back onto S by intersecting the extrapolated
ray x1 → x2 with the scene geometry and using the appropriate laws of reflection and
refraction to compute the remaining vertex locations. The combination of extrapola-
tion and projection behaves like Newton’s method, exhibiting quadratic convergence
near the solution; details on this iteration can be found in the original paper [11].
Figure 16 shows a sketch of how manifold walks can be used in a MLT-like itera-
tion: a proposal begins to modify a light path by perturbing the outgoing direction at
vertex xa . Propagating this direction through a specular reflection leads to a modified
position xb on a diffuse surface. To complete the partial path, it is necessary to find a
specular chain connecting xb to the light source. Here, we can simply apply a mani-
fold walk to the existing specular chain xb · · · xc to solve for an updated configuration
134 W. Jakob

te
pola
extra
ct
oje
Start pr

ext
rap
ola
te

project

Target

Fig. 15 Manifold walks use a Newton-like iteration to locally parameterize the specular manifold.
The extrapolation operation takes first-order steps based on the local manifold tangents, which are
subsequently projected back onto the manifold
trac

a ted
ed

upd
half-vector equal
to surface normal

Fig. 16 Example of a manifold-based path perturbation

xb · · · xc . The key observation is that MCMC explores the space of light paths using
localized steps, which is a perfect match for the local parameterization of the path
manifold provided by Manifold Exploration.

6.5 Results

Figures 17 and 18 show the comparisons of several MCMC rendering techniques for
an interior scene containing approximately 2 million triangles with shading normals
and a mixture of glossy, diffuse, and specular surfaces and some scattering volumes.
One hour of rendering time was allotted to each technique; the results are intentionally
unconverged to permit a visual analysis of the convergence behavior. By reasoning
about the geometry of the specular and offset specular manifolds for the paths it
encounters, the ME perturbation strategy is more successful at rendering certain
paths—such as illumination that refracts from the bulbs into the butter dish, then to
the camera (6 specular vertices)—that the other methods struggle with.
Path Space Markov Chain Monte Carlo Methods … 135

(a) (b)

(c) (d)

Fig. 17 This interior scene shows chinaware, a teapot containing an absorbing medium, and a butter
dish on a glossy silver tray. Illumination comes from a complex chandelier with glass-enclosed bulbs.
Prior methods have difficulty in finding and exploring relevant light paths, which causes noise and
other convergence artifacts. Equal-time renderings on an eight-core Intel Xeon X5570 machine at
1280 × 720 pixels in 1 h. a MLT [28]. b ERPT [3]. c PSSMLT [14]. d ME [11]

(a) (b) (c) (d)

Fig. 18 This view of a different part of the room, now lit through windows using a spheri-
cal environment map surrounding the scene, contains a scattering medium inside the glass egg.
Equal-time renderings at 720 × 1280 pixels in 1 h. a MLT [28]. b ERPT [3]. c PSSMLT [14].
d ME [11]
136 W. Jakob

7 Perturbation Rules for Glossy Transport

Realistic scenes contain a diverse set of materials and are usually not restricted to
specular or diffuse BSDFs. It is important for the used rendering method to gen-
eralize to such cases. All derivations thus far focused on ideal specular materials,
but it is possible to extend manifold walks to glossy materials as well. Jakob and
Marschner proposed a simple generalization of ME, which works for moderately
rough materials, and Kaplanyan et al. [13] recently developed a natural constraint
representation of light paths. They proposed a novel half vector-based perturbation
rule as well as numerous enhancements including better tolerance to non-smooth
geometry and sample stratification in image space based on a frequency analysis of
the scattering operator. We provide a high level overview of both approaches here.

7.1 Glossy Materials in the Manifold Perturbation

Figure 19 shows a sketch of this generalization. In the ideal specular case, there is a
single specular chain (or discrete set) connecting xb and xc (top left), and all energy
is concentrated on a lower-dimensional specular manifold defined by c(x̄) = 0 (top
right). In the glossy case, there is a continuous family of chains connecting xb and xc
(bottom left), and the space of light paths has its energy concentrated in a thin “band”
near the specular manifold. The key idea of how ME handles glossy materials is to
take steps along a family of parallel offset manifolds c(x̄) = k (bottom right) so that

Specular

Glossy

Valid path configurations Schematic path space view

Fig. 19 Sketch of the generalization of Manifold Exploration to glossy materials


Path Space Markov Chain Monte Carlo Methods … 137

path space near the specular manifold can be explored without stepping out of this
thin band of near-specular light transport.

7.2 The Natural Constraint Formulation

The method by Kaplanyan et al. [13] takes a different approach to explore glossy
transport paths (Fig. 20): instead of parameterizing a glossy chain by fixing its half
vectors and moving the chain endpoints, their method parameterizes complete paths
starting at the light source and ending at the camera. The underlying manifold walk
keeps the path endpoints fixed and computes a nearby light path as a function of its
half vectors. The set of all half vectors along a path can be interpreted as a type of
generalized coordinate system for light paths: its dimension equals the path’s degrees
of freedom, while capturing the relevant constraints (reflection and refraction) in a
convenient explicit form. For this reason, the resulting parameterization is referred to
as the natural constraint representation, and the method is called half vector space
light transport (HSLT); loosely speaking, its perturbation can be seen to explore
“orthogonal” directions as compared to the “parallel” manifold walks of ME.

x3 x5
x1

h2 h3 h5
h4

x2 x4

x3 x5
x1

h2 h3 h4 h5

x2 x4

Fig. 20 In the above example, ME (top) constrains the half vectors of two glossy chains x1 . . . x4
and x4 . . . x6 and solves for an updated configuration after perturbing the position of x4 . HSLT
(bottom) instead adjusts all half vectors at once and solves for suitable vertex positions with this
configuration. This proposal is effective for importance sampling the material terms and leads to
superior convergence when dealing with transport between glossy surfaces. Based on a figure by
Kaplanyan et al. [13] (used with permission)
138 W. Jakob

The underlying approach is motivated by the following interesting observation:


when parameterizing light paths in terms of their half vectors, the influence of material
terms on the integrand ϕ approximately decouples (Fig. 21). The reason for this
effect is that the dominant terms in glossy reflectance models (which are factors
of ϕ) depend on the angle between the half vector and the surface normal. The
change of variables from path space to the half vector domain furthermore cancels out
the geometry terms G, leading to additional simplifications. As a consequence, this
parameterization turns ϕ into a much simpler function resembling a separate Gaussian
in each half vector dimension, which is related to the roughness of the associated
surface. Kaplanyan et al. also demonstrate how frequency-space information about
the scattering operator can be used to better spread out samples in image space,
which is important to accelerate convergence of the histogram generation method
that creates the final rendering.
Figure 22 shows a rendering comparison of a kitchen scene rendered by ME and
HSLT, where most of the illumination is due to caustics paths involving a reflection
by the glossy floor. After 30 min, the ME rendering is noticeably less converged and
suffers from stripe artifacts, which are not present in the HSLT result.

Difference
Material roughness coefficient

Fig. 21 The natural constraint formulation [13] is a parameterization of path space in the half vector
domain. It has the interesting property of approximately decoupling the influence of the individual
scattering events on ϕ. The figure shows a complex path where the half vector h3 is perturbed at
vertex x3 . The first column shows a false-color plot of ϕ over the resulting paths for different values
of h3 and two roughness values. The second column shows a plot of the BSDF value at this vertex,
which is approximately proportional to ϕ. Based on a figure by Kaplanyan et al. [13] (used with
permission)
Path Space Markov Chain Monte Carlo Methods … 139

MEMLT (30m)

HSLT+MLT (30m)

Fig. 22 Equal-time rendering of an interior kitchen scene with many glossy reflections. Based on
a figure by Kaplanyan et al. [13] (used with permission)

8 Conclusion

This article presented an overview of the physics underlying light transport simu-
lations in computer graphics. After introducing relevant physical quantities and the
main energy balance equation, we showed how to compute approximate solutions
using a simple Monte Carlo estimator. Following this, we introduced the concept of
path space and examined the relation of path tracing, light tracing, and bidirectional
path tracing—including their behavior given challenging input that causes these
methods to become impracticably slow. The second part of this article reviewed sev-
eral MCMC methods that compute path space integrals using proposal distributions
defined on sets of light paths. To efficiently explore light paths involving specular
materials, we showed how to implicitly define and locally parameterize the asso-
ciated paths using a root-finding iteration. Finally, we reviewed recent work that
aims to generalize this approach to glossy scattering interactions. Most of the meth-
ods that were discussed are implemented in the Mitsuba renderer [9], which is a
research-oriented open source rendering framework.
MCMC methods in rendering still suffer from issues that limit their usefulness
in certain situations. Most importantly, they require an initialization or mutation
rule that provides well distributed seed paths to the perturbations, as they can only
explore connected components of path space. Bidirectional Path Tracing and the
Bidirectional Mutation are reasonably effective but run into issues when there are
many disconnected components of path space. This becomes increasingly problem-
atic as their number increases. Ultimately, as the number of disconnected components
exceeds the number of samples that can be generated, local exploration of path space
becomes ineffective; future algorithms could be designed to attempt exploration only
in sufficiently large path space components.
Furthermore, the all perturbations rules made assumptions about specific path
configurations or material properties, which limits their benefits when rendering
scenes that contain a wide range of material types. To efficiently deal with light paths
140 W. Jakob

involving arbitrary materials, camera models, and light sources, a fundamentally


different construction will be needed.

Acknowledgments This research was conducted in conjunction with the Intel Science and Tech-
nology Center for Visual Computing. Additional funding was provided by the National Science
Foundation under grant IIS-1011919 and an ETH/Marie Curie fellowship. The author is indebted
to Olesya Jakob, who crafted several of the example scenes in this article.

References

1. Arvo, J.R.: Analytic methods for simulated light transport. Ph.D. thesis, Yale University (1995)
2. Chen, M., Arvo, J.: Theory and application of specular path perturbation. ACM Trans. Graph.
19(4), 246–278 (2000)
3. Cline, D., Talbot, J., Egbert, P.: Energy redistribution path tracing. ACM Trans. Graph. 24(3),
1186–1195 (2005)
4. Cook, R.L., Torrance, K.E.: A reflectance model for computer graphics. ACM Trans. Graph.
1(1), 7–24 (1982)
5. Doucet, A., Johansen, A., Tadic, V.: On solving integral equations using Markov Chain Monte
Carlo methods. Appl. Math. Comput. 216(10), 2869–2880 (2010)
6. Grnüschloß, L., Raab, M., Keller, A.: Enumerating quasi-Monte Carlo point sequences in
elementary intervals. In: Plaskota, L., Woźniakowski, H. (eds.) Monte Carlo and Quasi-Monte
Carlo Methods 2010. Springer Proceedings in Mathematics and Statistics, vol. 23, pp. 399–408.
Springer, Berlin (2012)
7. Hachisuka, T., Kaplanyan, A.S., Dachsbacher, C.: Multiplexed metropolis light transport. ACM
Trans. Graph. 33(4), 100:1–100:10 (2014)
8. Heckbert, P.S.: Adaptive radiosity textures for bidirectional ray tracing. In: Proceedings of
SIGGRAPH 90 on Computer Graphics. (1990)
9. Jakob, W.: Mitsuba renderer. http://www.mitsuba-renderer.org (2010)
10. Jakob, W.: Light transport on path-space manifolds. Ph.D. thesis, Cornell University (2013)
11. Jakob, W., Marschner, S.: Manifold exploration: a Markov Chain Monte Carlo technique for
rendering scenes with difficult specular transport. ACM Trans. Graph. 31(4), 58:1–58:13 (2012)
12. Kajiya, J.T.: The rendering equation. In: Proceedings of SIGGRAPH 86 on Computer Graphics,
pp. 143–150 (1986)
13. Kaplanyan, A.S., Hanika, J., Dachsbacher, C.: The natural-constraint representation of the path
space for efficient light transport simulation. ACM Trans. Graph. (Proc. SIGGRAPH) 33(4),
1–13 (2014)
14. Kelemen, C., Szirmay-Kalos, L., Antal, G., Csonka, F.: A simple and robust mutation strategy
for the Metropolis light transport algorithm. Comput. Graph. Forum 21(3), 531–540 (2002)
15. Keller, A.: Quasi-Monte Carlo Image Synthesis in a Nutshell. Springer, Heidelberg (2014)
16. Kollig, T., Keller, A.: Efficient Bidirectional Path Tracing by Randomized Quasi-Monte Carlo
Integration. Springer, Heidelberg (2002)
17. Lafortune, E.P., Willems, Y.D.: Bi-directional path tracing. In: Proceedings of the Compu-
graphics 93. Alvor, Portugal (1993)
18. Lehtinen, J., Karras, T., Laine, S., Aittala, M., Durand, F., Aila, T.: Gradient-domain Metropolis
light transport. ACM Trans. Graph. 32(4), 1 (2013)
19. Manzi, M., Rousselle, F., Kettunen, M., Lethinen, J., Zwicker, M.: Improved sampling for
gradient-domain Metropolis light transport. ACM Trans. Graph. 33(6), 1–12 (2014)
20. Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform
pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8(1), 3–30 (1998)
21. Mitchell, D.P., Hanrahan, P.: Illumination from curved reflectors. In: Proceedings of the SIG-
GRAPH 92 on Computer Graphics, pp. 283–291 (1992)
Path Space Markov Chain Monte Carlo Methods … 141

22. Nicodemus, E.: Geometrical Considerations and Nomenclature for Reflectance, vol. 160. US
Department of Commerce, National Bureau of Standards, Washington (1977)
23. Pauly, M., Kollig, T., Keller, A.: Metropolis light transport for participating media. In: Render-
ingTechniques 2000: 11th Eurographics Workshop on Rendering, pp. 11–22 (2000)
24. Pharr, M., Humphreys, G., Jakob, W.: Physically Based Rendering: From Theory to Imple-
mentation, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)
25. Preisendorfer, R.: Hydrologic optics. US Department of Commerce, Washington (1976)
26. Spivak, M.: Calculus on Manifolds. Addison-Wesley, Boston (1965)
27. Torrance, E., Sparrow, M.: Theory for off-specular reflection from roughened surfaces. JOSA
57(9), 1105–1112 (1967)
28. Veach, E.: Robust Monte Carlo methods for light transport simulation. Ph.D. thesis, Stanford
University (1997)
29. Veach, E., Guibas, L.: Bidirectional estimators for light transport. In: Proceedings of the Fifth
Eurographics Workshop on Rendering (1994)
30. Veach, E., Guibas, L.J.: Optimally combining sampling techniques for Monte Carlo rendering.
In: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques,
SIGGRAPH ’95, pp. 419–428. ACM (1995)
31. Veach, E., Guibas, L.J.: Metropolis light transport. In: Proceedings of the SIGGRAPH 97 on
Computer Graphics, pp. 65–76 (1997)
32. Walter, B., Marschner, S.R., Li, H., Torrance, K.E.: Microfacet models for refraction through
rough surfaces. In: Rendering Techniques 2007: 18th Eurographics Workshop on Rendering,
pp. 195–206 (2007)
33. Walter, B., Zhao, S., Holzschuch, N., Bala, K.: Single scattering in refractive media with triangle
mesh boundaries. ACM Trans. Graph 28(3), 92 (2009)
Walsh Figure of Merit for Digital
Nets: An Easy Measure for Higher Order
Convergent QMC

Makoto Matsumoto and Ryuichi Ohori

Abstract Fix an integer s. Let f : [0, 1)s → R be an integrable function. Let


P ⊂ [0, 1]s be a finite point set. Quasi-Monte Carlo integration of f by P is the aver-
age value of f over P that approximates the integration of f over the s-dimensional
cube. Koksma–Hlawka inequality tells that, by a smart choice of P, one may expect
that the error decreases roughly O(N −1 (log N )s ). For any α ≥ 1, J. Dick gave a con-
struction of point sets such that for α-smooth f , convergence rate O(N −α (log N )sα )
is assured. As a coarse version of his theory, M-Saito-Matoba introduced Walsh figure
of Merit (WAFOM), which gives the convergence rate O(N −C log N /s ). WAFOM
is efficiently computable. By a brute-force search of low WAFOM point sets, we
observe a convergence rate of order N −α with α > 1, for several test integrands for
s = 4 and 8.

Keywords Quasi-Monte Carlo · Walsh figure of merit · Numerical integration ·


Digital nets

1 Quasi-Monte Carlo and Higher Order Convergence

Fix an integer s. Let f : [0, 1)s → R be an integrable function. Our goal is to have
a good approximation of the value

I ( f ) := f (x)dx.
[0,1)s

M. Matsumoto (B)
Graduate School of Sciences, Hiroshima University, Hiroshima 739-8526, Japan
e-mail: m-mat@math.sci.hiroshima-u.ac.jp
R. Ohori
Fujitsu Laboratories Ltd., Kanagawa 211-8588, Japan
e-mail: ohori.ryuichi@jp.fujitsu.com

© Springer International Publishing Switzerland 2016 143


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_5
144 M. Matsumoto and R. Ohori

We choose a finite point set P ⊂ [0, 1)s , whose cardinality is called the sample size
and denoted by N . The quasi-Monte Carlo (QMC) integration of f by P is the value

1 
I ( f ; P) := f (x),
N
x∈P

i.e., the average of f over the finite points P that approximates I ( f ). The QMC
integration error is defined by

Error( f ; P) := |I ( f ) − I ( f ; P)|.

If P consists of N independently, uniformly and randomly chosen points, the QMC


integration is nothing but the classical Monte Carlo (MC) integration, where the
integration error is expected to decrease with the order of N −1/2 when N increases,
if f has a finite variance.
The main purpose of QMC integration is to choose good point sets so that the
integration error decreases faster than MC. There are enormous studies in diverse
directions, see for examples [7, 19].
In applications, often we know little on the integrand f , so we want point sets
which work well for a wide class of f . An inequality of the form

Error( f ; P) ≤ V ( f )D(P), (1)

called of Koksma–Hlawka type, is often useful. Here, V ( f ) is a value independent


of P which measures some kind of variance of f , and D(P) is a value independent
of f which measures some kind of discrepancy of P from an “ideal” uniform
distribution. Under such an inequality, we may prepare point sets with small values
of D(P), and use them for QMC-integration if V ( f ) is expected to be not too large.
In the case of the original Koksma–Hlawka inequality, [19, Chaps. 2 and 3],
V ( f ) is the total variation of f in the sense of Hardy and Krause, and D(P)
is the star discrepancy of the point set. In this case the inequality is known to
be sharp. It is a conjecture that there is a constant cs depending only on s such
that D ∗ (P) > cs (log N )s−1 /N , and there are constructions of point sets with
D ∗ (P) < Cs (log N )s /N . Thus, to obtain a better convergence rate, one needs to
assume some restriction on f . If for a function class F , there are V ( f ) ( f ∈ F )
and D(P) with the inequality (1) with a sequence of point sets P1 , P2 , . . . with
D(Pi ) decreases faster than the order 1/Ni , then it is natural to call the point sets
as higher order QMC point sets for the function class F .
It is known that this is possible if we assume some smoothness on f . Dick [2, 4,
7] showed that for any positive integer α, there is a function class named α-smooth
such that the inequality

Error( f ; P) ≤ C(α, s)|| f ||α Wα (P)


Walsh Figure of Merit for Digital Nets: An Easy Measure … 145

holds, where point sets with Wα (P) = O(N −α (log N )sα ) are constructible from
(t, m, s)-nets (named higher order digital net). The definition of Wα (P) is given
later in Sect. 5.3. We omit the definition of || f ||α , which depends on all partial
mixed derivatives up to the αth order in each variable; when s = 1, it is defined by
α 
 1 2  1  
   (α) 2
 f 2α :=  f (i) (x) d x  +  f (x) d x.
i=0 0 0

2 Digital Net, Discretization and WAFOM

In [16], Saito, Matoba and the first author introduced Walsh figure of merit (WAFOM)
WF(P) of a digital net1 P. This may be regarded as a simplified special case of Dick’s
Wα with some discretization. WAFOM satisfies a Koksma–Hlawka type inequality,
and the value WF(P) decreases in the order O(N −C(log2 N )/s+D ) for some constant
C, D > 0 independent of s, N . Thus, the order of the convergence is faster than
O(N −α ) for any α > 0.

2.1 Discretization

Although the following notions are naturally extended to Z/b or even any finite
abelian groups [29], we treat only the case when base b = 2 for simplicity.
Let F2 := {0, 1} = Z/2 be the two-element field. Take n large enough, and approx-
imate the unit interval I = [0, 1) by the set of n-bit integers In := F2 n through the
inclusion In → I, x(considered as an n-bit integer) → x/2n + 1/2n+1 .
More precisely, we identify the finite set In with the set of half open intervals
obtained by partitioning [0, 1) into 2n pieces; namely

In := {[i2−n , (i + 1)2−n ) | 0 ≤ i ≤ 2n − 1}.

Example 1 In the case n = 3 and I3 = {0, 1}3 , I3 is the set of 8 intervals in Fig. 1.
The s-dimensional hypercube I s is approximated by the set Ins of 2ns hypercubes,
which is identified with Ins = (F2 n )s = Ms,n (F2 ) =: V . In sum,

Fig. 1 {0, 1}3 is identified


with the set of 8
segments I3

1 See Sect. 2.3 for a definition of digital nets; there we use the italic
P instead of P for a digital net,
to stress that actually P is a subspace of a discrete space, while P is in a continuous space I s .
146 M. Matsumoto and R. Ohori

Definition 1 Let V := Ms,n (F2 ) be the set of (s × n)-matrices with coefficients in


F2 = {0, 1}. An element B = (bi j ) ∈ V is identified with an s-dimensional hyper-
cube in Ins , consisting of elements (x1 , . . . , xs ) ∈ Rs where, for each i, the binary
expansion of xi coincides with 0.bi1 bi2 · · · bin up to the nth digit below the deci-
mal point. By abuse of the language, the notation B is used for the corresponding
hypercube.
Example 2 In the case n = 3 and s = 2, for example,
 
100
B= corresponds to [0.100, 0.101) × [0.011, 0.100).
011

As an approximation of f : I s → R, define

1
f n : InS = V → R, B → f n (B) := f dx
Vol(B) B

by mapping a small hypercube B of edge length 2−n to the average of f over this
small hypercube. Thus, f n is the discretization (with n-bit precision) of f by taking
the average over each small hypercube.
In the following, we do not compute f n , but consider as if we are given f n .
More precisely saying, let x B denote the mid point of the hypercube B, and we
approximate f n (B) by f (x B ). For sufficiently large n, say, n = 32, the approximation
error | f n (B) − f (x B )| (which we call the discretization error of f at√B ) would be
small enough: if f is Lipschitz continuous, then the error2 has order s2−n .
From now on, we assume that n is taken large enough, so that this discretization
error is negligible in practice for the QMC integration considered. A justification is
that we have only finite precision computation in digital computers, so a function
f has discretized domain with some finite precision. This assumption is somewhat
cheating, but seems to work well in many practical uses.
By definition of the above discretization, we have an equality

1 
f (x) dx = f n (B).
[0,1)s |V | B∈V

2.2 Discrete Fourier Transform


For A, B ∈ V , we define its inner product by

(A, B) := trace(t AB) = ai j bi j ∈ F2 (mod2).
1≤i≤s,1≤ j≤n

2 Iff √has Lipschitz constant C, namely, satisfies f (x − y) < C|x − y|, then the error is bounded
by C s2−n [16, Lemma 2.1].
Walsh Figure of Merit for Digital Nets: An Easy Measure … 147

For a function g : V → R, its discrete Fourier transform ĝ : V → R is defined by

1 
ĝ(A) := g(B)(−1)(B,A) .
|V | B∈V

Thus
1 
fˆn (0) = f n (B) = I ( f ).
|V | B∈V

Remark 1 The value fˆn (A) coincides with the Ath Walsh coefficient
 of the function
f defined as follows. Let A = (ai j ). Define an integer ci := nj=1 ai j 2 j for each
i = 1, . . . , s. Then the Ath Walsh coefficient of f is defined as the standard multi-
indexed Walsh coefficient fˆc1 ,...,cs .

2.3 Digital Nets, and QMC-Error in Terms


of Walsh Coefficients

Definition 2 Let P ⊂ V be an F2 -linear subspace (namely, P is closed under com-


ponentwise addition modulo 2). Then, P can be regarded as a set of small hypercubes
in Ins , or, a finite point set P ⊂ I s by taking the mid point of each hypercubes. Such
a point set P (or even P) is called a digital net with base 2.
This notion goes back to Sobol and Niederreiter; see for example [7, Defini-
tion 4.47]. For such an F2 -subspace P, let us define its perpendicular space3 by

P ⊥ := {A ∈ V | (B, A) = 0 (∀B ∈ P)}.

QMC integration of f n by P is by definition


1  
I ( f n ; P) := f n (B) = fˆn (A), (2)
|P| B∈P ⊥A∈P

where the right equality (called Poisson summation formula) follows from
  1 
ˆ
A∈P ⊥ f n (A) = ⊥ ( B∈V f n (B)(−1)(B,A) )

A∈P |V | 
= |V1 | B∈V f n (B) A∈P ⊥ (−1)(B,A)

= |V1 | B∈P f n (B)|P ⊥ |
1 
= |P| B∈P f n (B).

3 Theperpendicular space is called “the dual space” in most literatures on QMC and coding theory.
However, in pure algebra, the dual space to a vector space V over a field k means V ∗ := Homk (V, k),
which is defined without using inner product. In this paper, we use the term “perpendicular” going
against the tradition in this area.
148 M. Matsumoto and R. Ohori

2.4 Koksma–Hlawka Type Inequality by Dick

From (2), we have a QMC integration error bound by Walsh coefficients


 
 
   

Error( f n ; P) = |I ( f n ; P) − fˆn (0)| =  fˆn (A) ≤ | fˆn (A)|. (3)
 A∈P ⊥ −{0}  A∈P ⊥ −{0}

Thus, to bound the error, it suffices to bound | fˆn (A)|.

Theorem 1 (Decay of Walsh coefficients, [3]) For an n-smooth function f , there


is a notion of n-norm || f ||n and a constant C(s, n) independent of f and A with

| fˆn (A)| ≤ C(s, n)|| f ||n 2−μ(A) .

(See [7, Theorem 14.23] for a general statement.) Here, μ(A) is defined as follows:

Definition 3 For A = (ai j )1≤i≤s,1≤ j≤n ∈ V , its Dick weight μ(A) is defined by

μ(A) := jai j ,
1≤i≤s,1≤ j≤n

where ai j ∈ {0, 1} are considered as integers (without modulo 2).

Example 3 In the case of s = 3, n = 4, for example,


⎛ ⎞ ⎛ ⎞
1001 ja 1004 (1 + 0 + 0 + 4)
A = ⎝ 0111 ⎠ → ⎝ 0234 ⎠ → μ(A) = +(0 + 2 + 3 + 4) = 17.
ij

0010 0030 +(0 + 0 + 3 + 0)

Walsh figure of merit of P is defined as follows [16]:


Definition 4 (WAFOM) Let P ⊂ V . WAFOM of P is defined by

WF(P) := 2−μ(A) .
A∈P ⊥ −{0}

By plugging this definition and Dick’s Theorem 1 into (3), we have an inequality
of Koksma–Hlawka type:

Error( f n ; P) ≤ C(s, n)|| f ||n WF(P). (4)


Walsh Figure of Merit for Digital Nets: An Easy Measure … 149

2.5 A Toy Experiment on WF( P)

We shall see how WAFOM works for a toy case of n = 3-digit precision and s = 1
dimension. In Fig. 1, the unit interval I is divided into 8 intervals, each of which
corresponds to a (1 × 3)-matrix in F2 3 = V . Table 1 lists the seven subspaces of
dimension 2, selection of four of them, and their WAFOM and QMC error for the
integrand f (x) = x, x 2 and x 3 . The first line in Table 1 shows the 8-element set
V = F2 3 , corresponding to the 8 intervals in Fig. 1. The next line (100)⊥ denotes
the 2-dimensional subspace of V consisting of the elements perpendicular to (100),
that is, the four vectors whose first digit is 0. In the same manner, all 2-dimensional
subspaces of V are listed. The last one is (111)⊥ , consisting of the four vectors
(x1 , x2 , x3 ) with x1 + x2 + x3 = 0(mod2).
Our aim is to decide which is the best (or most “uniform”) among the seven 2-
dimensional sub-vector spaces for QMC integration. Intuitively, (100)⊥ is not a good
choice since all the four intervals cluster in [0, 1/2]. Similarly, we exclude (010)⊥
and (110)⊥ . We compare the remaining four candidates by two methods: computing
WAFOM, and computing QMC integration errors with test integrand functions x, x 2
and x 3 .
The results are shown in the latter part of Table 1. The first line corresponds
to the case of P = V . Since P ⊥ − {0} is empty, WF(P) = 0. For the remaining
four cases P = (x1 , x2 , x3 )⊥ , note that {(x1 , x2 , x3 )⊥ }⊥ = {(000), (x1 , x2 , x3 )} and
P ⊥ − {0} = {(x1 , x2 , x3 )}, thus we have WF(P) = 2−μ((x1 ,x2 ,x3 )) . The third column
in the latter table shows WAFOM for five different choices of P. The three columns
“Error for x i ” with i = 1, 2, 3 show the QMC integration error by P for integrating
x i over [0, 1]. We used the mid point of each segment (of length 1/8) to evaluate f .

Table 1 Toy examples for WAFOM for 3-digit discretization for integrated x, x 2 and x 3
V = {000 001 010 011 100 101 110 111}
(100)⊥ = {000 001 010 011 }
(010)⊥ = {000 001 100 101 }
(110)⊥ = {000 001 110 111}
(001)⊥ = {000 010 100 110 }
(101)⊥ = {000 010 101 111}
(011)⊥ = {000 011 100 111}
(111)⊥ = {000 011 101 110 }
μ(A) for
P WF(P) Error for x Error for x 2 Error for x 3
A ∈ P ⊥ \0
V ∅ 0 0 −0.0013 −0.0020
001⊥ 0+0+3 2−3 −0.0625 −0.0638 −0.0637
101⊥ 1+0+3 2−4 0 −0.0299 −0.0449
011⊥ 0+2+3 2−5 0 +0.0143 +0.0215
111⊥ 1+2+3 2−6 0 −0.0013 −0.0137
150 M. Matsumoto and R. Ohori

Thus, the listed errors include both the discretization errors and QMC-integration
errors for f n . For the first line, P = V implies no QMC integration error for f n
(n = 3), so the values show the discretization error exactly. The error bound (4) is
proportional to WF(P) for a fixed integrand. The table shows that, for these test
functions, the actual errors are well reflected in WAFOM values.
Here is a loose interpretation of WF(P). For an F2 -linear P,
• A ∈ P ⊥ \{0} is a linear relation satisfied by P.
• μ(A) measures
 “complexity” of A.
• WF(P) = A∈P ⊥ \{0} 2−μ(A) is small if all relations have high complexity, and
hence P is close to “uniform.”

The weight j in the sum jai j in the definition of μ(A) denotes that the jth digit
below the decimal point is counted with complexity 2− j .

3 Point Sets with Low WAFOM Values

3.1 Existence and Non-existence of Low WAFOM Point Sets

Theorem 2 There are absolute (i.e. independent of s, n and d) positive constants


C, D, E such that for any positive integer s, n and d ≥ 9s, there exists a P ⊂ V of
F2 -dimension d (hence cardinality N = 2d ) satisfying

WF(P) ≤ E · 2−Cd /s+Dd


= E · N −C log2 N /s+D .
2

Since the exponent −C log2 N /s + D goes to −∞ when N → ∞, this shows that


there exist point sets with “higher order convergence” having this order of WAFOM.
There are two independent proofs: M-Yoshiki [17] shows the positivity of the prob-
ability to have low-WAFOM point sets under a random choice of its basis (hence
non-constructive), and K.Suzuki [28] shows a construction using Dick’s interleaving
method [7, Sect. 15] for Niederreiter-Xing sequence [21]. Suzuki [29] generalizes
[17] and [31] for arbitrary base b. Theorem 2 is similar to the Dick’s construction
of point sets with Wα (P) = O(N −α (log N )sα ) for arbitrary high α ≥ 1, but there
seems no implication between his result and this theorem.
On the other side, Yoshiki [31] proved the following theorem that the order of the
exponent d 2 /s is sharp, namely, WAFOM can not be so small:

√ 3 Let C > 1/2 be any constant. For any positive integer s, n and d ≥
Theorem
s × ( C + 1/16 + 3/4)/(C − 1/2), any linear subspace P ⊂ V of F2 -dimension
d satisfies
2
WF(P) ≥ 2−C d /s .
Walsh Figure of Merit for Digital Nets: An Easy Measure … 151

3.2 An Efficient Computation Method of WAFOM

Since P is intended for a QMC integration where the enumeration of P is necessary,


|P| = 2dimF2 P can not be huge. On the other hand, |V | = 2ns would be huge, say,
for n = 32 and s > 2. Since dimF2 P + dimF2 P ⊥ = dimF2 V , |P ⊥ | must be huge.
Thus, a direct computation of WF(P) using Definition 4 would be too costly. In [16],
the following formula is given by a Fourier inversion. Put B = (bi, j ), then we have
⎧ ⎫
1 ⎨  ⎬
WF(P) = [(1 + (−1)bi, j 2− j )] − 1 .
|P| B∈P ⎩1≤i≤s,1≤ j≤n ⎭

This is computable in O(ns N ) steps of arithmetic operations in real numbers, where


N = |P|. Compared with most of other discrepancies, this is relatively easily com-
putable. This allows us to do a random search for low-WAFOM point sets.

Remark 2 1. The above equality holds only for an F2 -linear P. Since the left hand
side is non-negative, so is the right sum in this case. It seems impossible to define
WAFOM for a general point set by using this formula, since for a general (i.e.
non-linear) P, the sum at the right hand side is sometimes negative and thus will
never give a bound on the integration error.
2. The right sum may be interpreted as the QMC integration of a function (whose
definition is given in the right hand side of the equality) by P. The integration of
the function over total space V is zero. Hence, the above equality indicates that,
to have a best F2 -linear P from the viewpoint of WAFOM, it suffices to have a
best P for QMC integration for a single specified function. This is in contrast to
the definition of star-discrepancy, where all the rectangle characteristic functions
are used as the test functions, and the supremum of their QMC integration errors
is taken.
3. Harase-Ohori [11] gives a method to accelerate this computation by a factor of
30, using a look-up table. Ohori-Yoshiki [25] gives a faster and simpler method
to compute a good approximation of WAFOM, using that Walsh coefficients of
exponential function approximates the Dick weight μ. More precisely,s WF(P)
is well-approximated by the QMC-error of the function exp(−2 i=1 xi ), whose
value is easy to evaluate in modern CPUs.

4 Experimental Results

4.1 Random Search for Low WAFOM Point Sets

We fix the precision n = 30. We consider two cases of the dimension s = 4 and
s = 8. For each d = 8, 9, 10, . . . , 16, we generate d-dimensional subspace P ⊂ V =
152 M. Matsumoto and R. Ohori

Fig. 2 WAFOM values for: (1) best WAFOM among 10000, (2) the 100th best WAFOM, (3)
Niederreiter-Xing, (4) Sobol , of size 2d with d = 8, 9, . . . , 16. The vertical axis is for log2 of their
WAFOM, and the horizontal for log2 of the size of point sets. The left figure is for dimension s = 4,
the right s = 8

(F2 30 )s 10000 times, by the uniformly random choice of d elements as its basis. Let
Pd,s be the point set with the lowest WAFOM among them. For the comparison, Q d,s
be the point set of the 100th lowest WAFOM.

4.2 Comparison of QMC Rules by WAFOM

For a comparison, we use two other QMC quadrature rules, namely, Sobol sequence
improved by Joe and Kuo [13], and Niederreiter-Xing sequence (NX) implemented
by Pirsic [27] and by Dirk Nuyens [23, item nxmats] (downloaded from the latter).
Figure 2 shows the WAFOM values for these four kinds of point sets, with size 28
to 216 . For s = 4, Sobol has largest WAFOM value, while NX has small WAFOM
comparable to the 100th best Q d,s selected by WAFOM. In d = 14, NX has much
larger WAFOM than that of Q 14,s , while in d = 15 the converse occurs. Note that
this seems to be reflected in the following experiments. For s = 8, the four kinds of
point sets show small differences in values of their WAFOM. Indeed, NX has smaller
WAFOM value than the best point set among randomly generated 10000 for each
d, while Sobol has larger WAFOM values. A mathematical analysis on this good
grade of NX would be interesting.

4.3 Comparison by Numerical Integration

In addition to the above four kinds of QMC rules, Monte Carlo method is used for
comparison (using Mersenne Twister [15] pseudorandom number generator). For the
test functions, we use 6 Genz functions [8]:
s
Oscillatory f 1 (x) = cos(2π
s u 1 + 2 i=1 ai xi ), 2
Product Peak f 2 (x) = i=1 [1/(ai + (xi − u i ) )],
Walsh Figure of Merit for Digital Nets: An Easy Measure … 153

Fig. 3 QMC integration errors for (1) best WAFOM among 10000, (2) the 100th best WAFOM,
(3) Niederreiter-Xing, (4) Sobol , (5) Monte Carlo, using six Genz functions on the 4-dimensional
unit cube. The vertical axis is for log2 of the errors, and the horizontal for log2 of the size of point
sets. The error is the mean square error for 100 randomly digital shifted point sets

s
Corner Peak f 3 (x) = (1 + i=1 ai xi )−(s+1)
s
Gaussian f 4 (x) = exp(−  i=1 i (x i − u i ) )
a 2 2
s
Continuous f 5 (x) = exp(−
 a
i=1 i i |x − u i |)
0 if x1 > u 1 or x2 > u 2 ,
Discontinuous f 6 (x) = s
exp( i=1 ai xi )) otherwise.
This selection is copied from [22, p. 91] [11]. The parameters a1 , . . . , as are selected
so that (1) they are in an arithmetic progression (2) as = 2a1 (3) the average of
a1 , . . . , as coincides with the average of c1 , . . . , c10 in [22, Eq. (10)] for each test
function. The parameters u i are generated randomly by [15].
Figure 3 shows the QMC integration errors for six test functions with five methods,
for dimension s = 4. The error for Monte Carlo is of order N −1/2 . The best WAFOM
point sets (WAFOM) and Niederreiter-Xing (NX) are comparable. For the function
Oscillatory, where its higher derivatives grow relatively slowly, WAFOM point sets
154 M. Matsumoto and R. Ohori

perform better than NX and Sobol , and the convergence rate seems of order N −2 . For
Product peak and Gaussian, WAFOM and NX are comparable; this coincides with the
fact that higher derivatives of these test functions rapidly grow, but still we observe
convergence rate N −1.6 . For Corner peak, WAFOM performs better than NX. It is
somewhat surprising that the convergence rate is almost N −1.8 for WAFOM point sets.
For Continuous, NX performs better than WAFOM. Since the test functions are not
differentiable, || f ||n is unbounded and hence the inequality (4) has no meaning. Still,
for Continuous, the convergence rate of WAFOM is almost N −1.2 . For Discontinuous,
NX and Sobol perform better than WAFOM. Note that except Discontinuous, the
large/small value of WAFOM of NX for d = 14, 15 observed in the left of Fig. 2
seems to be reflected in the five graphs.
We conducted similar experiments for s = 8 dimension, but we omit the results,
since their difference in WAFOM is small, and the QMC rules show not much
difference. We report that still we observe convergence rate with N −α with α > 1.05
for the five test functions except Discontinuous, for WAFOM selected points and
NX.
Remark 3 Convergence rate for the integration error is even faster than that of
WAFOM values, for WAFOM selected point sets and NX for s = 4, while Sobol
sequence converging with rate N −1 . We feel that these go against our intuition, so
checked the code and compared with MC. We do not know why NX and WAFOM
work so well.

5 WAFOM Versus Other Figure of Merits

Niederreiter’s t-value [19] is a most established figure of merit of a digital net. Using
test functions, we compare the effect of t-value and WAFOM for QMC integration.

5.1 t-Value

Let P ⊂ I S = [0, 1)s be a finite set of cardinality 2m . Let n 1 , n 2 , . . . , n


s ≥ 0 be
s
integers. Recall that Ini is the set of 2ni intervals partitioning I . Then, i=1 In i
is a set of 2n 1 +n 2 +···+n s intervals. We want to make the QMC integration error 0 in
computing the volume of every such interval. A trivial bound is n 1 + n 2 + · · · + n s ≤
m, since at least one point must fall in each interval. The point set P is called a
(t, m, s)-net if the QMC integration error for each interval is zero, for any tuple
(n 1 , . . . , n s ) with
n 1 + n 2 + · · · + n s ≤ m − t.

Thus, smaller t-value is more preferable.


Walsh Figure of Merit for Digital Nets: An Easy Measure … 155

Fig. 4 Left Hellekalek’s function f (x) = (x11.1 − 1+1.1 1


)(x21.7 − 1+1.7
1
)(x32.3 − 1+2.3
1
)(x42.9 −
1
1+2.9 ), right Hamukazu’s function f (x) = 2 4 {5x }{7x }{11x }{13x }, where {x} := x − [x]. Hor-
1 2 3 4
izontal axis for category, vertical for the log2 of error. :WAFOM, + ×: t-value

5.2 Experiments on WAFOM Versus t-Value

We fix the dimension s = 4 and the precision n = 32, and generate 106 (F2 -linear)
point sets of cardinality 212 by uniform random choices of their F2 basis consisting
of 12 vectors. We sort these 106 point sets, according to their t-values. It turns out
that 3 ≤ t ≤ 12, and the frequency of the point sets for a given t-value is as follows.

t 3 4 5 6 7 8 9 10 11 12
freq. 63 6589 29594 32403 18632 8203 2994 1059 365 98

Then, we sort the same 106 point sets by WAFOM. We categorize them into 10
classes from the smallest WAFOM, so that ith class has the same frequency with the
ith class by t-value. Thus, the same 106 point sets are categorized in two ways. For
a given test integrand function, compute the mean square error of QMC integral in
each category, for those graded by t-value and those graded by WAFOM.
Figure 4 shows log2 of the mean square integration error, for each category cor-
responding to 3 ≤ t ≤ 12 for t-value (+ ×), and for the category sorted by WAFOM
value (). The smooth test function in the left hand side comes from Hellekalek
[12], and the non-continuous function in the right hand side was communicated
from Kimikazu Kato (refered to as “Hamukazu” according to his established twitter
handle). From the left figure, for t = 3, the average error for the best 63 point sets
with the smallest t-value 3 is much larger than the average from the best 63 point
sets selected by WAFOM. Thus, the experiments show that for this test function,
WAFOM seems to work better than t-value in selecting good point set. We have no
explanation why the error decreases for t ≥ 9. In the right figure, for Hamukazu’s
non-continuous test function, t-value works better in selecting good points.
Thus, it is expected that digital nets that have small t-value and small WAFOM
would work well for smooth functions and robust to non-smooth functions. Harase
[10] noticed that Owen linear scrambling [7, Sect. 13] [26] preserves t-value, but
156 M. Matsumoto and R. Ohori

changes WAFOM. Starting from a Niederreiter-Xing sequence with small t, he


applied Owen linear scrambling to find a point set with low WAFOM and small
t-value. He obtained good results for wide range of integrands.

5.3 Dick’s μα , and Non-discretized Case

Let α > 0 be an integer. For A ∈ M S,n (F2 ), the Dick’s α-weight μα (A) is defined as
follows. It is a part of summation appeared in Definition 3 of μ(A): the sum is taken
up to α nonzero entries from the right in each row.
Example 4 Suppose α = 2.

1001 ja 1004 (1 + 0 + 0 + 4)
ij
A = 0111 → 0234 → μα (A) = +(0 + 0 + 3 + 4) = 15.
0010 0030 +(0 + 0 + 3 + 0)

For F2 -linear P ⊂ M S,n (F2 ),



Wα (P) := 2−μα (A) . (5)
A∈P ⊥ −{0}

To be precise, we need to take n → ∞, as follows. We identify I = [0, 1] with the


product W := F2 N via binary fractional expansion (neglecting a measure-zero set).
Let K := F2 ⊕N ⊂ W be the subspace consisting of vectors with finite number of
nonzero components (this is usually identified with N ∪ {0} via binary expansion
and reversing the digits). We define inner product W × K → F2 as usual. Then,
for a finite subgroup P ⊂ W s , its perpendicular space P ⊥ ⊂ K s is defined and is
countable. For A ∈ K s , μα (A) is analogously defined, and the right hand side of (5)
is absolutely converging. Dick [3] proved

Error( f ; P) ≤ C(s, α)|| f ||α Wα (P),

and constructed a sequence of P with Wα (P) = O(N −α (log N ) Sα ) called higher


order digital nets. (See [7] for a comprehensive explanation.) Existence results and
search algorithms for higher order polynomial lattice rules are studied in [1, 5].
WAFOM is an n-digit discretized version of Wα where α = n. WAFOM loses
freedom to choose α, but it might be a merit since we do not need to choose α.
Remark 4 In Dick’s theory, α is fixed. In fact, setting α = log N does not yield
useful bound, since C(s, log N )Wlog N (P) → ∞ (N → ∞).
The above experiments show that, to have a small QMC-error by low WAFOM
point sets, the integrand should have high order partial derivatives with small norms
(see a preceding research [11], too). However, WAFOM seems to work with some
non-differentiable functions (such as Continuous in the previous section).
Walsh Figure of Merit for Digital Nets: An Easy Measure … 157

5.4 t-Value Again

Niederreiter-Pirsic [20] showed that for a digital net P, the strict t-value of P as a
(t, m, s)-net is expressed as

m−t +1= min μ1 (A). (6)


A∈P ⊥ −{0}

Here μ1 is Dick’s α-weight for α = 1, which is known as the Niederreiter-


Rosenbloom-Tsfasman weight.
There is a strong resemblance between (6) and Definition 4. Again in (6), high
complexity of all elements in P ⊥ − {0} gives strong uniformity (i.e., small t-value).
The right hand side of (6) is efficiently computable by a MacWilliams-type identity
in O(s N log N ) steps of integer operation [6].
Remark 5 The formula (6) for t-value uses the minimum over P, while Definition 4
of WAFOM and (5) use the summation over P. Can we connect t-value in (6) with
WAFOM in Definition 4? It may perhaps relate with ultra-discretization [14].

6 Randomization by Digital Shift

Let P ⊂ Ms,n (F2 ) be a linear subspace. Choose σ ∈ Ms,n (F2 ). The point set P +
σ := {B + σ |B ∈ P} is called the digital shift of P by σ . Since P + σ is not an
F2 -linear subspace, one can not define WF(P + σ ). Nevertheless, the same error
bound holds as P. Under a uniform random choice of σ , P + σ becomes unbiased.
Moreover, the mean square error is bounded as follows:
Theorem 4 (Goda-Ohori-Suzuki-Yoshiki [9])

Error( f n ; P + σ ) ≤ C(s, n)|| f ||n WF(P), and



E(Error( f n ; P + σ )2 ) ≤ C(s, n)|| f ||n WFr.m.s. (P),
 
where WFr.m.s. (P) := 2−2μ(A) .
A∈P ⊥ −{0}

7 Variants of WAFOM

As mentioned in the previous section, [9] defined WFr.m.s. (P). As another direction,
the following generalization of WAFOM is proposed by Yoshiki [30] and Ohori [24]:
158 M. Matsumoto and R. Ohori

in Definition 3, the function μ(A) might be generalized by:



μδ (A) := ( j + δ)ai j
1≤i≤s,1≤ j≤n

for any (even negative) real number δ (note that this definition is different from that of
μα , but we could not find a better notation). Then Definition 4 gives WFδ (P). The case
where δ = 1 is dealt in [30]. A weak point of the original WAFOM is that WAFOM
value does not vary enough and consequently it is not useful in grading point sets
for a large s, see Fig. 2, the s = 8 case. By choosing a suitable δ, we obtain WFδ (P)
that varies for large s (even for s = 16) and useful in choosing a good point set [24].
A table of bases of such point sets is available from Ohori’s GitHub Pages: http://
majiang.github.io/qmc/index.html. These point sets are obtained by Ohori, using
Harase’s method based on linear scrambling, from NX sequences. Thus, they have
small t-values and small WAFOM values. Experiments show their good performance
[18].

8 Conclusion

Walsh figure of merit (WAFOM) [16] for F2 -linear point sets as a quality measure for
a QMC rule is discussed. Since WAFOM satisfies a Koksma–Hlawka type inequality
(4), its effectiveness for very smooth functions is assured. Through the experiments
on QMC integration, we observed that the low WAFOM point sets show higher order
convergence such as O(N −1.2 ) for several test functions (including non-smooth one)
in dimension four, and O(N −1.05 ) for dimension eight.

Acknowledgments The authors are deeply indebted to Josef Dick, who patiently and generously
informed us of beautiful researches in this area, and to Harald Niederreiter for leading us to this
research. They thank for the indispensable helps by the members of Komaba-Applied-Algebra Semi-
nar (KAPALS): Takashi Goda, Shin Harase, Shinsuke Mori, Syoiti Ninomiya, Mutsuo Saito, Kosuke
Suzuki, and Takehito Yoshiki. We are thankful to the referees, who informed of numerous improve-
ments on the manuscript. The first author is partially supported by JST CREST, JSPS/MEXT Grant-
in-Aid for Scientific Research No.21654017, No.23244002, No.24654019, and No.15K13460. The
second author is partially supported by the Program for Leading Graduate Schools, MEXT, Japan.

References

1. Bardeaux, J., Dick, J., Leobacher, G., Nuyens, D., Pillichshammer, F.: Efficient calculation of
the worst-case error and (fast) component-by-component construction of higher order polyno-
mial lattice rules. Numer. Algorithm. 59, 403–431 (2012)
2. Dick, J.: Walsh spaces containing smooth functions and quasi-monte carlo rules of arbitrary
high order. SIAM J. Numer. Anal. 46, 1519–1553 (2008)
Walsh Figure of Merit for Digital Nets: An Easy Measure … 159

3. Dick, J.: The decay of the walsh coefficients of smooth functions. Bull. Austral. Math. Soc.
80, 430–453 (2009)
4. Dick, J.: On quasi-Monte Carlo rules achieving higher order convergence. In: Monte Carlo and
Quasi-Monte Carlo Methods 2008, pp. 73–96. Springer, Berlin (2009)
5. Dick, J., Kritzer, P., Pillichshammer, F., Schmid, W.: On the existence of higher order polyno-
mial lattices based on a generalized figure of merit. J. Complex 23, 581–593 (2007)
6. Dick, J., Matsumoto, M.: On the fast computation of the weight enumerator polynomial and
the t value of digital nets over finite abelian groups. SIAM J. Discret. Math. 27, 1335–1359
(2013)
7. Dick, J., Pillichshammer, F.: Digital Nets and Sequences. Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
8. Genz, A.: A package for testing multiple integration subroutines. In: Numerical Integration:
Recent Developments, Software and Applications, pp. 337–340. Springer, Berlin (1987)
9. Goda, T., Ohori, R., Suzuki, K., Yoshiki, T.: The mean square quasi-Monte Carlo error for
digitally shifted digital nets. In: Cools, R., Nuyens, D. (eds.) Monte Carlo and Quasi-Monte
Carlo Methods 2014, vol. 163, pp. 331–350. Springer, Heidelberg (2016)
10. Harase, S.: Quasi-Monte Carlo point sets with small t-values and WAFOM. Appl. Math. Com-
put. 254, 318–326 (2015)
11. Harase, S., Ohori, R.: A search for extensible low-WAFOM point sets. arXiv:1309.7828
12. Hellekalek, P.: On the assessment of random and quasi-random point sets. In: Random and
Quasi-Random Point Sets, pp. 49–108. Springer, Berlin (1998)
13. Joe, S., Kuo, F.: Constructing Sobol sequences with better two-dimensional projections. SIAM
J. Sci. Comput. 30, 2635–2654 (2008). http://web.maths.unsw.edu.au/~fkuo/sobol/new-joe-
kuo-6.21201
14. Kakei, S.: Development in Discrete Integrable Systems - Ultra-discretization, Quantization.
RIMS, Kyoto (2001)
15. Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform
pseudorandom number generator. ACM Trans. Model.Comput. Simul. 8(1), 3–30 (1998). http://
www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
16. Matsumoto, M., Saito, M., Matoba, K.: A computable figure of merit for quasi-Monte Carlo
point sets. Math. Comput. 83, 1233–1250 (2014)
17. Matsumoto, M., Yoshiki, T.: Existence of higher order convergent quasi-Monte Carlo rules via
Walsh figure of merit. In: Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 569–579.
Springer, Berlin (2013)
18. Mori, S.: A fast QMC computation by low-WAFOM point sets. In preparation
19. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF,
Philadelphia (1992)
20. Niederreiter, H., Pirsic, G.: Duality for digital nets and its applications. Acta Arith. 97, 173–182
(2001)
21. Niederreiter, H., Xing, C.P.: Low-discrepancy sequences and global function fields with many
rational places. Finite Fieldsr Appl. 2, 241–273 (1996)
22. Novak, E., Ritter, K.: High-dimensional integration of smooth functions over cubes. Numer.
Math. 75, 79–97 (1996)
23. Nuyens, D.: The magic point shop of qmc point generators and generating vectors. http://
people.cs.kuleuven.be/~dirk.nuyens/qmc-generators/. Home page
24. Ohori, R.: Efficient quasi-monte carlo integration by adjusting the derivation-sensitivity para-
meter of walsh figure of merit. Master’s Thesis (2015)
25. Ohori, R., Yoshiki, T.: Walsh figure of merit is efficiently approximable. In preparation
26. Owen, A.B.: Randomly permuted (t, m, s)-nets and (t, s)-sequences. In: Monte Carlo and
Quasi-Monte Carlo Methods 1994, pp. 299–317. Springer, Berlin (1995)
27. Pirsic, G.: A software implementation of niederreiter-xing sequences. In: Monte Carlo and
quasi-Monte Carlo methods, 2000 (Hong Kong), pp. 434–445 (2002)
28. Suzuki, K.: An explicit construction of point sets with large minimum Dick weight. J. Complex.
30, 347–354 (2014)
160 M. Matsumoto and R. Ohori

29. Suzuki, K.: WAFOM on abelian groups for quasi-Monte Carlo point sets. Hiroshima Math. J.
45, 341–364 (2015)
30. Yoshiki, T.: Bounds on walsh coefficients by dyadic difference and a new Koksma-Hlawka
type inequality for quasi-Monte Carlo integration. arXiv:1504.03175
31. Yoshiki, T.: A lower bound on WAFOM. Hiroshima Math. J. 44, 261–266 (2014)
Some Results on the Complexity
of Numerical Integration

Erich Novak

Abstract We present some results on the complexity of numerical integration. We


start with the seminal paper of Bakhvalov (1959) and end with new results on the
curse of dimensionality and on the complexity of oscillatory integrals. This survey
paper consists of four parts:
1. Classical results till 1971
2. Randomized algorithms
3. Tensor product problems, tractability and weighted norms
4. Some recent results: C k functions and oscillatory integrals

Keywords Complexity of integration · Randomized algorithms · Tractability ·


Curse of dimensionality

1 Classical Results Till 1971

I start with a warning: We do not discuss the complexity of path integration


and infinite-dimensional integration on RN or other domains although there are excit-
ing new results in that area, see [7, 15, 21, 22, 29, 41, 43, 44, 56, 70, 76, 88, 95,
120, 122]. For parametric integrals see [16, 17], for quantum computers, see [48,
49, 79, 114].
We mainly study the problem of numerical integration, i.e., of approximating the
integral 
Sd ( f ) = f (x) dx (1)
Dd

over an open subset Dd ⊂ Rd of Lebesgue measure λd (Dd ) = 1 for integrable func-


tions f : Dd → R. The main interest is on the behavior of the minimal number of
function values that are needed in the worst case setting to achieve an error at most

E. Novak (B)
Mathematisches Institut, University Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany
e-mail: erich.novak@uni-jena.de
© Springer International Publishing Switzerland 2016 161
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_6
162 E. Novak

ε > 0. Note that classical examples of domains Dd are the unit cube [0, 1]d and the
normalized Euclidean ball (with volume 1), which are closed. However, we work
with their interiors for definiteness of certain derivatives.
We state our problem. Let Fd be a class of integrable functions f : Dd → R.
For f ∈ Fd , we approximate the integral Sd ( f ), see (1), by algorithms of the form

An ( f ) = φn ( f (x1 ), f (x2 ), . . . , f (xn )),

where x j ∈ Dd can be chosen adaptively and φn : Rn → R is an arbitrary mapping.


Adaption means that the selection of x j may depend on the already computed val-
ues f (x1 ), f (x2 ), . . . , f (x j−1 ). We define N : Fd → Rn by N ( f ) = ( f (x1 ), . . . ,
f (xn )). The (worst case) error of the algorithm An is defined by

e(An ) = sup |Sd ( f ) − An ( f )|,


f ∈Fd

the optimal error bounds are given by

e(n, Fd ) = inf e(An ).


An

The information complexity n(ε, Fd ) is the minimal number of function values which
is needed to guarantee that the error is at most ε, i.e.,

n(ε, Fd ) = min{n | ∃ An such that e(An ) ≤ ε}.

We minimize n over all choices of adaptive sample points x j and mappings φn .


In this paper we give an overview on some of the basic results that are known about
the numbers e(n, Fd ) and n(ε, Fd ). Hence we concentrate on complexity issues and
leave aside other important questions such as implementation issues.
It was proved by Smolyak and Bakhvalov that as long as the class Fd is convex and
balanced we may restrict the minimization of e(An ) by considering only nonadaptive
choices of x j and linear mappings φn , i.e., it is enough to consider An of the form


n
An ( f ) = ai f (xi ). (2)
i=1

Theorem 0 (Bakhvalov [6]) Assume that the class Fd is convex and balanced. Then

e(n, Fd ) = inf sup Sd ( f ) (3)


x1 ,...,xn f ∈Fd
N ( f )=0

and for the infimum in the definition of e(n, Fd ) it is enough to consider linear and
nonadaptive algorithms An of the form (2).
Some Results on the Complexity of Numerical Integration 163

In this paper we only consider convex and balanced Fd and then we can use the
last formula for e(n, Fd ).
Remark 0 (a) For a proof of Theorem 0 see, for example, [87, Theorem 4.7]. This
result is not really about complexity (hence it got its number), but it helps to prove
complexity results.
(b) A linear algorithm An is called a quasi Monte Carlo (QMC) algorithm if ai =
1/n for all i and is called a positive quadrature formula if ai > 0 for all i. In general
it may happen that optimal quadrature formulas have some negative weights and, in
addition, we cannot say much about the position of good points xi .
(c) More on the optimality of linear algorithms and on the power of adaption can
be found in [14, 77, 87, 112, 113]. There are important classes of functions that are
not balanced and convex, and where Theorem 0 can not be applied, see also [13, 94].
The optimal order of convergence plays an important role in numerical analysis.
We start with a classical result of Bakhvalov (1959) for the class

Fdk = { f : [0, 1]d → R | D α f ∞ ≤ 1, |α| ≤ k},


d
where k ∈ N and |α| = i=1 αi for α ∈ Nd0 and D α f denotes the respective partial
derivative. For two sequences an and bn of positive numbers we write an bn if
there are positive numbers c and C such that c < an /bn < C for all n ∈ N.
Theorem 1 (Bakhvalov [5])

e(n, Fdk ) n −k/d . (4)

Remark 1 (a) For such a complexity result one needs to prove an upper bound (for a
particular algorithm) and a lower bound (for all algorithms). For the upper bound one
can use tensor product methods based on a regular grid, i.e., one can use the n = m d
points xi with coordinates from the set {1/(2m), 3/(2m), . . . , (2m − 1)/(2m)}.
The lower bound can be proved with the technique of “bump functions”: One can
construct 2nfunctions f 1 , . . . , f 2n with disjoint supports such that all 22n functions
of the form i=1 2n
δi f i are contained in Fdk , where δi = ±1 and Sd ( f i ) ≥ cd,k n −k/d−1 .
Since an2nalgorithm An can only compute  n function values, there are two functions
f + = i=1 f i and f − = f + − 2 nk=1 f ik such that f + , f − ∈ Fdk and An ( f + ) =
An ( f − ) but |Sd ( f + ) − Sd ( f − )| ≥ 2ncd,k n −k/d−1 . Hence the error of An must be at
least cd,k n −k/d . For the details see, for example, [78].
(b) Observe that we can not conclude much on n(ε, Fdk ) if ε is fixed and d is large,
since Theorem 1 contains hidden factors that depend on k and d. Actually the lower
bound is of the form
e(n, Fdk ) ≥ cd,k n −k/d ,

where the cd,k decrease with d → ∞ and tend to zero.


(c) The proof of the upper bound (using tensor product algorithms) is easy since
we assumed that the domain is Dd = [0, 1]d . The optimal order of convergence is
164 E. Novak

known for much more general spaces (such as Besov and Triebel–Lizorkin spaces)
and arbitrary bounded Lipschitz domains, see [85, 115, 118]. Then the proof of the
upper bounds is more difficult, however.
(d) Integration on fractals was recently studied by Dereich and Müller-Gron-
bach [18]. These authors also obtain an optimal order of convergence n −k/α . The
definition of Sd must be modified and α coincides, under suitable conditions, with
the Hausdorff dimension of the fractal.

By the curse of dimensionality we mean that n(ε, Fd ) is exponentially large in d.


That is, there are positive numbers c, ε0 and γ such that

n(ε, Fd ) ≥ c (1 + γ )d for all ε ≤ ε0 and infinitely many d ∈ N. (5)

If, on the other hand, n(ε, Fd ) is bounded by a polynomial in d and ε−1 then we say
that the problem is polynomially tractable. If n(ε, Fd ) is bounded by a polynomial in
ε−1 alone, i.e., n(ε, Fd ) ≤ Cε−α for ε < 1, then we say that the problem is strongly
polynomially tractable.
From the proof of Theorem 1 we can not conclude whether the curse of dimen-
sionality holds for the classes Fdk or not; see Theorem 11. Possibly Maung Zho Newn
and Sharygin [124] were the first who published (in 1971) a complexity result for
arbitrary d with explicit constants and so proved the curse of dimensionality for
Lipschitz functions.

Theorem 2 (Maung Zho Newn and Sharygin [124]) Consider the class

Fd = { f : [0, 1]d → R | | f (x) − f (y)| ≤ max |xi − yi |}.


i

Then
d
e(n, Fd ) = · n −1/d
2d + 2

for n = m d with m ∈ N.

Remark 2 One can show that for n = m d the regular grid (points xi with coordi-
nates from theset {1/(2m), 3/(2m), . . . , (2m − 1)/(2m)}) and the midpoint rule
An ( f ) = n −1 i=1
n
f (xi ) are optimal. See also [3, 4, 12, 107] for this result and for
generalizations to similar function spaces.

2 Randomized Algorithms

The integration problem is difficult for all deterministic algorithms if the classes Fd
of inputs are too large, see Theorem 2. One may hope that randomized algorithms
make this problem much easier.
Some Results on the Complexity of Numerical Integration 165

Randomized algorithms can be formalized in various ways leading to slightly


different models. We do not explain the technical details and only give a reason why
it makes sense to study different models for upper and lower bounds, respectively;
see [87] for more details.

• Assume that we want to construct and to analyze concrete algorithms that yield
upper bounds for the (total) complexity of given problems including the arithmetic
cost and the cost of generating random numbers. Then it is reasonable to consider
a rather restrictive model of computation where, for example, only the standard
arithmetic operations are allowed. One may also restrict the use of random numbers
and study so-called restricted Monte Carlo methods, where only random bits are
allowed; see [52].
• For the proof of lower bounds we take the opposite view and allow general ran-
domized mappings and a very general kind of randomness. This makes the lower
bounds stronger.

It turns out that the results are often very robust with respect to changes of the com-
putational model. For the purpose of this paper, it might be enough that a randomized
algorithm A is a random variable (Aω )ω∈Ω with a random element ω where, for each
fixed ω, the Algorithm Aω is a (deterministic) algorithm as before. We denote by μ
the distribution of the ω. In addition one needs rather weak measurability assump-
tions, see also the textbook [73]. Let n̄( f, ω) be the number of function values used
for fixed ω and f .
The number 
ñ(A) = sup n̄( f, ω) dμ(ω)
f ∈F Ω

is called the cardinality of the randomized algorithm A and


 ∗ 1/2
eran (A) = sup S( f ) − φω (Nω ( f ))2 dμ(ω)
f ∈F Ω

∗
is the error of A. By we denote the upper integral. For n ∈ N, define

eran (n, Fd ) = inf{eran (A) : ñ(A) ≤ n}.

If A : F → G is a (measurable) deterministic algorithm then A can also be treated


as a randomized algorithm with respect to a Dirac (atomic) measure μ. In this sense
we can say that deterministic algorithms are special randomized algorithms. Hence
the inequality
eran (n, Fd ) ≤ e(n, Fd ) (6)

is trivial.
The number eran (0, Fd ) is called the initial error in the randomized setting. For
n = 0, we do not sample f , and Aω ( f ) is independent of f , but may depend on ω.
166 E. Novak

It is easy to check that for a linear S and a balanced and convex set F, the best we
can do is to take Aω = 0 and then

eran (0, Fd ) = e(0, Fd ).

This means that for linear problems the initial errors are the same in the worst case
and randomized setting.
The main advantage of randomized algorithms is that the curse of dimensionality
is not present even for certain large classes of functions. With the standard Monte
Carlo method we obtain
1
eran (n, Fd ) ≤ √ ,
n

when Fd is the unit ball of L p ([0, 1]d ) and 2 ≤ p ≤ ∞. Mathé [72] proved that this
is almost optimal and the optimal algorithm is

1 n
Aωn ( f ) = √ f (X i )
n + n i=1

with i.i.d. random variables X i that are uniformly distributed on [0, 1]d . It also follows
that
1
eran (n, Fd ) = √ ,
1+ n

when Fd is the unit ball of L p ([0, 1]d ) and 2 ≤ p ≤ ∞. In the case 1 ≤ p < 2 one
can only achieve the rate n −1+1/ p , for a discussion see [50].
Bakhvalov [5] found the optimal order of convergence already in 1959 for the
class
Fdk = { f : [0, 1]d → R | D α f ∞ ≤ 1, |α| ≤ k},
d
where k ∈ N and |α| = i=1 αi for α ∈ Nd0 .

Theorem 3 (Bakhvalov [5])

eran (n, Fdk ) n −k/d−1/2 . (7)

Remark 3 A proof of the upper bound can be given with a technique that is often
called separation of the main part or also control variates. For n = 2m use m func-
tion values to construct a good L 2 approximation f m of f ∈ Fdk by a deterministic
algorithm. The optimal order of convergence is

 f − f m 2 m −k/d .
Some Results on the Complexity of Numerical Integration 167

Then use the unbiased estimator

1 
m
Aωn ( f ) = Sd ( f m ) + ( f − f m )(X i )
m i=1

with i.i.d. random variables X i that are uniformly distributed on [0, 1]d . See, for
example, [73, 78] for more details. We add in passing that the optimal order of con-
vergence can be obtained for many function spaces (Besov spaces, Triebel–Lizorkin
spaces) and for arbitrary bounded Lipschitz domains Dd ⊂ Rd ; see [85], where the
approximation problem is studied. To obtain an explicit randomized algorithm with
the optimal rate of convergence one needs a random number generator for the set
Dd . If it is not possible to obtain efficiently random samples from the uniform dis-
tribution on Dd one can work with Markov chain Monte Carlo (MCMC) methods,
see Theorem 5.
All known proofs of lower bounds use the idea of Bakhvalov (also called Yao’s
Minimax Principle): study the average case setting with respect to a probability
measure on F and use the theorem of Fubini. For details see [45–47, 73, 78, 88].

We describe a problem that was studied by several colleagues and solved by


Hinrichs [58] using deep results from functional analysis. Let H (K d ) be a reproduc-
ing kernel Hilbert space of real functions defined on a Borel measurable set Dd ⊆ Rd .
Its reproducing kernel K d : Dd × Dd → R is assumed to be integrable,
  1/2
Cdinit := K d (x, y) ρd (x) ρd (y) dx dy < ∞.
Dd Dd

Here, ρd is a probability density function on Dd . Without loss of generality we


assume that Dd and ρd are chosen such that there is no subset of Dd with positive
measure such that all functions from H (K d ) vanish on it.
The inner product and the norm of H (K d ) are denoted by ·, · H (K d ) and  ·  H (K d ) .
Consider multivariate integration

Sd ( f ) = f (x) ρd (x) dx for all f ∈ H (K d ),
Dd

where it is assumed that Sd : H (K d ) → R is continuous.


We approximate Sd ( f ) in the randomized setting using importance sampling.
That is, for a positive probability density function τd on Dd we choose n random
sample points x1 , x2 , . . . , xn which are independent and distributed according to τd
and take the algorithm

1  f (x j ) ρd (x j )
n
An,d,τd ( f ) = .
n j=1 τd (x j )
168 E. Novak

The error of An,d,τd is then



2 1/2
eran (An,d,τd ) = sup Eτd Sd ( f ) − An,d,τd ( f ) ,
 f  H (K d ) ≤1

where the expectation is with respect to the random choice of the sample points x j .
For n = 0 we formally take A0,d,τd = 0 and then

eran (0, H (K d )) = Cdinit .

Theorem 4 (Hinrichs [58]) Assume additionally that K d (x, y) ≥ 0 for all


x, y ∈ Dd . Then there exists a positive density function τd such that
 π 1/2 1
eran (An,d,τd ) ≤ √ eran (0, H (K d )).
2 n

Hence, if we want to achieve eran (An,d,τd ) ≤ ε eran (0, H (K d )) then it is enough to


take  
π 1 2
n= .
2 ε

Remark 4 In particular, such problems are strongly polynomially tractable (for the
normalized error) if the reproducing kernels are pointwise nonnegative and inte-
grable. In [89] we prove that the exponent 2 of ε−1 is sharp for tensor product Hilbert
spaces whose univariate reproducing kernel is decomposable and univariate inte-
gration is not trivial for the two parts of the decomposition. More specifically we
have
 
1 1 2 2 ln ε−1 − ln 2
n (ε, H (K d )) ≥
ran
for all ε ∈ (0, 1) and d ≥ ,
8 ε ln α −1

where α ∈ [1/2, 1) depends on the particular space.


We stress that these estimates hold independently of the smoothness of functions
in a Hilbert space. Hence, even for spaces of very smooth functions the exponent of
strong polynomial tractability is 2.

Sometimes one cannot sample easily from the “target distribution” π if one wants
to compute an integral 
S( f ) = f (x) π(dx).
D

Then Markov chain Monte Carlo (MCMC) methods are a very versatile and widely
used tool.
We use an average of a finite Markov chain sample as approximation of the mean,
i.e., we approximate S( f ) by
Some Results on the Complexity of Numerical Integration 169

1
n
Sn,n 0 ( f ) = f (X j+n 0 ),
n j=1

where (X i )n∈N0 is a Markov chain with stationary distribution π . The number n


determines the number of function evaluations of f . The number n 0 is the burn-in or
warm up time. Intuitively, it is the number of steps of the Markov chain to get close
to the stationary distribution π .
We study the mean square error of Sn,n 0 , given by

1/2
eν (Sn,n 0 , f ) = Eν,K |Sn,n 0 ( f ) − S( f )| ,

where ν and K indicate the initial distribution and the transition kernel of the chain;
we work with the spaces L p = L p (π ). For the proof of the following error bound
we refer to [98, Theorem 3.34 and Theorem 3.41].

Theorem 5 (Rudolf [98]) Let (X n )n∈N be a Markov chain with reversible transition
kernel K , initial distribution ν, and transition operator P. Further, let

Λ = sup{α : α ∈ spec(P − S)},

where spec(P − S) denotes the spectrum of the operator (P − S) : L 2 → L 2 , and


assume that Λ < 1. Then
2 2 Cν γ n0
sup eν (Sn,n 0 , f )2 ≤ + 2 (8)
 f  p ≤1 n(1 − Λ) n (1 − γ )2

holds for p = 2 and for p = 4 under the following conditions:


• for p = 2, dπ

∈ L ∞ and a transition kernel K which is L 1 -exponentially conver-
gent with (γ , M) where γ < 1, i.e.,

P n − S L 1 →L 1 ≤ Mγ n
 dν 
for all n ∈ N and Cν = M  dπ − 1∞ ;
 dν 
• for p = 4, dπ

∈ L 2 and γ = P − S L 2 →L 2 < 1 where Cν = 64  dπ − 12 .

Remark 5 Let us discuss the results. First observe that we assume that the so called
spectral gap 1 − Λ is positive; in general we only know that |Λ| ≤ 1. If the transition
kernel is L 1 -exponentially convergent, then we have an explicit error bound for inte-
grands f ∈ L 2 whenever the initial distribution has a density dπ dν
∈ L ∞ . However,
in general it is difficult to provide explicit values γ and M such that the transition
kernel is L 1 -exponentially convergent with (γ , M). This motivates to consider tran-
sition kernels which satisfy a weaker convergence property, such as the existence of
an L 2 -spectral gap, i.e., P − S L 2 →L 2 < 1. In this case we have an explicit error
bound for integrands f ∈ L 4 whenever the initial distribution has a density dπ

∈ L 2.
170 E. Novak

Thus, by assuming a weaker convergence property of the transition kernel we obtain


a weaker result in the sense that f must be in L 4 rather than L 2 .
If we want to have an error of ε ∈ (0, 1) it is still not clear how to choose n and n 0
to minimize the total amount of steps n + n 0 . How should we choose the burn-in n 0 ?
One can prove in this setting, see [98], that the choice n ∗ =  log Cν
1−γ
 is a reasonable
and almost optimal choice for the burn-in.
More details can be found in [83]. For a full discussion with all the proofs
see [98].

3 Tensor Product Problems and Weights

We know from the work of Bakhvalov already done in 1959 that the optimal order of
convergence is n −k/d for functions from the class C k ([0, 1]d ). To obtain an order of
convergence of roughly n −k for every dimension d, one needs stronger smoothness
conditions. This is a major reason for the study of functions with bounded mixed
derivatives, or dominating mixed smoothness, such as the classes

W pk,mix ([0, 1]d ) = { f : [0, 1]d → R | D α f  p ≤ 1 for α∞ ≤ k}.

Observe that functions from this class have, in particular, the high order derivative
D (k,k,...,k) f ∈ L p and one may hope that the curse of dimensionality can be avoided
or at least moderated by this assumption. For k = 1 these spaces are closely related
to various notions of discrepancy, see, for example, [23, 29, 71, 88, 111].
The optimal order of convergence is known for all k ∈ N and 1 < p < ∞ due
to the work of Roth [96, 97], Frolov [39, 40], Bykovskii [10], Temlyakov [109]
and Skriganov [101], see the survey Temlyakov [111]. The cases p ∈ {1, ∞} are
still unsolved. The case p = 1 is strongly related to the star discrepancy, see also
Theorem 10.

Theorem 6 Assume that k ∈ N and 1 < p < ∞. Then

e(n, W pk,mix ([0, 1]d )) n −k (log n)(d−1)/2 .

Remark 6 The upper bound was proved by Frolov [39] for p = 2 and by
Skriganov [101] for all p > 1. The lower bound was proved by Roth [96] and
Bykovskii [10] for p = 2 and by Temlyakov [109] for all p < ∞. Hence it took
more than 30 years to prove Theorem 6 completely.
For functions in W pk,mix ([0, 1]d ) with compact support in (0, 1)d one can take
algorithms of the form
 
| det A|  Am
An ( f ) = f ,
ad d
a
m∈Z
Some Results on the Complexity of Numerical Integration 171

where A is a suitable matrix that does not depend on k or n, and a > 0. Of course
the sum is finite since we use only the points Am a
in (0, 1)d .
This algorithm is similar to a lattice rule but is not quite a lattice rule since the
points do not build an integration lattice. The sum of the weights is roughly 1, but
not quite. Therefore this algorithm is not really a quasi-Monte Carlo algorithm. The
algorithm An can be modified to obtain the optimal order of convergence for the
whole space W pk,mix ([0, 1]d ). The modified algorithm uses different points xi but
still positive weights ai . For a tutorial on this algorithm see [116]. Error bounds
for Besov spaces are studied in [35]. Triebel–Lizorkin spaces and the case of small
smoothness are studied in [117] and [74].
For the Besov–Nikolskii classes S rp,q B(T d ) with 1 ≤ p, q ≤ ∞ and 1/ p < r < 2,
the optimal rate is
n −r (log n)(d−1)(1−1/q)

and can be obtained constructively with QMC algorithms, see [63]. The lower bound
was proved by Triebel [115].
The Frolov algorithm can be used as a building block for a randomized algorithm
that is universal in the sense that it has the optimal order of convergence (in the
randomized setting as well as in the worst case setting) for many different function
spaces, see [65].

A famous algorithm for tensor product problems is the Smolyak algorithm, also
called sparse grids algorithm. We can mention just a few papers and books that deal
with this topic: The algorithm was invented by Smolyak [106] and, independently,
by several other colleagues and research groups. Several error bounds were proved
by Temlyakov [108, 110]; explicit error bounds (without unknown constants) were
obtained by Wasilkowski and Woźniakowski [121, 123]. Novak and Ritter [80–82]
studied the particular Clenshaw-Curtis Smolyak algorithm. A survey is Bungartz
and Griebel [9] and another one is [88, Chap. 15]. For recent results on the order of
convergence see Sickel and T. Ullrich [99, 100] and Dinh Dũng and T. Ullrich [36].
The recent paper [62] contains a tractability result for the Smolyak algorithm applied
to very smooth functions. We display only one recent result on the Smolyak algorithm.

Theorem 7 (Sickel and T. Ullrich [100]) For the classes W2k,mix ([0, 1]d ) one can
construct a Smolyak algorithm with the order of the error

n −k (log n)(d−1)(k+1/2) . (9)

Remark 7 (a) The bound (9) is valid even for L 2 approximation instead of integra-
tion, but it is not known whether this upper bound is optimal for the approximation
problem. Using the technique of control variates one can obtain the order

n −k−1/2 (log n)(d−1)(k+1/2)


172 E. Novak

for the integration problem in the randomized setting. This algorithm is not often used
since it is not easy to implement and its arithmetic cost is rather high. In addition,
the rate can be improved by the algorithm of [65] to n −k−1/2 (log n)(d−1)/2 .
(b) It is shown in Dinh Dũng and T. Ullrich [36] that the order (9) can not be
improved when restricting to Smolyak grids.
(c) We give a short description
 of the Clenshaw–Curtis Smolyak algorithm for the
computation of integrals [−1,1]d f (x) dx that often leads to “almost optimal” error
bounds, see [81].
We assume that for d = 1 a sequence of formulas


mi
Ui( f ) = a ij f (x ij )
j=1

is given. In the case of numerical integration the a ij are just numbers. The method
U i uses m i function values and we assume that U i+1 has smaller error than U i and
m i+1 > m i . Define then, for d > 1, the tensor product formulas
m i1 m id
 
(U ⊗ · · · ⊗ U )( f ) =
i1 id
··· a ij11 · · · a ijdd f (x ij11 , . . . , x ijdd ).
j1 =1 jd =1

A tensor product formula clearly needs

m i1 · m i2 · · · · · m id

function values, sampled on a regular grid. The Smolyak formulas A(q, d) are clever
linear combinations of tensor product formulas such that

• only tensor products with a relatively small number of knots are used;
• the linear combination is chosen in such a way that an interpolation property for
d = 1 is preserved for d > 1.

The Smolyak formulas are defined by

  
d −1
A(q, d) = (−1) q−|i|
· · (U i1 ⊗ · · · ⊗ U id ),
q−d+1≤|i|≤q
q − |i|

where q ≥ d. Specifically, we use, for d > 1, the Smolyak construction and start,
for d = 1, with the classical Clenshaw–Curtis formula with

m1 = 1 and m i = 2i−1 + 1 for i > 1.


Some Results on the Complexity of Numerical Integration 173

The Clenshaw–Curtis formulas


mi
U (f) =
i
a ij f (x ij )
j=1

use the knots


π( j − 1)
x ij = − cos , j = 1, . . . , m i
mi − 1

(and x11 = 0). Hence we use nonequidistant knots. The weights a ij are defined in such
a way that U i is exact for all (univariate) polynomials of degree at most m i .
It turns out that many tensor product problems are still intractable and suf-
fer from the curse of dimensionality, for a rather exhaustive presentation see
[87, 88, 90]. Sloan and Woźniakowski [103] describe a very interesting idea that
was further developed in hundreds of papers, the paper [103] is most important and
influential. We can describe here only the very beginnings of a long ongoing story;
we present just one example instead of the whole theory.
The rough idea is that f : [0, 1]d → R may depend on many variables, d is large,
but some variables or groups of variables are more important than others. Consider,
for d = 1, the inner product
 1   1   1
1
 f, g1,γ = f dx g dx + f  (x) g  (x) dx,
0 0 γ 0

where γ > 0. If γ is small then f must be “almost constant” if it has small norm.
A large γ means that f may have a large variation and still the norm is relatively
small. Now we take tensor products of such spaces and weights γ1 ≥ γ2 ≥ . . . and
consider the complexity of the integration problem for the unit ball Fd with respect
to this weighted norm. The kernel K of the tensor product space H (K ) is of the form


d
K (x, y) = K γi (xi , yi ),
i=1

where K γ is the kernel of the respective space Hγ of univariate functions.


∞
Theorem 8 (Sloan and Woźniakowski [103]) Assume that i=1 γi < ∞. Then the
problem is strongly polynomially tractable.
Remark 8 The paper [103] contains also a lower bound which is valid for all quasi-
Monte Carlo methods. The proof of the upper bound is very interesting and an
excellent example for the probabilistic method. Compute the mean of the quadratic
worst case error of QMC algorithms over all (x1 , . . . , xn ) ∈ [0, 1]nd and obtain
  
1
K (x, x) dx − K (x, y) dx dy .
n [0,1]d [0,1]2d
174 E. Novak

−1
This
 expectation is of the form Cd n and the sequence Cd is bounded if and only if
γi < ∞. The lower bound in [103] is based on the fact that the kernel K is always
non-negative; this leads to lower bounds for QMC algorithms or, more generally, for
algorithms with positive weights.
As already indicated, Sloan and Woźniakowski [103] was continued in many
directions. Much more general weights and many different Hilbert spaces were stud-
ied. By the probabilistic method one only obtains the existence of a good QMC
algorithms but, in the meanwhile, there exist many results about the construction
of good algorithms. In this paper the focus is on the basic complexity results and
therefore we simply list a few of the most relevant papers: [8, 11, 26–28, 53–55,
66–69, 92, 93, 102, 104, 105]. See also the books [23, 71, 75, 88] and the excellent
survey paper [29].

In complexity theory we want to study optimal algorithms and it is not clear


whether QMC algorithms or quadrature formulas with positive coefficients ai are
optimal. Observe that the Smolyak algorithm uses also negative ai and it is known
that in certain cases positive quadrature formulas are far from optimal; for examples
see [84] or [88, Sects. 10.6 and 11.3]. Therefore it is not clear whether the conditions
on the weights in Theorem 8 can be relaxed if we allow arbitrary algorithms. The
next result shows that this is not the case.

Theorem 9 ([86]) The integration problem from Theorem 8 is strongly polynomially



tractable if and only if i=1 γi < ∞.

Remark 9 Due to the known upper bound of Theorem 8, to prove Theorem 9 it is


enough to prove a lower bound for arbitrary algorithms. This is done via the technique
of decomposable kernels that was developed in [86], see also [88, Chap. 11].
We do not describe this technique here and only remark that we need for this
technique many non-zero functions f i in the Hilbert space Fd with disjoint supports.
Therefore this technique usually works for functions with finite smoothness, but not
for analytic functions.

Tractability of integration can be proved for many weighted spaces and one
may ask whether there are also unweighted spaces where tractability holds as well.
A famous example for this are integration problems that are related to the star dis-
crepancy.
For x1 , . . . , xn ∈ [0, 1]d define the star discrepancy by
 
 1
n 
∗  
D∞ (x1 , . . . , xn ) = sup t1 · · · td − 1[0,t) (xi ) ,
t∈[0,1]d  n i=1 

n
the respective QMC quadrature formula is Q n ( f ) = 1
n i=1 f (xi ).
Consider the Sobolev space

Fd = { f ∈ W11,mix |  f  ≤ 1, f (x) = 0 if there exists an i with xi = 1}


Some Results on the Complexity of Numerical Integration 175

with the norm  


 ∂d f 
 f  := 

 .
∂ x1 ∂ x2 . . . ∂ xd 1

Then the Hlawka–Zaremba-equality yields



D∞ (x1 , . . . , xn ) = sup |Sd ( f ) − Q n ( f )|,
f ∈Fd

hence the star discrepancy is a worst case error bound for integration. We define

n(ε, Fd ) = min{n | ∃ x1 , . . . , xn with D∞ (x1 , . . . , xn ) ≤ ε}.

The following result shows that this integration problem is polynomially tractable
and the complexity is linear in the dimension.

Theorem 10 ([51])
n(ε, Fd ) ≤ C d ε−2 (10)

and
n(1/64, Fd ) ≥ 0.18 d.

Remark 10 This result was modified and improved in various ways and we mention
some important results. Hinrichs [57] proved the lower bound

n(ε, Fd ) ≥ c d ε−1 for ε ≤ ε0 .

Aistleitner [1] proved that the constant C in (10) can be taken as 100. Aistleitner
and Hofer [2] proved more on upper bounds.
 Already the proof in [51] showed that

an upper bound D∞ (x1 , . . . , xn ) ≤ C dn holds with high probability if the points
x1 , . . . , xn are taken independently and uniformly distributed. Doerr [30] proved the
respective lower bound, hence

∗ d
E(D∞ (x1 , . . . , xn )) for n ≥ d.
n

Since the upper bounds are proved with the probabilistic method, we only know
the existence of points with small star discrepancy. The existence results can be
transformed into (more or less explicit) constructions and the problem is, of course,
to minimize the computing time as well as the discrepancy. One of the obstacles is
that already the computation of the star discrepancy of given points x1 , x2 , . . . , xn is
very difficult. We refer the reader to [19, 24, 25, 31–34, 42, 59].
Recently Dick [20] proved a tractability result for another unweighted space that
is defined via an L 1 -norm and consists of periodic functions; we denote Fourier
coefficients by f˜(k), where k ∈ Zd . Let 0 < α ≤ 1 and 1 ≤ p ≤ ∞ and
176 E. Novak
 
 | f (x + h) − f (x)|
Fα, p,d = f : [0, 1] → R |
d
| f˜(k)| + sup ≤1 .
d x,h hαp
k∈Z

Dick proved the upper bound


 
d − 1 d α/ p
e(n, Fα, p,d ) ≤ max √ , α
n n

for any prime number n. Hence the complexity is at most quadratic in d.


 The proof
 2  is constructive,
 d  a suitable algorithm is the following. Use points xk =
k1
n
, n , . . . , n , where k = 0, 1, . . . , n − 1, and take the respective QMC
k k

algorithm.

4 Some Recent Results

We end this survey with two results that were still unpublished at the time of the
conference, April 2014. First we return to the classes C k ([0, 1]d ), see Theorem 1.
We want to be a little more general and consider the computation of

Sd ( f ) = f (x) dx (11)
Dd

up to some error ε > 0, where Dd ⊂ Rd has Lebesgue measure 1. The results hold for
arbitrary sets Dd , the standard example of course is Dd = [0, 1]d . For convenience
we consider functions f : Rd → R. This makes the function class a bit smaller and
the result a bit stronger, since our emphasis is on lower bounds.
It has not been known if the curse of dimensionality is present for probably
the most natural class which is the unit ball of r times continuously differentiable
functions,

Fdk = { f ∈ C k (Rd ) | D α f ∞ ≤ 1 for all |α| ≤ k},

where k ∈ N.

Theorem 11 ([60]) The curse of dimensionality holds for the classes Fdk with the
super-exponential lower bound

n(ε, Fdk ) ≥ ck (1 − ε) d d/(2k+3) for all d ∈ N and ε ∈ (0, 1),

where ck > 0 depends only on k.


Some Results on the Complexity of Numerical Integration 177

Remark 11 In [60, 61] we also prove that the curse of dimensionality holds for
even smaller classes of functions Fd for which
√ the norms of arbitrary directional
derivatives are bounded proportionally to 1/ d.
We start with the fooling function
 
1
f 0 (x) = min 1, √ dist(x, Pδ ) for all x ∈ Rd ,
δ d

where

n
Pδ = Bδd (xi )
i=1


and Bδd (xi ) is the ball with center xi and radius δ d. The function f 0 is Lipschitz.
By a suitable smoothing via convolution we construct a smooth fooling function
f k ∈ Fd with f k |P0 = 0.
Important elements of the proof are volume estimates (in the spirit of Elekes [38]
and Dyer, Füredi and McDiarmid [37]), since we need that the volume of a neigh-
borhood of the convex hull of n arbitrary points is exponentially small in d.
Also classes of C ∞ -functions were studied recently. We still do not know whether
the integration problem suffers from the curse of dimensionality for the classes

Fd = { f : [0, 1]d → R | D α f ∞ ≤ 1 for all α ∈ Nd0 },

this is Open Problem 2 from [87]. We know from Vybíral [119] and [61] that the
curse is present for somewhat larger spaces and that a weak tractability holds for
smaller classes; this can be proved with the Smolyak algorithm, see [62].

We now consider univariate oscillatory integrals for the standard Sobolev spaces
H s of periodic and non-periodic functions with an arbitrary integer s ≥ 1. We study
the approximate computation of Fourier coefficients
 1 √
Ik ( f ) = f (x) e−2π i kx dx, i= −1,
0

where k ∈ Z and f ∈ H s .
There are several recent papers about the approximate computation of highly
oscillatory univariate integrals with the weight exp(2π i kx), where x ∈ [0, 1] and
k is an integer (or k ∈ R) which is assumed to be large in the absolute sense, see
Huybrechs and Olver [64] for a survey.
We study the Sobolev space H s for a finite s ∈ N, i.e.,

H s = { f : [0, 1] → C | f (s−1) is abs. cont., f (s) ∈ L 2 } (12)


178 E. Novak

with the inner product

s−1 
 1  1  1
()
 f, gs = f (x) dx ()
g (x) dx + f (s) (x) g (s) (x) dx
=0 0 0 0
(13)

s−1
() () (s) (s)
= f , 10 g , 10 +  f , g 0 ,
=0

1 1/2
where  f, g0 = 0 f (x) g(x) dx, and norm  f  H s =  f, f s .
For the periodic case, an algorithm that uses n function values at equally spaced
points is nearly optimal, and its worst case error is bounded by Cs (n + |k|)−s with
Cs exponentially small in s. For the non-periodic case, we first compute successive
derivatives up to order s − 1 at the end-points x = 0 and x = 1. These derivatives
values are used to periodize the function and this allows us to obtain similar error
bounds like for the periodic case. Asymptotically in n, the worst case error of the
algorithm is of order n −s independently of k for both periodic and non-periodic cases.
Theorem 12 ([91]) Consider the integration problem Ik defined over the space H s
of non-periodic functions with s ∈ N. Then
 s
cs 3 2
≤ e(n, k, H s ) ≤ ,
(n + |k|)s 2π (n + |k| − 2s + 1)s

for all k ∈ Z and n ≥ 2s.


Remark 12 The minimal errors e(n, k, H s ) for the non-periodic case have a peculiar
property for s ≥ 2 and large k. Namely, for n = 0 we obtain the initial error which is
of order |k|−1 , whereas for n ≥ 2s it becomes of order |k|−s . Hence, the dependence
on |k|−1 is short-lived and disappears quite quickly. For instance, take s = 2. Then
e(n, k, H s ) is of order |k|−1 only for n = 0 and maybe for n = 1, 2, 3, and then
becomes of order |k|−2 .

Acknowledgments I thank the following colleagues and friends for valuable remarks: Michael
Gnewuch, Aicke Hinrichs, Robert Kunsch, Thomas Müller-Gronbach, Daniel Rudolf, Tino Ullrich,
and Henryk Woźniakowski. I also thank two referees for carefully reading my manuscript.

References

1. Aistleitner, Ch.: Covering numbers, dyadic chaining and discrepancy. J. Complex. 27, 531–
540 (2011)
2. Aistleitner, Ch., Hofer, M.: Probabilistic discrepancy bounds for Monte Carlo point sets. Math.
Comput. 83, 1373–1381 (2014)
3. Babenko, V.F.: Asymptotically sharp bounds for the remainder for the best quadrature formulas
for several classes of functions. 19(3), 187–193 (1976). English Translation: Mathematics
Notes
Some Results on the Complexity of Numerical Integration 179

4. Babenko, V.F.: Exact asymptotics of the error of weighted cubature formulas optimal for
certain classes of functions. English Translation Mathematics Notes 20(4), 887–890 (1976)
5. Bakhvalov, N.S., On the approximate calculation of multiple integrals. Vestnik MGU, Ser.
Math. Mech. Astron. Phys. Chem. 4:3–18, : in Russian. English translation: Journal of Com-
plexity 31, 502–516, 2015 (1959)
6. Bakhvalov, N.S.: On the optimality of linear methods for operator approximation in convex
classes of functions. USSR Comput. Math. Math. Phys. 11, 244–249 (1971)
7. Baldeaux, J., Gnewuch, M.: Optimal randomized multilevel algorithms for infinite-
dimensional integration on function spaces with ANOVA-type decomposition. SIAM J.
Numer. Anal. 52, 1128–1155 (2014)
8. Baldeaux, J., Dick, J., Leobacher, G., Nuyens, D., Pillichshammer, F.: Efficient calculation
of the worst-case error and (fast) component-by-component construction of higher order
polynomial lattice rules. Numer. Algorithms 59, 403–431 (2012)
9. Bungartz, H.-J., Griebel, M.: Sparse grids. Acta Numer. 13, 147–269 (2004)
10. Bykovskii, V.A.: On the correct order of the error of optimal cubature formulas in spaces with
dominant derivative, and on quadratic deviations of grids. Akad. Sci. USSR, Vladivostok,
Computing Center Far-Eastern Scientific Center (preprint, 1985)
11. Chen, W.W.L., Skriganov, M.M.: Explicit constructions in the classical mean squares problem
in irregularities of point distribution. J. für Reine und Angewandte Mathematik (Crelle) 545,
67–95 (2002)
12. Chernaya, E.V.: Asymptotically exact estimation of the error of weighted cubature formulas
optimal in some classes of continuous functions. Ukr. Math. J. 47(10), 1606–1618 (1995)
13. Clancy, N., Ding, Y., Hamilton, C., Hickernell, F.J., Zhang, Y.: The cost of deterministic,
adaptive, automatic algorithms: cones, not balls. J. Complex. 30, 21–45 (2014)
14. Creutzig, J., Wojtaszczyk, P.: Linear vs. nonlinear algorithms for linear problems. J. Complex.
20, 807–820 (2004)
15. Creutzig, J., Dereich, S., Müller-Gronbach, Th, Ritter, K.: Infinite-dimensional quadrature
and approximation of distributions. Found. Comput. Math. 9, 391–429 (2009)
16. Daun, T., Heinrich, S.: Complexity of Banach space valued and parametric integration. In:
Dick, J., Kuo, F.Y., Peters, G.W., Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte Carlo
Methods 2012, pp. 297–316. Springer (2013)
17. Daun, T., Heinrich, S.: Complexity of parametric integration in various smoothness classes.
J. Complex. 30, 750–766, (2014)
18. Dereich, S., Müller-Gronbach, Th.: Quadrature for self-affine distributions on Rd . Found.
Comput. Math. 15, 1465–1500, (2015)
19. Dick, J.: A note on the existence of sequences with small star discrepancy. J. Complex. 23,
649–652 (2007)
20. Dick, J.: Numerical integration of Hölder continuous, absolutely convergent Fourier-, Fourier
cosine-, and Walsh series. J. Approx. Theory 183, 14–30 (2014)
21. Dick, J., Gnewuch, M.: Optimal randomized changing dimension algorithms for infinite-
dimensional integration on function spaces with ANOVA-type decomposition. J. Approx.
Theory 184, 111–145 (2014)
22. Dick, J., Gnewuch, M.: Infinite-dimensional integration in weighted Hilbert spaces: anchored
decompositions, optimal deterministic algorithms, and higher order convergence. Found.
Comput. Math. 14, 1027–1077 (2014)
23. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-
Monte Carlo Integration. Cambridge University Press, Cambridge (2010)
24. Dick, J., Pillichshammer, F.: Discrepancy theory and quasi-Monte Carlo integration. In: Chen,
W., Srivastav, A., Travaglini, G., (eds) Panorama in Discrepancy Theory. Lecture Notes in
Mathematics 2107, pp. 539–619. Springer (2014)
25. Dick, J., Pillichshammer, F.: The weighted star discrepancy of Korobov’s p-sets. Proc. Am.
Math. Soc. 143, 5043–5057, (2015)
26. Dick, J., Sloan, I.H., Wang, X., Woźniakowski, H.: Liberating the weights. J. Complex. 20,
593–623 (2004)
180 E. Novak

27. Dick, J., Sloan, I.H., Wang, X., Woźniakowski, H.: Good lattice rules in weighted Korobov
spaces with general weights. Numer. Math. 103, 63–97 (2006)
28. Dick, J., Larcher, G., Pillichshammer, F., Woźniakowski, H.: Exponential convergence and
tractability of multivariate integration for Korobov spaces. Math. Comput. 80, 905–930 (2011)
29. Dick, J., Kuo, F.Y., Sloan, I.H.: High-dimensional integration: the quasi-Monte Carlo way.
Acta Numer. 22, 133–288 (2013)
30. Doerr, B.: A lower bound for the discrepancy of a random point set. J. Complex. 30, 16–20
(2014)
31. Doerr, B., Gnewuch, M.: Construction of low-discrepancy point sets of small size by bracket-
ing covers and dependent randomized rounding. In: Keller, A., Heinrich, S., Niederreiter, H.,
(eds.), Monte Carlo and Quasi-Monte Carlo Methods 2006, pp. 299–312. Springer (2008)
32. Doerr, B., Gnewuch, M., Kritzer, P., Pillichshammer, F.: Component-by-component construc-
tion of low-discrepancy point sets of small size. Monte Carlo Methods Appl. 14, 129–149
(2008)
33. Doerr, B., Gnewuch, M., Wahlström, M.: Algorithmic construction of low-discrepancy point
sets via dependent randomized rounding. J. Complex. 26, 490–507 (2010)
34. Doerr, C., Gnewuch, M., Wahlström, M.: Calculation of discrepancy measures and applica-
tions. In: Chen, W.W.L., Srivastav, A., Travaglini, G., (eds.) Panorama of Discrepancy Theory.
Lecture Notes in Mathematics 2107, pp. 621–678. Springer (2014)
35. Dubinin, V.V.: Cubature formulas for Besov classes. Izvestija Mathematics 61(2), 259–283
(1997)
36. Dũng, D., Ullrich, T.: Lower bounds for the integration error for multivariate functions with
mixed smoothness and optimal Fibonacci cubature for functions on the square. Mathematische
Nachrichten 288, 743–762 (2015)
37. Dyer, M.E., Füredi, Z., McDiarmid, C.: Random volumes in the n-cube. DIMACS Ser. Discret.
Math. Theor. Comput. Sci. 1, 33–38 (1990)
38. Elekes, G.: A geometric inequality and the complexity of computing volume. Discret. Comput.
Geom. 1, 289–292 (1986)
39. Frolov, K.K.: Upper bounds on the error of quadrature formulas on classes of functions. Dok-
lady Akademy Nauk USSR 231, 818–821 (1976). English translation: Soviet Mathematics
Doklady 17, 1665–1669, 1976
40. Frolov, K.K.: Upper bounds on the discrepancy in L p , 2 ≤ p  ∞. Doklady Akademy Nauk
USSR 252, 805–807 (1980). English translation: Soviet Mathematics Doklady 18(1): 37–41,
1977
41. Gnewuch, M.: Infinite-dimensional integration on weighted Hilbert spaces. Math. Comput.
81, 2175–2205 (2012)
42. Gnewuch. M.: Entropy, randomization, derandomization, and discrepancy. In: Plaskota, L.,
Woźniakowski, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2010, pp. 43–78.
Springer (2012)
43. Gnewuch, M.: Lower error bounds for randomized multilevel and changing dimension algo-
rithms. In: Dick, J., Kuo, F.Y., Peters, G.W., Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte
Carlo Methods 2012, pp. 399–415. Springer (2013)
44. Gnewuch, M., Mayer, S., Ritter, K.: On weighted Hilbert spaces and integration of functions
of infinitely many variables. J. Complex. 30, 29–47 (2014)
45. Heinrich, S.: Lower bounds for the complexity of Monte Carlo function approximation. J.
Complex. 8, 277–300 (1992)
46. Heinrich, S.: Random approximation in numerical analysis. In: Bierstedt, K.D., et al. (eds.)
Functional Analysis, pp. 123–171. Dekker (1994)
47. Heinrich, S.: Complexity of Monte Carlo algorithms. In: The Mathematics of Numerical
Analysis, Lectures in Applied Mathematics 32, AMS-SIAM Summer Seminar, pp. 405–419.
Park City, American Mathematical Society (1996)
48. Heinrich, S.: Quantum Summation with an Application to Integration. J. Complex. 18, 1–50
(2001)
49. Heinrich, S.: Quantum integration in Sobolev spaces. J. Complex. 19, 19–42 (2003)
Some Results on the Complexity of Numerical Integration 181

50. Heinrich, S., Novak, E.: Optimal summation and integration by deterministic, randomized,
and quantum algorithms. In: Fang, K.-T., Hickernell, F.J., Niederreiter, H. (eds.) Monte Carlo
and Quasi-Monte Carlo Methods 2000, pp. 50–62. Springer (2002)
51. Heinrich, S., Novak, E., Wasilkowski, G.W., Woźniakowski, H.: The inverse of the star-
discrepancy depends linearly on the dimension. Acta Arithmetica 96, 279–302 (2001)
52. Heinrich, S., Novak, E., Pfeiffer, H.: How many random bits do we need for Monte Carlo
integration? In: Niederreiter, H. (ed.) Monte Carlo and Quasi-Monte Carlo Methods 2002,
pp. 27–49. Springer (2004)
53. Hickernell, F.J., Woźniakowski, H.: Integration and approximation in arbitrary dimension.
Adv. Comput. Math. 12, 25–58 (2000)
54. Hickernell, F.J., Woźniakowski, H.: Tractability of multivariate integration for periodic func-
tions. J. Complex. 17, 660–682 (2001)
55. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On strong tractability of weighted multivari-
ate integration. Math. Comput. 73, 1903–1911 (2004)
56. Hickernell, F.J., Müller-Gronbach, Th, Niu, B., Ritter, K.: Multi-level Monte Carlo algorithms
for infinite-dimensional integration on RN . J. Complex. 26, 229–254 (2010)
57. Hinrichs, A.: Covering numbers, Vapnik-Cervonenkis classes and bounds for the star discrep-
ancy. J. Complex. 20, 477–483 (2004)
58. Hinrichs, A.: Optimal importance sampling for the approximation of integrals. J. Complex.
26, 125–134 (2010)
59. Hinrichs, A.: Discrepancy, integration and tractability. In: Dick, J., Kuo, F.Y., Peters, G.W.,
Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 129–172. Springer
(2013)
60. Hinrichs, A., Novak, E., Ullrich, M., Woźniakowski, H.: The curse of dimensionality for
numerical integration of smooth functions. Math. Comput. 83, 2853–2863 (2014)
61. Hinrichs, A., Novak, E., Ullrich, M., Woźniakowski, H.: The curse of dimensionality for
numerical integration of smooth functions II. J. Complex. 30, 117–143 (2014)
62. Hinrichs, A., Novak, E., Ullrich, M.: On weak tractability of the Clenshaw Curtis Smolyak
algorithm. J. Approx. Theory 183, 31–44 (2014)
63. Hinrichs, A., Markhasin, L., Oettershagen, J., Ullrich, T.: Optimal quasi-Monte Carlo rules
on order 2 digital nets for the numerical integration of multivariate periodic functions, Numer.
Math. 1–34, (2015)
64. Huybrechs, D., Olver, S.: Highly oscillatory quadrature. Lond. Math. Soc. Lect. Note Ser.
366, 25–50 (2009)
65. Krieg, D., Novak, E.: A universal algorithm for multivariate integration. Found. Comput.
Math. available at arXiv:1507.06853 [math.NA]; arXiv:1507.06853v2 [math.NA]
66. Kritzer, P., Pillichshammer, F., Woźniakowski, H.: Multivariate integration of infinitely many
times differentiable functions in weighted Korobov spaces. Math. Comput. 83, 1189–1206
(2014)
67. Kritzer, P., Pillichshammer, F., Woźniakowski, H.: Tractability of multivariate analytic prob-
lems. In: Uniform distribution and quasi-Monte Carlo methods, pp. 147–170. De Gruyter
(2014)
68. Kuo, F.Y.: Component-by-component constructions achieve the optimal rate of convergence
for multivariate integration in weighted Korobov and Sobolev spaces. J. Complex. 19, 301–
320 (2003)
69. Kuo, F.Y., Wasilkowski, G.W., Waterhouse, B.J.: Randomly shifted lattice rules for unbounded
integrands. J. Complex. 22, 630–651 (2006)
70. Kuo, F.Y., Sloan, I.H., Wasilkowski, G.W., Woźniakowski, H.: Liberating the dimension. J.
Complex. 26, 422–454 (2010)
71. Leobacher, G., Pillichshammer, F.: Introduction to Quasi-Monte Carlo Integration and Appli-
cations. Springer, Berlin (2014)
72. Mathé, P.: The optimal error of Monte Carlo integration. J. Complex. 11, 394–415 (1995)
73. Müller-Gronbach, Th., Novak, E., Ritter, K.: Monte-Carlo-Algorithmen. Springer, Berlin
(2012)
182 E. Novak

74. Nguyen, V.K., Ullrich, M., Ullrich, T.: Change of variable in spaces of mixed smoothness and
numerical integration of multivariate functions on the unit cube (In preparation)
75. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM (1992)
76. Niu, B., Hickernell, F., Müller-Gronbach, Th, Ritter, K.: Deterministic multi-level algorithms
for infinite-dimensional integration on RN . J. Complex. 27, 331–351 (2011)
77. Novak, E.: On the power of adaption. J. Complex. 12, 199–237 (1996)
78. Novak, E.: Deterministic and Stochastic Error Bounds in Numerical Analysis. Lecture Notes
in Mathematics 1349. Springer, Berlin (1988)
79. Novak, E.: Quantum complexity of integration. J. Complex. 17, 2–16 (2001)
80. Novak, E., Ritter, K.: High dimensional integration of smooth functions over cubes. Numer.
Math. 75, 79–97 (1996)
81. Novak, E., Ritter, K.: The curse of dimension and a universal method for numerical integration.
In: Nürnberger, G., Schmidt, J.W., Walz, G. (eds.) Multivariate Approximation and Splines,
vol. 125, pp. 177–188. ISNM, Birkhäuser (1997)
82. Novak, E., Ritter, K.: Simple cubature formulas with high polynomial exactness. Constr.
Approx. 15, 499–522 (1999)
83. Novak, E., Rudolf, D.: Computation of expectations by Markov chain Monte Carlo methods.
In: Dahlke, S., et al. (ed.) Extraction of quantifiable information from complex systems.
Springer, Berlin (2014)
84. Novak, E., Sloan, I.H., Woźniakowski, H.: Tractability of tensor product linear operators. J.
Complex. 13, 387–418 (1997)
85. Novak, E., Triebel, H.: Function spaces in Lipschitz domains and optimal rates of convergence
for sampling. Constr. Approx. 23, 325–350 (2006)
86. Novak, E., Woźniakowski, H.: Intractability results for integration and discrepancy. J. Com-
plex. 17, 388–441 (2001)
87. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, vol. I, Linear Informa-
tion. European Mathematical Society (2008)
88. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, vol. II, Standard Infor-
mation for Functionals. European Mathematical Society (2010)
89. Novak, E., Woźniakowski, H.: Lower bounds on the complexity for linear functionals in the
randomized setting. J. Complex. 27, 1–22 (2011)
90. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, vol. III, Standard Infor-
mation for Operators. European Mathematical Society (2012)
91. Novak, E., Ullrich, M., Woźniakowski, H.: Complexity of oscillatory integration for univariate
Sobolev spaces. J. Complex. 31, 15–41 (2015)
92. Nuyens, D., Cools, R.: Fast algorithms for component-by-component construction of rank-1
lattice rules in shift invariant reproducing kernel Hilbert spaces. Math. Comput. 75, 903–920
(2006)
93. Nuyens, D., Cools, R.: Fast algorithms for component-by-component construction of rank-1
lattice rules with a non-prime number of points. J. Complex. 22, 4–28 (2006)
94. Plaskota, L., Wasilkowski, G.W.: The power of adaptive algorithms for functions with singu-
larities. J. Fixed Point Theory Appl. 6, 227–248 (2009)
95. Plaskota, L., Wasilkowski, G.W.: Tractability of infinite-dimensional integration in the worst
case and randomized settings. J. Complex. 27, 505–518 (2011)
96. Roth, K.F.: On irregularities of distributions. Mathematika 1, 73–79 (1954)
97. Roth, K.F.: On irregularities of distributions IV. Acta Arithmetica 37, 67–75 (1980)
98. Rudolf, D.: Explicit error bounds for Markov chain Monte Carlo. Dissertationes Mathematicae
485, (2012)
99. Sickel, W., Ullrich, T.: Smolyak’s algorithm, sampling on sparse grids and function spaces of
dominating mixed smoothness. East J. Approx. 13, 387–425 (2007)
100. Sickel, W., Ullrich, T.: Spline interpolation on sparse grids. Appl. Anal. 90, 337–383 (2011)
101. Skriganov, M.M.: Constructions of uniform distributions in terms of geometry of numbers.
St. Petersburg Math. J. 6, 635–664 (1995)
Some Results on the Complexity of Numerical Integration 183

102. Sloan, I.H., Reztsov, A.V.: Component-by-component construction of good lattice rules. Math.
Comput. 71, 263–273 (2002)
103. Sloan, I.H., Woźniakowski, H.: When are quasi-Monte Carlo algorithms efficient for high
dimensional integrals? J. Complex. 14, 1–33 (1998)
104. Sloan, I.H., Kuo, F.Y., Joe, S.: On the step-by-step construction of quasi-Monte Carlo inte-
gration rules that achieves strong tractability error bounds in weighted Sobolev spaces. Math.
Comput. 71, 1609–1640 (2002)
105. Sloan, I.H., Wang, X., Woźniakowski, H.: Finite-order weights imply tractability of multi-
variate integration. J. Complex. 20, 46–74 (2004)
106. Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes
of functions. Doklady Akademy Nauk SSSR 4, 240–243 (1963)
107. Sukharev, A.G.: Optimal numerical integration formulas for some classes of functions. Sov.
Math. Dokl. 20, 472–475 (1979)
108. Temlyakov, V.N.: Approximate recovery of periodic functions of several variables. Mathe-
matics USSR Sbornik 56, 249–261 (1987)
109. Temlyakov, V.N.: On a way of obtaining lower estimates for the error of quadrature formulas.
Math. USSR Sb. 181, 1403–1413 (1990). in Russian. English translation: Mathematics USSR
Sbornik 71(247–257), 1992
110. Temlyakov, V.N.: On approximate recovery of functions with bounded mixed derivative. J.
Complex. 9, 41–59 (1993)
111. Temlyakov, V.N.: Cubature formulas, discrepancy, and nonlinear approximation. J. Complex.
19, 352–391 (2003)
112. Traub, J.F., Woźniakowski, H.: A General Theory of Optimal Algorithms. Academic Press,
Cambridge (1980)
113. Traub, J.F., Wasilkowski, G.W., Woźniakowski, H.: Information-Based Complexity. Acad-
emic Press, Cambridge (1988)
114. Traub, J.F., Woźniakowski, H.: Path integration on a quantum computer. Q. Inf. Process. 1,
365–388 (2003)
115. Triebel, H.: Bases in Function Spaces, Sampling, Discrepancy, Numerical Integration. Euro-
pean Mathematical Society (2010)
116. Ullrich, M.: On Upper error bounds for quadrature formulas on function classes by K.K.
Frolov. In: Cools, R., Nuyens, D., (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2014,
vol. 163, pp. 571–582. Springer, Heidelberg (2016)
117. Ullrich, M., Ullrich, T.: The role of Frolov’s cubature formula for functions with bounded
mixed derivative. SIAM J. Numer. Anal. 54(2), 969–993 (2016)
118. Vybíral, J.: Sampling numbers and function spaces. J. Complex. 23, 773–792 (2007)
119. Vybíral, J.: Weak and quasi-polynomial tractability of approximation of infinitely differen-
tiable functions. J. Complex. 30, 48–55 (2014)
120. Wasilkowski, G.W.: Average case tractability of approximating ∞-variate functions. Math.
Comput. 83, 1319–1336 (2014)
121. Wasilkowski, G.W., Woźniakowski, H.: Explicit cost bounds of algorithms for multivariate
tensor product problems. J. Complex. 11, 1–56 (1995)
122. Wasilkowski, G.W., Woźniakowski, H.: On tractability of path integration. J. Math. Phys. 37,
2071–2088 (1996)
123. Wasilkowski, G.W., Woźniakowski, H.: Weighted tensor-product algorithms for linear mul-
tivariate problems. J. Complex. 15, 402–447 (1999)
124. Zho Newn, M., Sharygin, I.F.: Optimal cubature formulas in the classes D21,c and D21,l1 . In:
Problems of Numerical and Applied Mathematics, pp. 22–27. Institute of Cybernetics, Uzbek
Academy of Sciences (1991, in Russian)
Approximate Bayesian
Computation: A Survey on Recent Results

Christian P. Robert

Abstract Approximate Bayesian Computation (ABC) methods have become a


“mainstream” statistical technique in the past decade, following the realisation by
statisticians that they are a special type of non-parametric inference. In this survey
of ABC methods, we focus on the recent literature, building on the previous sur-
vey of Marin et al. Stat Comput 21(2):279–291, 2011, [39]. Given the importance
of model choice in the applications of ABC, and the associated difficulties in its
implementation, we also give emphasis to this aspect of ABC techniques.

Keywords Approximate Bayesian computation · Likelihood-free methods ·


Bayesian model choice · Sufficiency · Monte Carlo methods · Summary statistics

1 ABC Basics

Bayesian statistics and Monte Carlo methods are ideally suited to the task of passing many
models over one dataset. D. Rubin 1984

Although it now covers a wide range of application domains, approximate Bayesian


computation (ABC) was first introduced in population genetics [48, 62] to handle
models with intractable likelihoods [3]. By intractable, we mean models where the
likelihood function (θ |y)
• is completely defined by the probabilistic model, y ∼ f (y|θ );
• is available neither in closed form, nor by numerical derivation;
• cannot easily be completed or demarginalised by the introduction of latent or
auxiliary variables [53, 61];
• cannot be estimated by an unbiased estimator [2].

C.P. Robert (B)


CEREMADE, Université Paris-Dauphine, Paris, France
e-mail: xian@ceremade.dauphine.fr
C.P. Robert
Department of Statistics, University of Warwick, Coventry, UK

© Springer International Publishing Switzerland 2016 185


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_7
186 C.P. Robert

This intractability prohibits the direct implementation of a generic MCMC


algorithm like a Gibbs or a Metropolis–Hastings scheme. Examples of intractable
models associated with latent variable structures of high dimension abound, primarily
in population genetics, but more generally models including combinatorial structures
(e.g., trees, graphs), intractable normalising constants as in f (x|θ ) = g(y|θ )/Z (θ )
(e.g. Markov random fields, exponential graphs), and missing (or latent) variables,
i.e. when 
f (y|θ ) = f (y|G, θ ) f (G|θ )dG
G

cannot produce a likelihood function in a manageable way (while f (y|G, θ ) and


f (G|θ ) are easily available).
The idea of the approximation behind ABC is both surprisingly simple and fun-
damentally related to the very nature of statistics, namely the resolution of an inverse
problem. Indeed, ABC relies on the feasibility of producing simulated data from the
inferred model or models, as it evaluates the unavailable likelihood by the proximity
of this simulated data to the observed data. In other words, it relies on the natural
assumption that the forward step—from model to data—is reasonably easy in con-
trast with the backward step—from data to model. ABC involves three levels of
approximation of the original Bayesian inference problem: if y0 denotes the actual
observation,
• ABC degrades the data precision down to a tolerance level ε, replacing the event
Y = y0 with the event d(Y, y0 ) ≤ ε, where d(·, ·) is a distance (or deviance) mea-
sure;
• ABC substitutes for the likelihood (θ |y0 ) a non-parametric approximation, for
instance I(d(z(θ ), y0 ) ≤ ε, where z(θ ) ∼ f (z|θ );
• ABC most often summarises the original data y0 by an (almost always) insufficient
statistic, S(y0 ), and aims to approximate the posterior π(θ |S(y0 )), instead of the
original π(θ |y0 ).
Not so coincidentally, [56], quoted above, used this representation in a non-
algorithmic perspective as a motivation for conducting Bayesian analysis (as opposed
to other forms of statistical inference). Rubin indeed details the accept–reject algo-
rithm [53] at the core of the ABC algorithm. Namely, the following algorithm

Algorithm 1 Accept–reject for Bayesian analysis


Given an observation x 0
for t = 1 to N do
repeat
Generate θ ∗ from the prior π(·)
Generate x from the model f (·|θ ∗ )
Accept θ ∗ if x = x 0
until acceptance
end for
return the N accepted values of θ ∗
Approximate Bayesian Computation: A Survey on Recent Results 187

returns as an accepted value a draw generated exactly from the posterior distribution,
π(θ |x0 ).
When compared with Rubin’s representation, ABC produces an approximate solu-
tion, replacing the above acceptance step with the tolerance condition

d(x, x0 ) < ε

in order to handle both continuous and large finite sampling spaces,1 X, but this
early occurrence in [56] is definitely worth signalling. It is also relevant that Rubin
does not promote this simulation method in situations where the likelihood is not
available but rather as an intuitive way to understand posterior distributions from
a frequentist perspective, because θ ’s from the posterior are these that could have
generated the observed data. (The issue of the zero probability of the exact equality
between simulated and observed data is not addressed in Rubin’s paper, maybe
because the notion of a “match” between simulated and observed data is not clearly
defined.) Another (just as) early occurrence of an ABC-like algorithm was proposed
by [19].

Algorithm 2 ABC (basic version)


Given an observation x 0
for t = 1 to N do
repeat
Generate θ ∗ from the prior π(·)
Generate x ∗ from the model f (·|θ ∗ )
Compute the distance ρ(x0 , x∗ )
Accept θ ∗ if ρ(x0 , x∗ ) < ε
until acceptance
end for
return the N accepted values of θ ∗

The ABC method is formally implemented as in Algorithm 2, which requires


calibrating the objects ρ(·, ·), called the distance or divergence measure, N , number
of accepted simulations, and ε, called the tolerance. Algorithm 2 is exact (in the
sense of Algorithm 1) when ε = 0. This algorithm can be easily implemented to test
the performances of the ABC methods on toy examples where the exact posterior
distribution is known, in order to visualise the impact of the algorithm parameters
like the tolerance level ε or the choice of the distance function ρ. However, in realistic
settings, it is almost never used as such, due to the curse of dimensionality. Indeed,
the data x 0 is generally complex enough for the proximity ρ(x0 , x∗ ) to be large, even
when both x 0 and x ∗ are generated from the same distribution. As illustrated on the

1 As detailed below, the distance may depend solely on an insufficient statistic S(x) and hence not

be a distance from a formal perspective, while introduction a second level of approximation to the
ABC scheme.
188 C.P. Robert

time series (toy) example of [39], the signal-to-noise2 ratio produced by selecting
θ ∗ ’s such that ρ(x0 , x∗ ) < ε falls dramatically as the dimension of x 0 increases for
a fixed value of ε. This means a corresponding increase in either the total number of
simulations N or in the tolerance ε is required to preserve a positive acceptance rate.
In practice, it is thus paramount to first summarise the data in a so-called summary
statistic before computing a proximity index. Thus enters the notion of summary
statistics that is central to operational ABC algorithms, as well as the subject of
much debate, as discussed in [12, 39] and below. A more realistic version of the ABC
algorithm is produced in Algorithm 3, where S(·) denotes the summary statistic.

Algorithm 3 ABC (version with summary)


Given an observation x 0
for t = 1 to N do
Generate θ (t) from the prior π(·)
Generate x (t) from the model f (·|θ (t) )
Compute dt = ρ(S(x0 ), S(x(t) ))
end for
Order distances d(1) ≤ d(2) ≤ . . . ≤ d(N )
return the values θ (t) associated with the k smallest distances

For a general introduction to ABC methods, I refer the reader to our earlier survey
[39] and to [60], the latter constituting the original version of the Wikipedia page on
ABC [69], first published in PLoS One. The presentation made in that page is compre-
hensive and correct, rightly putting stress on the most important aspects of the method.
The authors also include the proper level of warning about the need to assess assump-
tions behind and calibrations of the method. For concision’s sake, I will not cover here
recent computational advances, like these linked with sequential Monte Carlo [4, 65]
and the introduction of Gaussian processes in the approximation [72].
An important question that arises in the wake of defining this approximate algo-
rithm is whether or not it constitutes a valid approximation to the posterior distribution
π(θ |S(y0 )), if not of the original π(θ |y0 ). (This is what we will call consistency of
the ABC algorithm in the following section, meaning that the Monte Carlo approx-
imation provided by the algorithm converges to the posterior when the number of
simulations grows to infinity. The more standard notion of statistical consistency
will also be invoked to justify the approximation.) In case it does not converge to
the posterior, a subsequent question is whether or not the ABC output constitutes
a proper form of Bayesian inference. Answers to the latter vary according to one’s
perspective:
• asymptotically, an infinite computing power allows for a zero tolerance, hence for
a proper posterior conditioning on S(y0 );
• the outcome of Algorithm 3 is an exact posterior distribution when assuming an
error-in-variable model with scale ε [70];

2 Or, more accurately, posterior-to-prior.


Approximate Bayesian Computation: A Survey on Recent Results 189

• it is also an exact posterior distribution once data has been randomised at


scale ε [24];
• it remains a formal Bayesian procedure albeit applied to an estimated likelihood.
Those answers are not fully satisfactory, in particular because using ABC implies
an ad hoc modification to the sampling model, but they are also illuminating about
the tension that exists between information and precision in complex models. ABC
indeed provides a worse approximation of the posterior distribution when the dimen-
sion of the summary statistics increases, at a given computational cost. This may
sound paradoxical from a purely statistical perspective but it is in fine a consequence
of the curse of dimensionality and of the fact that the signal-to-noise ratio may be
higher in a low dimension statistic than in the raw data. While π(θ |S(y0 )) is less
concentrated than the original π(θ |y0 ), the ABC versions of these two posteriors,

π(θ |d(S(Y ), S(y0 )) ≤ εη ) and π(θ |d(Y, y0 ) ≤ ε)

may exhibit the opposite feature. (In the above, we introduce the tolerance εη to stress
the dependence of the choice of the tolerance on the summary statistics.) A related
difficulty with ABC is that the approximation error—of using π(θ |d(S(Y ), S(y0 )) ≤
εη ) instead of π(θ |S(y0 )) or the original π(θ |y0 )—is unknown unless one is ready
to run costly simulation.

2 ABC Consistency

ABC was first treated with suspicion by the mainstream statistical community (as
well as some population geneticists, see the fierce debate between [63, 64] and [5, 8])
because it sounded like a rudimentary version of standard Monte Carlo methods like
MCMC algorithms [53]. However, the perspective later changed, due to representa-
tions of the ABC posterior distribution as (i) a genuine posterior distribution [71] and
of ABC as an auxiliary variable method [71], (ii) a non-parametric technique [10,
11], connected with both indirect inference [20] and k-nearest neighbour estimation
[9]. This array of interpretations helped to turn ABC into an acceptable (if not fully
accepted) component of Bayesian computational methods, albeit requiring caution
and calibration [69]. The following entries cover some of the advances made in the
statistical analysis of the method. While some of the earlier justifications are about
computational consistency, namely a converging approximation when the computing
power grows to infinity, the more recent analyses are mostly focused on statistical
consistency. This perspective shift signifies that ABC is increasingly considered as
an inference method per se.

2.1 ABC as Knn


In [9], the authors made a significant contribution to the statistical foundations of
ABC. It analyses the convergence properties of the ABC algorithm in accordance
190 C.P. Robert

with the way it is truly implemented. In practice, as in the DIYABC software [16],
the tolerance bound ε is determined as in Algorithm 3: a quantile of the simulated
distances, say the 10 % or the 1 % quantile, is chosen as ε. This means in particular
that the interpretation of ε as a non-parametric density estimation bandwidth, while
interesting and prevalent in the literature (see, e.g., [10, 24]), is only an approximation
of the actual practice.
The focus of [9] is on the mathematical foundations of this practice, an advance
obtained by (re)analysing ABC as a k-nearest neighbour (knn) method. Using generic
knn results, they derive a consistency property for the ABC algorithm by imposing
some constraints upon the rate of decrease of the quantile k as a function of n. More
specifically, provided

k N/log log N −→ ∞ and k N/N −→ 0

when N → ∞, for almost all s0 (with respect to the distribution of S(Y )), with
probability 1, convergence occurs, i.e.


kN
1/k N ϕ(θ j ) −→ E[ϕ(θ j )|S = s0 ]
j=1

The setting is restricted to the use of sufficient statistics or, equivalently, to a dis-
tance over the whole sample. The issue of summary statistics is not addressed in the
paper. The paper also contains a rigorous proof of the convergence of ABC when
the tolerance ε goes to zero. The mean integrated square error consistency of the
conditional kernel density estimate is established for a generic kernel (under usual
assumptions). Further assumptions (both on the target and on the kernel) allow the
authors to obtain precise convergence rates (as a power of the sample size), derived
from classical k-nearest neighbour regression, like
p+4/m+ p+4
kN ≈ N

in dimensions m larger than 4 (where N is the simulation size). The paper [9] is
theoretical and highly mathematical, however this work clearly constitutes a major
reference for the justification of ABC. In addition, it creates a link with machine-
learning techniques, where ABC is yet at an early stage of development.

2.2 Convergence Rates

In [17], the authors address ABC consistency in the special setting of hidden Markov
models. It relates to [24] discussed below in that those authors also establish ABC
consistency for the noisy ABC, given in Algorithm 4, where h(·) is a kernel bounded
by one (as for instance the unnormalised normal density).
Approximate Bayesian Computation: A Survey on Recent Results 191

Algorithm 4 ABC (noisy version)


Given an observation x 0
Generate S̃ 0 ∼ h({ S̃ − S(x 0 )}/ε)
for t = 1 to N do
repeat
Generate θ ∗ from the prior π(·)
Generate x ∗ from the model f (·|θ ∗ )
Accept θ ∗ with probability h({ S̃ 0 − S(x ∗ )}/ε)
until acceptance
end for
return N accepted values of θ ∗

In [17], an ABC scheme is derived in such a way that the ABC simulated sequence
remains an HMM, the conditional distribution of the observables given the latent
Markov chain being modified by the ABC acceptance ball. This means that con-
ducting maximum likelihood (or Bayesian) estimation based on the ABC sample is
equivalent to exact inference under the perturbed HMM scheme. In this sense, this
equivalence also connects with [24, 71] perspectives on “exact ABC”. While the
paper provides asymptotic bias for a fixed value of the tolerance ε, it also proves that
an arbitrary level of accuracy can be attained with enough data and a small enough ε.
The authors of the paper show in addition (as in [24]) that ABC inference based on
noisy observations y1 + εζ1 , . . . , yn + εζn with the same tolerance ε, is equivalent
to a regular inference based on the original data y1 , . . . , yn , hence the consistence of
Algorithm 4. Furthermore, the asymptotic variance of the ABC version is shown to
always be larger than the asymptotic variance of the standard MLE, and decreasing
as ε2 . The paper also contains an illustration on an HMM with α-stable observables.
Notice that the restriction to summary statistics that preserve the HMM structure
is paramount for the results in the paper to apply, hence prevents the use of truly
summarising statistics that would not grow linearly in dimension with the size of the
HMM series.

2.3 Checking ABC Convergence

The authors of [47] evaluate several diagnostics for ABC validation via coverage
diagnostics. Getting valid approximation diagnostics for ABC is of obvious impor-
tance, while being under-represented in the literature. When simulation time remains
manageable, the DIYABC [16] software does implement a limited coverage assess-
ment by computing the type I error, i.e. through simulating pseudo-data under the
null model and evaluating the number of time it is rejected at the 5 % level (see
Sects. 2.11.3 and 3.8 in the DIYABC documentation).
The core idea advanced by [47] is that a Bayesian credible interval on the para-
meter θ at a given credible level α should have a similar confidence level (at least
asymptotically and even more for matching priors). Furthermore, they support the
192 C.P. Robert

notion that simulating pseudo-data (à la ABC) with a known parameter value θ allows
for a Monte Carlo evaluation of the credible interval genuine coverage, hence for
a calibration of the tolerance ε. The delicate issue is about the generation of these
“known” parameters. For instance, if the pair (θ, y) is generated from the joint dis-
tribution made of prior times likelihood, and if the credible region is also based on
the true posterior, the average coverage is the nominal one. On the other hand, if
the credible interval is based on a poor (ABC) approximation to the posterior, the
average coverage should differ from the nominal one. Given that ABC is only an
approximation, however, this approach may fail to return a powerful diagnostic. In
their implementation, the authors end up approximating the p-value P(θ0 < θ ) and
checking for uniformity.

3 Improvements, Implementations, and Applications

3.1 ABC for State-Space Models

As stressed in the survey written by [30] on the use of ABC methods in a rather
general class of time-series models, these methods allow us to handle setting where
the likelihood of the current observation conditional on the past observations and on a
latent (discrete-time) process cannot be computed. The author makes the preliminary
useful remark that, in most cases, the probabilistic structure of the model (e.g.,
a hidden Markov type of dependence) is lost within the ABC representation. An
exception he and others [14, 17, 21, 31, 33, 41, 42] exploit quite thoroughly is when
the difference between the observed data and the simulated pseudo-data is operated
time step by time step, as in 
Id(yt ,yt0 )≤ε
t=1

where y 0 = (y10 , . . . , yT0 ) is the actual observation. The ABC approximation indeed
retains the same likelihood structure and allows for derivations of consistency prop-
erties (in the number of observations) of the ABC estimates. In particular, using such
a distance in the algorithm allows for the approximation to converge to the genuine
posterior when the tolerance ε goes to zero [9]. This is the setting where [24] (see also
17) show that noisy ABC is well-calibrated, i.e. has asymptotically proper conver-
gence properties. Most of the results obtained by Jasra and co-authors are dedicated
to specific classes of models, from iid models [17, 24, 31] to “observation-driven
times-series” [31] to other forms of HMM (17, 21, 41) mostly for MLE consistency
results. The constraint mentioned above leads to computational difficulties as the
acceptance rate quickly decreases with n (unless the tolerance ε is increasing with n).
The authors of [31] then suggest raising the number of pseudo-observations to aver-
age indicators in the above product and to make it random in order to ensure a
fixed number of acceptances. Moving to ABC-SMC (for sequential Monte Carlo,
Approximate Bayesian Computation: A Survey on Recent Results 193

see [4] and Algorithm 5), [32] establish unbiasedness and convergence within this
framework, in connection with the alive particle filter [35].

Algorithm 5 ABC-SMC
Given an observation x 0 , 0 < α < 1, and a proposal distribution q0 (·)
Set ε0 = +∞ and i = 0
repeat
for t = 1 to N do
Generate θt from the proposal qi (·)
Generate x ∗ from the model f (·|θt )
Compute the distance dt = ρ(x ∗ , x 0 ) and the weight ωt = π(θt )/qi (θt )
end for
Set i = i + 1
Update εi as the weighted α-quantile of the dt ’s and qi based on the weighted θt ’s
until ε is stable
return N weighted values θt

3.2 ABC with Empirical Likelihood

In [43], an ABC algorithm based on an empirical likelihood (EL) approximation is


introduced. The foundations of empirical likelihood are provided in the comprehen-
sive book of [45]. The core idea of empirical likelihood is to use a maximum entropy
discrete distribution supported by the data and constrained by estimating equations
related with the parameters of interest or of the whole model. Given a dataset x
comprising n independent replicates x = (x1 , . . . , xn ) of a random variable X ∼ F,
and a collection of generalised moment conditions that identify the parameter (of
interest) θ
E F [h(X, θ )] = 0

where h is a known function, the induced empirical likelihood [44] is defined as


n
L el (θ |x) = max pi
p
i=1

where the maximum is taken on for all p’s on the simplex of Rn such that

pi h(xi , θ ) = 0
i

As such, empirical likelihood is a non-parametric approach in the sense that the dis-
tribution of the data does not need to be specified, only some of its characteristics.
Econometricians have developed this kind of approach over the years, see e.g. [26].
However, this empirical likelihood technique can also be seen as a convergent
194 C.P. Robert

approximation to the likelihood and hence able to be exploited for cases when the
exact likelihood cannot be derived. For instance, [43] propose using it as a substitute
to the exact likelihood in Bayes’ formula, as sketched in Algorithm 6.

Algorithm 6 ABC (with empirical likelihood)


Given an observation x 0
for i = 1 → N do
Generate θi from the prior π(·)
Set the weight ωi = L el (θi |x 0 )
end for
return (θi , ωi ), i = 1, . . . , N
Use weighted sample as in importance sampling

Furthermore, [43] examine the consequences of using an empirical likelihood


in ABC contexts through a collection of examples. Note that the (ABCel) method
differs from genuine ABC algorithms in that it does not simulate pseudo-data. (Sim-
ulated data versions produced poor performances.) The principal difficulty with this
method is in connecting the parameter θ of the distribution with some moments of the
(iid) data. While this link operates rather straightforwardly for quantile distributions
[1], since theoretical quantiles are available in closed form, implementing empiri-
cal likelihood is less clear for times-series models like ARCH and GARCH [13].
Those models actually relate to hidden Markov structures, meaning that the under-
lying iid generating process is latent and has to be recovered by simulation. Inde-
pendence is indeed paramount when defining the empirical likelihood. Through a
range of simulation models and experiments, [43] demonstrates that ABCel clearly
improves upon ABC for the GARCH(1, 1) model but also that it remains less infor-
mative than a regular MCMC analysis. The difficulty in implementing the principle
is steeper for population genetic models, where parameters like divergence dates,
effective population sizes, mutation rates, cannot be expressed in terms of moments
of the distribution of the sample at a given locus. In particular, the data-points are not
iid. To bypass this difficulty, [43] resort instead to a composite likelihood formulation
[57], approximating for instance a likelihood by a product of pairwise likelihoods
over all pairs of genes. In Kingman’s coalescent theory [58], the pairwise likelihoods
can indeed be expressed in closed form. However, instead of using this composite
likelihood per se, since it constitutes a rather poor substitute to the genuine likeli-
hood, [43] rely on the associated pairwise composite score functions ∇ log L(θ ) to
build their generalised moment conditions as E[∇ log L(θ )] = 0. The comparison
with optimal standard ABC outcomes shows an improvement brought by ABCel in
the approximation, at an overall computing cost that is negligible against the cost of
ABC (in the sense that it takes minutes to produce the ABCel outcome, compared
with hours for ABC.)
The potential for use of the empirical likelihood approximation is much less
widespread than the possibility of simulating pseudo-data in regular ABC, since
EL essentially relies on an iid sample structure, plus the availability of parameter
Approximate Bayesian Computation: A Survey on Recent Results 195

defining moments. While the composite likelihood alternative provided an answer


in the important case of population genetic models, there are in fact many instances
where one simply cannot come up with a regular EL approximation, However, the
range of applications of straight EL remains wide enough to be of interest, as it
includes most dynamical models like hidden Markov models. In cases when it is
available, ABCel provides an almost free benchmark against which regular ABC
can be tested.

4 Summary Statistics, the ABC Conundrum

The main focus in the recent ABC literature has been on the selection and evaluation
of summary statistics, including a Royal Statistical Society Read Paper [24] that set a
reference and gave prospective developments in the discussion section. Transforming
the data into a statistic of small dimension but nonetheless sufficiently informative
constitutes a fundamental difficulty with ABC. Indeed, it is most often the case that
there is no non-trivial sufficient statistic and that summary statistics are not already
provided by the software (like DIYABC, [16]) or predetermined by practitioners
from the field. This choice has to balance a loss of statistical information against a
gain in ABC precision, with little available on the amounts of error and information
loss involved in the ABC substitution.

4.1 The Read Paper

In what is now a reference paper, [24] proposed an original approach to ABC, where
ABC is considered from a purely inferential viewpoint and calibrated for estimation
purposes. Quite logically, Fearnhead and Prangle (2012) do not follow the more
traditional perspective of representing ABC as a converging approximation to the
true posterior density. Like [71], they take instead a randomised or noisy version of
the observed summary statistic and then derive a calibrated version of ABC, i.e. an
algorithm that gives proper predictions, the drawback being that it is for the posterior
given this randomised version of the summary statistics. The paper also contains an
important result in the form of a consistency theorem which shows that noisy ABC is
a convergent estimation method when the number of observations or datasets grows
to infinity. The most interesting aspect in this switch of perspective is that the kernel
h used in the acceptance probability

h((s − sobs )/ h)

does not have to act as an estimate of the true sampling density, since it appears in
the (randomised) pseudo-model. (Everything collapses to the true model when the
bandwidth h goes to zero.) The Monte Carlo error is taken into account through the
196 C.P. Robert

average acceptance probability, which converges to zero with h, demonstrating it is


a suboptimal choice.
A form of tautology stems from the comparison of ABC posteriors via a loss
function
(θ0 − θ̂ )T A(θ0 − θ̂ )

that ends up with the “best” asymptotic summary statistic being the Bayes estimate
itself
E[θ |yobs ].

This result indeed follows from the very choice of the loss function rather than from an
intrinsic criterion. Using the posterior expectation as the summary statistic still makes
sense, especially when the calibration constraint implies that the ABC approxima-
tion has the same posterior mean as the true (randomised) posterior. Unfortunately
this result is parameterisation dependent and unlikely to be available in settings
where ABC is necessary. In the semi-automatic implementation proposed by [24],
the authors suggest using a pilot run of ABC to approximate the above statistics.
The simplification in the paper follows from a linear regression on the parameters,
thus linking the approach with [6]. The paper also accounts for computing costs and
stresses the relevance of the indirect inference literature [20, 27].
As exposed in my discussion [52], I remain skeptical about the “optimality”
resulting from the choice of summary statistics in the paper, partly because practice
shows that proper approximation to genuine posterior distributions stems from using
a (much) larger number of summary statistics than the dimension of the parameter
(albeit un-achievable at a given computing cost), partly because the validity of the
approximation to the optimal summary statistics depends on the quality of the pilot
run, and partly because there are some imprecisions in the mathematical derivation
of the results [52]. Furthermore, important inferential issues like model choice are
not covered by this approach. But, nonetheless, the paper provides a way to construct
default summary statistics that should come as a supplement to summary statistics
provided by the experts, or even as a substitute.

4.2 A Review of Dimension Reduction Techniques

In [12], the authors offer a detailed review of dimension reduction methods in ABC,
along with a comparison of three specific models. Given that, as put above, the
choice of the vector of summary statistics is presumably the most important single
step in an ABC algorithm and keeping in mind that selecting too large a vector is
bound to fall victim of the curse of dimensionality, this constitutes a reference for the
ABC literature. Therein, the authors compare regression adjustments à la [6], subset
selection methods, as in [34], and projection techniques, as in [24]. They add to this
impressive battery of methods the potential use of AIC and BIC.
Approximate Bayesian Computation: A Survey on Recent Results 197

The paper also suggests a further regularisation of [6] by ridge regression, although
L 1 penalty à la Lasso would be more appropriate in my opinion for removing extra-
neous summary statistics. Unsurprisingly, ridge regression does better than plain
regression in the comparison experiment when there are many almost collinear sum-
mary statistics, but an alternative conclusion could be that regression analysis is not
that appropriate with many summary statistics. Indeed, summary statistics are not
quantities of interest but data summarising tools towards a better approximation of
the posterior at a given computational cost.

4.3 ABC with Score Functions

In connection with [43] and their application in population genetics, [57] advocate
the use of composite score functions for ABC. While the paper provides a survey of
composite likelihood methods, the core idea of the paper is to use the score function
(of the composite likelihood) as the summary statistic,

∂ c(θ ; y)
,
∂θ
when evaluated at the maximum composite likelihood at the observed data point.
In the specific (but unrealistic) case of an exponential family, an ABC based on the
score is asymptotically (i.e., as the tolerance ε goes to zero) exact. Working with a
composite likelihood thus leads to a natural summary statistics. As with the empirical
likelihood approach, the composite likelihoods that are available for computation are
usually restricted in number, thus leading to an almost automated choice of a summary
statistic.
An interesting (common) feature in most examples found in this paper is that
comparisons are made between ABC using the (truly) sufficient statistic and ABC
based on the pairwise score function, which essentially relies on the very same
statistics. So the difference, when there is a difference, pertains to the choice of
a different combination of the summary statistics or, somehow equivalently to the
choice of a different distance function. One of the examples starts from the MA(2)
toy-example of [39]. The composite likelihood is then based on the consecutive triplet
marginal densities.
In a related vein, [40] offer a new perspective on ABC based on pseudo-scores.
For one thing, it concentrates on the selection of summary statistics from a more
econometrics than usual point of view, defining asymptotic sufficiency in this con-
text and demonstrating that both asymptotic sufficiency and Bayes consistency can
be achieved when using maximum likelihood estimators of the parameters of an
auxiliary model as summary statistics. In addition, the proximity to (asymptotic)
sufficiency yielded by the MLE is replicated by the score vector. Using the score
instead of the MLE as a summary statistics allows for huge gains in terms of speed.
The method is then applied to a continuous time state space model, using as auxiliary
198 C.P. Robert

model an augmented unscented Kalman filter. The various state space models tested
therein demonstrate that the ABC approach based on the marginal [likelihood] score
performs quite well, including against [24] approach. It strongly supports the idea
of using such a generic object as the unscented Kalman filter for state space models,
even when it is not a particularly accurate representation of the true model. Another
appealing feature is found in the connections made with indirect inference.

5 ABC Model Choice

While ABC is a substitute for a proper—possibly MCMC-based—Bayesian infer-


ence, and thus pertains to all aspects of Bayesian inference, including testing and
model checking, the special issue of comparing models via ABC is highly delicate
and has attracted most of the criticisms addressed against ABC [63, 64]. The imple-
mentation of ABC model choice follows by treating the model index m as an extra
parameter with an associated prior, as detailed in the following algorithm:

Algorithm 7 ABC (model choice)


Given an observation x 0
for i = 1 to N do
repeat
Generate m from the prior π(M = m)
Generate θm from the prior πm (θm )
Generate x from the model f m (x|θm )
until ρ{S(x), S(x 0 )} ≤ ε
Set m(i) = m and θ (i) = θm
end for
return the values m(i) associated with the k smallest distances

Improvements upon returning raw model index frequencies as ABC estimates


have been proposed in [23], via a regression regularisation. In this approach, indices
are processed as categorical variables in a formal multinomial regression, using for
instance logistic regression. Rejection-based approaches as in Algorithm 7 were
introduced in [16, 28, 65], in a Monte Carlo perspective simulating model indices as
well as model parameters. Those versions are widely used by the population genetics
community, as exemplified by [7, 15, 22, 25, 29, 37, 46, 49, 67, 68]. As described
in the following sections, this adoption may be premature or over-optimistic, since
caution and cross-checking are necessary to completely validate the output.

5.1 ABC Model Criticism


The approach in [51] is very original in its view of ABC model criticism and thus
indirectly ABC model choice. It is about the use of the ABC approximation error ε
Approximate Bayesian Computation: A Survey on Recent Results 199

in an altogether different way, namely as a tool for assessing the goodness of fit of a
given model. The fundamental idea is to process ε as an additional parameter of the
model, simulating from a joint posterior distribution

f (θ, ε|x0 ) ∝ ξ(ε|x0 , θ ) × πθ (θ ) × πε (ε)

where x0 is the data and ξ(ε|x0 , θ ) plays the role of the likelihood. (The π ’s are
obviously the priors on θ and ε.) In fact, ξ(ε|x0 , θ ) is the prior predictive density of
ρ(S(x), S(x0 )) given θ and x0 when x is distributed from f (x|θ ). The authors then
derive an ABC algorithm they call ABCμ to simulate an MCMC chain targeting this
joint distribution, replacing ξ(ε|x0 , θ ) with a non-parametric kernel approximation.
For each model under comparison, the marginal posterior distribution on the error
ε is then used to assess the fit of the model, the logic of it being that this posterior
should include 0 in a reasonable credible interval. (Contrary to other ABC papers, ε
can be negative and multidimensional in this paper.)
Given the wealth of innovations contained in the paper, let me add here that,
while the authors stress they use the data once (a point always uncertain to me), they
also define the above target by using simultaneously a prior distribution on ε and a
conditional distribution on the same ε-that they interpret as the likelihood in (ε, θ ).
The product being most often defined as a density in (ε, θ ), it can be simulated from,
but this is hardly a regular Bayesian problem, especially because it seems the prior
on ε significantly contributes to the final assessment. Further and more developed
criticisms are published as [55], along with a reply by the authors [50]. Let me stress
one more time how original this paper is and deplore a lack of follow-up in the
subsequent literature for a practical method that should be implemented on existing
ABC software.

5.2 A Clear Lack of Confidence

The analysis in [54] leads to to the conclusion that ABC approximations to poste-
rior probabilities cannot be blindly and uniformly trusted. Approximating posterior
probabilities as in Algorithm 7, i.e. by using the frequencies of acceptances of sim-
ulations from these models (assuming the use of a common summary statistic to
define the distance to the observations). Rather obviously, the limiting behaviour of
the procedure is ruled by a true Bayes factor, except that it is the Bayes factor based
on the distributions of the summary statistics under both models.
While this does not sound like a particularly fundamental remark, given that all
ABC approximations rely on posterior distributions based on these statistics, rather
than on the whole dataset, and while this approximation only has consequences in
terms of inferential precision for most inferential purposes, it induces a dramatic arbi-
trariness in the Bayes factor. To illustrate this arbitrariness, consider the case of using
a sufficient statistic S(x) for both models. Then, by the factorisation theorem [36],
the true likelihoods factorise as
200 C.P. Robert

1 (θ1 |x) = g1 (x) p1 (θ1 |S(x)) and 2 (θ2 |x) = g2 (x) p2 (θ2 |S(x))

resulting in a true Bayes factor equal to

g1 (x) S
B12 (x) = B (x)
g2 (x) 12

where the last term is the limiting ABC Bayes factor. Therefore, in the favourable
case of the existence of a sufficient statistic, using only the sufficient statistic induces
a difference in the result that fails to converge with the number of observations or
simulations. Quite the opposite, it may diverge one way or another as the number
of observations increases. Again, this is in the favourable case of sufficiency. In the
realistic setting of insufficient statistics, things deteriorate even further. This practical
situation implies a wider loss of information compared with the exact inferential
approach, hence a wider discrepancy between the exact Bayes factor and the quantity
produced by an ABC approximation. The paper is thus intended as a warning to the
community about the dangers of this approximation, especially when considering the
rapidly increasing number of applications using ABC for conducting model choice
and hypothesis testing.
This paper stresses a fundamental and even foundational distinction between ABC
point (and confidence) estimation, and ABC model choice, namely that the problem
stands at another level for Bayesian model choice (using posterior probabilities).
When doing point estimation with insufficient statistics, the information content is
poorer, but unless one uses very degraded (i.e., ancillary) summary statistics, infer-
ence is converging. The posterior distribution stays different from the true posterior
in this case but, at least, increasing the number observations brings more information
about the parameter (and convergence when this number goes to infinity). For model
choice, this is not guaranteed if we use summary statistics that are not inter-model
sufficient, as shown by the Poisson and normal examples in [54]. Furthermore, except
for very specific cases such as Gibbs random fields [28], it is almost always impos-
sible to derive inter-model sufficient statistics, beyond the raw sample. The paper
includes a realistic and computationally costly population genetic illustration, where
it exhibits a clear divergence in the numerical values of both approximations of the
posterior probabilities. The error rates in using the ABC approximation to choose
between two scenarios, labelled 1 and 2, are 14.5 and 12.5 % (under scenarios 1 and 2),
respectively.
A quite related if less pessimistic paper is [18], also concerned with the limiting
behaviour for the ratio,
g1 (x) S
B12 (x) = B (x).
g2 (x) 12

Indeed, the authors reach the opposite conclusion from ours, namely that the problem
can be solved by a sufficiency argument. Their point is that, when comparing models
within exponential families (which is the natural realm for sufficient statistics), it
is always possible to build an encompassing model with a sufficient statistic that
Approximate Bayesian Computation: A Survey on Recent Results 201

remains sufficient across models. This construction is correct from a mathematical


perspective, as seen for instance in the Poisson versus geometric example we first
mentioned in [28]: adding
n
xi !
i=1

to the sum of the observables into a large sufficient statistic produces a ratio g1 /g2 that
is equal to 1, hence avoids any discrepancy. However, this encompassing property
only applies for exponential families. Looking at what happens in the limiting case
when one is relying on a common sufficient statistic is a formal study that sheds no
light on the (potentially huge) discrepancy between the ABC-based Bayes factor and
the true Bayes factor in the typical case.

5.3 Validating Summaries for ABC Model Choice

The subsequent [38] deals with the contrasted performances and the resulting eval-
uation of summary statistics for Bayesian model choice (and not solely in ABC
settings). The central result in this paper is that the summary statistic should enjoy a
different range of means (as a vector) under different models for the corresponding
Bayes factor to be consistent (as the number of observations goes to zero). Oth-
erwise, the model with the least parameters will be asymptotically selected. Even
though the idea of separating the mean behaviour of the summary statistic under
both models is intuitive, establishing a complete theoretical framework that vali-
dated this intuition requires assumptions borrowed from the asymptotic Bayesian
literature [66]. The main theorem in [38] states that, under such assumptions, when
the “true” mean E[S(Y )] of the summary statistic can be recovered for both models
under comparison, then the Bayes factor is of order
 (d ˘d ) 
O n − 1 2 /2 ,

where di is the intrinsic dimension of the parameters driving the summary statistic
in model i = 1, 2, irrespective of which model is true. (Precisely, the dimensions
di are the dimensions of the asymptotic mean of the summary statistic under both
models.) Therefore, the Bayes factor always asymptotically selects the model having
the smallest effective dimension and cannot be consistent. If, instead, the “true”
mean E[S(Y )] cannot be represented in the wrong model, then the Bayes factor is
consistent. This implies that, the best statistics to be used in ABC model choice
are ancillary statistics with different mean values under both models. Otherwise, the
summary statistic must have enough components to prohibit a parameter value under
the wrong model meeting the true mean of the summary statistic.
The paper remains quite theoretical, with the mathematical assumptions required
to obtain the convergence theorems being rather overwhelming and difficult to check
202 C.P. Robert

in practical cases. Nonetheless, this paper comes as a third if not last step in a series
of papers on the issue of ABC model choice. Indeed, we first identified a sufficiency
property [28], then realised that this property was a quite rare occurrence, and we
finally made the theoretical advance in [38]. This last step characterises when is a
statistic good enough to conduct model choice, with a clear answer that the ranges
of the mean of the summary statistic under each model should not intersect. From a
methodological point of view, only the conclusion should be taken into account, as
it is then straightforward to come up with quick simulation devices to check whether
a summary statistic behaves differently under both models, taking advantage of the
reference table already available (instead of having to run Monte Carlo experiments
with ABC steps). The paper [38] includes a χ 2 check about the relevance of a given
summary statistics.
In [59], the authors consider summary statistics for ABC model choice in hidden
Gibbs random fields. The move to a hidden Markov random field means that the
original approach of [28] does not apply: there is no dimension-reduction sufficient
statistics in that case. The authors introduce a small collection of (four!) focused
statistics to discriminate between Potts models. They further define a novel misclas-
sification rate, conditional on the observed value and derived from the ABC reference
table. It is the predictive error rate

PABC (m̂(Y )
= m|S(y obs ))

integrating out both the model index m and the corresponding random variable Y (and
the hidden intermediary parameter) given the observations–more precisely given the
transform of the observations by the summary statistic S. In a simulation experiment,
this paper shows that the predictive error rate significantly decreases by including
2 or 4 geometric summary statistics on top of the no-longer-sufficient concordance
statistics.

6 Conclusion

This survey reflects upon the diversity and the many directions of progress in this
field of ABC research. The overall message is that the on-going research has led both
to consider ABC as part of the statistical tool-kit and to envision different approaches
to statistical modelling, where a complete representation of the whole world is no
longer feasible. Over the evolution of ABC in the past fifteen years we have thus
moved from approximate methods to approximate models, which is a positive move
in my opinion.

Acknowledgments The author is most grateful to an anonymous referee for her or his help with
the syntax and grammar of this survey. He also thanks the organisers of MCqMC 2014 in Leuven
for their kind invitation.
Approximate Bayesian Computation: A Survey on Recent Results 203

References

1. Allingham, D., King, R., Mengersen, K.: Bayesian estimation of quantile distributions. Stat.
Comput. 19, 189–201 (2009)
2. Andrieu, C., Roberts, G.: The pseudo-marginal approach for efficient Monte Carlo computa-
tions. Ann. Stat. 37(2), 697–725 (2009)
3. Beaumont, M.: Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol.
Evol. Syst. 41, 379–406 (2010)
4. Beaumont, M., Cornuet, J.-M., Marin, J.-M., Robert, C.: Adaptive approximate Bayesian com-
putation. Biometrika 96(4), 983–990 (2009)
5. Beaumont, M., Nielsen, R., Robert, C., Hey, J., Gaggiotti, O., Knowles, L., Estoup, A., Mahesh,
P., Coranders, J., Hickerson, M., Sisson, S., Fagundes, N., Chikhi, L., Beerli, P., Vitalis, R.,
Cornuet, J.-M., Huelsenbeck, J., Foll, M., Yang, Z., Rousset, F., Balding, D., Excoffier, L.: In
defense of model-based inference in phylogeography. Mol. Ecol. 19(3), 436–446 (2010)
6. Beaumont, M., Zhang, W., Balding, D.: Approximate Bayesian computation in population
genetics. Genetics 162, 2025–2035 (2002)
7. Belle, E., Benazzo, A., Ghirotto, S., Colonna, V., Barbujani, G.: Comparing models on the
genealogical relationships among Neandertal, Cro-Magnoid and modern Europeans by serial
coalescent simulations. Heredity 102(3), 218–225 (2008)
8. Berger, J., Fienberg, S., Raftery, A., Robert, C.: Incoherent phylogeographic inference. Proc.
Natl. Acad. Sci. 107(41), E57 (2010)
9. Biau, G., Cérou, F., Guyader, A.: New insights into approximate Bayesian computation.
Annales de l’IHP (Probab. Stat.) 51, 376–403 (2015)
10. Blum, M.: Approximate Bayesian computation: a non-parametric perspective. J. Am. Stat.
Assoc. 105(491), 1178–1187 (2010)
11. Blum, M., François, O.: Non-linear regression models for approximate Bayesian computation.
Stat. Comput. 20, 63–73 (2010)
12. Blum, M.G.B., Nunes, M.A., Prangle, D., Sisson, S.A.: A comparative review of dimension
reduction methods in approximate Bayesian computation. Stat. Sci. 28(2), 189–208 (2013)
13. Bollerslev, T., Chou, R., Kroner, K.: ARCH modeling in finance. A review of the theory and
empirical evidence. J. Econom. 52, 5–59 (1992)
14. Calvet, C., Czellar, V.: Accurate methods for approximate Bayesian computation filtering. J.
Econom. (2014, to appear)
15. Cornuet, J.-M., Ravigné, V., Estoup, A.: Inference on population history and model check-
ing using DNA sequence and microsatellite data with the software DIYABC (v1.0). BMC
Bioinform. 11, 401 (2010)
16. Cornuet, J.-M., Santos, F., Beaumont, M., Robert, C., Marin, J.-M., Balding, D., Guillemaud, T.,
Estoup, A.: Inferring population history with DIYABC: a user-friendly approach to approximate
Bayesian computation. Bioinformatics 24(23), 2713–2719 (2008)
17. Dean, T., Singh, S., Jasra, A., Peters, G.: Parameter inference for hidden Markov models with
intractable likelihoods. Scand. J. Stat. (2014, to appear)
18. Didelot, X., Everitt, R., Johansen, A., Lawson, D.: Likelihood-free estimation of model evi-
dence. Bayesian Anal. 6, 48–76 (2011)
19. Diggle, P., Gratton, R.: Monte Carlo methods of inference for implicit statistical models. J. R.
Stat. Soc. Ser. B 46, 193–227 (1984)
20. Drovandi, C., Pettitt, A., Fddy, M.: Approximate Bayesian computation using indirect infer-
ence. J. R. Stat. Soc. Ser. A 60(3), 503–524 (2011)
21. Ehrlich, E., Jasra, A., Kantas, N.: Gradient free parameter estimation for hidden markov models
with intractable likelihoods. Method. Comp. Appl. Probab. (2014, to appear)
22. Excoffier, C., Leuenberger, D., Wegmann, L.: Bayesian computation and model selection in
population genetics (2009)
23. Fagundes, N., Ray, N., Beaumont, M., Neuenschwander, S., Salzano, F., Bonatto, S., Excoffier,
L.: Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci.
104(45), 17614–17619 (2007)
204 C.P. Robert

24. Fearnhead, P., Prangle, D.: Constructing summary statistics for Approximate Bayesian com-
putation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc.: Ser. B (Stat.
Method.), 74(3), 419–474. (With discussion)
25. Ghirotto, S., Mona, S., Benazzo, A., Paparazzo, F., Caramelli, D., Barbujani, G.: Inferring
genealogical processes from patterns of bronze-age and modern DNA variation in Sardinia.
Mol. Biol. Evol. 27(4), 875–886 (2010)
26. Gouriéroux, C., Monfort, A.: Simulation Based Econometric Methods. CORE Lecture Series.
CORE, Louvain (1995)
27. Gouriéroux, C., Monfort, A., Renault, E.: Indirect inference. J. Appl. Econom. 8, 85–118 (1993)
28. Grelaud, A., Marin, J.-M., Robert, C., Rodolphe, F., Tally, F.: Likelihood-free methods for
model choice in Gibbs random fields. Bayesian Anal. 3(2), 427–442 (2009)
29. Guillemaud, T., Beaumont, M., Ciosi, M., Cornuet, J.-M., Estoup, A.: Inferring introduction
routes of invasive species using approximate Bayesian computation on microsatellite data.
Heredity 104(1), 88–99 (2009)
30. Jasra, A.: Approximate Bayesian Computation for a Class of Time Series Models. e-prints
(2014)
31. Jasra, A., Kantas, N., Ehrlich, E.: Approximate inference for observation driven time series
models with intractable likelihoods. TOMACS (2014, to appear)
32. Jasra, A., Lee, A., Yau, C., Zhang, X.: The Alive Particle Filter. e-prints (2013)
33. Jasra, A., Singh, S., Martin, J., McCoy, E.: Filtering via approximate Bayesian computation.
Stat. Comp. 22, 1223–1237 (2012)
34. Joyce, P., Marjoram, P.: Approximately sufficient statistics and Bayesian computation. Stat.
Appl. Genet. Mol. Biol. 7(1), Article 26 (2008)
35. Le Gland, F., Oudjane, N.: A Sequential Particle Algorithm that Keeps the Particle System
Alive. Lecture Notes in Control and Information Sciences, vol. 337, pp. 351–389. Springer,
Berlin (2006)
36. Lehmann, E., Casella, G.: Theory of Point Estimation, revised edn. Springer, New York (1998)
37. Leuenberger, C., Wegmann, D.: Bayesian computation and model selection without likelihoods.
Genetics 184(1), 243–252 (2010)
38. Marin, J., Pillai, N., Robert, C., Rousseau, J.: Relevant statistics for Bayesian model choice. J.
R. Stat. Soc. Ser. B 76(5), 833–859 (2014)
39. Marin, J., Pudlo, P., Robert, C., Ryder, R.: Approximate Bayesian computational methods.
Stat. Comput. 21(2), 279–291 (2011)
40. Martin, G.M., McCabe, B.P.M., Maneesoonthorn, W., Robert, C.P. Approximate Bayesian
Computation in State Space Models. e-prints (2014)
41. Martin, J., Jasra, A., Singh, S., Whiteley, N., Del Moral, P., McCoy, E.: Approximate Bayesian
computation for smoothing. Stoch. Anal. Appl. 32(3), (2014)
42. McKinley, T., Ross, J., Deardon, R., Cook, A.: Simulation-based Bayesian inference for epi-
demic models. Comput. Stat. Data Anal. 71, 434–447 (2014)
43. Mengersen, K., Pudlo, P., Robert, C.: Bayesian computation via empirical likelihood. Proc.
Natl. Acad. Sci. 110(4), 1321–1326 (2013)
44. Owen, A.B.: Empirical likelihood ratio confidence intervals for a single functional. Biometrika
75, 237–249 (1988)
45. Owen, A.B.: Empirical Likelihood. Chapman & Hall, Boca Raton (2001)
46. Patin, E., Laval, G., Barreiro, L., Salas, A., Semino, O., Santachiara-Benerecetti, S., Kidd, K.,
Kidd, J., Van Der Veen, L., Hombert, J., et al.: Inferring the demographic history of African
farmers and pygmy hunter-gatherers using a multilocus resequencing data set. PLoS Genet.
5(4), e1000448 (2009)
47. Prangle, D., Blum, M.G.B., Popovic, G., Sisson, S.A.: Diagnostic tools of approximate
Bayesian computation using the coverage property. e-prints (2013)
48. Pritchard, J., Seielstad, M., Perez-Lezaun, A., Feldman, M.: Population growth of human Y
chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798
(1999)
Approximate Bayesian Computation: A Survey on Recent Results 205

49. Ramakrishnan, U., Hadly, E.: Using phylochronology to reveal cryptic population histories:
review and synthesis of 29 ancient DNA studies. Mol. Ecol. 18(7), 1310–1330 (2009)
50. Ratmann, O., Andrieu, C., Wiuf, C., Richardson, S.: Reply to Robert et al.: Model criticism
informs model choice and model comparison. Proc. Natl. Acad. Sci. 107(3), E6–E7 (2010)
51. Ratmann, O., Andrieu, C., Wiujf, C., Richardson, S.: Model criticism based on likelihood-free
inference, with an application to protein network evolution. Proc. Natl. Acad. Sci. USA 106,
1–6 (2009)
52. Robert, C.: Discussion of “constructing summary statistics for Approximate Bayesian Com-
putation” by Fernhead, P., Prangle, D., J. R. Stat. Soc. Ser. B, 74(3), 447–448 (2012)
53. Robert, C., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
54. Robert, C., Cornuet, J.-M., Marin, J.-M., Pillai, N.: Lack of confidence in ABC model choice.
Proc. Natl. Acad. Sci. 108(37), 15112–15117 (2011)
55. Robert, C., Mengersen, K., Chen, C.: Model choice versus model criticism. Proc. Natl. Acad.
Sci. 107(3), E5 (2010)
56. Rubin, D.: Bayesianly justifiable and relevant frequency calculations for the applied statistician.
Ann. Stat. 12, 1151–1172 (1984)
57. Ruli, E., Sartori, N., Ventura, L.: Approximate Bayesian Computation with composite score
functions. e-prints (2013)
58. Stephens, M., Donnelly, P.: Inference in molecular population genetics. J. R. Stat. Soc.: Ser. B
(Stat. Method.) 62(4), 605–635 (2000)
59. Stoehr, J., Pudlo, P., Cucala, L.: Adaptive ABC model choice and geometric summary statistics
for hidden Gibbs random fields. Stat. Comput. pp. 1–13 (2014)
60. Sunnåker, M., Busetto, A., Numminen, E., Corander, J., Foll, M., Dessimoz, C.: Approximate
Bayesian computation. PLoS Comput. Biol. 9(1), e1002803 (2013)
61. Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation. J. Am.
Stat. Assoc. 82, 528–550 (1987)
62. Tavaré, S., Balding, D., Griffith, R., Donnelly, P.: Inferring coalescence times from DNA
sequence data. Genetics 145, 505–518 (1997)
63. Templeton, A.: Statistical hypothesis testing in intraspecific phylogeography: nested clade
phylogeographical analysis vs. approximate Bayesian computation. Mol. Ecol. 18(2), 319–
331 (2008)
64. Templeton, A.: Coherent and incoherent inference in phylogeography and human evolution.
Proc. Natl. Acad. Sci. 107(14), 6376–6381 (2010)
65. Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.: Approximate Bayesian computation
scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface
6(31), 187–202 (2009)
66. van der Vaart, A.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)
67. Verdu, P., Austerlitz, F., Estoup, A., Vitalis, R., Georges, M., Théry, S., Froment, A., Le Bomin,
S., Gessain, A., Hombert, J.-M., Van der Veen, L., Quintana-Murci, L., Bahuchet, S., Heyer,
E.: Origins and genetic diversity of pygmy hunter-gatherers from Western Central Africa. Curr.
Biol. 19(4), 312–318 (2009)
68. Wegmann, D., Excoffier, L.: Bayesian inference of the demographic history of chimpanzees.
Mol. Biol. Evol. 27(6), 1425–1435 (2010)
69. Wikipedia (2014). Approximate Bayesian computation — Wikipedia, The Free Encyclopedia
70. Wilkinson, R.L: Approximate Bayesian computation (ABC) gives exact results under the
assumption of model error. Technical Report (2008)
71. Wilkinson, R.: Approximate Bayesian computation (ABC) gives exact results under the
assumption of model error. Stat. Appl. Genet. Mol. Biol. 12(2), 129–141 (2013)
72. Wilkinson, R.D.: Accelerating ABC methods using Gaussian processes. e-prints (2014)
Part II
Contributed Papers
Multilevel Monte Carlo Simulation
of Statistical Solutions to the Navier–Stokes
Equations

Andrea Barth, Christoph Schwab and Jonas Šukys

Abstract We propose Monte Carlo (MC), single level Monte Carlo (SLMC) and
multilevel Monte Carlo (MLMC) methods for the numerical approximation of sta-
tistical solutions to the viscous, incompressible Navier–Stokes equations (NSE) on
a bounded, connected domain D ⊂ Rd , d = 1, 2 with no-slip or periodic boundary
conditions on the boundary ∂D. The MC convergence rate of order 1/2 is shown
to hold independently of the Reynolds number with constant depending only on
the mean kinetic energy of the initial velocity ensemble. We discuss the effect of
space-time discretizations on the MC convergence. We propose a numerical MLMC
estimator, based on finite samples of numerical solutions with finite mean kinetic
energy in a suitable function space and give sufficient conditions for mean-square
convergence to a (generalized) moment of the statistical solution. We provide in
particular error bounds for MLMC approximations of statistical solutions to the vis-
cous Burgers equation in space dimension d = 1 and to the viscous, incompressible
Navier-Stokes equations in space dimension d = 2 which are uniform with respect
to the viscosity parameter. For a more detailed presentation and proofs we refer the
reader to Barth et al. (Multilevel Monte Carlo approximations of statistical solutions
of the Navier–Stokes equations, 2013, [6]).

Keywords Multilevel Monte Carlo method · Navier–Stokes equations · Statistical


solutions · Finite volume

A. Barth (B)
SimTech, University of Stuttgart, Pfaffenwaldring 5a, 70569 Stuttgart, Germany
e-mail: andrea.barth@mathematik.uni-stuttgart.de
C. Schwab
Seminar Für Angewandte Mathematik, ETH Zürich, Rämistrasse 101,
8092 Zurich, Switzerland
e-mail: schwab@math.ethz.ch
J. Šukys
Computational Science Laboratory, ETH Zürich, Clausiusstrasse 33,
8092 Zurich, Switzerland
e-mail: jonas.sukys@mavt.ethz.ch

© Springer International Publishing Switzerland 2016 209


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_8
210 A. Barth et al.

1 Navier–Stokes Equations and Statistical Solutions

In the connected bounded domain D ⊂ Rd , for d = 1, 2, with boundary ∂D and in the


finite time interval J̄ := [0, T ], for T < ∞, we consider a viscous, incompressible
flow, subject to a prescribed divergence-free initial velocity field u0 : D → R and to
a body force f acting on the fluid particles in D. The NSE for viscous, incompressible
flow of a Newtonian fluid are given in terms of the velocity field u : J̄ × D → Rd , and
the pressure p : J̄ × D → R. The pressure takes the role of a Lagrange multiplier,
enforcing the divergence-free constraint. The NSE in J̄ × D, for d = 2, read (see,
e.g., [16]),

u − νΔu + (u · ∇)u + ∇p = f , ∇ · u = 0, (1)
∂t
with the kinematic viscosity ν ≥ 0 and with a given initial velocity field u(0) = u0 .
In space dimension d = 1, i.e. for D = (0, 1), the NSE reduce to the (viscous for
ν > 0) Burgers’ equation. We focus here on Eq. (1) with periodic or no-slip boundary
conditions. We provide numerical examples for periodic boundary conditions, but
emphasize that the theory of statistical solutions extends also to other boundary con-
ditions (see [7]). Apart from not exhibiting viscous boundary layers, homogeneous
statistical solutions to the NSE with periodic boundary conditions appear in certain
physical models [7, Chaps. IV and V].
Statistical solutions aim at describing the evolution of ensembles of solutions
through their probability distribution. In space dimension d ≥ 2, for no-slip boundary
conditions we define the function space

Hnsp = {v ∈ L 2 (D)d : ∇ · v = 0 in H −1 (D), v · n|∂D = 0 in H −1/2 (∂D)},

where n is the unit outward-pointing normal vector from D. For D = (0, 1)2 and
periodic boundary conditions, we denote the corresponding space of functions with
vanishing average over D by Hper . We remark that Hper coincides with the space
Ḣ(L) in [7, Chap. V.1.2] of L-periodic functions with vanishing average, with period
L = 1. Whenever we discuss generic statements valid for either boundary conditions,
we write H ∈ {Hnsp , Hper }.
We assume given a probability measure on H, where H is equipped with the Borel-
σ -algebra B(H). Statistical solutions to the NSE as defined in [7, 8] are parametric
families of probability measures on H. Rather than being restricted to one single
initial condition, a (Foiaş–Prodi) statistical solution to the NSE is a one-parameter
family of probability measures which describes the evolution of statistics of initial
velocity ensembles. Individual solutions of Eq. (1) are special cases of statistical
solutions, for initial measure μ0 charging one initial velocity u0 ∈ H. In general, the
initial distribution μ0 is defined via an underlying probability space (Ω, F , P). The
distribution of initial velocities is assumed to be given as image measure under a
H-valued random variable with distribution μ0 . This random variable X is defined as
a mapping from the measurable space (Ω, F ) into the measurable space (H, B(H))
such that μ0 = X ◦ P. Consider the NSE (1) in space dimension d = 2 with viscosity
MLMC Simulation of Statistical Solutions 211

ν > 0 without forcing, i.e. with f ≡ 0. In this case, the solution to the NSE is unique
and the initial-data-to-solution map is a semigroup S ν = (S ν (t, 0), t ∈ J) on H [7,
Chap. III.3.1]. Then, a (unique) time-dependent family of probability measures μν =
(μνt , t∈ J) on H is given by [7, Chap. IV.1.2]

μνt (E) = μ0 (S ν (t, 0)−1 E), E ∈ B(H), (2)

i.e., for every t ≥ 0, and every E ∈ B(H), P({u(t) ∈ E}) = P({u0 ∈ S ν (t, 0)−1 E}) =
μ0 ((S ν (t, 0))−1 E). We remark that for nonzero, time-dependent forcing f , S ν defines
in general not a semigroup on H [7, Chap. V.1.1]. For any time t ∈ J, we may then
define the generalized moment

Φ(w) dμνt (w)
H

for a suitable, μνt -integrable function Φ on H. The time-evolution of generalized


moments of the Navier-Stokes flow is formally given by
 
d
Φ(v) dμνt (v) = (F(t, v), Φ (v))H dμνt (v), (3)
dt H H

for suitable test functionals Φ. Here, F is given by F(t, u) = f − νAu − B(u, u),
where A denotes the Stokes operator and B(u, u) the quadratic momentum advection
term (see [7, Eq. IV.1.10] for details). For the functional setting in space dimension
d = 2, in the no-slip case, we define Vnsp := {v ∈ H01 (D)d : ∇ · v = 0 in L 2 (D)} ⊂
Hnsp and in an analogous fashion Vper ⊂ Hper . Again, we write V ∈ {Vnsp , Vper } for
generic statements valid in either case.
A suitable class of test functionals Φ is given by the following:
Definition 1 [7, Definitions V.1.1, V.1.3] Let C be the space of cylindrical test
functionals Φ on H which are real-valued and depend only on a finite number of
components of v ∈ H, i.e. there exists k < ∞, such that

Φ(v) = φ((v, g1 )H , . . . , (v, gk )H ), (4)

where φ is a compactly supported C 1 scalar function on Rk and g1 , . . . , gk ∈ V .


Provided that the support of μ0 in H is bounded, the condition of compact support
of φ in Eq. (4) can be relaxed; we refer to [7, Appendix V.A] for details.
For Φ ∈ C we denote by Φ its differential in H, which is given by


k
Φ (v) = ∂i φ((v, g1 )H , . . . , (v, gk )H )gi .
i=1

As a linear combination of elements in V , Φ (v) belongs to V .


212 A. Barth et al.

Energy equalities are central for statistical solutions to Eq. (1); we integrate Eq. (3),
which leads, in space dimension d = 2 and for all t ∈ J, to (cp. [7, Eq. V.1.9])
  t
v 2H dμνt (v) + 2ν v 2V dμs (v) ds
H 0 V
 t  (5)
ν
= (f (s), v)H dμs (v) ds + v 2H dμ0 (v).
0 H H

Equations (3) and (5) motivate the definition of statistical solutions to Eq. (1).
Definition 2 [7, Definitions V.1.2, V.1.4] In space dimension d = 1, 2, a one-
parametric family μν = (μνt , t ∈ J) of Borel probability measures on H is a statistical
solution to Eq. (1) on the time interval J if
1. the initial Borel probability measure μ0 on H has finite mean kinetic energy, i.e.,

v 2H dμ0 (v) < ∞,
H

2. f ∈ L 2 (J; H) and the Borel probability measures μt satisfy Eq. (3) for all Φ ∈ C
and Eq. (5) holds.
We note that in space dimension d = 3 the notion of statistical solution is more
delicate, cp. [8]. We recall an existence (and, in space dimension d = 2, uniqueness,)
result (see [7, Theorems V.1.2, V.1.3, V.1.4]), [8]): if μ0 is supported in BH (R) for
some 0 < R < ∞, and if the forcing term f ∈ H is time-independent, the statistical
solution is unique and given by Eq. (2).

2 Discretization Methods

Our goal is the numerical approximation of (generalized) moments of the statistical


solution (μνt , t ∈ J) for a given initial distribution μ0 on H. We achieve this by
approximating, for given Φ ∈ C (with C as in Definition 1) and for given μ0 with
finite mean kinetic energy on H, the expression

eμνt (Φ) = Φ(w) dμνt (w), t ∈ J.
H

As a first approach, we assume that we can sample from the exact initial distribution
μ0 . Since μ0 is a distribution on the infinite-dimensional space H, this is, in general,
a simplifying assumption. However, if the probability measure μ0 is supported on
a finite-dimensional subspace of H, the assumption is no constraint. We discuss an
appropriate approximation of the initial distribution in Sect. 3. We generate M ∈ N
independent copies (wi , i = 1, . . . , M) of u0 , where u0 is μ0 -distributed. Assume for
MLMC Simulation of Statistical Solutions 213

now that for each draw wi ∈ H, distributed according to μ0 , we can solve ui (t) =
S ν (t, 0)wi exactly, and that we can evaluate the (real-valued) functional Φ(ui (t))
exactly. Then

1  1 
M M
e (Φ) ≈
μνt EμMνt (Φ(u(t))) := Φ(u (t)) =
i
Φ(S ν (t, 0)wi ), (6)
M i=1 M i=1

where we denoted by (EμMνt , M ∈ N) the sequence of MC estimators which approx-


imate the (generalized) expectation eμνt (Φ) for Φ ∈ C . To state the error bound on
the variance of the MC estimator, given in Eq. (6), we assume for simplicity that the
right hand side of Eq. (1) is equal to zero, i.e., f ≡ 0 (all results that follow have an
analog for nonzero forcing f ∈ L 2 (D)).
Proposition 1 Let Φ ∈ C be a test function. Then, an error bound on the mean-
square error of the Monte Carlo estimator EμMνt , for M ∈ N, is given by

1  1/2
eμνt (Φ) − EμMνt (Φ(u(t))) L2 ((H,μνt );R) = √ Var μνt (Φ)
M

1   1/2 
≤ C√ 1+ w 2H dμ0 (w) .
M H

For ν > 0, the latter inequality is strict. Here, we used the notation Var P (X) =
eP ( eP (X) − X 2E ) for a square-integrable, E-valued random variable X under the
measure P. We define, further, L 2 ((Ω, P); E) as the set of square-summable (with
respect to the measure P) random variables taking values in the separable Banach
 1/2
space E, and equip it with norm X L2 ((Ω,P);E) := eP ( X 2E ) . Test functions in C
fulfill, for some constant C > 0, the linear growth condition |Φ(w)| ≤ C(1 + w H ),
for all w ∈ H. We remark that the MC error estimate in Proposition 1 is uniform with
respect to ν > 0 (see [6]). With EμMt being a convex combination of individual Leray–
Hopf solutions, by [8, Theorem 4.2] the MC estimator EμMt converges as M → ∞ (in
the sense of sequential convergence of measures, and uniformly on bounded time
intervals) to a Višik–Foursikov statistical solution as defined in [8].
Space and Time Discretization
The MC error bounds in Proposition 1 are semi-discrete in the sense that they assume
the availability of an exact Leray–Hopf solution to the NSE for each initial velocity
sample drawn from μ0 , and they pertain to bulk properties of the flow in the sense
that they depend on the H-norm of the individual flows. We have, therefore, to
perform additional space and time discretizations in order to obtain computationally
feasible approximations of (generalized) moments of statistical solutions. In MLMC
sampling strategies such as those proposed subsequently, we consider a sequence
of (space and time) discretizations which are indexed by a level index ∈ N0 . We
consider a dense, nested family of finite dimensional subspaces V = (V , ∈ N0 )
of V and therefore of H. Associated to the subspaces V , we have the refinement
214 A. Barth et al.

levels ∈ N0 , the refinement sizes (h , ∈ N0 ) and the H-orthogonal projections


(P , ∈ N0 ). Furthermore, we endow the finite dimensional spaces in V with the
norm induced by H. For ∈ N0 , the sequence is supposed to be dense in H in the sense
that, for every v ∈ H, lim →+∞ v − P v H = 0. In order to obtain a computationally
feasible method, we introduce a sequence of time discretizations Θ = (Θ , ∈ N0 )
of the time interval J̄ each of equidistant/maximum time steps of size Δ t. The time
discretization at level ∈ N0 , Θ , is the partition of [0, T ] which is given by

T
Θ = {t i ∈ [0, T ] : t i = i · Δ t, i = 0, . . . , }.
Δ t

We view the fully-discrete solution to Eq. (1) as the solution to a nonlinear dynamical
system according to
Dt (u ) = F (t, u ),

where Dt denotes the weak derivative with respect to time and the right hand side is

F (t, v) = f − νA v − B (v, v).

Here, A denotes the discrete Stokes operator and B the associated bilinear form.
We denote by S ν = (S ν (t i , 0), i = 0, . . . , T /Δ t) the fully-discrete solution oper-
ator that maps u0 into u = (u (t i ), i = 0, . . . , T /Δ t). We assume that the spaces
in V and the time discretizations Θ are chosen such that the following error bound
holds.
Assumption 1 The sequence of fully-discrete solutions (u , ∈ N0 ) converges to
the solution u to Eq. (1). The space and time discretization error is bounded, for
∈ N and t ∈ Θ , with h  Δ t, either by
1.

u(t) − u H = S ν (t, 0)u0 − S ν (t, 0)u0 H ≤ C (h s + (Δ t)s ) ≤ C h s , (7)

for some s ∈ [0, 1], or by


2.
h σ (Δ t)σ hσ
u(t) − u H = S ν (t, 0)u0 − S ν (t, 0)u0 H ≤ C ( + )≤C ,
ν ν ν
(8)
for some σ > 0.
Equation (8) implies the scale resolution convergence requirement > ∗ where
∗ ∈ N0 such that h σ∗ ≤ ν.
Let us comment on Assumption 1. The convergence estimates are explicit in the
discretization parameter h (equal to, for example, a mesh width of a Finite Volume
mesh, or to N −1 where N denotes the spectral order of a spectral method) and in the
MLMC Simulation of Statistical Solutions 215

kinematic viscosity ν. Finite Element based space-time discretizations of the NSE in


space dimension d = 2, such as those in [9, 16] will, in general, not satisfy Eq. (7).
In spatial dimension d = 1, it is shown in [10, Main Corollary, p. 373] that Eq. (7)
holds, with s = 1/2 and for some constant C > 0 independent of ν. The rate is bound
to s = 1/2 since solutions to the inviscid limit problem form shocks in finite time.
In space dimension d = 2, for small data and with periodic boundary conditions,
for ν = 0 the equations of inviscid, incompressible flow are well-posed and for
sufficiently regular initial data, the unique solutions do not form shocks ([3, 17]).
First order convergent Finite Volume discretizations for the corresponding problem
of inviscid, incompressible flow which satisfy the error bound Eq. (7) with s = 1
are available in [11], based on [3, Chap. 7]. Finite Element discretizations based
on the heat equation result in discretization error bounds as in Eq. (8) with, to our
knowledge, constants C > 0 which implicitly depend (exponentially) on T /ν and
which are, therefore, not suitable to infer statements on the performance of the MLMC
approximation of statistical solutions for small values of ν.

2.1 Single Level Monte Carlo Method

With the discretization in hand we can combine the error in the spatial and temporal
domain with the statistical sampling by the MC method, leading to what we shall
refer to as the single level Monte Carlo (SLMC) approach.
We define, for ∈ N0 and t ∈ Θ , the SLMC estimator with M independent and
identically according to μ0 distributed samples wi ∈ H by

1  1 
M M
EμMνt (Φ(u (t))) := Φ((u (t))) = Φ(S ν (t, 0)wi ).
M i=1 M i=1

Here, S denotes the fully-discrete solution operator, defined above, and Φ ∈ C .


We assume that Φ ∈ C satisfies a Lipschitz condition:

for all v, w ∈ H : |Φ(v) − Φ(w)| ≤ C v − w H . (9)

We remark that Eq. (9) follows from φ being continuously differentiable and with
compact support. The constant C depends on the maximum of φ and on the H norms
of g1 , . . . , gk . Under Eq. (9), the SLMC estimator admits the following mean-square
error bound (see [6]).
Proposition 2 If, for Φ ∈ C fulfilling Eq. (9) and ∈ N0 , the generalized moment
of the statistical solution fulfills Assumption 1, for some s ∈ [0, 1] or some σ > 0
and h  Δ t, then the fully-discrete single level Monte Carlo estimator EμMνt (Φ(u ))
admits, for t ∈ Θ , the bound
216 A. Barth et al.

eμνt (Φ) − EμMνt (Φ(u )) L2 ((H,μνt );R)


1  1/2
≤√ Var μνt (Φ) + eμνt (Φ − Φ(u )) L2 ((H,μνt );R)
M
 1 
≤C √ + ρ(h ) .
M

For robust discretizations, ρ(z) = zs , with C > 0 independent of , h and of ν.


The error bound for the SLMC estimator consists of two additive components, the
approximation of the spatial and temporal discretization and of the MC sampling.
Although we only established an upper bound, one can show that this error is, indeed,
of additive nature. This, in turn, indicates that the lack of scale-resolution in the
spatial and temporal approximation, i.e. if the discretization underresolves the scale
of viscous cut-off, can partly (in a mean-square sense) be offset by increasing the
number of samples, on the mesh-level in the MC approximation. This is in line
with similar findings for MLMC Galerkin discretizations for elliptic homogenization
problems in [2]. To ensure that the total error in Proposition 2 is smaller than a
prescribed tolerance ε > 0, we require

1  1/2
√ Var μνt (Φ) + eμνt (Φ − Φ(u )) L2 ((H,μνt );R) ≤ ε.
M

A sufficient condition for this is for some η ∈ (0, 1)

1  1/2
√ Var μνt (Φ) ≤ η · ε and eμνt (Φ − Φ(u )) L2 ((H,μνt );R) ≤ (1 − η)ε.
M

2.2 Multilevel Monte Carlo Method

The idea of the MLMC estimator is to expand the expectation of the approximation of
the solution on some discretization level L ∈ N0 as the expectation of the solution on
the (initial) discretization level 0 and a sum of correcting terms on all discretization
levels = 1, . . . , L, i.e., for Φ ∈ C ,


L
eμνt (Φ(uL )) = eμνt (Φ(u0 )) + eμνt (Φ(u ) − Φ(u −1 )).
=1

Then we approximate the expectation in each term on the right hand side with a
SLMC estimator with a level dependent number of samples, so that we may write


L
EμL νt (Φ(uL )) = EμMνt0 (Φ(u0 )) + EμMνt (Φ(u ) − Φ(u −1 )).
=1
MLMC Simulation of Statistical Solutions 217

We call EμL νt the MLMC estimator for discretization level L ∈ N0 . The MLMC esti-
mator has the following mean-square error bound.
Proposition 3 If, for Φ ∈ C fulfilling Eq. (9) and L ∈ N0 , the generalized moment
of the statistical solution fulfills Assumption 1, for = 0, . . . , L with s ∈ [0, 1) or
σ > 0 and h  Δ t, the error of the fully-discrete multilevel Monte Carlo estimator
EμL νt (Φ(uL )) admits, for t ∈ ΘL , the bound

eμνt (Φ) − EμL νt (Φ(uL )) L2 ((H,μνt );R)



L
1  1
≤ eμνt (Φ − Φ(uL )) L2 ((H,μνt );R) + √ Var μνt (Φ(u ) − Φ(u −1 )) 2
=0
M
 1    L
1  
≤ C ρ(hL ) + √ 1 + ρ(h0 ) + √ ρ(h ) + ρ(h −1 ) ,
M0 =1
M

σ
where Φ(u−1 ) ≡ 0, ρ(z) = zs or ρ(z) = zν and z ∈ [0, 1]. If, further, for all =
1, . . . , L, it holds that h  Δ t and that h −1 ≤ ϑh , with some reduction factor
0 < ϑ < 1 independent of . Then, there exists C(ϑ) > 0 independent of L, such
that there holds the error bound

 1 
L
1 
eμνt (Φ) − EμL νt (Φ(uL )) L2 ((H,μνt );R) ≤ C(ϑ) ρ(hL ) + √ + √ ρ(h ) .
M0 =0 M

A proof can be found in [6]. This result leads again to the question how to chose
the sample numbers (M , = 1, . . . , L) that yield a given (mean kinetic energy)
error threshold ε. We have, if we assume that ηL ∈ (0, 1), the requirement

eμνt (Φ − Φ(uL )) L2 ((H,μνt );R) ≤ (1 − ηL ) ε and



L
1  1/2
√ Var μνt (Φ(u ) − Φ(u −1 )) ≤ ηL ε.
=0
M

If we have that, for some θ > 0, for = 0, . . . , L,


 1/2
Var μνt (Φ(u ) − Φ(u −1 )) ≤ θ ,

then, to equilibrate the error for each level = 1, . . . , L, we choose the sample sizes

M = θ 2 α (ηL ε)−2 (10)



for a sequence (α , = 1, . . . , L) with α ∈ [0, 1], subject to the constraint L =1 α =
1. We determine the required number M of SLMC samples on each discretiza-
tion level = 0, . . . , L based on equilibration of the errors arising from each term
Var μνt (Φ(u ) − Φ(u −1 )) such that the total mean-square error from Proposition 3
218 A. Barth et al.

is bounded by the prescribed tolerance ε > 0. This is only possible if the conver-
gence requirement is fulfilled for level L, since then we can choose ηL accordingly
to satisfy a preset error bound. However, the convergence requirement might not
be fulfilled for all < L, hence, for those levels we have to sample accordingly. In
particular, denote by ∗ ≥ 0 the first level where the solution is scale-resolved. Then
Var μνt (Φ(u ∗ ) − Φ(u ∗ −1 )) might be large, as might be θ ∗ ; thus α ∗ has to be chosen
accordingly. Since it is infeasible to determine the values Var μνt (Φ(u ) − Φ(u −1 ))
we estimate sample numbers from the second (more general) bound in Proposition 3.
We refer to [4] for an analysis of the computational complexity of MLMC estimators
in the case of weak or strong errors of SPDEs.
We proceed to determine the numbers M of SLMC samples. To this end, we
continue to work under Assumption 1. We either assume Eq. (7) or we work with
Eq. (8) under the assumption that at least on the finest level the scale resolution
requirement is fulfilled, i.e., hLσ < ν. For the latter, we consider the case where the
scale resolution requirement is not fulfilled for all levels up to level ∗ (ν). In this
case, for 0 ≤ ∗ (ν) < L (meaning h σ∗ (ν) ≥ ν and h σ∗ (ν)+1 < ν), we choose on the
first level the sample number
 2 
M0 = O (ρ(hL ))−1 (11)

to equilibrate the statistical and the discretization error contributions. Here, and in
what follows, all constants implied in the Landau symbols O(·) are independent
of ν. According to this convergence analysis, the SLMC sample numbers M , for
discretization levels = 1, . . . , ∗ (ν), . . . , L should be chosen according to
 2 
M = O ρ(h )(ρ(hL ))−1 2(1+γ ) , (12)

for γ > 0 arbitrary (with the constant implied in O depending on γ ). Note that ρ(h )
might be large for underresolved discretization levels. This choice of sample numbers
is in line with Eq. (10) for one particular sequence (α , = 1, . . . , L).

3 Numerics

We describe numerical experiments in the unit interval D = (0, 1) in space dimension


d = 1, i.e. for the viscous Burgers’ equation, and in space dimension d = 2, in
D = (0, 1)2 , with periodic boundary conditions, and with stochastic initial data. As
indicated in Sect. “Space and Time Discretization”, in space dimension d = 1, i.e. for
scalar problems, the bound in Assumption 1 holds with s = 1/2 and with a constant
C > 0 independent of ν (see [10]). If the mesh used for the space discretization
resolves the viscous scale, the first order Finite Volume method even converges with
rate s = 1 in L 1 (D) due to the high spatial regularity of the solution u, albeit with
constants which blow up as the viscosity ν tends to zero. Specifically, we consider
MLMC Simulation of Statistical Solutions 219

Eq. (1) with periodic boundary conditions in the physical domain D = [0, 1], i.e.

∂ 1 ∂ 2 ∂2
u+ (u ) = ν 2 u + f , for all x ∈ D, t ∈ [0, T ], ω ∈ Ω, (13)
∂t 2 ∂x ∂x

which is completed with the random initial condition u(0) = u0 ∈ L 2 (Ω, L 1 (D) ∩
L ∞ (D)), inducing an initial measure μ0 on L 1 (D) ∩ L ∞ (D) ⊂ H = L 2 (D) with finite
second moments.
The numerical simulation of a statistical solution requires sampling from the
measure μ0 defined on the generally infinite dimensional space H. To give a conver-
gence result for finite dimensional, “principal component” approximations of this
initial measure μ0 , we follow closely the approach in [5].
The initial distribution μ0 is defined on a probability space (Ω, F , P) and is
assumed to be given as an image measure under an H-valued random variable with
distribution μ0 . This random variable is defined as a mapping from the measur-
able space (Ω, F ) into the measurable space (H, B(H)) such that μ0 = X ◦ P.
We assume throughout the numerical experiments that μ0 is a Gaussian measure
supported on H or on a subspace of H. Gaussian measures on a separable, infinite-
dimensional Hilbert space H are completely characterized by the mean m ∈ H and
covariance operator Q defined on H, being a symmetric, nuclear trace-class oper-
ator. Any Gaussian random variable X ∈ L 2 (Ω; H) can then be represented by its
Karhunen–Loève expansion

X =m+ λi βi wi ,
i∈N

where ((λi , wi ), i ∈ N) is a complete orthonormal system in H and consists of eigen-


values and eigenfunctions of Q. The sequence (βi , i ∈ N) consists of real-valued,
independent, (standard) normal-distributed random variables. With κ-term trunca-
tions of Karhunen–Loève expansions  we √ define a sequence of random variables
(X κ , κ ∈ N) given by X κ = m + κi=1 λi βi wi , with mean m ∈ H and covariance
operator Qκ . The sequence of truncated sums X κ converge P-a.s. to X in the H-norm
as κ → +∞. Then, we have the following lemma (see [5] for a proof).
Lemma 1 ([5]) If the eigenvalues (λi , i ∈ N) of the covariance operator Q of the
Gaussian random variable X on H have a rate of decay of λi ≤ C i−γ for some
γ > 1, then the sequence (X κ , κ ∈ N) converges to X in L 2 (Ω; H) and the error is
bounded by
1 γ −1
X − X κ L2 (Ω;H) ≤ C √ κ− 2 .
γ −1

For the numerical realization of the MLMC method, and in particular for the
numerical experiments ahead, we need to draw samples from the initial distribution.
As an example we therefore introduce a Gaussian distribution on H = Lper 2
(D), where
D = (0, 1). In the univariate

case, the condition ∇ · u = 0 in (1) becomes void and
2
Lper (D) = {u ∈ L 2 (D) : D u = 0}. A basis of Lper
2
(D) is given by (wi , i ∈ N), where
220 A. Barth et al.

wi (x) = sin(2iπ x). Then the covariance operator Q is with Mercer’s theorem defined,
for φ ∈ Lper
2
(D), as

Qφ(x) = q(x, y)φ(y)dy
D
 
where the kernel is q(x, y) = i∈N λi wi (x)wi (y) =  i∈N λi sin(2iπ
√ x) sin(2iπ y).
Now, we may choose any sequence (λi , i ∈ N) with i∈N λ i < ∞ to define
a covariance operator Q on H which is trace class. One possible choice would
be λi  i−α , for α > 2. In our numerical experiments, we choose as eigenvalues
λi = i−2.5 for i ≤ 8 and zero otherwise, and the mean field m ≡ 0, i.e.

8
1
u0 (x, ω) = 5/4
sin(2π ix)Yi (ω). (14)
i=1
i

The kinematic viscosity is chosen to be ν = 10−3 and the source term is set to f ≡ 0.
All simulations reported below were performed on Cray XE6 in CSCS [14] with the
recently developed massively parallel code ALSVID-UQ [1, 13, 15]. Simulations
were executed on Cray XE6 (see [14]) with 1496 AMD Interlagos 2 × 16-core 64-
bit CPUs (2.1 GHz), 32 GB DDR3 memory per node, 10.4 GB/s Gemini 3D torus
interconnect with a theoretical peak performance of 402 TFlops.
The initial data in Eq. (14) and the reference solution uref at time t = 2 are depicted
in Fig. 1. The solid line represents the mean Eμνt (uref ) and the dashed lines represent
the mean plus/minus the standard deviation (Var μνt (uref ))1/2 of the (random) solution
uref at every point x ∈ D. The variance
 and1therefore the2standard deviation can easily
be calculated by Var μ0 (u0 (x)) = 8i=1 ( i5/4 sin(2π ix)) , for x ∈ D. The solution is
computed with a standard first-order Finite Volume scheme using the Rusanov HLL
solver on a spatial grid in D of size 32768 cells and the explicit forward Euler
time stepping (see [12]) with the CFL number set to 0.9. The number of levels of
refinement is 9 (the coarsest level has 64 cells). The number of samples is chosen
according to the analysis in Sect. “Space and Time Discretization” with s = 1, i.e.

Fig. 1 Reference solution computed using the MLMC finite volume method
MLMC Simulation of Statistical Solutions 221

M = ML 22(L− ) , for = 0, . . . , L, where the number of samples on the finest mesh


set to ML = 4 (this leads to M0 = 262144). The simulation took 50 min (wall-clock
time) on 256 cores.
Next, following Definition 1 and the remarks thereafter, for k = 1, φ(ξ ) = ξ
and for a given kernel g1 ∈ L ∞ (D), we define a continuous, linear functional Φ on
L 1 (D) ∩ L ∞ (D) by

Φ(u)(t, ω) = u(x, t, ω)g1 (x)dx, for all t ∈ [0, T ] ω ∈ Ω. (15)
D

Note, that formally the function φ is not compactly supported. However, for one-
dimensional problems, there holds an energy bound (we refer to the results in [12])
with respect to the initial data u0 (·, ω), i.e. u(·, t, ω) L2 (D) ≤ u0 (·, ω) L2 (D) . Since
the values of the inner product can be bounded for every t and P-a.e. ω by

|(u(·, t, ω), g1 )H | ≤ u(·, t, ω) L2 (D) g1 L2 (D) ≤ u0 (·, ω) L2 (D) g1 L2 (D) < ∞,

the function φ(·) may be modified for large values, enforcing the required compact
support of φ in the Definition 1. We note, that such modification is ω-dependent,
and hence a more stringent bound of the L ∞ (Ω, L 2 (D))-norm of the initial data
is required instead, i.e. we require that u0 (·, ω) L2 (D) < C holds P-a.s. for some
constant C < ∞. Such a bound holds for the uniformly distributed initial condition,
however, it does not hold for the Gaussian distributed initial condition considered
here. In the following numerical experiment, we choose the function g1 in Eq. (15)
to be g1 (x) = (x − 0.5)3 . With this choice it can be easily verified that Φ in Eq. (15)
fulfills the Lipschitz condition in Eq. (9).
Using MLMC Finite Volume approximations for the mean Eμνt (Φ(uref )) and the
variance Var μνt (Φ(uref )) from Fig. 1 as a reference solution, we compute approx-
imate solutions u using both, SLMC Finite Volume and MLMC Finite Volume
methods, on a family of meshes with spatial resolutions ranging from n0 = 64 cells
up to nL = 2048 cells. We monitor the convergence of the errors in EμL νt (Φ(uL )) and
Var Lμt (Φ(uL )),


εLE = Eμνt (Φ(uref )) − EμL νt (Φ(uL )) , εLV = Var μνt (Φ(uref )) − Var Lμνt (Φ(uL )) .

The number of samples on the finest mesh is set to ML = 4. The number of levels
for the MLMC Finite Volume method √ is chosen so that the coarsest level contains
64 cells. Since 1/64 ≈ 0.015 < ν = 10−1.5 ≈ 0.03, the “viscous cut-off” scale
(which, in the present problem coincides with the scale of the viscous shock profile)
of the solution u is resolved on every mesh resolution level = 0, . . . , L.
Since the solution is a random field, the discretization error εL· is a random quantity
as well. For error convergence analysis we, therefore, compute a statistical estima-
tor by averaging estimated discretization errors from several independent runs. We
compute the error in Proposition 3 by approximating the L 2 (H, R)-norm by MC
222 A. Barth et al.

sampling. Let Φ(uref ) denote the reference solution and ((Φ(uL ))(k) , k = 1, . . . , K)
be a sequence of independent approximate solutions obtained by running the SLMC
Finite Volume or MLMC Finite Volume solver K ∈ N times. The L 2 (Ω; H)-based
relative percentage error estimator is defined to be


 2 ⎛ ⎞2
e,(k) V,(k)
 εL εL
RεLE = 100 · EK , RεLV = 100 · EK ⎝ ⎠ .
|Eμνt (Φ(uref ))| | Var μνt (Φ(uref ))|

In order to obtain an accurate estimate of RεLE and RεLV , the number K must be
large enough to ensure a sufficiently small (<0.1) relative variance σ 2 (RεLE ) and
σ 2 (RεLV ). We found K = 30 to be sufficient for our numerical experiments. Next,
we analyze the relative percentage error convergence plots of mean and variance.
In Fig. 2, we plot the error εLE against the number of cells on the finest discretiza-
tion level L in the left subplot and versus the computational work (runtime) in the
right subplot. The coarse level stays the same when we increase the finest discretiza-
tion level L to obtain a convergence plot. Both SLMC and MLMC methods give
similar relative percentage errors for the same spatial resolution. However, there is a
significant difference in the runtime: MLMC methods are two orders of magnitude
faster than plain SLMC methods. The lower dashed line in the top-right corner of
each plot in Fig. 2 (and all subsequent figures) indicates the expected convergence
rate of the MLMC method obtained in Proposition 3. These expected convergence
rates coincide with the observations in the numerical experimental data. In Fig. 3,
we plot the error εLV versus the number of cells on the finest discretization level L
in the left subplot and versus the computational work (runtime) in the right subplot.
Analogously as in the plots for the expectation, both SLMC and MLMC methods
give similar errors for the same spatial resolution. In terms of the required computa-
tional work for one percent error, MLMC methods are, in this example, two orders
of magnitude faster than plain SLMC methods.
We repeat the error convergence analysis for Burgers’ equation, but this time
with much fewer cells on the coarsest mesh resolution in the MLMC estimator. In

Fig. 2 Convergence of the error εLE of the mean Eμνt (Φ) of the viscous Burgers’ equation
MLMC Simulation of Statistical Solutions 223

Fig. 3 Convergence of the error εLV of the variance Var μνt (Φ) of the viscous Burgers’ equation

particular, instead of taking 64 cells on the coarsest mesh resolution, we will take
only 8 √ cells, i.e. adding three more levels of mesh refinement. Since in this case
1/8 > ν = 10−1.5 ≈ 0.03, the viscous cut-off length scale of the solution u is not
resolved on every mesh resolution level, in particular, it is resolved only on the mesh
resolution levels = 3, . . . , L, and it is under-resolved on = 0, 1, 2. Notice, that
the number of cells on the finer mesh resolutions stays the same as in the previous
experiment, where n3 = 64, . . . , nL = 2048. Note also that by the theory in [10],
the presently used numerical scheme converges robustly in H with order s  1/2,
meaning that the constant in the convergence bound is independent of ν. In Fig. 4,
we plot the error εLE against the number of cells nL in the left subplot and versus
computational work (runtime) in the right subplot for the case of 8 cells on the
coarsest resolution. Even in the presence of multiple under-resolved levels, the error
convergence of the MLMC Finite Volume method is faster than the previous setup
(compared to Fig. 2). In Fig. 5, we plot the error εLV versus the number of cells nL in
the left subplot and versus the computational work (runtime) in the right subplot for
the case of 8 cells on the coarsest resolution. Again, even in the presence of multiple
under-resolved levels, the error convergence of the MLMC Finite Volume method is
faster than the previous setup (compared to Fig. 3).

Fig. 4 Convergence of the error εLE of the mean Eμνt (Φ) of the viscous Burgers’ equation
224 A. Barth et al.

Fig. 5 Convergence of the error εnV of the variance Var μνt (Φ) of the viscous Burgers’ equation

We conclude with preliminary numerical experiments in space dimension d = 2,


from [11]. We consider Eq. (1) in the physical domain D = [0, 1]2 , with periodic
boundary conditions. For d = 2 and ν > 0, individual and statistical solutions exist
and are unique. Moreover, in this setting Eq. (1) admits equivalent vorticity reformula-
tions in terms of a scalar vorticity η obtained from the velocity u(t) = (u1 (t), u2 (t))
by
η(t) := rot u(t) = ∂2 u1 (t) − ∂1 u2 (t) (16)

which maps Sobolev spaces of divergence-free velocity fields isomorphically to


spaces of (scalar) vorticities η. The relation in Eq. (16) is invertible via the Biot-
Savart law:

u(t) = curl ◦ (−Δ)−1 η(t) = (∂2 (−Δ)−1 η, −∂1 (−Δ)−1 η) =: rot−1 η(t). (17)

In terms of the (scalar in space dimension d = 2) vorticity η(t), Eq. (1) becomes
the viscous vorticity equation: in the periodic setting, for s ≥ 0, given ν > 0, find
η ∈ Xs := L 2 (J; Hper
s+1
(D)) ∩ H 1 (J; Hper
s−1
(D)) such that there holds Eq. (17) and

∂t η + u · ∇η = νΔη, in L 2 (J; Hper


s−1
(D)),
−Δψ = η in L 2 (J; Hper
s+1
(D)), (18)
η|t=0 = η0 in s
Hper (D).

The relations Eqs. (16) and (17) are bijective in certain scales of (Sobolev) spaces
of D-periodic functions so that Eqs. (16)–(18) and (1) are equivalent. Moreover,
the isomorphisms rot and rot−1 in Eqs. (16) and (17) allow to transfer the statistical
solutions μν = (μνt , t ≥ 0) equivalently to a one-parameter family π ν = (πtν , t ≥ 0)
of probability measures on sets of admissible vorticities, defined for every ensemble
F of π0 -measurable initial vorticities η0 by

πtν (F) = π0 ((T ν (t))−1 (F)), T ν (t)η0 := (rot ◦ S ν (t, 0) ◦ rot−1 )η0 .
MLMC Simulation of Statistical Solutions 225

Fig. 6 L2 error of the mean for different viscosities with SLMC and MLMC, with respect to the
mesh width h and wall clock time

Here, we defined π0 (F) := (μ0 ◦ rot−1 )(F). Existence and uniqueness of the velocity
statistical solutions μν imply existence and uniqueness of the vorticity statistical
solutions π ν . We refer to [11] for further details, and also for detailed description of
the Finite Volume discretization and convergence analysis of Eq. (18) (Fig. 6).
In the ensuing numerical experiments, we consider a probability measure π0
concentrated on initial vorticities of the form:

η0 (x; ω) = η¯0 (x) + Y1 (ω)η1 (x)

with Y1 ∼ U (−1, 1) and where η¯0 (x) ∈ Hper 1


(D) denotes the mean initial vortic-
ity, and the fluctuation is given by η1 (x) := sin(2π x1 ) sin(2π x2 ) ∈ Hper 1
(D). We
choose as the “mean vorticity” η¯0 (x) := x1 (1 − x1 )x2 (1 − x2 ). Note that then η0 (·) ∈
1
Hper (D) P-a.s.
The ensuing numerical results are obtained using a forward in time, central in space
(FTCS), vorticity solver, described in detail in [11]. In this case, for small data, the
individual Leray-Hopf solutions converge, as ν → 0, to the unique incompressible,
inviscid Euler flow (see [3, Chap. 13], [17]) in C([0, T ]; L 2 (D)). Contrary to the
one-dimensional setting, in space dimension d = 2 and for sufficiently regular initial
data, incompressible, inviscid Euler flow solutions do not form shocks. To construct
a reference solution, we approximate the ensemble average by 1-dimensional Gauss–
Legendre quadrature (using 20 nodes) and a fine discretization in space and time. This
is sufficient to accurately resolve the mean of the statistical solution. This solution,
computed with a space discretization on 10242 equal sized cells, is used as a reference
solution for the error convergence analysis of the SLMC and MLMC Finite Volume
discretization error for the 1-parametric random initial data. Simulations of individual
solutions are performed up to final time T = 1. We compare SLMC and MLMC
approximations. We select the sample numbers on the discretization levels so that
the sampling error and the discretization errors remain balanced. Due to the absence
of boundary layers, for periodic boundary conditions, and of shocks in solutions of the
226 A. Barth et al.

limiting problem, we are in the setting of Assumption 1, with s = 1. Then, the SLMC
error behaves like O(M −1/2 ) + O(h ) with O(·) independent of ν. A sufficient choice
of the sample numbers for a first order numerical scheme on individual solutions
is M = h −2 . For MLMC, with the choice M = 22s(L−l) we achieve an asymptotic
error bound of O(hL log(hL )). On the finest meshes we choose ML = 10 samples
in order to remove sampling fluctuations. Concerning the computational work, the
computational cost of a single deterministic simulation behaves like WDET ∼ hL−3
(in two spatial dimensions and one temporal dimension). We remark, that Multigrid
methods allow for implicit time-stepping for the viscous part and for the velocity
reconstruction in work and memory of O(hL−2 ) per time step. For SLMC, we perform
O(hL−2 ) deterministic runs. This yields a scaling of the overall work of WSLMC ∼ hL−5 .
With MLMC we require M = O(h 2s /hL2s ) simulations per level, for a total work of:


L 
L
WMLMC ∼ h −3 h 2s /hL2s = hL−2 h −1 ≈ hL−3 ,
l=0 =0

neglecting the logarithmic term. That is, for SLMC with the mentioned choices of
−1/5 −1/3
sample numbers M, we obtain WSLMC ∼ ErrSLMC , whereas for MLMC, WMLMC ∼
ErrMLMC (see Fig. 6). From the discussion above and from the numerical results,
SLMC has prohibitive complexity for small space and timesteps. As predicted by the
theoretical analysis, MLMC exhibits, in terms of work vs. accuracy, a performance
which is comparable to that of one individual numerical solution on the finest mesh.
As in the one-dimensional setting, for the computation of the error, a sample of
K = 10 experiments was generated and the error is estimated by the sample average.
The number K of repetitions of experiments is chosen in such a way that the variance
of the relative error is sufficiently small.

Acknowledgments The research of Ch. S. and A. B. is partially supported under ERC AdG 247277.
The research of J. Š. was supported by ETH CHIRP1-03 10-1 and CSCS production project ID
S366. The research of A.B. leading to these results has further received funding from the German
Research Foundation (DFG) as part of the Cluster of Excellence in Simulation Technology (EXC
310/2) at the University of Stuttgart, and it is gratefully acknowledged. The research of A. B. and
J. Š. partially took place at the Seminar für Angewandte Mathematik, ETH Zürich. The authors
thank S. Mishra and F. Leonardi for agreeing to cite numerical tests from [11] in space dimension
d = 2.

References

1. ALSVID-UQ. Version 3.0. http://www.sam.math.ethz.ch/alsvid-uq


2. Abdulle, A., Barth, A., Schwab, Ch.: Multilevel Monte Carlo methods for stochastic elliptic
multiscale PDEs. Multiscale Model. Simul. 11(4), 1033–1070 (2013)
3. Bahouri, H., Chemin, J.-Y., Danchin, R.: Fourier Analysis and Nonlinear Partial Differen-
tial Equations. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of
Mathematical Sciences], vol. 343. Springer, Heidelberg (2011)
4. Barth, A., Lang, A.: Multilevel Monte Carlo method with applications to stochastic partial
differential equations. Int. J. Comput. Math. 89(18), 2479–2498 (2012)
MLMC Simulation of Statistical Solutions 227

5. Barth, A., Lang, A.: Simulation of stochastic partial differential equations using finite element
methods. Stochastics 84(2–3), 217–231 (2012)
6. Barth, A., Schwab, Ch., Šukys, J.: Multilevel Monte Carlo approximations of statistical solu-
tions of the Navier–Stokes equations. Research report 2013-33, Seminar for Applied Mathe-
matics, ETH Zürich (2013)
7. Foiaş, C., Manley, O., Rosa, R., Temam, R.: Navier-Stokes equations and turbulence. Encyclo-
pedia of Mathematics and its Applications, vol. 83. Cambridge University Press, Cambridge
(2001)
8. Foiaş, C., Rosa, R., Temam, R.: Properties of time-dependent statistical solutions of the three-
dimensional Navier-Stokes equations. Annales de l’Institute Fourier 63(6), 2515–2573 (2013)
9. Heywood, J.G., Rannacher, R.: Finite element approximation of the nonstationary Navier-
Stokes problem. I. Regularity of solutions and second-order error estimates for spatial dis-
cretization. SIAM J. Numer. Anal. 19(2), 275–311 (1982)
10. Karlsen, K.H., Koley, U., Risebro, N.H.: An error estimate for the finite difference approxima-
tion to degenerate convection-diffusion equations. Numer. Math. 121(2), 367–395 (2012)
11. Leonardi, F., Mishra, S., Schwab, Ch.: Numerical Approximation of Statistical Solutions of
Incompressible Flow. Research report 2015-27, Seminar for Applied Mathematics, ETH Zürich
(2015)
12. LeVeque, R.: Numerical Solution of Hyperbolic Conservation Laws. Cambridge Press, Cam-
bridge (2005)
13. Mishra, S., Schwab, Ch., Šukys, J.: Multi-level Monte Carlo Finite Volume methods for non-
linear systems of conservation laws in multi-dimensions. J. Comput. Phys. 231(8), 3365–3388
(2012)
14. Rosa (Cray XE6). Swiss National Supercomputing Center (CSCS), Lugano. http://www.
cscs.ch
15. Šukys, J., Mishra, S., Schwab, Ch.: Static load balancing for Multi-Level Monte Carlo finite
volume solvers. PPAM 2011, Part I, LNCS, vol. 7203, pp. 245–254. Springer, Heidelberg
(2012)
16. Temam, R.: Navier-stokes equations and nonlinear functional analysis. CBMS-NSF Regional
Conference Series in Applied Mathematics, vol. 41. Society for Industrial and Applied Math-
ematics (SIAM), Philadelphia (1983)
17. Yudovič, V.I.: A two-dimensional non-stationary problem on the flow of an ideal incompressible
fluid through a given region. Mat. Sb. (N.S.) 64(106), 562–588 (1964)
Unbiased Simulation of Distributions
with Explicitly Known Integral Transforms

Denis Belomestny, Nan Chen and Yiwei Wang

Abstract In this paper, we propose an importance-sampling based method to obtain


unbiased estimators to evaluate expectations involving random variables whose prob-
ability density functions are unknown while their Fourier transforms have explicit
forms. We give a general principle about how to choose appropriate importance sam-
pling density under various Lévy processes. Compared with the existing methods,
our method avoids time-consuming numerical Fourier inversion and can be applied
effectively to high dimensional option pricing under different models.

Keywords Monte Carlo · Unbiased simulation · Fourier transform · Levy


processes · Importance sampling

1 Introduction

Nowadays Monte Carlo simulation becomes an influential tool in financial appli-


cations such as derivative pricing and risk management; see Glasserman [12] for a
comprehensive overview, Staum [25] and Chen and Hong [8] for introductory tuto-
rials of the topic. A standard MC procedure typically starts with using some general
methods of random number generation, such as inverse transform and acceptance-
rejection, to sample from descriptive probabilistic distributions of market variables.

D. Belomestny (B)
Duisburg-Essen University, Thea-Leymann-Str. 9, Essen, Germany
e-mail: denis.belomestny@uni-due.de
D. Belomestny
IITP RAS, Moscow, Russia
N. Chen · Y. Wang
The Chinese University of Hong Kong, Hong Kong, China
e-mail: nchen@se.cuhk.edu.hk
Y. Wang
e-mail: ywwang@se.cuhk.edu.hk

© Springer International Publishing Switzerland 2016 229


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_9
230 D. Belomestny et al.

Therefore, explicit knowledge about the functional forms of the underlying distrib-
utions is a prerequisite for the applications of MC technique.
However, a growing literature of Lévy-driven processes and their applications in
finance calls for research to investigate how to simulate from a distribution whose
cumulative probability function or probability density function may not be avail-
able in explicit form. As an important building block of asset price modeling, Lévy
processes can capture well discontinuous price changes and thus are widely used to
model the skewness/smile of implied volatility curves in the option market; see, e.g.
Cont and Tankov [26], for the modeling issues of Lévy process. According to the
celebrated Lévy-Khintchine representation, the joint distribution of the increments
of a Lévy process is analytically characterized by its Fourier transform. Utilizing this
fact, we can evaluate the price function of options written on the underlying assets
modelled by a Lévy process in two steps. First, we apply the Fourier transform (with
some suitable adjustments) on the risk-neutral presentation of option prices in order
to obtain an explicit form of the transformed price function. Second, we numerically
invert the transform to recover the original option price. This research line can be
traced back to Carr and Madan [5], which proposed Fast Fourier Transform (FFT) to
accelerate the computational speed of the method. One may also refer to Lewis [20],
Lee [19], Lord and Kahl [21] and Kwok et al. [17] for more detailed discussion and
extension of FFT. Kou et al. [16] used a trapezoidal rule approximation developed
by Abate and Whitt [1] to invert Laplace transforms for the purpose of option pricing
under the double exponential jump diffusion model, a special case of Lévy process.
Feng and Linetsky [11] introduced Hilbert transform to simplify Fourier transform
of discretely monitored barrier option by backward induction. More recently, Biagini
et al. [4] and Hurd and Zhou [14] extended the Fourier-transform based method to
price options on several assets, including basket options, spread options and catastro-
phe insurance derivatives.
The numerical inversion of Fourier transforms turns out to be the computational
bottleneck of the above approach. It essentially involves using a variety of numer-
ical discretization schemes to evaluate one- or multi-dimensional integrals. Hence,
such methods will suffer seriously from the “curse of dimensionality” as the prob-
lem dimension increases when we try to price options written on multiple assets.
Monte Carlo method, as a competitive alternative for calculating integrals in a high
dimensional setting, thus becomes a natural choice in addressing this difficulty. To
overcome the barrier that explicit forms of the distribution functions for Lévy driven
processes are absent, some early literature relies on somehow ad hoc techniques to
derive upper bounds for the underlying distribution for the purpose of applying the
acceptance-rejection principle (see, e.g. Glynn [2] and Devroye [18]). More recently,
some scholars, such as Glasserman and Liu [13] and Chen et al. [9], proposed to
numerically invert the transformed distributions to tabulate the original distribution
on a uniform grid so that they can simulate from. Both directions work well in one
dimension. Nevertheless, it is difficult for them to be extended to simulate high
dimensional distributions.
In this paper we propose a novel approach for computing high-dimensional inte-
grals with respect to distributions with explicitly known Fourier transforms based on
Unbiased Simulation of Distributions with Explicitly … 231

a genuine combination of Fourier and Monte Carlo techniques. In order to illustrate


the main idea of our approach, let us first consider a simple problem of computing
expectations with respect to one-dimensional stable distributions. Let pα (x) be the
density of a random variable X having a symmetric stable law with the stability
parameter α ∈ (1, 2), i.e., its Fourier transform is
 ∞
.
F [pα ](u) = eiux pα (x)dx = exp(−|u|α ),
−∞

Suppose we want to compute the expectation Q = e[g(X)] for some nonnegative


function g. Since there are several algorithms of sampling from stable distribution
(see, e.g. Chambers et al. [7]), we could use Monte Carlo to construct the estimate

1
n
Qn = g(Xi ),
n i=1

where X1 , . . . , Xn is an i.i.d. sample from the corresponding α-stable distribution.


Recall that in the theory of Fourier transform, we have Parseval’s identity (see,
e.g. Rudin [23]) such that for ∀g and p,
 
1
g(x)p(x)dx = F [g](−u)F [p](u)du.
Rd (2π )d Rd

Take, for example, g(x) = (max{x, 0})β with some β ∈ (0, α), then Parseval’s iden-
tity implies
 ∞
g(x)
Q= [x · pα (x)] dx
−∞ x
 ∞
1
= F [x · pα (x)](−u)F [g(x)/x](u) du.
2π −∞

According to

d
F [x · pα (x)](−u) = i F [pα ](−u) = −isign(u)α|u|α−1 exp(−|u|α )
du
and
Γ (β)
F [g(x)/x](u) = (cos(βπ/2) + isign(u) sin(βπ/2)),
|u|β

we have
 ∞
α Γ (β) sin(βπ/2)
Q= uα−β−1 exp(−uα ) du. (1)
π 0
232 D. Belomestny et al.

Consider a new random variable X  with a power exponential distribution density

1
fα (x) = exp(−|x|α ), −∞ < x < +∞
2Γ (1 + 1/α)

and a new function g (x) such that

Γ (1/α)Γ (β) sin(βπ/2) α−β−1


g (x) = |x| , −∞ < x < +∞,
π

we can easily show that Q = e[g (X  )] from (1). If β = α − 1, noting that g is in


fact a constant function, we have Var[g (X  )] = 0. On the other hand, Var[g(X)] >
B(2 − α)−1 for some constant B > 0 (not depending on α).
This shows that even in the above very simple situation, moving to the Fourier
domain can significantly reduce the variance of Monte Carlo estimates. More impor-
tantly, by using our approach, we replace the problem of sampling from the stable
distribution pα by a much simpler problem of drawing from the exponential power
distribution fα . Of course, the main power of Monte Carlo methods can be observed in
high-dimensional integration problems, which will be considered in the next section.

2 General Framework

Let g be a real-valued function on Rd and let p be a probability density on Rd . Our


aim is to compute the integral of g with respect to p :

V= g(x)p(x) dx.
Rd

Suppose that there is a vector R ∈ Rd , such that

g(x)e−x,R ∈ L 1 (Rd ), p(x)ex,R ∈ L 1 (Rd ),

then we have by the Parseval’s formula


 
1
V= g(x)e−x,R · p(x)ex,R dx = F [g](iR − u)F [p](u − iR) du. (2)
Rd (2π )d Rd

Let q be a probability density function with the property that q(x) = 0 whenever
|F [p](u − iR)| = 0, | · | denoting the complex modulus. That is, q has the same
support as |F [p](u − iR)|. Then we can write

1 F [p](u − iR)
V= F [g](iR − u) q(u) du = eq [h(X)] , (3)
(2π )d Rd q(u)

where
Unbiased Simulation of Distributions with Explicitly … 233

1 F [p](x − iR)
h(x) = F [g](iR − x) .
(2π ) d q(x)

and X is a random variable distributed according to q.


The variance of the corresponding Monte Carlo estimator is given by
 2d 
1 |F [p](u − iR)|2
Varq [h(X)] = |F [g](iR − u)|2 du − V 2 .
2π Rd q(u)

Note that the function |F [p](u − iR)| is, up to a constant, a probability density and
in order to minimize the variance, we need to find a density q, that minimizes the
ratio
|F [p](u − iR)|
q(u)

and that we are able to simulate from. In the next section, we discuss how to get a
tight upper bound for |F [p](iR − u)| in the case of an infinitely divisible distribution
p, corresponding to the marginal distributions of Lévy processes. Such a bound can
be then used to find a density q leading to small values of variance Varq [h(X)].

3 Lévy Processes

Let (Zt ) be a pure jump d-dimensional Levy process with the characteristic exponent
ψ, that is  
E eiu,Zt  = e−tψ(u) , u ∈ Rd .

Consider the process Xt = ΛZt , where Λ is a real m × d matrix. Let a vector R ∈ Rm


. ∗
be such that νR (dz) = eΛ R,z ν(dz) is again a Lévy measure, i.e.

 2
|z| ∧ 1 νR (dz) < ∞.

Suppose that there exist a constant Cν > 0 and a real number α ∈ (0, 2), such
that, for sufficiently small ρ > 0, the following estimate holds

z, h2 νR (dz) ≥ Cν ρ 2−α , h ∈ Rd , |h| = 1. (4)
{z∈R:|z,h|≤ρ}

The above condition is known as Orey’s condition in the literature (see Sato [24]). It
is usually used to ensure that the process admits continuous transition densities. The
value α is called by the Blumenthal–Getoor index of the process. Under it, we have

Lemma 1 Suppose that (4) holds, then there exists constant AR > 0 such that, for
any u ∈ Rm and sufficiently large |Λ∗ u|,
234 D. Belomestny et al.
 
2tCν

α
|F [pt ](u − iR)| ≤ AR exp − 2
Λ∗ u
, (5)
π

where pt is the density of Xt .


Proof For any u ∈ Rm , we have
  
Λ∗ R,z
 ∗ ∗
|F [pt ](u − iR)| = exp −t 1−e cos Λ u, z + Λ R, z1{|z|≤1} ν(dz)
Rd
  

= exp −t 1 − eΛ R,z + Λ∗ R, z1{|z|≤1} ν(dz)
Rd
  
∗  
× exp −t eΛ R,z 1 − cos Λ∗ u, z ν(dz)
Rd
  
 
= AR exp −t 1 − cos Λ∗ u, z νR (dz) ,
Rd

where
  
 Λ∗ R,z
AR = exp t e − 1 − Λ∗ R, z1{|z|≤1} ν(dz) < ∞,
Rd

since


Λ R,z


e − 1 − Λ∗ R, z1{|z|≤1}
≤ C1 (Λ∗ R) |z|2 1{|z|≤1} + C2 (Λ∗ R)eΛ R,z 1{|z|>1} .

First, note that the condition (4) is equivalent to the following one

z, k2 νR (dz) ≥ Cν |k|α ,
{z∈R:|z,k|≤1}

for sufficiently large k ∈ Rd , say |k| ≥ c0 . To see this, it is enough to change in (4)
the vector h to the vector ρk. Fix u ∈ Rm with |u| ≥ 1 and |Λ∗ u| ≥ c0 , then using
the inequality 1 − cos(x) ≥ π22 |x|2 , |x| ≤ π, we find
 
  2
1 − cos Λ∗ u, z νR (dz) ≥ 2 Λ∗ u, z2 νR (dz)
Rd π {z∈R:|Λ∗ u,z|≤1}
2Cν

α
≥ 2
Λ∗ u
.
π


Lemma 1 provides us a general guideline how to choose the importance sampling


density q used in our unbiased simulation. Note that, after a proper rescaling, the
function on the right hand side of the inequality (5) gives us the probability density
of a power exponential distribution. Hence, letting
Unbiased Simulation of Distributions with Explicitly … 235
 
2tCν

α
q(u) := C exp − 2
Λ∗ u
,
π

we know from Lemma 1 that our simulation scheme will have a finite variance.
Discussion
The condition (4) is not very restrictive. We can show that it is true for many com-
monly used Lévy models in financial applications, such as CGMY, NIG and α-stable
models. Below we discuss a special case, which can be viewed as a generalization
of α-stable processes.
For simplicity we take R = 0. Clearly, if (Zt ) is a d-dimensional α-stable process
which is rotation invariant (ψ(h) = cα |h|α , for h ∈ Rd ), then (4) holds. Consider now
general α-stable processes. It is known that Z is α-stable if and only if its components
Z 1 , . . . , Z d are α-stable and if the Levy copula C of Z is homogeneous of order 1
(see Cont and Tankov [26]), i.e.

C (r · ξ1 , . . . , r · ξd ) = r C (ξ1 , . . . , ξd )

for all ξ = (ξ1 , . . . , ξd ) ∈ Rd and r > 0. As an example of such homogeneous Levy


copula one can consider
⎛ ⎞−1/θ

d


−θ 
C (ξ1 , . . . , ξd ) = 22−d ⎝
ξ j
⎠ η1ξ ·...·ξ − (1 − η)1ξ1 ·...·ξd <0 ,
1 d ≥0
j=1

where θ > 0 and η ∈ [0, 1]. If the marginal tail integrals given by

Πj (xj ) = ν R, . . . , I (xj ), . . . R sgn(xj )

with 
(x, ∞), x ≥ 0,
I (x) =
(−∞, x], x < 0,

are absolutely continuous, we can compute the Lévy measure ν for the Lévy copula
C by differentiation as follows:

ν(dx1 , . . . , dxd ) = ∂1 . . . ∂d C |ξ1 =Π1 (x1 ),...,ξd =Πd (xd ) ν1 (dx1 ) · . . . · νd (dxd ),

where ν1 (dx1 ), . . . , νd (dxd ) are the marginal Lévy measures.


Suppose that the marginal Lévy measures are absolutely continuous with a stable-
like behaviour:
lj (|xj |)
νj (dxj ) = kj (xj ) dxj = dxj , j = 1, . . . , d,
|xj |1+α
236 D. Belomestny et al.

where l1 , . . . , ld are some nonnegative bounded nonincreasing functions on [0, ∞)


with lj (0) > 0 and α ∈ [0, 2]. Then

ν(dx1 , . . . , dxd ) = G(Π1 (x1 ), . . . , Πd (xd )) k1 (x1 ) · . . . · kd (xd ) dx1 . . . dxd

with G(ξ1 , . . . , ξd ) = ∂1 . . . ∂d C |ξ1 ...,ξd . Note that for any r > 0,

kj (rxj ) = r −1−α k j (xj , r), Πj (rxj ) = r −α Π j (xj , r), j = 1, . . . , d,

where
 ∞  xj
lj (rxj )
k j (xj , r) = , Π j (xj , r) = 1{xj ≥0} k j (s, r) ds + 1{xj <0} k j (s, r) ds.
|xj | 1+α
xj −∞

Since the function G is homogeneous with order 1 − d, we get for ρ ∈ (0, 1),
 

z, h2 ν(dz) = ρ 2−α y, h2 G Π 1 (y1 , ρ), . . . , Π d (yd , ρ)
{z∈R:|z,h|≤ρ} {z∈R:|y,h|≤1}

k 1 (y1 , ρ) · . . . · k d (yd , ρ) dy1 . . . dyd




≥ ρ 2−α y, h2 G Π 1 (y1 , 1), . . . , Π d (yd , 1)
{z∈R:|y,h|≤1}

k 1 (y1 , 1) · . . . · k d (yd , 1) dy1 . . . dyd

and the condition (4) holds, provided



inf z, h2 ν(dz) > 0.
h: |h|=1 {z∈R:|z,h|≤1}

If for some R = (R1 , . . . , Rd ) the functions exRi li (x), i = 1, . . . , d, are bounded, the
.
condition (4) holds for νR (dz) = eR,z ν(dz).
Of course, the power exponential distribution may not be a proper candidate for
q(u) if the condition (4) fails to hold. Nevertheless, we need to stress that the principle
behind Parseval’s identity still applies here and thus our unbiased simulation should
work in that case.
For example, for the variance gamma process Xt with parameters θ , σ —drift
and volatility of Brownian motion and κ—variance of the subordinator, the Fourier
transform is
u2 σ 2 κ
− iθ κu)− κ .
t
E[eiuXt ] = (1 +
2

There exists some constant 1 < α < 2t


κ
, providing 2t
κ
> 1, such that


iuX
1

E[e t ]
<
(1 + |u|)α
Unbiased Simulation of Distributions with Explicitly … 237

for sufficiently large |u|, so we can use the power density

α−1
q(u) =
2(1 + |u|)α

as our importance sampling density.


We leave the investigation on the variance property of the simulator when the
condition (4) is not satisfied to the future research work.

4 Positive Definite Densities

Let p be a probability density on Rd , which is positive definite. For example, all


symmetric infinite divisible absolute continuous distributions have positive definite
densities. Let furthermore g be a nonnegative integrable function on Rd . Suppose
that we want to compute the expectation

V = Ep [g(X)] = g(x)p(x) dx.
Rd

We have by the Parseval’s identity



1
V= F [g](−x)F [p](x) dx.
(2π )d Rd

Note that p∗ (x) = F [p](x)/((2π )d p(0)) is a probability density and therefore we


have another “dual” representation for V :

V = Ep∗ [g∗ (X)]

with g∗ (x) = p(0)F [g](−x).


Let us compare the variances of the random variables g(X) under X ∼ p and g∗ (X)
under X ∼ p∗ . It holds

Var p [g(X)] = g2 (x)p(x) dx − V 2
Rd

and

p(0)
Var p∗ [g∗ (X)] = |F [g](x)|2 F [p](x) dx − V 2
(2π )d Rd

= p(0) (g  g)(x)p(x) dx − V 2 ,
Rd
238 D. Belomestny et al.

where 
(g  g)(x) = g(x − y)g(y) dy.

As a result,


 2 
Var p [g(X)] − Var p∗ [g (X)] = g (x) − p(0)(g  g)(x) p(x) dx.
Rd

Note that if p(0) > 0 is small, then it is likely that Var p [g(X)] > Var p∗ [g∗ (X)].
This means that estimating V under p∗ with Monte Carlo can be viewed as a variance
reduction method in this case. Apart from the variance reduction effect, the density
p∗ may has in many cases (for example, for infinitely divisible distributions) much
simpler form than p and therefore is easy to simulate from.

5 Numerical Examples

5.1 European Put Option Under CGMY Model

The CGMY process {Xt }t≥0 with drift μ is a pure jump process with the Lévy measure
(see Carr et al. [6])
 
exp(Gx) exp(−Mx)
νCGMY (x) = C 1x<0 + 1x>0 , C, G, M > 0, 0 < Y < 2.
|x|1+Y x 1+Y

As can be easily seen, the Lévy measure νCGMY satisfies the condition (4) with α = Y .
The characteristic function of XT is given by

φ(u) = e[eiuXT ] = exp iμuT + TCΓ (−Y )[(M − iu)Y − M Y + (G + iu)Y − GY ] ,

where
μ = r − CΓ (−Y )[(M − 1)Y − M Y + (G + 1)Y − GY

ensures that {e−rt eXt }t≥0 is a martingale. Suppose the stock price follows the model

St = S0 eXt ,

then due to (2), for any R < 0, the price of the European put option is given by

−rT + e−rT
e e[(K − ST ) ] = F [g](iR − u)F [p](u − iR)du, (6)

where
Unbiased Simulation of Distributions with Explicitly … 239

K 1−R e−iu ln K
F [g](iR − u) = , F [p](u − iR) = ei(u−iR) ln S0 · e[ei(u−iR)XT ].
(iu + R − 1)(iu + R)

To ensure the finiteness of F [p](u − iR), we have to select an R such that −G <
R < 0. In fact, under such R,

eRx νCGMY (x)dx < +∞,
|x|≥1

which is equivalent to E[eRXT ] < +∞ (see Sato [24], Theorem 25.17). Therefore,

|F [p](u − iR)| ≤ eR ln S0 E[eRXT ] < +∞.

Lemma 1 implies that we can find constants α, A, and θ such that α ≤ Y , A > 0,
θ > 0, and
|u|α
|F [p](u − iR)| ≤ Ae− θ

for sufficiently large u. So the following exponential power density

1 |u|α
q(u) = 1 e− θ

2θ Γ (1 + α1 )
α

can be used as the sampling density in (3).


We choose the values of α, θ , and R to minimize the second moment of our
estimator, i.e., we solve the following optimization problem
 
|F [g](iR − U)|2 |F [p](U − iR))|2
min Eq , U ∼ q(·).
−G<R<0,θ,α q2 (U)

Since the expectation usually does not have the explicit form, we propose the
following stochastic optimization algorithm to solve the problem.
.
Step 1 Noting that W = |U|α is gamma distributed with the density

1
w α −1 e− θ , w > 0,
1 w
qW (w) = 1
θ α Γ ( α1 )

we first generate n i.i.d. samples Wi ∼ Γ ( α1 , θ ) and i.i.d. samples Ri which have


1
equal probability to be 1 or −1. Then Ui = Wiα Ri have the common distribution
function q.
Step 2 Obtain the optimal parameters by
240 D. Belomestny et al.

1  |F [g](iR − Ui )|2 |F [p](Ui − iR))|2


N
arg min
−G<R<0,θ,α N i=1 q2 (Ui )

We use the parameters C = 1, G = 5, M = 5, Y = 0.5, r = 0.1, S0 = K =


100, T = 1 from Feng and Lin [10] to calculate the price of the European put option.
The option price obtained via numerical integration was 10.2967. All numerical
experiments were conducted on a PC equipped with Intel Core i5 CPU at 2.50 GHz
with 8 GB RAM. Our numerical results are shown in Table 1 and compared with the
results obtained by P–T method (given by Poirot and Tankov [22]) and K–M method
(given by Kawai and Masuda [15]). The results show that our proposed scheme is
more efficient in option pricing.
Here we choose initial point R = −5, α = Y , θ = −1/TCΓ (−Y ) and repeat
the above optimization scheme until the optimal solution converges. To assess how
sensitive the simulation efficiency is with respect to the choice of (R, α, θ ), we also
run two more arbitrarily chosen value sets for the parameters, as shown in Tables 2
and 3. The performance is still better than existing methods in terms of RMSE.
European Put Option Under NIG Model
The NIG (Normal Inverse Gaussian) Lévy process can be constructed by subordi-
nating Brownian Motion with an Inverse Gaussian process: (see Barndorff-Nielsen,
O. [3])

Table 1 Put option in CGMY model (R = −4.6, α = 0.49, θ = 0.36)


No. of simulation Price 95 %-interval RMSE Time (s)
100,000 10.3073 [10.2896, 0.0091 0.06
10.3251]
400,000 10.2999 [10.2910, 0.0045 0.27
10.3088]
1,600,000 10.2970 [10.2926, 0.0023 1.05
10.3014]
100,000(P–T) 11.6421 [9.3455, 13.9387] 3.5172 0.03
100,000(K–M) 10.2938 [10.2016, 0.0471 13096.13
10.3861]

Table 2 Put option in CGMY model (R = −1.5, α = 0.49, θ = 0.36)


No. of simulation Price 95 %-interval RMSE Time (s)
100,000 10.3074 [10.2705, 0.0188 0.07
10.3444]
400,000 10.2990 [10.2805, 0.0094 0.30
10.3175]
1,600,000 10.2958 [10.2866, 0.0047 1.07
10.3050]
Unbiased Simulation of Distributions with Explicitly … 241

Table 3 Put option in CGMY model (R = −3, α = 0.4, θ = 0.6)


No. of simulation Price 95 %-interval RMSE Time (s)
100,000 10.3062 [10.2614, 0.0229 0.07
10.3511]
400,000 10.2925 [10.2701, 0.0114 0.29
10.3150]
1,600,000 10.2961 [10.2849, 0.0057 1.08
10.3073]

Xt (a, β, δ) = βTt (ν, δ) + W (Tt (ν, δ)),



where a = β 2 + ν 2 and Tt (ν, δ) is the Inverse Gaussian Lévy process defined by
Tt (ν, δ) = inf{s > 0 : νs + Bs = δt}. We have
   
e[eiuXt ] = exp δt a2 − β 2 − a2 − (β + iu)2

and the corresponding Lévy measure νNIG fulfils the condition (4) with α = 1. Sup-
pose the stock price is modelled by

St = S0 eμt+Xt ,

where  
μ = r − q − δ( a2 − β 2 − a2 − (β + 1)2 )

ensures the martingale condition. Then for any −a − β < R < 0, the price of Euro-
pean put option is given by

−rT + e−rT
e e[(K − ST ) ] = F [g](iR − u)F [p](u − iR)du,

where

K 1−R e−iu ln K
F [g](iR − u) = , F [p](u − iR) = ei(u−iR)(ln S0 +μT ) · e[ei(u−iR)XT ].
(iu + R − 1)(iu + R)

Lemma 1 implies that one can use the Laplace density

1 − |u|
q(u) = e θ

as the importance sampling density, where the parameter θ can be chosen by mini-
mizing the simulated second moment.
242 D. Belomestny et al.

Table 4 Put option in NIG model (R = −9.3, θ = 2.4)


No. of Price 95 %- RMSE Time RMSE Time
simulation interval (direct)
100,000 4.5900 [4.5879, 0.0011 0.06 0.0238 0.04
4.5922]
400,000 4.5896 [4.5886, 0.0006 0.23 0.0119 0.13
4.5907]
1,600,000 4.5897 [4.5891, 0.0003 0.92 0.0059 0.49
4.5902]

Chen et al. [9] used parameters a = 15, β = −5, δ = 0.5, r = 0.03, S0 = K =


100, T = 0.5 to calculate the price of the European put option and obtained the value
4.5898. Table 4 shows numerical results with the same parameters and compares the
RMSE and the computational time of our method with those of the method direct
simulating the subordinator.
Barrier Option Under CGMY Model
The payoff function of barrier option with m monitoring time points is

(ST − K)+ 1{L≤St1 ...Stm ≤U} ,

where St = S0 eXt and 0 < t1 < . . . < tm < T . According to (2), the option price is
equal to 
e−rT
F [g](iR − u)F [p](u − iR)du,
(2π )m+1 Rm+1

where u = (u1 , . . . , um+1 ), R = (0, . . . , 0, R), 1 < R < M,

e(−ium+1 −R+1) ln K e(ium+1 −ium +R) ln U − e(ium+1 −ium +R) ln L


F [g](iR − u) =
(ium+1 + R − 1)(ium+1 + R) ium+1 − ium + R
e(ium −ium−1 ) ln U − e(ium −ium−1 ) ln L e(iu2 −iu1 ) ln U − e(iu2 −iu1 ) ln L
···
ium − ium−1 iu2 − iu2


m 
and F [p](u − iR) = eiu1 ln S0 φΔ (uj ) φΔ (um+1 − iR).
j=1
We use parameters C = 1, G = 5, M = 5, Y = 1.5, r = 0.1, s = 100, K =
100, T = 2, U = 105, L = 95 and calculate the price of the barrier option when
there is only one monitoring time at t = 1. The benchmark price calculated by numer-
ical integration is 1.2266. We use the method described in Sect. 2 with the sampling
density

1 |u1 |α1 1 |u2 |α2


− −
h(u1 , u2 ) = 1/α1
e θ1
1/α2
e θ2
,
2θ1 Γ (1 + 1
α1
) 2θ2 Γ (1 + 1
α2
)
Unbiased Simulation of Distributions with Explicitly … 243

Table 5 Barrier option in CGMY model (R = 1.1, α1 = 1.4, θ1 = 0.9, α2 = 0.7, θ2 = 0.2)
No. of simulation Price 95 %-interval RMSE Time (s)
100,000 1.2235 [1.2164, 1.2305] 0.0036 0.18
400,000 1.2260 [1.2225, 1.2295] 0.0018 0.61
1,600,000 1.2264 [1.2247, 1.2282] 0.0009 2.34

where α1 , α2 ≤ Y . The numerical results are presented in Table 5.

Acknowledgments The research by Denis Belomestny was made in IITP RAS and supported by
Russian Scientific Foundation grant (project N 14-50-00150). The second and third authors are grate-
ful for the financial support of a GRF grant from HK SAR government (Grant ID: CUHK411113).

References

1. Abate, J., Whitt, W.: The fourier-series method for inverting transforms of probability distrib-
utions. Queueing Syst. 10(1–2), 5–87 (1992)
2. Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis, vol. 57. Springer
Science & Business Media, New York (2007)
3. Barndorff-Nielsen, O.E.: Processes of normal inverse gaussian type. Financ. Stoch. 2(1), 41–68
(1997)
4. Biagini, F., Bregman, Y., Meyer-Brandis, T.: Pricing of catastrophe insurance options written
on a loss index with reestimation. Insur.: Math. Econ. 43(2), 214–222 (2008)
5. Carr, P., Madan, D.: Option valuation using the fast Fourier transform. J. Comput. Financ. 2(4),
61–73 (1999)
6. Carr, P., Geman, H., Madan, D.B., Yor, M.: The fine structure of asset returns: an empirical
investigation. J. Bus. 75(2), 305–333 (2002)
7. Chambers, J.M., Mallows, C.L., Stuck, B.: A method for simulating stable random variables.
J. Am. Stat. Assoc. 71(354), 340–344 (1976)
8. Chen, N., Hong, L.J.: Monte Carlo simulation in financial engineering. In: Proceedings of the
39th Conference on Winter Simulation, pp. 919–931. IEEE Press (2007)
9. Chen, Z., Feng, L., Lin, X.: Simulating Lévy processes from their characteristic functions and
financial applications. ACM Trans. Model. Comput. Simul. (TOMACS) 22(3), 14 (2012)
10. Feng, L., Lin, X.: Inverting analytic characteristic functions and financial applications. SIAM
J. Financ. Math. 4(1), 372–398 (2013)
11. Feng, L., Linetsky, V.: Pricing discretely monitored barrier options and defaultable bonds in
Lévy process models: a fast Hilbert transform approach. Math. Financ. 18(3), 337–384 (2008)
12. Glasserman, P.: Monte Carlo Methods in Financial Engineering, vol. 53. Springer, New York
(2004)
13. Glasserman, P., Liu, Z.: Sensitivity estimates from characteristic functions. Oper. Res. 58(6),
1611–1623 (2010)
14. Hurd, T.R., Zhou, Z.: A Fourier transform method for spread option pricing. SIAM J. Financ.
Math. 1(1), 142–157 (2010)
15. Kawai, R., Masuda, H.: On simulation of tempered stable random variates. J. Comput. Appl.
Math. 235(8), 2873–2887 (2011)
16. Kou, S., Petrella, G., Wang, H.: Pricing path-dependent options with jump risk via Laplace
transforms. Kyoto Econ. Rev. 74(1), 1–23 (2005)
244 D. Belomestny et al.

17. Kwok, Y.K., Leung, K.S., Wong, H.Y.: Efficient options pricing using the fast Fourier transform.
Handbook of Computational Finance, pp. 579–604. Springer, Heidelberg (2012)
18. L’Ecuyer, P.: Non-uniform random variate generations. International Encyclopedia of Statisti-
cal Science, pp. 991–995. Springer, New York (2011)
19. Lee, R.W., et al.: Option pricing by transform methods: extensions, unification and error control.
J. Comput. Financ. 7(3), 51–86 (2004)
20. Lewis, A.L.: A simple option formula for general jump-diffusion and other exponential Lévy
processes. Available at SSRN 282110 (2001)
21. Lord, R., Kahl, C.: Optimal Fourier inversion in semi-analytical option pricing (2007)
22. Poirot, J., Tankov, P.: Monte Carlo option pricing for tempered stable (CGMY) processes.
Asia-Pac. Financ. Mark. 13(4), 327–344 (2006)
23. Rudin, W.: Real and Complex Analysis. Tata McGraw-Hill Education, New York (1987)
24. Sato, K.I.: Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press,
Cambridge (1999)
25. Staum, J.: Monte Carlo computation in finance. Monte Carlo and Quasi-Monte Carlo Methods
2008, pp. 19–42. Springer, New York (2009)
26. Tankov, P.: Financial Modelling with Jump Processes, vol. 2. CRC Press, Boca Raton (2004)
Central Limit Theorem for Adaptive
Multilevel Splitting Estimators
in an Idealized Setting

Charles-Edouard Bréhier, Ludovic Goudenège and Loïc Tudela

Abstract The Adaptive Multilevel Splitting (AMS) algorithm is a powerful and


versatile iterative method to estimate the probabilities of rare events. We prove a
new central limit theorem for the associated AMS estimators introduced in [5],
and which have been recently revisited in [3]—the main result there being (non-
asymptotic) unbiasedness of the estimators. To prove asymptotic normality, we rely
on and extend the technique presented in [3]: the (asymptotic) analysis of an integral
equation. Numerical simulations illustrate the convergence and the construction of
Gaussian confidence intervals.

Keywords Monte-Carlo simulation · Rare events · Multilevel splitting · Central


limit theorem

Mathematics Subject Classification: 65C05 · 65C35 · 60F05

1 Introduction
Many models from physics, chemistry or biology involve stochastic systems for
different purposes: taking into account uncertainty with respect to data parameters,

C.-E. Bréhier (B)


Université Paris-Est, CERMICS (ENPC), 6-8-10 Avenue Blaise Pascal,
Cité Descartes, 77455 Marne-la-vallée, France
e-mail: brehierc@cermics.enpc.fr
C.-E. Bréhier
INRIA Paris-Rocquencourt, Domaine de Voluceau - Rocquencourt,
B.P. 105, 78153 Le Chesnay, France
L. Goudenège
Fédération de Mathématiques de l’École Centrale Paris, CNRS,
Grande voie des Vignes, 92295 Châtenay-Malabry, France
e-mail: goudenege@math.cnrs.fr
L. Tudela
Ensae ParisTech, 3 Avenue Pierre Larousse, 92240 Malakoff, France
e-mail: loic.tudela@ensae-paristech.fr

© Springer International Publishing Switzerland 2016 245


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_10
246 C.-E. Bréhier et al.

or allowing for dynamical phase transitions between different configurations of the


system. This phenomenon often referred to as metastability is observed, for instance,
when one studies a d-dimensional overdamped Langevin dynamics:

d X t = −∇V (X t )dt + 2β −1 dWt ,

associated with a potential function V with several local minima. Here W denotes a
d-dimensional standard Wiener process. When the inverse temperature β increases,
the transitions become rare events (their probability decreases exponentially fast).
In this paper, we adopt a numerical point of view, and analyze a method which
outperforms a pure Monte-Carlo method for a given computational effort in the small
probability regime (in terms of relative error). Two important families of methods
have been introduced in the 1950s and next have been extensively developed, in order
to efficiently address this rare event estimation problem: importance sampling, and
importance/multilevel splitting—see [11], and [9] for a more recent treatment. We
refer for instance to [12] for a more general presentation.
The method we study in this work is a multilevel splitting algorithm. The main
advantage of this kind of methods is that they are non-intrusive: the model does not
need to be modified in order to obtain a more efficient Monte-Carlo method. The
method we study has an additional feature: adaptive computations (of levels) are
made on-the-fly. To explain more precisely the algorithm and its properties, from
now on we only focus on a simpler, generic setting for the rare event estimation
problem.
Let X be a real random variable, and a be a given threshold. We want to estimate
the tail probability p := P(X > a). The splitting strategy, in the regime when
a becomes large, consists in introducing the following decomposition of p, as a
product of conditional probabilities:

P(X > a) = P(X > an |X > an−1 ) . . . P(X > a2 |X > a1 )P(X > a1 ),

for a sequence of levels a1 < . . . < an−1 < an = a. The common interpretation of
this formula is that the event that X > a is split in n conditional probabilities for X ,
which are each much larger than p, and are thus easier to estimate.
To optimize the variance, the levels must be chosen such that all the conditional
probabilities are equal to p 1/n , with n as large as possible. However, levels satisfying
this condition are not known a priori in practical cases.
Notice that, in principle, to apply this splitting strategy, one needs to know how to
sample according to the conditional distributions appearing in the splitting formula.
If this condition holds, we say that we are in an idealized setting.
Adaptive techniques based on multilevel splitting, where the levels are computed
on-the-fly, have been introduced in the 2000s in various contexts, under different
names: Adaptive Multilevel Splitting (AMS) [5–7], Subset simulation [2] and Nested
sampling [13] for instance.
Central Limit Theorem for Adaptive Multilevel Splitting … 247

In this paper, we focus on the versions of AMS algorithms studied in [3], following
[5]. Such algorithms depend on two parameters: a number of (interacting) replicas
n, and a fixed integer k ∈ {1, . . . , n − 1}, such that a proportion k/n of replicas are
killed and resampled at each iteration. The version with k = 1 has been studied in
[10], and is also (in the idealized setting) a special case of the Adaptive Last Particle
Algorithm of [14].
A family of estimators ( p̂ n,k )n≥2,1≤k≤n−1 is introduced in [3]—see (2) and (3).
The main property established there is unbiasedness: for all values n and k the
equality E[ p̂ n,k ] = p holds true—note that this statement is not an asymptotic
result. Moreover, an analysis of the computational cost is provided there, in the
regime n → +∞, with fixed k. However, comparisons, when k changes, are made
using a cumbersome procedure: M independent realizations of the algorithm are
necessary to define a new estimator, as an empirical mean of p̂1n,k , . . . , p̂ n,k
M , and
finally one studies the limit when M → +∞. The aim of this paper is to remove this
procedure: we prove directly an asymptotic normality result for the estimator p̂ n,k ,
when n → +∞, with fixed k. Such a result allows to directly rely on asymptotic
Gaussian confidence intervals.
Note that other Central Limit Theorems for Adaptive Multilevel Splitting
estimators (in different parameter regimes for n and k) have been obtained in
[4, 5, 8].
The main result of this paper is Theorem 1: if k and a are fixed, under the assump-
tion that the cumulative  function of X is continuous, when n → +∞,
√ distribution
the random variable n p̂ n,k − p converges in law to a centered Gaussian random
variable, with variance − p 2 log( p) (independent of k).
The main novelty of the paper is the treatment of the case k > 1: indeed when
k = 1 (see [10]) the law of the estimator is explicitly known (it involves a Poisson
random variable with parameter −n log( p)): the asymptotic normality of log( p̂ n,1 ) is
a consequence of straightforward computation, and the central limit theorem for p̂ n,1
easily follows using the delta-method. When k > 1, the law is more complicated
and not explicitly known; the key idea is to prove that the characteristic function
of log( p̂ n,k ) satisfies a functional equation, following the strategy in [3]; the basic
ingredient is a decomposition according to the first step of the algorithm.
One of the main messages of this paper is thus that the functional equation tech-
nique is a powerful tool in order to prove several key properties of the AMS algorithm
in the idealized setting: unbiasedness and asymptotic normality.
The paper is organized as follows. In Sect. 2, we introduce the main objects:
the idealized setting (Sect. 2.1) and the AMS algorithm (Sect. 2.2). Our main result
(Theorem 1) is stated in Sect. 2.3. Section 3 is devoted to the detailed proof of this
result. Finally Sect. 4 contains a numerical illustration of the Theorem.
248 C.-E. Bréhier et al.

2 Adaptive Multilevel Splitting Algorithms

2.1 Setting

Let X be a real random variable. We assume that X > 0 almost surely. The aim is
the estimation of the probability p = P(X > a), where a > 0 is a threshold. When
a goes to +∞, p goes to 0. More generally, we introduce the conditional probability
for 0 ≤ x ≤ a
P(x) = P(X > a|X > x). (1)

Note that the quantity of interest satisfies p = P(0); moreover P(a) = 1.


Let F denote the cumulative distribution function of X : F(x) = P(X ≤ x)
∀x ∈ R.
The following standard assumption [3, 5] is crucial for the study in this paper.
Assumption 1 The function F is assumed to be continuous.

2.2 The AMS Algorithm

The algorithm depends on two parameters:


• the number of replicas n ≥ 2;
• the number k ∈ {1, . . . , n − 1} of replicas that are resampled at each iteration.
The other necessary parameters are the stopping threshold a and the initial con-
dition x ∈ [0, a]. On the one hand, in practice, one applies the algorithm with x = 0
to estimate p. On the other hand, introducing an additional variable x for the initial
condition is a key tool for the theoretical analysis of the algorithm.
j
In the sequel, when a random variable X i is written, the subscript i denotes the
index in {1, . . . , n} of a particle, and the superscript j denotes the iteration of the
algorithm.
In the algorithm below and in the following, we use classical notations for kth
order statistics. For Y = (Y1 , . . . , Yn ) independent and identically distributed (i.i.d.)
real valued random variables with continuous cumulative distribution function, there
exists almost surely a unique (random) permutation σ of {1, . . . , n} such that Yσ (1) <
. . . < Yσ (n) . For any k ∈ {1, . . . , n}, we then use the classical notation Y(k) = Yσ (k)
to denote the kth order statistics of the sample Y .
We are now in position to describe the Adaptive Multilevel Splitting (AMS)
algorithm.

Algorithm 1 (Adaptive Multilevel Splitting)


Initialization: Define Z 0 = x. Sample n i.i.d. realizations X 10 , . . . , X n0 , with the law
L (X |X > x).
Central Limit Theorem for Adaptive Multilevel Splitting … 249

Define Z 1 = X (k)
0
, the kth order statistics of the sample X 0 = (X 10 , . . . , X n0 ), and
σ 1 the (a.s.) unique associated permutation: X σ0 1 (1) < . . . < X σ0 1 (n) .
Set j = 1.
Iterations (on j ≥ 1): While Z j < a:
j j
• Conditionally on Z j , sample k new independent random variables (Y1 , . . . , Yk ),
according to the conditional distribution L (X |X > Z j ).
• Set
 j
j Y(σ j )−1 (i) if (σ j )−1 (i) ≤ k
Xi =
if (σ j )−1 (i) > k.
j−1
Xi

In other words, the particle with index i is killed and resampled according to the
j−1 j−1
law L (X |X > Z j ) if X i ≤ Z j , and remains unchanged if X i > Zj . Notice
that the condition (σ j )−1 (i) ≤ k is equivalent to i ∈ σ j (1), . . . , σ j (k) .
j j j
• Define Z j+1 = X (k) , the kth order statistics of the sample X j = (X 1 , . . . , X n ),
j j
and σ j+1 the (a.s.) unique associated permutation: X σ j+1 (1) < . . . < X σ j+1 (n) .
• Finally increment j ← j + 1.
End of the algorithm: Define J n,k (x) = j − 1 as the (random) number of iterations.
Notice that J n,k (x) is such that Z J (x) < a and Z J (x)+1 ≥ a.
n,k n,k

For a schematic representation of the algorithm, we refer for instance to [5].


We are now in position to define the estimator p̂ n,k (x) of the probability P(x):
 J n,k (x)
k
p̂ n,k
(x) = C n,k
(x) 1 − , (2)
n

with
1

Card i ; X iJ (x) ≥ a .
n,k
C n,k (x) = (3)
n

When x = 0, to simplify notations we set p̂ n,k = p̂ n,k (0).

2.3 The Central Limit Theorem

The main result of the paper is the following asymptotic normality statement.
Theorem 1 Under Assumption 1, for any fixed k ∈ N∗ and a ∈ R+ , the following
convergence in distribution holds true:
√  n,k   
n p̂ − p → N 0, − p 2 log( p) . (4)
n→+∞
250 C.-E. Bréhier et al.

Notice that the asymptotic variance does not depend on k. As a consequence of this
result, one can define asymptotic Gaussian confidence intervals, for one realization
of the algorithm and n → +∞. However, the speed of convergence is not known
and may depend on the estimated probability p, and on the parameter k.
Thanks to Theorem 1, we can study the cost of the use of one realization of
the AMS algorithm to obtain a given accuracy when n → +∞. In [3], the cost was
analyzed when using a sample of M independent realizations of the algorithm, giving
an empirical estimator, and the analysis was based on an asymptotic analysis of the
variance in the large n limit.
Let ε be some fixed tolerance error, and α > 0. Denote rα such that P(Z ∈
[−rα , rα ]) = 1 − α, where Z is a standard Gaussian random variable.
Then for n large,√an asymptotic confidence
√ interval with level 1 − α, centered
rα − p2 log( p) rα − p2 log( p)
around p, is [ p − √
n
, p+ √
n
]. Then the ε-error criterion | p̂ n,k −
− p2 log( p)rα2
p| ≤ ε is achieved for n of size ε2
.
However, on average one realization of the AMS algorithm requires a number of
steps of the order −n log( p)/k, with k random variables sampled at each iteration
(see [3]). Another source of cost is the sorting of the replicas at initialization, and
the insertion at each iteration of the k new sampled replicas in the sorted ensemble
of the non-resampled ones. Thus  the cost to achieve an accuracy of size ε is in the
large n regime of size n log(n) − p 2 log( p) , which does not depend on k.
This cost can be compared with the one when using a pure Monte-Carlo approxi-
mation, with an ensemble of non-interacting replicas of size n: thanks to the Central
− p(1− p)rα2
Limit Theorem, the tolerance criterion error ε is satisfied for n of size ε2
.
Despite the log(n) factor in the AMS case, the performance is improved since
p 2 log( p) = o( p) when p → 0.

Remark 1 In [3], the authors are able to analyze the effect of the change of k on the
asymptotic variance of the estimator. Here, we do not observe significant differences
when k changes, theoretically and numerically.

3 Proof of the Central Limit Theorem

The proof is divided into the following steps. First, thanks to Assumption 1, we
explain why, in order to theoretically study the statistical behavior of the algorithm,
it is sufficient to study to the case when X is distributed according to the exponential
law with parameter 1: P(X > z) = exp(−z) for any z > 0. The second step is
the introduction of the characteristic function of log( p̂ n,k (x)); then, following the
definition of the algorithm, we prove that it is solution of a functional equation with
respect to x, which can be transformed into a linear ODE of order k. Finally, we
study the solution of this ODE in the limit n → +∞.
Central Limit Theorem for Adaptive Multilevel Splitting … 251

3.1 Reduction to the Exponential Case

We first recall arguments from [3] which prove that it is sufficient to study the statis-
tical behavior of the Algorithm 1 and of the estimator (2) in a special case (Assump-
tion 2 below); the more general result, Theorem 1 (valid under Assumption 1), is
deduced from that special case.
It is sufficient to study the case when the random variable X is exponentially
distributed with parameter 1. This observation is based on a change of variable with
the following function:  
Λ(x) = − log 1 − F(x) . (5)

It is well-known that F(X ) is uniformly distributed on (0, 1) (thanks to the continuity


Assumption 1), and thus Λ(X ) is exponentially distributed with parameter 1. Thanks
to Corollary 3.4 in [3], this property has the following consequence for the study
of the AMS algorithm: the law of the estimator p̂ n,k is equal to the law of q̂ n,k ,
which is the estimator defined, with (2), using the same values of the parameters
n and k, but with two differences. First, the law of the underlying random variable
is the exponential distribution with parameter 1; second, the stopping level a is
replaced
with Λ(a), where
 Λ is defined by (5). Note the following consistency:
E q̂ n,k = exp −Λ(a) = 1 − F(a) = p (by the unbiasedness result of [3]).
Since the arguments are intricate, we do not repeat them here and we refer the
interested reader to [3]; from now on, we thus assume the following.
Assumption 2 Assume that X is exponentially distributed with parameter 1: we
denote L (X ) = E (1).
When Assumption 2 is satisfied, the analysis is simpler and the rest of the paper
is devoted to the proof of the following Proposition 1.

Proposition 1 Under Assumption 2, the following convergence in distribution holds


true: √  n,k   
n p̂ − p → N 0, a exp(−2a) . (6)
n→+∞

We emphasize again that even if the exponential case appears as a specific example
(Assumption 2 obviously implies Assumption 1), giving a detailed proof of Proposi-
tion 1 is sufficient, thanks to Corollary 3.4 in [3], to obtain our main general result
Theorem 1. Since the exponential case is more convenient for the computations below,
in the sequel we work under Assumption 2. Moreover, we abuse notation: we use the
general notations from Sect. 2, even under Assumption 2.
The following notations will be useful:
 
• f (z) = exp(−z)1z>0 (resp. F(z) = 1 − exp(−z) 1z>0 ) is the density (resp. the
cumulative distribution function) of the exponential law E (1) with parameter 1.
  n−k
• f n,k (z) = k nk F(z)k−1 f (z) 1 − F(z) is the density of the kth order statistics
X (k) of a sample (X 1 , . . . , X n ), where the X i are independent and exponentially
distributed, with parameter 1.
252 C.-E. Bréhier et al.

Finally, in order to deal with the conditional distributions L (X |X > x) (which


thanks to Assumption 2 is a shifted exponential distribution x+E (1)) in the algorithm,
we set for any x ≥ 0 and any y ≥ 0

f (y; x) = f (y − x), F(y; x) = F(y − x),


 f n,k (y; x) = f n,k (y − x),
y (7)
Fn,k (y) = f n,k (z)dz, Fn,k (y; x) = Fn,k (y − x).
−∞

Straightforward computations (see also [3]) yield the following useful formulae:



d
f n,1 (y; x) = n f n,1 (y; x).
dx
⎪ d  
⎩ for k ∈ {2, . . . , n − 1}, f n,k (y; x) = (n − k + 1) f n,k (y; x) − f n,k−1 (y; x) .
dx
(8)

3.2 Proof of the Proposition 1

The first important idea is to prove Proposition 1 for all possible initial conditions
x ∈ [0, a], even if the value of interest is x = 0: in fact we prove the convergence
√  n,k   
n p̂ (x) − p(x) → N 0, (a − x) exp(−2(a − x)) . (9)
n→+∞

A natural idea is to introduce the characteristic function of p̂ n,k (x), and to fol-
low the strategy developed in [3]. Nevertheless, we are not able to derive a use-
ful functional equation with respect to the x variable. The strategy we adopt is to
study the asymptotic normality of the logarithm log( p̂ n,k (x)) of the estimator, and
to use a particular case of the delta-method (see for instance [15], Sect. 3): if for
a√sequence
 of real random variables√ (θn )n∈N and a real number θ ∈ R one has 
n θn − θ ) → N (0, σ 2 ), then n exp(θn ) − exp(θ ) → N 0, exp(2θ )σ 2 ,
n→∞ n→∞
where convergence is in distribution.
We thus introduce for any t ∈ R and any 0 ≤ x ≤ a
  √  
φn,k (t, x) := E exp it n log( p̂ n,k (x)) − log(P(x)) . (10)

We also introduce an additional auxiliary function (using P(x) = exp(x − a))


  √   √ 
χn,k (t, x) := E exp it n p̂ n,k (x) = exp it n(x − a) φn,k (t, x), (11)
Central Limit Theorem for Adaptive Multilevel Splitting … 253

for which Lemma 1 states a functional equation, with respect to the variable x ∈
[0, a]. By Lévy’s Theorem, Proposition 1 is a straightforward consequence (choosing
x = 0) of Proposition 2 below.
Proposition 2 For any k ∈ N∗ , any 0 ≤ x ≤ a and any t ∈ R

t 2 (x − a)
φn,k (t, x) → exp . (12)
n→+∞ 2

The rest of this section is devoted to the statement and the proof of four lemmas,
and finally to the proof of Proposition 2.
Lemma 1 (Functional Equation) For any n ∈ N and any k ∈ {1, . . . , n − 1}, and
for any t ∈ R, the function x → χn,k (t, x) is solution of the following functional
equation (with unknown χ ): for any 0 ≤ x ≤ a

 a
it n log(1− nk )
χ (t, x) = e χ (t, y) f n,k (y; x) dy (13)
x

k−1 √
n log(1− nl )
+ eit P(S(x)n(l) < a ≤ S(x)n(l+1) ), (14)
l=0

where (S(x)nj )1≤ j≤n are iid with law L (X |X > x) and where S(x)n(l) is the lth order
statistics of this sample (with convention S(x)n(0) = x).
Proof The idea (like in the proof of Proposition 4.2 in [3]) is to decompose the
expectation according to the value of the first level Z 1 = X (k) 0
. On the event
 1   n,k 
Z > a = J (x) = 0 , the algorithm stops and p̂ n,k (x) = n for the unique
n−l

l ∈ {0, . . . , k − 1} such that S(x)n(l) < a ≤ S(x)n(l+1) . Thus

√ 
k−1 √
n log( p̂n,k (x)) n log(1− nl )
E[eit 1 J n,k (x)=0 ] = eit P(S(x)n(l) < a ≤ S(x)n(l+1) ). (15)
l=0

If Z 1 < a, for the next iteration the algorithm restarts from Z 1 , and

n log( p̂n,k (x))
E[eit 1 J n,k (x)>0 ]
 √ √  n,k
 
it n log(1− nk ) it n log C n,k (x)(1− nk ) J (x)−1
=E e E[e |Z ]1 Z 1 <a
1

√  √ 
= eit n log(1− n ) E E[eit n log( p̂ (Z )) |Z 1 ]1 Z 1 <a
k n,k 1
(16)

= eit n log(1− n ) E χn,k (t, Z 1 )1 Z 1 <a
k


 a
it n log(1− nk )
=e χn,k (t, y) f n,k (y; x) dy.
x

Then (13) follows from (15), (16) and the definition (11) of χn,k . 
254 C.-E. Bréhier et al.

We exploit the functional equation (13) for x → χn,k (t, x), to prove that this
function is solution of a Linear Ordinary Differential Equation (ODE).
Lemma 2 (ODE) Let n and k ∈ {1, . . . , n − 2} be fixed. There exist real numbers
μn,k and (rmn,k )0≤m≤k−1 , depending only on n and k, such that for all t ∈ R, the
function x → χn,k (t, x) satisfy the following Linear Ordinary Differential Equation
(ODE) of order k: for x ∈ [0, a]

dk √ 
k−1
dm
it n log(1− nk ) n,k
k
χ n,k (t, x) = e μ χ n,k (t, x) + rmn,k m χn,k (t, x). (17)
dx m=0
dx

The coefficients μn,k and (rmn,k )0≤m≤k−1 satisfy the following properties:

μn,k = (−1)k n . . . (n − k + 1)

k−1
(18)
λk − rmn,k λm = (λ − n) . . . (λ − n + k − 1) for all λ ∈ R.
m=0

Observe that the ODE (17) is linear and that the coefficients are constant (with
respect to the variable x ∈ [0, a], for fixed parameters n, k and t). This nice property
is the main reason why we consider the function χn,k (given by (11)) instead of
φn,k (given by (10)); moreover it is also the reason why we study the characteristic
function of log( p̂ n,k (x)), instead of the one of p̂ n,k (x).

Proof The proof follows the same lines as Proposition 6.4 in [3]. We introduce


k−1 √
n log(1− nl )
Θn,k (t, x) := eit P(S(x)n(l) < a ≤ S(x)n(l+1) ).
l=0

Then by recursion, using the second line in (8), for 0 ≤ l ≤ k − 1 and for any x ≤ a
and t ∈ R
 a
dl   √
n,k it n log(1− nk )
χ n,k (t, x) − Θn,k (t, x) = μl e χn,k (t, y) f n,k−l (y; x) dy
d xl x

l−1
n,k d
m  
+ rm,l m
χn,k (t, x) − Θn,k (t, x) , (19)
m=0
dx

with the associated recursion


n,k n,k
⎧ n,k μ0 = 1, μl+1 = −(nn,k − k + l + 1)μln,k ;

⎨r0,l+1 = −(n − k + l + 1)r0,l , if l > 0,
n,k n,k n,k (20)
rm,l+1 = rm−1,l − (n − k + l + 1)rm,l , 1 ≤ m ≤ l,

⎩ n,k
rl,l = −1.
Central Limit Theorem for Adaptive Multilevel Splitting … 255

Using (19) for l = k − 1 and the first line of (8), one eventually obtains, by differ-
entiation, an ODE of order k:

dk   √
n,k it n log(1− nk )
χ n,k (t, x) − Θn,k (t, x) = μ e χn,k (t, x)
dxk

k−1
dm  
+ rmn,k m χn,k (t, x) − Θn,k (t, x) , (21)
m=0
dx

with μn,k := μn,kk and r m := r m,k .


n,k n,k

It is key to observe that the coefficients μn,k and (rmn,k )0≤m≤k−1 are defined by the
same recursion as in [3]. In particular, they do not depend on the parameter t ∈ R.
To see a proof of (18), we refer to Sect. 6.4 in [3].
It is clear that the polynomial equality in (18) is equivalent to the following
identity: for all j ∈ {0, . . . , k − 1}

dk 
k−1
dm
k
exp ((n − k + j + 1)(x − a)) = rmn,k m exp ((n − k + j + 1)(x − a)) .
dx m=0
dx

Due to the definition of the cumulative distribution functions of order statistics (7),
one easily checks that Θn,k (t, .) is a linear combination of the exponential functions
x → exp(nx), . . . , exp((n − k + 1)x); therefore

dk 
k−1 m
n,k d
Θn,k (t, x) = r m Θn,k (t, x).
dxk m=0
dxm

Thus the terms depending on Θn,k in (21) cancel out, and thus (17) holds true. 

The next steps are to give an explicit expression of the solution of (17) as a linear
combination of exponential functions, and to study the coefficients and the modes in
the asymptotic regime n → +∞. Since the ODE is of order k, in order to uniquely
determine the solution, more information is required: we need to know the derivatives
of order 0, 1, . . . , k − 1 of x → χn,k (t, x) at some point. We choose the terminal
point x = a (notice that by the change of variable x → a − x the ODE (17) can then
be seen as an ODE with an initial condition). This is the content of Lemma 3 below.

Lemma 3 (Terminal condition) For any fixed k ∈ {1, . . . , } and any t ∈ R, we have

χn,k (t, a) = 1
dm  (22)
χ (t, x)
d x m n,k
= O( √1n )n m if m ∈ {1, . . . , k − 1} .
x=a n→∞

Proof The equality χn,k (t, a) = 1 is trivial, since p̂ n,k (a) = 1. Equations (19) and
(21), immediately imply (by recursion) that for 1 ≤ m ≤ k − 1
256 C.-E. Bréhier et al.

dm  dm 
 
χ n,k (t, x)  = Θn,k (t, x)  .
dxm x=a dxm x=a

Introduce the following decomposition


k−1 √
(eit n log(1− n ) P(S(x)n(l) < a ≤ S(x)n(l+1) )
l
Θn,k (t, x) =
l=0
k−1 
 

n log(1− nl )

= eit − 1 Fn,l (a; x) − Fn,l+1 (a; x)
l=0


k−1
+ P(S(x)n(l) < a ≤ S(x)n(l+1) )
l=0
=: Ωn,k (t, x) + 1 − Fn,k (a; x),

where Fn,l denotes the cumulative distribution function of the lth order statistics
(with the convention Fn,0 (a; x) = 1 for x ≤ a), see (7).
Thanks to (8) and a simple recursion on l, it is easy to prove that for any 0 ≤ l ≤ k
and any m ≥ 1
dm 

Fn,l (a; x)  = O(n m ); (23)
dxm x=a

this immediately yields

dm 
 1
m
Ωn,k (t, x) = O( √ )n m .
dx x=a n→∞ n

In fact, it is possible to prove a stronger result: if 1 ≤ l ≤ k and 0 ≤ m < l then

dm 

Fn,l (a; x)  = 0,
dxm x=a

by recursion on l and using (8) recursively on m. We thus obtain for 1 ≤ m ≤ k − 1

dm  
1 − Fn,k (a; x)  = 0.
dxm x=a

This concludes the proof of Lemma 3. 

The last result we require is given by Lemma 4.


Lemma 4 (Asymptotic expansion) Let k ∈ {1, . . . , } and t ∈ R be fixed. Then for
n large enough, we have


k
(t)eλn,k (t)(x−a) ,
l
χn,k (t, x) = ηn,k
l
(24)
l=1
Central Limit Theorem for Adaptive Multilevel Splitting … 257

for complex coefficients satisfying:


 √ t2
λ1n,k (t) = it n + 2
+ o(1),
n→∞ (25)
ηn,k
1
(t) → 1;
n→∞

and for 2 ≤ l ≤ k
 i2π(l−1)
λln,k (t) ∼ n(1 − e k ),
n→∞ (26)
ηn,k
l
(t) → 0.
n→∞

Proof We denote by (λln,k (t))1≤l≤k the roots of the characteristic equation associated
with the linear ODE with constant coefficient (17) (with unknown λ ∈ C): thanks to
(18)
(n − λ)...(n − k + 1 − λ) √
− eit n log(1− n ) = 0
k

n...(n − k + 1)

By the continuity property of the roots of a complex polynomial of degree k with


respect to its coefficients, we have

l λln,k (t) l
λn,k (t) := → λ∞ ,
n n→∞

l
where (λ∞ (t))1≤l≤k are the roots of (1 − λ)k = 1: thus λ1n,k (t) and = o(n),
n→∞

i2π(l−1)
λln,k (t) ∼ n(1 − e k ).
n→∞

To study more precisely the asymptotic behavior of λ1n,k (t), we postulate an ansatz

λ1n,k (t) = ct n + dt + o(1);
n→∞

We then identify the coefficients ct = it and dt = t 2 /2 thanks to the expansions


 k  
ct dt ct k dt k − k2 ct2 1
1 − √ − + o(1) = 1− √ − +o
n n n→∞ n n n


it n log(1− nk ) itk t 2k2 1
e = 1− √ − +o .
n→∞ n 2n n

In particular, for n large enough, (λln,k (t))1≤l≤k are pairwise distinct, and (24) follows.
Then the coefficients (ηn,kl
(t))1≤l≤k are solutions of the following linear system
of equations of order k:
258 C.-E. Bréhier et al.

⎪ ηn,k (t) + ... + ηn,k (t) = χn,k (a),
⎪ 1 k


⎨ η1 (t)λ (t) + ... + ηk (t)λk (t) = 1 d χn,k (t, a),
1
n,k n,k n,k n,k n dx
... (27)

⎪  1 k−1  k k−1

⎪ 1
⎩ ηn,k (t) λn,k (t) + ... + ηn,k
k
(t) λn,k (t) 1 d k−1
= n k−1 χ (t, a).
d x k−1 n,k

Using Cramer’s rule, we express each ηn,k


l
(t) as a ratio of determinants (the denom-
inator is a Vandermonde determinant and is non zero when n is large enough). For
l ∈ {2, . . . , k}, we have
l
det(Mn,k (t))
ηn,k
l
(t) =  1 k
 → 0,
V λn,k (t), . . . , λn,k (t) n→+∞

where the matrix


⎛ ⎞
1 1 ... 1 ... 1
⎜ 1
λn,k (t)
2
λn,k (t) . . . O( √1n ) . . . λn,k (t) ⎟
k
⎜ ⎟
l
Mn,k (t) =⎜
⎜ .. .. .. .. .. .. ⎟

⎝ . . . . . . ⎠
 1 k−1  2 k−1  k k−1
λn,k (t) λn,k (t) . . . O( n ) . . . λn,k (t)
√ 1

1
l
is such that det(Mn,k (t)) → 0 (since λn,k (t) → 0), while the denominator is the
 1
n→+∞
k
  1 k

Vandermonde determinant V λn,k (t), . . . , λn,k (t) → V λ∞ (t), . . . , λ∞ (t)
n→+∞
= 0. !k
Finally, ηn,k
1
(t) = 1 − l=2 ηn,k
l
(t) → 1. This concludes the proof of
n→+∞
Lemma 4. 

We
 are√ now in position to prove Proposition 2. Indeed, recall that φn,k (t, x) =
exp −it n(x −a) χn,k (t, x) thanks to (10) and (11). Then taking the limit n → +∞
thanks to Lemma 4 gives the convergence of the characteristic function φn,k .

4 Numerical Results

In this section, we provide numerical illustration of the Central Limit Theorem 1. We


apply the algorithm with an exponentially distributed random variable with parameter
1—this is justified by the discussion in Sect. 3.1.
In the simulations below, the estimated probability is e−6 (≈ 2.48 10−3 ).
In Fig. 1, we fix the value k = 10, and we show histograms for n = 102 , 103 , 104 ,
with different values for the number M independent realizations of the algorithm,
such that n M = 108 (we thus have empirical variance of the same order for all
cases). In Fig. 1, we give the associated Q-Q plots, where the empirical quantiles of
Central Limit Theorem for Adaptive Multilevel Splitting … 259

Fig. 1 Histograms for k = 10 and p = exp(−6): n = 102 , 103 , 104 from left to right

Fig. 2 Q-Q plot for k = 10 and p = exp(−6): n = 102 , 103 , 104 from left to right

Fig. 3 Histograms for n = 104 and p = exp(−6): k = 1, 10, 100 from left to right

the sample are compared with the exact quantiles of the standard Gaussian random
variable (after normalization).
In Fig. 3, we show histograms for M = 104 independent realizations of the AMS
algorithm with n = 104 and k ∈ {1, 10, 100}; we also provide associated Q-Q plots
in Fig. 4.
From Figs. 1 and 2, we observe that when n increases, the normality of the esti-
mator is confirmed. Moreover, from Figs. 3 and 4, no significant difference when k
varies is observed.
260 C.-E. Bréhier et al.

Fig. 4 Q-Q plot for n = 104 and p = exp(−6): k = 1, 10, 100 from left to right

Acknowledgments C.-E. B. would like to thank G. Samaey, T. Lelièvre and M. Rousset for the
invitation to give a talk on the topic of this paper at the 11th MCQMC Conference, in the special
session on Mathematical aspects of Monte Carlo methods for molecular dynamics. We would also
like to thank the referees for suggestions which improved the presentation of the paper.

References

1. Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis. Springer, New
York (2007)
2. Au, S.K., Beck, J.L.: Estimation of small failure probabilities in high dimensions by subset
simulation. J. Probab. Eng. Mech. 16, 263–277 (2001)
3. Bréhier, C.E., Lelièvre, T., Rousset, M.: Analysis of adaptive multilevel splitting algorithms in
an idealized case. ESAIM Probab. Stat., to appear
4. Cérou, F., Del Moral, P., Furon, T., Guyader, A.: Sequential Monte Carlo for rare event esti-
mation. Stat. Comput. 22(3), 795–808 (2012)
5. Cérou, F., Guyader, A.: Adaptive multilevel splitting for rare event analysis. Stoch. Anal. Appl.
25(2), 417–443 (2007)
6. Cérou, F., Guyader, A.: Adaptive particle techniques and rare event estimation. In: Conference
Oxford sur les méthodes de Monte Carlo séquentielles, ESAIM Proceedings, vol. 19, pp. 65–72.
EDP Sci., Les Ulis (2007)
7. Cérou, F., Guyader, A., Lelièvre, T., Pommier, D.: A multiple replica approach to simulate
reactive trajectories. J. Chem. Phys. 134, 054108 (2011)
8. Cérou, F., Guyader, A., Del Moral, P., Malrieu, F.: Fluctuations of adaptive multilevel splitting.
e-preprints (2014)
9. Glasserman, P., Heidelberger, P., Shahabuddin, P., Zajic, T.: Multilevel splitting for estimating
rare event probabilities. Oper. Res. 47(4), 585–600 (1999)
10. Guyader, A., Hengartner, N., Matzner-Løber, E.: Simulation and estimation of extreme quan-
tiles and extreme probabilities. Appl. Math. Optim. 64(2), 171–196 (2011)
11. Kahn, H., Harris, T.E.: Estimation of particle transmission by random sampling. Natl. Bur.
Stand. Appl. Math. Ser. 12, 27–30 (1951)
12. Rubino, G., Tuffin, B.: Rare Event Simulation using Monte Carlo Methods. Wiley, Chichester
(2009)
13. Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Anal. 1(4), 833–859
(2006)
14. Simonnet, E.: Combinatorial analysis of the adaptive last particle method. Stat. Comput. (2014)
15. van der Vaart, A.W.: Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic
Mathematics, vol. 3. Cambridge University Press, Cambridge (1998)
Comparison Between L S-Sequences
and β-Adic van der Corput Sequences

Ingrid Carbone

Abstract In 2011 the author introduced a generalization of van der Corput


sequences, the so called L S-sequences defined for integers L , S such that L ≥ 1,
S ≥ 0, L + S ≥ 2, and γ ∈ ]0, 1[ is the positive solution of Sγ 2 + Lγ = 1. These
sequences coincide with the classical van der Corput sequences whenever S = 0,
are uniformly distributed for all L , S and have low discrepancy when L ≥ S. In
this paper we compare the L S-sequences and the β-adic van der Corput sequences
where β > 1 is the Pisot root of x 2 − L x − L. Using a suitable numeration sys-
tem G = {G n }n≥0 , where the base sequence is the linear recurrence of order two,
G n+2 = LG n+1 + LG n , with initial conditions G 0 = 1 and G 1 = L + 1, we prove
that when L = S the (L , L)-sequence with Lγ 2 + Lγ = 1 and the β-adic van der
Corput sequence with β = 1/γ and β 2 = Lβ + L can be obtained from each other
by a permutation. In particular for β = Φ, the golden ratio, the β-adic van der Corput
sequence coincides with the Kakutani–Fibonacci sequence obtained for L = S = 1,
which has been already studied.

Keywords Uniform distribution · Discrepancy · Numeration systems · van der


Corput sequences

1 Introduction

In this paper we compare two classes of low discrepancy sequences which have been
introduced relatively recently and which have an interesting overlap.
We are interested in β-adic van der Corput sequences, which have been introduced
in [3, 22, 23]. Their motivation stems from algebraic arguments and is related to the
β-adic representation of real numbers introduced by [24]. They have been studied

I. Carbone (B)
Department of Mathematics and Informatics, University of Calabria,
Ponte P. Bucci Cubo 30B, 87036, Arcavacata di Rende, Cosenza, Italy
e-mail: i.carbone@unical.it

© Springer International Publishing Switzerland 2016 261


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_11
262 I. Carbone

quite extensively. For good references on the subject we suggest [3, 18, 22, 23]. For
the original definition of van der Corput sequence see [25].
The other, more recent, class is represented by the L S-sequences which have been
introduced in [4] and have been object of several papers. These sequences have a
more geometric motivation and are related to a generalization of an idea of Kakutani
([20]), which appeared in [26]. For another generalization of the Kakutani splitting
procedure to the multidimensional setting see [8]; for two possible generalizations
of L S-sequences to dimension 2, see [6]. Other papers dedicated to the subject are
[9–11, 19].
As it has been shown in [4], each L S-sequence of points corresponds to the
reordering by a suitable algorithm of the points defining a specific sequence of
partitions of [0, 1[, which depends on two nonnegative integers L and S, such that
L + S ≥ 2, L ≥ 1 and S ≥ 0. These sequences will be defined in the next section.
An interesting role is played by the (1, 1)-sequence, called in [4] the Kakutani–
Fibonacci sequence of points, which has also been studied from an ergodic point of
view in [7, 18].
Each β-adic van der Corput sequence is associated to a characteristic equation
x d = a0 x d−1 + a1 x d−2 + · · · + ad−1 , with some restrictions on the coefficients.
These restrictions imply, when d = 2, that a1 = a0 or a1 = a0 + 1. In the latter case
β-adic sequences are nothing else but the classical van der Corput sequences with
base b = a0 − 1.
This paper is concerned with the study of the interesting overlap between β-adic
sequences of order two (with a0 = a1 ) and the corresponding L S-sequences for
L = S.
It should be noted that both families of sequences are much richer: the β-adic
sequences can be defined for any order d ≥ 2 (with appropriate restrictions on the
coefficients), while the L S-sequences can be defined for any pair of positive integers
L , S and have low discrepancy whenever L ≥ S.
The main result is Theorem 2, which states that for L = S the (L , L)-sequence
and the β-adic van der Corput sequence, which corresponds to the positive root of
x 2 − L x − L, can be obtained from each other by a permutation. In particular, when
L = S = 1, the Kakutani–Fibonacci (1, 1)-sequence coincides with the β-adic
van der Corput sequence where β = Φ is the golden ratio, i.e. the positive root of
x 2 − x − 1 (see also [18] for more details).
β-adic sequences and L S-sequences provide, in dimension 1, low-discrepancy
sequences.
Having new low-discrepancy sequences at our disposal, it is important to obtain
a more complete understanding of their behavior in order to use them in the Quasi
Monte Carlo method, pairing them à la Halton (as it has been done in [16] and in
[17]). This is the main motivation of this paper.
For the L S-sequences the problem has been posed for the first time in [6]. It should
be noted that partial negative results have been obtained quite recently by [2]. This is
one of the most interesting open problems concerning L S-sequences. On the other
hand, a recent result ([18]) proved uniform distribution of the Halton type sequences
for β-adic van der Corput sequences.
Comparison Between L S-Sequences and β-Adic … 263

2 Preliminaries and Results

We recall that, given a sequence {xn }n≥1 ⊂ [0, 1[, its discrepancy is defined by the
sequence {D N } N ≥1 , where
 
 
1  N

D N = D(x1 , . . . , x N ) = sup   1[a, b[ (x j ) − (b − a) .
0≤a<b≤1  N j=1 

We say that {xn }n≥1 has low discrepancy if there exists a constant C such that
N D N ≤ C log N .
We recall that a sequence {xn }n≥1 is uniformly distributed if D N → 0 as N tends to
infinity. For extensive accounts on uniform distribution, discrepancy and applications
see [12] and [21].
If we consider a sequence {πn }n≥1 of finite partitions of [0, 1[, with πn =
{[yi(n) , yi+1
(n)
[, 1 ≤ i ≤ tn }, where y1(n) = 0 and yt(n) n +1
= 1, according to the defi-
nition above we say that its discrepancy D(πn ) is the discrepancy of {Q n }n≥1 , where
Q n = {y1(n) , . . . , yt(n)
n
}, and that {πn }n≥1 is uniformly distributed if D(πn ) → 0 as n
tends to infinity. Moreover, if there exists a constant C such that tn D(πn ) ≤ C, we
say that {πn }n≥1 has low discrepancy.

Definition 1 (Kakutani, [20]) Given a finite partition π of [0, 1[ and given α ∈


]0, 1[, its α-refinement, denoted by απ , is the partition obtained by subdividing
all the intervals of π having maximal length proportionally to α and 1 − α. The
so-called Kakutani’s α-sequence of partitions {α n π }n≥1 is obtained by successive
α-refinements of π .

When π is the trivial partition ω = {[0, 1[} of [0, 1[, the sequence {α n ω}n≥1 is
uniformly distributed ([20]).
In [26] the following generalization of Kakutani’s splitting procedure is given.

Definition 2 (Volčič, [26]) For a given non-trivial finite partition ρ of [0, 1[, the
ρ-refinement of a partition π of [0, 1[ (denoted by ρπ ) is obtained by subdividing all
the intervals of π having maximal length positively (or directly) homothetically to
ρ. If for any n ∈ N we denote by ρ n π the ρ-refinement of ρ n−1 π , we get a sequence
of partitions {ρ n π }n≥1 , called the sequence of successive ρ-refinements of π .

Obviously, if ρ = {[0, α[, [α, 1[}, then the ρ-refinement is just Kakutani’s α-
refinement.
In [26] Volčič proved that the sequence {ρ n ω}n≥1 is uniformly distributed for
any partition ρ, and in [1] the authors, solving a problem posed in [26], provided
necessary and sufficient conditions on π and ρ under which the sequence {ρ n π }n≥1
is uniformly distributed.
The L S-sequences of partitions are a special case.
264 I. Carbone

Definition 3 (Carbone, [4]) Let us fix two nonnegative integers L ≥ 1 and S ≥


0 with L + S ≥ 2 and let γ be the positive solution of the quadratic equation
Sx 2 + L x = 1 (if S = 0, the equation is linear). Denote by ρ L ,S the partition defined
by L “long” intervals having length γ followed by S “short” intervals having length
γ 2 (if S = 0, all the L intervals have the same length γ = 1/L). The sequence
of successive ρ L ,S -refinements of the trivial partition ω is called L S-sequence of
partitions and is denoted by {ρ Ln ,S ω}n≥1 (or {ρ Ln ,S } for short).

Whenever L = b and S = 0, we simply get the b-adic sequence of partitions in


base b.
If we denote by tn the total number of intervals of ρ Ln ,S , by ln the number of its
long intervals and by sn the number of its short intervals, it is very simple to see that
tn = ln + sn , ln = L ln−1 + sn−1 and sn = S ln−1 .
The sequence {ρ1,1 n
}, called in [4] Kakutani–Fibonacci sequence of partitions,

corresponds to L = S = 1 and γ = 21 ( 5 − 1), which is the inverse of the golden
ratio Φ. It is a Kakutani α-sequence, with α = γ .
In Theorem 2.2 of [4] we gave explicit and very precise estimates of the dis-
crepancy of L S-sequences of partitions, proving in particular that they have low
discrepancy when L ≥ S.
To each L S-sequence of partitions {ρ Ln ,S } we associate the L S-sequence of points,
denoted by {ξ Ln ,S }. Actually, the underlying geometric construction is strongly based
on the partitions. We refer to [5] for further details. For the original definition based
on the reordering the points of each partition ρ Ln ,S , see [4].
We define for every 0 ≤ i ≤ L − 1 the functions ψi (x) = γ x + iγ restricted
to 0 ≤ x < 1, and for every L ≤ i ≤ L + S − 1 (and S > 0) the functions
ψi (x) = γ x + Lγ + (i − L)γ 2 restricted to 0 ≤ x < γ .
We denote by E L ,S the set consisting of all the pairs of indices which correspond
to the “forbidden” compositions ψi, j = ψi ◦ ψ j , i.e.

E L ,S = {L , L + 1, . . . , L + S − 1} × {1, . . . , L + S − 1}. (1)

If S = 0, the first factor is empty, so E L ,0 = ∅.


We recall that any natural number n ≥ 1 can be expressed in base b ≥ 1 as


M
n= ak (n) bk , (2)
k=0

with ak (n) ∈ {0, 1, . . . , b − 1} for all 0 ≤ k ≤ M, and M = logb n (here and in the
sequel  ·  denotes the integer part). The expression (2) leads to the representation
in base b of n
[n]b = a M (n)a M−1 (n) . . . a0 (n) . (3)

If n = 0, we write [0]b = 0. The representation of n in base b given by (2) is used


to define the radical-inverse function φb on N which associates the number
Comparison Between L S-Sequences and β-Adic … 265


M
φb (n) = ak (n)b−k−1 (4)
k=0

to the string of digits (3), whose representation in base b is 0.a0 (n)a1 (n) . . . a M (n).
Of course 0 ≤ φb (n) < 1 for all n ≥ 0.
The sequence {φb (n)}n≥0 is the van der Corput sequence in base b.

Definition 4 (Carbone, [5]) Let γ be the positive solution of the equation Sx 2 +


L x = 1. We denote by N L ,S the set of all positive integers n, ordered by magnitude,
with [n] L+S = a M (n) a M−1 (n) . . . a0 (n) such that (ak (n), ak+1 (n)) ∈
/ E L ,S for all
0 ≤ k ≤ M − 1. If S = 0, we have N L ,S = N. For all n ∈ N L ,S we define the
L S-radical inverse function as follows:


M
φ L ,S (n) = ãk (n) γ k+1 , (5)
k=0

where ãk (n) = ak (n) if 0 ≤ ak (n) ≤ L − 1 and ãk (n) = L + γ (ak (n) − L) if
L ≤ ak (n) ≤ L + S − 1. If S = 0, (5) coincides with the radical inverse function
(4).

Definition 5 (Carbone, [5]) The sequence {φ L ,S (n)} defined on N L ,S is the L S-


sequence of points.

If the L S-sequence of partitions {ρ Ln ,S } has low discrepancy, the corresponding


L S-sequence of points {φ L ,S (n)} has low discrepancy, too. In fact, if we denote by
Ξ LN,S = {ξ L1 ,S , ξ L2 ,S , . . . , ξ LN,S } the first N elements of the sequence {φ L ,S (n)}, we
have the following result.

Theorem 1 (Carbone, [4])


(i) If S ≤ L there exists k1 > 0 such that N D(Ξ LN,S ) ≤ k1 log N for any N ∈ N.
(ii) If S = L + 1 there exist k2 , k2
> 0 such that k2
log N ≤ D(Ξ LN,S ) ≤ k2 log2 N
for any N ∈ N.
(iii) If S ≥ L + 2 there exist k3 , k3
> 0 such that k3 N 1−τ ≤ D(Ξ LN,S ) ≤
)
k3 N 1−τ log N for any N ∈ N, where 1 − τ = − log(Sγ log γ
> 0.

Let us observe that if L = b and S = 0, the L S-sequence reduces to the van der
Corput sequence in base b.
The simple case L = S = 1 has been widely studied. For a dynamical approach to
the (1, 1)-sequence, see [7]. The sequence {ξ1,1
n
}, called in [5] Kakutani–Fibonacci
sequence of points, corresponds to the Kakutani–Fibonacci sequence of partitions
{ρ1,1
n
}.
The set (1) reduces to {(1, 1)} and, according to Definition 4, N1,1 is the set of
all natural numbers n such that the binary representation (3) does not contain two
consecutive digits equal to 1. Moreover, the (1, 1)-radical inverse function defined
by (5) on N1,1 is
266 I. Carbone


M
φ1,1 (n) = ak (n) γ k+1 , (6)
k=0

with the same coefficients ak (n) of the representation of n given by (3) for b = 2.
We will use the notation {ξ1,1
n
}n∈N or {φ1,1 (n)}n∈NL ,S for the Kakutani–Fibonacci
sequence of points.
We conclude this section with some basic notions on numeration systems with
respect to a linear recurrence base sequence (for more details see [13]).
If G = {G n }n≥0 is an increasing sequence of natural numbers with G 0 = 1, any
n ∈ N can be expanded with respect to this sequence as follows:


n= εk (n)G k . (7)
k=0

N
This expansion is finite and unique if for every N ∈ N we have k=0 εk (n)G k <
G N +1 . G is called numeration system and (7) the G-expansion of n. The digits εk
can be computed by the greedy algorithm (see, for instance, [14]).
Let us consider now a special numeration system, where the base sequence is a
linear recurrence of order d ≥ 1, namely

G n+d = a0 G n+d−1 + · · · + ad−1 G n , n ≥ 0, (8)

with G 0 = 1 and G k = a0 G k−1 + · · · + ak−1 G 0 + 1 for k < d.


When the coefficients of the characteristic equation

x d = a0 x d−1 + · · · + ad−1 (9)

associated to the linear recurrence (8) are decreasing, namely a0 ≥ · · · ≥ ad−1 ≥ 1,


we know that the largest root β of (9) is a Pisot number.
We recall that a Pisot number is a real algebraic integer q > 1 such that all its
Galois conjugates have absolute value strictly less than 1. If P(x) is a polynomial
with exactly one Pisot number β as a zero, β is called the Pisot root of P.
The most famous example of a Pisot number is the golden ratio Φ, which is
the Pisot root of the equation x 2 = x + 1 associated to the numeration system
G = {G n }n≥0 , where {G n }n≥0 is the Fibonacci sequence.

Definition 6 (Barat–Grabner, [3]) If (7) is the G-expansion of the natural number n


and β is the Pisot root of (9), the sequence {φβ (n)}n≥0 where φβ is the β-adic Monna
map defined by
∞
φβ (n) = εk (n)β −k−1 , (10)
k=0

is called β-adic van der Corput sequence.


Comparison Between L S-Sequences and β-Adic … 267

If β = b is a natural number greater than 1, the sequence {φβ (n)}n≥0 is the classical
van der Corput sequence in base b.

3 Results

In order to compare L S-sequences and β-adic van der Corput sequences, let us recall
that the sequence {φβ (n)}n≥0 defined by (10) is not necessarily contained and dense
in [0, 1[. A partial answer can be found in [3], where it is proved that if β is the Pisot
root of the characteristic Eq. (9) associated to the numeration system G defined by
(8), where a0 = · · · = ad−1 , then the sequence {φβ (n)}n≥0 is uniformly distributed
in [0, 1[ and has low discrepancy. In this case, the sequence is called the Multinacci
sequence.
A complete answer has been given very recently by [18], where the authors proved
the following result.

Lemma 1 (Hofer—Iacò—Tichy, [18]) Let a = (a0 , . . . , ad−1 ), where the integers


a0 , . . . , ad−1 ≥ 0 are the coefficients of the numeration system G and assume that
the corresponding characteristic root β satisfies (9). Furthermore, assume that there
is no b = (b0 , . . . , bk−1 ) with k < d such that β is the characteristic root of
the polynomial defined by b. Then φβ (N) ⊂ [0, 1[ and φβ (N) ⊂ [0, x[ for some
0 < x < 1 if and only if a can be written either as

a = (a0 , . . . , a0 ) (11)

or
a = (a0 , a0 − 1, . . . , a0 − 1, a0 ), (12)

where a0 > 0.

We notice that the above lemma does not require the assumption of decreasing
coefficients. In [18] it is also observed that, if the condition that d has to be minimal
is dropped, then there exist two more cases in which the above theorem is satisfied.
We are interested in the following case:

a = (a0 , . . . , a0 , a0 + 1). (13)

From now on we shall restrict our attention to the case d = 2, and consequently
to (11) and (12). Let us consider the numeration system G = {G n }n≥0 defined by the
linear recurrence of order d = 2

G n+2 = a0 G n+1 + a1 G n , n ≥ 0, (14)

with the initial conditions


268 I. Carbone

G 0 = 1 and G 1 = a0 + 1. (15)

According to [18], if β is the solution of the characteristic equation x 2 = a0 x +a1 ,


the β-adic van der Corput sequence {φβ (n)}n≥0 is uniformly distributed if and only
if a0 = a1 (and β is not the root of any equation of order 1), or a1 = a0 + 1 (and β is
the root of the equation of order 1 associated to the linear recurrence G n+1 = a0 G n ).
At this point we come back to our L S-sequences and state our main result.

Theorem 2 When L = S, the L S-sequence {ξ Ln ,L } is a reordering of the β-adic van


der Corput sequence, where 1/β is the solution of the equation L x 2 + L x = 1.

Proof In the case L = S = 1, the Kakutani–Fibonacci sequence {ξ1,1 n


} actually
coincides with the β-adic van der Corput sequence, where β = 1/γ is the golden
ratio Φ.
We know that {ξ1,1 n
} can be written as {φ1,1 (n)}n∈N1,1 (see (6)) where N1,1 (see
Definition 4) is the set of all the natural numbers whose binary representation

(3)
does not contain two consecutive digits equal to 1. Moreover, γ = 5−1 2
is the
solution of the equation γ + γ 2 = 1. If we consider now the linear recurrence (14),
namely G n+2 = G n+1 + G n with the initial conditions (15)√given by G 0 = 1 and
G 1 = 2, we have already noticed that the golden ratio β = 1+2 5 is the solution of the
equation β 2 = β + 1 and that γ1 = β. Furthermore, it is clear that {G n }n≥0 = {tn }n≥0 ,
where tn is the total number of intervals of the nth partition of the Kakutani–Fibonacci
sequence of partitions {ρ1,1
n
} defined in Sect. 2, which satisfies tn+2 = tn+1 + tn , with
t0 = 1 and t1 = 2. Here tn (0) = 1 corresponds to ρ1,1 0
= [0, 1[.
The coefficients εk (n) of the related β-adic van der Corput sequence {φβ (n)}n≥0
defined by (10) can be evaluated with the greedy algorithm: it is very simple to
see that εk (n) ∈ {0, 1} and that the expansion (7) does not contain two consecutive
coefficients equal to 1. In both representations, the β-adic Monna map and the (1, 1)-
radical inverse function coincide on their domain and the proof is complete.
This result appears also in [18].
Now we prove the statement of the theorem in the case L = S ≥ 2, showing that
the set of the images of the radical inverse function φ L ,L (n) defined by (5) coincides
with the set of the images of the β-adic Monna map φβ (n) defined by (10).
More precisely, we consider n ∈ N L ,L . According to Definition 4, n has a repre-
sentation [n]2L = a M (n) a M−1 (n) . . . a0 (n) in base 2L such that (ak (n), ak+1 (n))
/ E L ,L for all 0 ≤ k ≤ M − 1, where E L ,L = {L , L + 1, . . . , 2L − 1} ×

{1, 2, . . . , 2L − 1} (see (1)). For such n ∈ N L ,L we consider the (L , L)-sequence
{φ L ,L (n)}, where


M
φ L ,L (n) = ãk (n) γ k+1 , (16)
k=0

with ãk (n) = ak (n) if 0 ≤ ak (n) ≤ L − 1 and ãk (n) = L + γ (ak (n) − L) if
L ≤ ak (n) ≤ 2L − 1, and where Lγ + Lγ 2 = 1.
Comparison Between L S-Sequences and β-Adic … 269

We now restrict our attention to the digits ãk (n) in the case L ≤ ak (n) ≤ 2L − 1.
If we put ak (n) = L + m, with 0 ≤ m ≤ L − 1, we can write ãk (n) = L + mγ .
Consequently, we have ãk (n)γ k+1 = Lγ k+1 + mγ k+2 .
From the condition (ak (n), ak+1 (n)) ∈
/ E L ,L we derive that ak+1 (n) must be equal
to 0, and that ak−1 (n) has to belong to the set {0, 1, . . . , L − 1}. Three consecutive
powers of γ can be grouped in the partial sum

ãk−1 (n)γ k + ãk (n)γ k+1 + ãk+1 (n)γ k+2 = ak−1 (n)γ k + Lγ k+1 + mγ k+2 ,

and in (16) we also admit two consecutive digits belonging to the set {L}×{1, . . . , L−
1}.
Taking the set E L ,L into account, (16) can be written with new coefficients ak
(n),
which are nonnegative integer numbers such that (ak
(n), ak+1

/ E L
,L , where
(n)) ∈
 
E L
,L = E L ,L \ {L} × {0, 1, . . . , L − 1} =
   
= {L + 1, . . . , 2L − 1} × {1, . . . , 2L − 1} ∪ {L} × {L , . . . , 2L − 1} . (17)

Now we consider the β-adic van der Corput sequence {φβ (n)}n≥0 , where



φβ (n) = εk (n)β −k−1 ,
k=0

and 1/β = γ is the Pisot root of x 2 = a0 x + a0 , where a0 = L, which is the


characteristic equation associated to the numeration system G = {G n }n≥0 , with
G n+2 = a0 (G n+1 + G n ) and initial conditions G 0 = 1 and G 1 = a0 + 1.
By Theorem 2 of [13] we know that the digits εk of the G-expansion (7) of the
natural number n have to satisfy the condition (εk , εk+1 ) ∈/ E L
,L , where E L
,L is
defined by (17), and the theorem is completely proved. 

It follows from Lemma 1 that in Theorem 2 we considered all the β-adic van der
Corput sequences of order two, apart for the classical van der Corput sequences. On
the other hand, there exist many other LS-sequences having low discrepancy.

References

1. Aistleitner, C., Hofer, M.: Uniform distribution of generalized Kakutani’s sequences of parti-
tions. Annali di Matematica Pura e Applicata (4). 192(4), 529–538 (2013)
2. Aistleitner, C., Hofer, M., Ziegler, V.: On the uniform distribution modulo 1 of multidimensional
L S-sequences. Annali di Matematica Pura e Applicata (4). 193(5), 1329–1344 (2014)
3. Barat, G., Grabner, P.: Distribution properties of G-additive functions. J. Number Theory 60,
103–123 (1996)
270 I. Carbone

4. Carbone, I.: Discrepancy of L S sequences of partitions and points. Annali di Matematica Pura
e Applicata (4). 191(4), 819–844 (2012)
5. Carbone, I.: Extension of van der Corput algorithm to L S-sequences. Appl. Math. Comput.
255, 207–2013 (2015)
6. Carbone, I., Iacò, M.R., Volčič, A.: L S-sequences of points in the unit square. submitted
arXiv:1211.2941 (2012)
7. Carbone, I., Iacò, M.R., Volčič, A.: A dynamical system approach to the Kakutani-Fibonacci
sequence. Ergod. Theory Dyn. Syst. 34(6), 1794–1806 (2014)
8. Carbone, I., Volčič, A.: Kakutani splitting procedure in higher dimension. Rendiconti
dell’Istituto Matematico dell’Università di Trieste 39, 119–126 (2007)
9. Carbone, I., Volčič, A.: A von Neumann theorem for uniformly distributed sequences of par-
titions. Rendiconti del Circolo Matematico di Palermo 60(1–2), 83–88 (2011)
10. Chersi, F., Volčič, A.: λ-equidistributed sequences of partitions and a theorem of the de Bruijn-
Post type. Annali di Matematica Pura e Applicata 4(162), 23–32 (1992)
11. Drmota, M., Infusino, M.: On the discrepancy of some generalized Kakutani’s sequences of
partitions. Unif. Distrib. Theory 7(1), 75–104 (2012)
12. Drmota, M., Tichy, R.F.: Sequences Discrepancies and Applications. Lecture Notes in Mathe-
matics. Springer, Berlin (1997)
13. Fraenkel, A.S.: Systems of numeration. Am. Math. Mon. 92(2), 105–114 (1985)
14. Frougny, C., Solomyak, B.: Finite beta-expansions. Ergod. Theory Dyn. Syst. 12, 713–723
(1992)
15. Grabner, P., Hellekalek, P., Liardet, P.: The dynamical point of view of low-discrepancy
sequences. Unif. Distrib. Theory 7(1), 11–70 (2012)
16. Halton, J.H.: On the efficiency of certain quasi-random sequences of points in evaluating multi-
dimensional integrals. Numerische Mathematik 2, 84–90 (1960)
17. Hammersley, J.M.: Monte-Carlo methods for solving multivariate problems. Ann. N. Y. Acad.
Sci. 86, 844–874 (1960)
18. Hofer, M., Iacò, M.R., Tichy, R.: Ergodic properties of the β-adic Halton sequences. Ergod.
Theory Dyn. Syst. 35, 895–909 (2015)
19. Infusino, M., Volčič, A.: Uniform distribution on fractals. Unif. Distrib. Theory 4(2), 47–58
(2009)
20. Kakutani, S.: A problem on equidistribution on the unit interval [0, 1[. In: Measure theory
(Proc. Conf., Oberwolfach, 1975), Lecture Notes in Mathematics 541, pp. 369–375. Springer,
Berlin (1976)
21. Kuipers, L., Niederreiter, H.: Unif. Distrib. Seq. Pure and Applied Mathematics. Wiley, New
York (1974)
22. Ninomiya, S.: Constructing a new class of low-discrepancy sequences by using the β-adic
transformation. IMACS Seminar on Monte Carlo Methods (Brussels, 1997). Math. Comput.
Simul. 47(2–5), 403–418 (1998)
23. Ninomiya, S.: On the discrepancy of the β-adic van der Corput sequence. J. Math. Sci. 5,
345–366 (1998)
24. Rényi, A.: Representations for real numbers and their ergodic properties. Acta Mathematica
Academiae Scientiarum Hungaricae 8, 477–493 (1957)
25. van der Corput, J.G.: Verteilungsfunktionen. Proc. Koninklijke Nederlandse Akademie Van
Wetenschappen 38, 813–821 (1935)
26. Volčič, A.: A generalization of Kakutani’s splitting procedure. Annali di Matematica Pura e
Applicata (4). 190(1), 45–54 (2011)
Computational Higher Order Quasi-Monte
Carlo Integration

Robert N. Gantner and Christoph Schwab

Abstract The efficient construction of higher-order interlaced polynomial lattice


rules introduced recently in [Dick et al. SIAM Journal of Numerical Analysis,
52(6):2676–2702, 2014] is considered and the computational performance of these
higher-order QMC rules is investigated on a suite of parametric, high-
dimensional test integrand functions. After reviewing the principles of their con-
struction by the “fast component-by-component” (CBC) algorithm due to Nuyens
and Cools as well as recent theoretical results on their convergence rates from
[Dick, J., Kuo, F.Y., Le Gia, Q.T., Nuyens, D., Schwab, C.: Higher order QMC
Petrov–Galerkin discretization for affine parametric operator equations with random
field inputs. SIAM J. Numer. Anal. 52(6) (2014), pp. 2676–2702], we indicate algo-
rithmic aspects and implementation details of their efficient construction. Instances
of higher order QMC quadrature rules are applied to several high-dimensional test
integrands which belong to weighted function spaces with weights of product and
of SPOD type. Practical considerations that lead to improved quantitative conver-
gence behavior for various classes of test integrands are reported. The use of (ana-
lytic or numerical) estimates on the Walsh coefficients of the integrand provide
quantitative improvements of the convergence behavior. The sharpness of theoret-
ical, asymptotic bounds on memory usage and operation counts, with respect to
the number of QMC points N and to the dimension s of the integration domain is
verified experimentally to hold starting with dimension as low as s = 10 and with
N = 128. The efficiency of the proposed algorithms for computation of the gener-
ating vectors is investigated for the considered classes of functions in dimensions
s = 10, ..., 1000. A pruning procedure for components of the generating vector
is proposed and computationally investigated. The use of pruning is shown to yield
quantitative improvements in the QMC error, but also to not affect the asymptotic con-
vergence rate, consistent with recent theoretical findings from [Dick, J., Kritzer, P.:

R.N. Gantner (B) · C. Schwab


Seminar for Applied Mathematics, ETH Zürich, Rämistrasse 101, Zurich, Switzerland
e-mail: robert.gantner@sam.math.ethz.ch
C. Schwab
e-mail: christoph.schwab@sam.math.ethz.ch

© Springer International Publishing Switzerland 2016 271


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_12
272 R.N. Gantner and C. Schwab

On a projection-corrected component-by-component construction. Journal of Com-


plexity (2015) DOI 10.1016/j.jco.2015.08.001].

Keywords Quasi-Monte Carlo · Higher Order · Polynominal Lattice Rule

1 Introduction

The efficient approximation of high-dimensional integrals is a core task in many areas


of scientific computing. We mention only uncertainty quantification, computational
finance, computational physics and chemistry, and computational biology. In partic-
ular, high-dimensional integrals arise in the computation of statistical quantities of
solutions to partial differential equations with random inputs.
In addition to efficient spatial and temporal discretizations of partial differential
equation models, it is important to devise high-dimensional quadrature schemes that
are able to exploit an implicitly lower-dimensional structure in parametric input data
and solutions of such PDEs. The rate of convergence of Monte Carlo (MC) methods is
dimension-robust, i.e. the convergence rate bound holds with constants independent
of the problem dimension provided that the variances are bounded independent of
the dimension, but it is limited to 1/2. Thus it is important to devise integration
methods which converge of higher order than 1/2, independent of the dimension of
the integration domain.
In recent years, numerous approaches to achieve this type of higher-order conver-
gence have been proposed; we mention only quasi Monte-Carlo integration, adaptive
Smolyak quadrature, adaptive polynomial chaos discretizations, and related methods.
In the present paper, we consider the realization of novel higher-order interlaced
polynomial lattice rules introduced in [6, 10, 11], which allow an integrand-adapted
construction of a quasi-Monte Carlo quadrature rule that exploits sparsity of the
parameter-to-solution map. We consider in what follows the problem of integrating
a function f : [0, 1)s → R of s variables y1 , . . . , ys over the s-dimensional unit
cube, 
I [ f ] := f (y1 , . . . , ys ) dy1 · · · dys . (1)
[0,1)s

Exact computation quickly becomes infeasible and we must, in most applications,


resort to an approximation of (1) by a quadrature rule. We focus on quasi-Monte Carlo
quadrature rules; more specifically, we consider interlaced polynomial lattice point
sets for functions in weighted spaces with weights of product and smoothness-driven
product-and-order-dependent (SPOD) type. Denoting the interlaced polynomial lat-
tice point set by P = {x (0) , . . . , x (N −1) } with x (n) ∈ [0, 1)s for n = 0, . . . , N − 1,
we write the QMC quadrature rule as

N −1
1 
QP [ f ] := f (x (n) ).
N n=0
Computational Higher Order Quasi-Monte Carlo Integration 273

In Sect. 2 we first define in more detail the structure of the point set P considered
throughout and derive worst-case error bounds for integrand functions which belong
to certain weighted spaces of functions introduced in [13]. Then, the component-by-
component construction is reviewed and the worst-case error reformulated to allow
efficient computation. The main contribution of this paper is found in Sects. 4 and 5,
which mention some practical considerations required for efficient implementation
and application of these rules. In Sect. 5, we give measured convergence results for
several model integrands, showing the applicability of these methods.

2 Interlaced Polynomial Rank-1 Lattice Rules

Polynomial rank-1 lattice point sets, introduced by Niederreiter in [14], are a modifi-
cation of standard rank-1 lattice point sets to polynomial arithmetic in Zb [x] (defined
in the next section). A polynomial lattice rule is an equal-weight quasi-Monte Carlo
(QMC) quadrature rule based on such point sets. Here, we consider the higher-order
interlaced polynomial lattice rules introduced in [6, Def. 3.6], [7, Def. 5.1] and focus
on computational techniques for their efficient construction.

2.1 Definitions

For a given prime number b, let Zb denote the finite field of order b and Zb [x] the set
of polynomials with coefficients in Zb . Let P ∈ Zb [x] be an irreducible polynomial
of degree m. Then, the finite field of order bm is isomorphic to the residue class
(Zb [x]/P, +, ·), where both operations are carried out in Zb [x] modulo P. We denote
by G b,m = ((Zb [x]/P) , ·) the cyclic group formed by the nonzero elements of the
residue class together with polynomial multiplication modulo P.
Throughout, we frequently interchange an integer n, 0 ≤ n < N = bm , with its
associated polynomial n(x) = η0 + η1 x + η2 x 2 + . . . + ηm−1 x m−1 , the coefficients
of which are given by the b-adic expansion n = η0 + η1 b + η2 b2 + . . . + ηm−1 bm−1 .
Given a generating vector q ∈ G sb,m , we have the following expression for the
ith component of the nth point x(n) ∈ [0, 1)s of a polynomial lattice point set P:
 n(x)q (x) 
xi(n) = vm
i
, i = 1, . . . , s, n = 0, . . . , N − 1,
P(x)
−1
∞ vm :−Zb ((xm)) → [0, 1)−is given for −1
where the mapping any integer w by the
expression vm ξ
=w   x = =min(1,w) ξ  b , and Zb ((x )) denotes the set of
formal Laurent series ∞ a
k=w k x −k
with a k ∈ Z b for some integer w.
274 R.N. Gantner and C. Schwab

A key ingredient for obtaining QMC formulas which afford higher-order conver-
gence rates is the interlacing of lattice point sets, as introduced in [1, 2]. We define
the digit interlacing function, which maps α points in [0, 1) to one point in [0, 1).
Definition 1 (Digit Interlacing Function) We define the digit interlacing function
Dα with interlacing factor α ∈ N acting on the points {x j ∈ [0, 1), j = 1, . . . , α}
by
∞ 
 α
Dα (x1 , . . . , xα ) = ξ j,a b− j−α(a−1) ,
a=1 j=1

where by ξ j,a we denote the ath component of the b-adic decomposition of x j ,


x j = ξ j,1 b−1 + ξ j,2 b−2 + . . ..
An interlaced polynomial lattice point set based on the generating vector q ∈ G αs
b,m ,
whose dimension is now α times larger than before, is then given by the points
bm −1
{x(n) }n=0 with

(n) n(x)qα(i−1)+1 (x) n(x)qα(i−1)+α (x)


xi = Dα vm , . . . , vm , i = 1, . . . , s,
P(x) P(x)

i.e. the ith coordinate of the nth point is obtained by interlacing a block of α coordi-
nates.

2.2 Worst-Case Error Bound

We give here an overview of bounds on the worst case error which are required for
the CBC construction; for details we refer to [6]. The results therein were based on a
“new function space setting”, which generalizes the notion of a reproducing kernel
Hilbert space to a Banach space setting. We also refer to [13] for an overview of
related function spaces.

2.2.1 Function Space Setting

In order to derive a worst-case error (WCE) bound, consider the higher-order unan-
chored Sobolev space Ws,α,γ ,q,r := { f ∈ L 1 ([0, 1)s ) :  f s,α,γ ,q,r < ∞} which is
defined in terms of the higher order unanchored Sobolev norm
  
 f s,α,γ ,q,r := γu−q
u⊆{1:s} v⊆u τ u\v ∈{1:α}|u\v|
  q
r/q
1/r (2)
(α ,τ ,0)
(∂ y v u\v f )( y) d y{1:s}\v d yv ,
|v|

[0,1] [0,1]s−|v|
Computational Higher Order Quasi-Monte Carlo Integration 275

with the obvious modifications if q or r is infinite. Here {1 : s} is a shorthand notation


for the set {1, 2, . . . , s}, and (α v , τ u\v , 0) denotes a sequence ν with ν j = α for j ∈ v,
ν j = τ j for j ∈ u \ v, and ν j = 0 for j ∈ / u. For non-negative weights γu , the space
Ws,α,γ ,q,r consists of smooth functions with integrable mixed derivatives of orders up
to α with respect to each variable, and L q -integrable (q ∈ [1, ∞]) mixed derivatives
containing a derivative of order α in at least one variable.
This space is called unanchored because the innermost integral over [0, 1]s−|v|
in the definition of the norm  ◦ s,α,γ ,q,r integrates out the “inactive” coordinates,
i.e. those with respect to which a derivative of order less than α is taken, rather
than “anchoring” these variables by fixing their values equal to an anchor point
a ∈ [0, 1)s . The weights γu in the definition of the norm can be interpreted as the
relative importance of groups of variables u. Below, we will assume either product
structure or so-called SPOD structure on the weights γu ; here, the acronym “SPOD”
stands for smoothness-driven, product and order dependent weights, which were first
introduced in [6].
We remark that the sum over all subsets u ⊆ {1 : s} also r includes the empty
−r

set u = ∅, for which we obtain the term γ∅ [0,1]s f ( y) d y , which contains the
average of the function f over the s-dimensional unit cube.

2.2.2 Error Bound

The worst-case error eWC (P, W ) of a point set P = { y(0) , . . . , y(b −1) } over the
m

function space W is defined by the following supremum over the unit ball in W :

eWC (P, W ) = sup |I [ f ] − QP [ f ]|.


 f W ≤1

Assume that 1 ≤ r, r
≤ ∞ with 1/r + 1/r
= 1 and α, s ∈ N with α > 1. Define a
collection of positive weights γ = (γu )u⊂N . Then, by [6, Theorem 3.5], we have the
following bound on the worst-case error in the space Ws,α,γ ,q,r ,

sup |I [ f ] − QP [ f ]| ≤ es,α,γ ,r
(P),
 f Ws,α,γ ,q,r ≤1

with the bound for the worst case error es,α,γ ,r


(P) given by
⎛ ⎛ ⎞r
⎞1/r

⎜  |u|
⎝Cα,b
 ⎟
es,α,γ ,r
(P) = ⎝ γu b−μα (ku ) ⎠ ⎠ . (3)
∅ =u⊆{1:s} 
ku ∈Du

The inner sum is over all elements of the dual net without zero, see [10, Def. 5]. For
a number k with b-adic expansion k = Jj=1 κ j ba j with a1 > . . . > a J , we define
 )
the weight μα (k) = min(α,J
j=1 (a j + 1) as in [6]. The constant Cα,b is obtained by
276 R.N. Gantner and C. Schwab

bounding the Walsh coefficients of functions in Sobolev spaces, see [3, Thm.14] for
details. Here, it has the value

2 1
Cα,b = max , max
(2 sin πb )α z=1,...,α−1 (2 sin πb )z

α−2

1 1 2 2b + 1
× 1+ + 3+ + . (4)
b b(b + 1) b b−1

The bound (3) holds for general digital nets; however, we wish to restrict ourselves
to polynomial lattice rules. We additionally choose r
= 1 (and thus r = ∞, i.e. the
∞ norm over the sequence indexed by u ⊆ {1 : s} in the norm  ◦ s,α,γ ,q,r ). We
denote by P  a point set in αs dimensions, and use in the following the definition
logb y(α−1) bα −1
α −b − b
ω(y) = bb−1 bα −b
where ω(0) = bb−1
α −b . Using [6, Theorem 3.9], we

bound the sum over the dual net Du in (3) by a computationally amenable expression,

b −1
1   
m

es,α,γ ,1 (P) ≤ E αs (q) = m γv


 ω(y (n) 
y(n) ∈ P,
j ), (5)
b n=0 v⊆{1:αs} j∈v
v =∅

 
n(x)q j (x)
where y (n)
j = vm P(x)
depends on the jth component of the generating vector,
q j (x), and the auxiliary weight  γv , v ⊆ {1 : αs} depends on the choice of weights γu .
Assume given a sequence (β j ) j ∈  p (N) for 0 < p < 1 and denote by u(v) ⊆ {1 :
s} an “indicator set” containing a dimension i ∈ {1, . . . , s} if any of the corresponding
α dimensions {(i − 1)α + 1, . . . , iα} is in v ⊆ {1 : αs}. This can be given explicitly
by u(v) = { j/α : j ∈ v}. For product weights, we define

 α

γv =
 γj, γ j = Cα,b bα(α−1)/2 ν!2δ(ν,α) β νj , (6)
j∈u(v) ν=1

and obtain from (5) the worst-case error bound for d = 1, . . . , αs

1       
b −1 m

E d (q) = m γj ω(y (n)


j ) . (7)
b n=0 u⊆{1:s} j∈u v⊆{1:d} j∈v
u =∅ u(v)=u

For SPOD weights we have


  ν
γv =
 |ν u(v) |! γ j (ν j ), γ j (ν j ) = Cα,b bα(α−1)/2 2δ(ν j ,α) β j j , (8)
ν u(v) ∈{1:α}|u(v)| j∈u(v)
Computational Higher Order Quasi-Monte Carlo Integration 277

for which we obtain


b −1
1  
m
     
E d (q) = m |ν|! γ j (ν j ) ω(y (n)
j ) . (9)
b n=0 v⊆{1:d}
ν∈{1:α}|u(v)| j∈u(v) j∈v
v =∅

These two expressions will be the basis of the component-by-component (CBC)


construction elucidated in the next section. We note that the powers of Cα,b arising
in (7) and (9) can become very large, leading to a pronounced negative impact on
the construction procedure (see Sect. 4.1 below). The constant Cα,b , defined in (4),
stems from bounds on the Walsh coefficients of smooth functions [3].

3 Component-by-Component Construction

The component-by-component construction (CBC) [12, 18, 19] is a simple but never-
theless effective algorithm for computing generating vectors for rank-1 lattice rules,
of both standard and polynomial type. In each iteration of the algorithm, the worst-
case error is computed for all candidate elements of the generating vector, and the one
with minimal WCE is taken as the next component. After s iterations, a generating
vector of length s is obtained, which can then be used for QMC quadrature.
Nuyens and Cools reformulated in [15, 16] the CBC construction to exploit the
cyclic structure inherent in the point sets for standard lattice rules when the number
of points N is a prime number. This leads to the so-called fast CBC algorithm based
on the fast Fourier transform (FFT) which speeds up the computation drastically. It
is also the basis for the present construction.
Fast CBC is based on reformulating (7) and (9): instead of iterating over the index
d = 1, . . . , αsmax , we iterate over the dimension s = 1, . . . , smax and for each s
over t = 1, . . . , α. Thus, the index d above is replaced by the pair s, t through
d = α(s − 1) + t and we write

y (n) (n)
j,i = yα( j−1)+i , j = 1, . . . , smax , i = 1, . . . , α. (10)

In order to obtain an efficient algorithm we further reformulate (7) and (9) such that
only intermediate quantities are updated instead of recomputing E d (q) in (7) and (9).

3.1 Product Weights

In the product weight case, we have for t = α the expression

bm −1 s
  α 
1   (n)
E s,α (q) = m 1 + γj (1 + ω(y j,i )) − 1 − 1. (11)
b n=0 j=1 i=1
278 R.N. Gantner and C. Schwab

  
α (n)

We define the quantity Ys (n) = sj=1 1 + γ j i=1 (1 + ω(y j,i )) − 1 which
will be updated at the end of each iteration over t. To emphasize the independence of
certain quantities on the current unknown component qs,t , we denote the truncated
generating vector by q d = (q1 , . . . , qd ) or in analogy to (10), q s,t = (q1,1 , . . . , qs,t ).
 s,1 , . . . , qs,t ), such that (11) can be
We now write E s,t (q s,t ) = E s−1,α (q s−1,α ) + E(q
used for E s−1,α (q s−1,α ) during the iteration over t. For t < α, we have

bm −1
  t 
1   (n)
E s,t (q) = m 1 + γs (1 + ω(ys,i )) − 1 Ys−1 (n) − 1,
b n=0 i=1

which can be written in terms of E s−1,α (q s−1,α ) as

bm −1 bm −1
 t 
γs  γs   (n)
E s,t (q) = E s−1,α (q s−1,α )− m Ys−1 (n)+ m (1 + ω(ys,i )) Ys−1 (n).
b n=0 b n=0 i=1

t (n)
For later use and ease of exposition, we define Vs,t (n) = i=1 (1 + ω(ys,i )), which
 (n)   (n) 
satisfies Vs,t (n) = Vs,t−1 (n) 1 + ω(ys,t ) for t > 1 and Vs,1 (n) = 1 + ω(ys,1 ) . We
 bα −1 t (0)
also note that Vs,t (0) = (1 + ω(0)) = bα −b , since ys,t = 0, independent of the
t

generating vector. This leads to the following decomposition of the error for product
weights
γs  
E s,t (q) = E s−1,α (q s−1,α ) + (1 + ω(0))t − 1 Ys−1 (0)
bm
bm −1
γs   
+ m Vs,t−1 (n) − 1 Ys−1 (n)
b n=1
b −1
γs 
m

(n)
+ m ω(ys,t )Vs,t−1 (n)Ys−1 (n), (12)
b n=1

(n)
where only (12) depends on the unknown qs,t through ys,t . This reformulation permits
efficient computation of the worst-case error bound E s,t during the CBC construction
by updating intermediate quantities.

3.2 SPOD Weights

The search criterion (9) can be reformulated to obtain [6, 3.43]

1     
b −1 αs
m
s 
E s,t (q) = m ! γ j (ν j ) ω(y (n)
j ). (13)
b n=0 =1 ν∈{0:α}s j=1 v⊆{1:d} s.t. j∈v
|ν|= ν j >0 u(v)={1≤ j≤s:ν j >0}
Computational Higher Order Quasi-Monte Carlo Integration 279

bm −1 αs
For a complete block (i.e. t = α), we write E s,α (q) = 1
bm n=0 =1 Us, (n),
where Us, (n) is given by
s 
  
α 
 
Us, (n) = ! γ j (ν j ) 1 + ω(y (n)
j,i ) − 1 .
ν∈{0:α}s j=1 i=1
|ν|= ν j >0

Proceeding as in the product weight case, we separate out the E s−1,α (q s−1,α ) term,

E s,t (q) = E s−1,α (q s−1,α )


bm −1 αs min(α,)
1      
t
(n) !
+ m (1 + ω(ys,i )) − 1 γs (νs ) Us−1,−νs (n) .
b ( − νs )!
n=0 i=1 =1 νs =1

 min(α,)
Defining Vs,t (n) as above and with Ws (n) = αs =1 νs =1
!
γs (νs ) (−ν s )!
Us−1,−νs
(n), we again aim to isolate the term depending on the unknown qs,t . This yields

1  bα − 1 t 
E s,t (q) = E s−1,α (q s−1,α ) + α
− 1 Ws (0)
b m b −b
b −1
1 
m

+ m (Vs,t−1 (n) − 1)Ws (n) (14)


b n=1
b −1
1 
m

(n)
+ m Vs,t−1 (n)Ws (n)ω(ys,t ), (15)
b n=1

(n)
where only the last sum (15) depends on qs,t through ys,t .
The remaining terms can be ignored, since the error E(q d−1 , z) is shifted by the
same amount for all candidates z ∈ G b,m . This optimization saves O(N ) operations
due to the omission of the sum (14). An analogous optimization is possible in the
product weight case. Since the value of the error bound E smax ,α (q) is sometimes a
useful quantity, one may choose to compute the full bounds given above.

3.3 Efficient Implementation

As currently written, the evaluation of the sums (12) and (15) for all possible bm − 1
values for qs,t requires O(N 2 ) operations. Following [15], we view this sum as a
matrix-vector multiplication of the matrix
280 R.N. Gantner and C. Schwab


n(x)q(x)
Ω := ω vm (16)
P(x) 1≤n≤bm −1
q∈G b,m

 
with the vector consisting of the component-wise product Vs,t−1 (n)Ws (n) 1≤n≤bm −1 .
The elements of Ω depend on n(x)q(x), which is a product of polynomials in G b,m .
Since the nonzero elements of a finite field form a cyclic group under multiplication,
there exists a primitive element g that generates the group, i.e. every element of G b,m
can be given as some exponent of g.
By using the so-called Rader transform, originally developed in [17], the rows
and columns of Ω can be permuted to obtain a circulant matrix Ω perm . Application
of the fast Fourier transform allows the multiplications (12) and (15) to be executed
in O(N log N ) operations. This technique was applied to the CBC algorithm in [16];
we also mention the exposition in [8, Chap. 10.3].
The total work complexity is O(αs N log N + α 2 s 2 N ) for SPOD weights and
O(αs N log N ) for product weights [6, Theorems 3.1, 3.2]. In Sect. 5, we show mea-
surements of the CBC construction time that indicate that the constants in these
asymptotic estimates are small, allowing these methods to be applied in practice.

3.4 Algorithms

In Algorithms 1 and 2 below, V, W, Y, U() and X() denote vectors of length N . E


is a vector of length N − 1 and E old , E 1 , E 2 are scalars. By  we denote component-
wise multiplication and Ω z,: denotes the zth row of Ω.

Algorithm 1 CBC_product(b, m, α, smax , {γ1 , . . . , γs })


Y ← 1 · b−m , E old ← 0
for s = 1, . . . , smax do
V←1
for t = 1, . . . , αα do 
−1 t
E 1 ← γs bbα −b − 1 Y(0)
bm −1  
E 2 ← γs n=1 V(n) − 1 Y(n)
E ← γs Ω · (V  Y) + (E old + E 1 + E 2 ) · 1
qs,t ← argminq∈G b,m E(q)
V ← (1 + Ω qs,t ,: )  V
end for 
Y ← 1 + γs (V − 1)  Y
E old ← E(qs,α )
end for
return q, E old
Computational Higher Order Quasi-Monte Carlo Integration 281

Algorithm 2 CBC_SPOD(b, m, α, smax , {γ j (·)}sj=1 )


U(0) ← 1, U(1 : αsmax ) ← 0
E old ← 0
for s = 1, . . . , smax do
V←1
W←0
for  = 1, . . . , αs do
X() ← 0
for ν = 1, . . . , min(α, ) do
!
X() ← X() + γs (ν) (−ν)! U( − ν)
end for
W ← W + b1m X()
end for
for t = 1,. . α. , αdo 
−1 t
E 1 ← bbα −b − 1 W(0)
bm −1  
E 2 ← n=1 V(n) − 1 W(n)
E ← Ω · (V  W) + (E old + E 1 + E 2 ) · 1
qs,t ← argminq∈G b,m E(q)
V ← (1 + Ω qs,t ,: )  V
end for
E old ← E(qs,α )
for  = 1, . . . , αs do
U() ← U() + (V − 1)  X()
end for
end for
return q, E old

4 Implementation Considerations

4.1 Walsh Coefficient Bound

The definition of the auxiliary weights (6) and (8) contain powers of the Walsh
constant bound Cα,b defined in (4), which for b = 2 is bounded from below by
 α−2
Cα,2 = 29 53 ≥ 29 . For base b = 2, it was recently shown in [20] that Cα,2 can
be replaced by C = 1. Large values of the worst-case error bounds (7) and (9) have
been found to lead to generating vectors with bad projections. For integrand functions
with small Walsh coefficients, Cα,b may be replaced with a tighter bound C; this will
yield a worst-case error bound better adapted to the integrand and a generating vector
with the desired properties. Since additionally Cα,b is increasing in α for fixed b, this
becomes more important as the order of the quadrature rule increases.
282 R.N. Gantner and C. Schwab

4.2 Pruning

For large values of the WCE, the elements of the generating vector can repeat, leading
to very bad projections in certain dimensions. For polynomial lattice rules, if qs,k =
qs̃,k ∀k = 1, . . . , α for two dimensions s and s̃, the corresponding components of the
quadrature points will be identical, xs(n) = xs̃(n) for all values of n = 0, . . . , bm − 1.
Thus, in the projection onto the (s, s̃)-plane, only points on the diagonal are obtained,
which is obviously a very bad choice. One way this problem could be avoided is to
consider a second error criterion, as in [4]. We propose here a simpler method that
requires only minor modification of the CBC iteration.
To alleviate this effect, we formulate a pruning procedure that incorporates this
observation into the construction of the generating vector. We impose the additional
condition that the newest element of the generating vector is unique, i.e. is not
equal to a previously constructed component of q. This can be achieved in the CBC
construction by replacing the minimization of E(q) over all possible bm − 1 values
of the new component by the restricted version

qd = argmin E(q1 , . . . , qd−1 , z). (17)


z∈G b,m ,
z ∈{q
/ 1 ,...,qd−1 }

This procedure requires d −1 operations in iteration d to check the previous entries of


the vector, or O(α 2 s 2 ) in total, and thus does not increase the asymptotic complexity.
Alternatively, the indices can be stored in an additional sorted data structure with
logarithmic (in αs) cost for both inserting new indices and checking for membership.
This yields a cost of O(αs log(αs)) additional operations, with an additional storage
cost of O(αs). It was shown in [5] that the presently proposed pruning procedure
preserves the higher order QMC convergence estimates. In the case where the set of
candidates in (17), G b,m \{q1 , . . . , qd−1 }, is empty (which happens e.g. when αsmax >
bm − 1), the restriction is dropped. In other words, pruning is applied as long as it
still allows for at least one possible value for qd .

5 Results

We present several tests of an implementation of Algorithms 1 and 2, and of the


resulting higher order QMC quadrature rules. Rather than solving concrete appli-
cation problems, the purpose of the ensuing numerical experiments is a) to verify
the validity of the asymptotic (as s, N → ∞) complexity estimates and QMC error
bounds, in particular to determine the range where the asymptotic complexity bounds
give realistic descriptions of the CBC construction’s performance; b) to investigate
the quantitative effect of (not) pruning the generating vector on the accuracy and con-
vergence rates of the QMC quadratures, and c) to verify the necessity of the weighted
spaces Ws,α,γ ,q,r and the norms in (2) for classifying integrand function regularity. We
Computational Higher Order Quasi-Monte Carlo Integration 283

remark that, due to the limited space of these proceedings, only few representative
simulations can be presented in detail; for further results and a complete description
of our implementation, we refer to [9].

5.1 Model Problems

For our numerical results, we consider two model parametric integrands designed to
mimic the behavior of parametric solution families of parametric partial differential
equations. Both integrand functions are smooth (in fact, analytic) functions of all
integration variables and admit stable extensions to a countable number of integration
variables. However, their “sparsity” is controlled, as expressed by the growth of their
higher derivatives. The first integrand function belongs to weighted spaces Ws,α,γ ,q,r
with the norms in (2) where the weights are of SPOD type [6], whereas the second
integrand allows for product weights. The SPOD-type integrand we consider was first
mentioned in [13], and models a parametric partial differential equation depending
in an affine manner on s parameters y1 , . . . , ys , as considered, for example, in [6]:
⎛ ⎞−1

s
f θ,s,ζ ( y) = ⎝1 + θ · aj yj⎠ , a j = j −ζ , ζ ∈ N . (18)
j=1

|ν|+1 
We have the differential ∂ νy f θ,s,ζ ( y) = (−1)|ν| |ν|! f θ,s,ζ ( y) sj=1 (θa j )ν j , leading
 ν
to the bound |∂ νy f θ,s,ζ ( y)| ≤ C f |ν|! sj=1 β j j for all ν ∈ {0, 1, . . . , α}s and for a
C f ≥ 1 with the weights β j given by β j = θa j = θ j −ζ , j = 1, . . . , s. Additionally,
for s → ∞, we have (β j ) j ∈  p (N) with p > ζ1 and thus α = 1/ p + 1 = ζ .
Therefore, by Theorem 3.2 of [6], an interlaced polynomial lattice rule of order α
with N = bm points (b prime, m ≥ 1) and point set P N can be constructed such
that the QMC quadrature error fulfills

|I [ f θ,s,ζ ] − QP N [ f θ,s,ζ ]| ≤ C(α, β, b, p)N −1/ p , (19)

for a constant C(α, β, b, p) independent of s and N . Convergence rates were com-


puted with respect to a reference value of the integral I [ f θ,s,ζ ] obtained with
dimension-adaptive Smolyak quadrature with tolerance 10−14 . We also consider sep-
arable integrand functions, which, on account of their separability, trivially belong
to the product weight class. They are given by
⎛ ⎞

s
gθ,s,ζ ( y) = exp ⎝θ a j y j ⎠ , a j = j −ζ , (20)
j=1
284 R.N. Gantner and C. Schwab


and satisfy ∂ νy g( y) = g( y) sk=1 (θak )νk . Under the assumption that there exists
a constant C  > 0 that is independent of s and such that g( y) ≤ C  for all y ∈
s
 −ζ
[0, 1] , which holds here with C = exp(θ j=1 j ), ζ > 1, we have the bound
s

 sj=1 β νj j , for all ν ∈ {0, 1, . . . , α}s with the weights β j given
|∂ νy gθ,s,ζ ( y)| ≤ C
by β j = θa j = θ j −ζ for j = 1, . . . , s. We have the following analytically given
formula for the integral
⎡ ⎛ ⎞⎤
s  ζ   ∞  −ζ μ
j   s
θ j
I [gθ,s,ζ ] = exp(θ j −ζ ) − 1 = exp ⎣ log ⎝ ⎠⎦ , (21)
j=1
θ j=1 μ=0
(μ + 1)!

and have an error bound of the form (19), with a different value for C(α, β, b, p).

5.2 Validation of Work Bound

The results below are based on an implementation of the CBC algorithm in the
C++ programming language, and exploits shared-memory parallelism to reduce the
computation time for large m and s. Fourier transforms were realized using the FFTW
library, with shared-memory parallelization enabled. Timings were executed on a
system with 48 Intel Xeon E5-2696 cores at 2.70 GHz, where at most 8 CPUs were
used at a time. The timing results in Fig. 1 show that the work bounds O(αs N log N +
α 2 s 2 N ) for SPOD weights from [6, Thm.3.1] and O(αs N log N ) for product weights
from [6, Thm.3.2] are fulfilled in practice and seem to be tight. The work O(N log N )
in the number of QMC points N also appears tight for moderate s and N .

5.3 Pruning and Adapting the Walsh Coefficient Bound

We consider construction of the generating vector with and without application of


the pruning procedure defined in Sect. 4.2. Convergence rates for both cases can be
seen in Fig. 2: for α = 2 no difference was observed when pruning the entries.
Results for the constant Cα,b from (4) as well as for C = 1 are shown; in this
example, adapting the constant C to the integrand seems to yield better results than
pruning. In the case of the integrand (18), this can be justified by estimating the
Walsh coefficients by numerical computation of the Walsh–Hadamard transform.
The maximal values of these numerically computed coefficients is bounded by 1 for
low dimensions, indicating that the bound Cα,b is too pessimistic. For base b = 2 in
(4), it was recently shown in [20] that C = 1.
Computational Higher Order Quasi-Monte Carlo Integration 285

(a) (b)

(c) (d)

Fig. 1 CPU time required for the construction of generating vectors of varying order α = 2, 3, 4
for product and SPOD weights with β j = θ j −ζ versus the dimension s in a and b and versus the
number of points N = 2m in c and d

(a) (b)

Fig. 2 Effect of pruning the generating vectors: convergence of QMC approximation for the SPOD
integrand (18) with ζ = 4, s = 100, base b = 2 and α = 2, 3, 4, with and without pruning. Results
a obtained with Walsh constant (4). In b, the Walsh constant C = 1 and pruning are theoretically
justified in [5] and [20], respectively
286 R.N. Gantner and C. Schwab

(a) (b)

(c) (d)

Fig. 3 Convergence of QMC approximation to (21) for the product weight integrand (20) in s =
100, 1000 dimensions with interlacing parameter α = 2, 3, 4 with pruning. a s = 100, ζ = 2, b
s = 100, ζ = 4, c s = 1000, ζ = 2, d s = 1000, ζ = 4

(a) (b)

Fig. 4 Convergence of QMC approximation for the SPOD weight integrand (18) in s = 100
dimensions with interlacing parameter α = 2, 3, 4 with pruning. a ζ = 2. b ζ = 4
Computational Higher Order Quasi-Monte Carlo Integration 287

5.4 Higher-Order Convergence

As can be seen in Figs. 3 and 4, the higher-order convergence rates proved in [6] can
be observed in practice for the two classes of tested integrand functions. To generate
the QMC rules used in Figs. 3 and 4, the ad hoc value C = 0.1 was used. We
also mention that for more general, non-affine, holomorphic parameter dependence
of operators the same convergence rates and derivative bounds as in [6] have been
recently established in [7]. The CBC constructions apply also to QMC rules for
these (non affine-parametric) problems. The left subgraphs (ζ = 2) show that higher
values of the interlacing parameter α do not imply higher convergence rates, if the
integrand does not exhibit sufficient sparsity as quantified by the norm (2). The right
subgraphs (ζ = 4) in Figs. 3 and 4 show that the convergence rate is indeed dimension
independent, but limited by the interlacing parameter α = 2: the integrand function
with ζ = 4 affords higher rates than 2 for interlaced polynomial lattice rules with
higher values of the interlacing parameter α.
The fast CBC constructions [15, 16], as adapted to higher order, interlaced poly-
nomial lattice rules in [6], attain the asymptotic scalings for work and memory with
respect to N and to integration dimension s already for moderate values of s and
N . Theoretically predicted, dimension-independent convergence orders beyond first
order were achieved with pruned generating vectors obtained with base b = 2 and
Walsh constant C = 1. QMC rule performance was observed to be sensitive to over-
estimated values of the Walsh constant Cα,b . The choice b = 2 and C = 1 with
pruning of generating vectors, theoretically justified in [5] and [20], respectively,
yielded satisfactory results for α = 2, 3, 4 in up to s = 1000 dimensions.

Acknowledgments This work is supported by the Swiss National Science Foundation (SNF)
under project number SNF149819 and by the European Research Council (ERC) under FP7 Grant
AdG247277. Work of CS was performed in part while CS visited ICERM / Brown University in
September 2014; the excellent ICERM working environment is warmly acknowledged.

References

1. Dick, J.: Explicit constructions of quasi-Monte Carlo rules for the numerical integration of high-
dimensional periodic functions. SIAM J. Numer. Anal. 45(5), 2141–2176 (2007) (electronic).
doi:10.1137/060658916
2. Dick, J.: Walsh spaces containing smooth functions and quasi-Monte Carlo rules of arbitrary
high order. SIAM J. Numer. Anal. 46(3), 1519–1553 (2008). doi:10.1137/060666639
3. Dick, J.: The decay of the Walsh coefficients of smooth functions. Bull. Aust. Math. Soc. 80(3),
430–453 (2009). doi:10.1017/S0004972709000392
4. Dick, J.: Random weights, robust lattice rules and the geometry of the cbcr c algorithm.
Numerische Mathematik 122(3), 443–467 (2012). doi:10.1007/s00211-012-0469-5
5. Dick, J., Kritzer, P.: On a projection-corrected component-by-component construction. J. Com-
plex. (2015). doi:10.1016/j.jco.2015.08.001
6. Dick, J., Kuo, F.Y., Le Gia, Q.T., Nuyens, D., Schwab, C.: Higher order QMC Petrov–Galerkin
discretization for affine parametric operator equations with random field inputs. SIAM J.
Numer. Anal. 52(6), 2676–2702 (2014)
288 R.N. Gantner and C. Schwab

7. Dick, J., Le Gia, Q.T., Schwab, C.: Higher-order quasi-Monte Carlo integration for holomor-
phic, parametric operator equations. SIAM/ASA J. Uncertain. Quantif. 4(1), 48–79 (2016).
doi:10.1137/140985913
8. Dick, J., Pillichshammer, F.: Digital nets and sequences. Cambridge University Press, Cam-
bridge (2010). doi:10.1017/CBO9780511761188
9. Gantner, R. N.: Dissertation ETH Zürich (in preparation)
10. Goda, T.: Good interlaced polynomial lattice rules for numerical integration in weighted Walsh
spaces. J. Comput. Appl. Math. 285, 279–294 (2015). doi:10.1016/j.cam.2015.02.041
11. Goda, T., Dick, J.: Construction of interlaced scrambled polynomial lattice rules of arbitrary
high order. Found. Comput. Math. (2015). doi:10.1007/s10208-014-9226-8
12. Kuo, F.Y.: Component-by-component constructions achieve the optimal rate of convergence
for multivariate integration in weighted Korobov and Sobolev spaces. J. Complexity 19(3),
301–320 (2003). doi:10.1016/S0885-064X(03)00006-2
13. Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo methods for high-dimensional integra-
tion: the standard (weighted Hilbert space) setting and beyond. ANZIAM J. 53, 1–37 (2011).
doi:10.1017/S1446181112000077
14. Niederreiter, H.: Random number generation and quasi-Monte Carlo methods. CBMS-NSF
Regional Conference Series in Applied Mathematics, vol. 63. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA (1992). doi:10.1137/1.9781611970081
15. Nuyens, D., Cools, R.: Fast algorithms for component-by-component construction of rank-1
lattice rules in shift-invariant reproducing kernel Hilbert spaces. Math. Comp. 75(254), 903–
920 (2006) (electronic). doi:10.1090/S0025-5718-06-01785-6
16. Nuyens, D., Cools, R.: Fast component-by-component construction, a reprise for different
kernels. Monte Carlo and quasi-Monte Carlo methods 2004, pp. 373–387. Springer, Berlin
(2006). doi:10.1007/3-540-31186-6_22
17. Rader, C.: Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE
3(3), 1–2 (1968)
18. Sloan, I.H., Kuo, F.Y., Joe, S.: Constructing randomly shifted lattice rules in weighted Sobolev
spaces. SIAM J. Numer. Anal. 40(5), 1650–1665 (2002). doi:10.1137/S0036142901393942
19. Sloan, I.H., Reztsov, A.V.: Component-by-component construction of good lattice rules. Math.
Comp. 71(237), 263–273 (2002). doi:10.1090/S0025-5718-01-01342-4
20. Yoshiki, T.: Bounds on Walsh coefficients by dyadic difference and a new Koksma- Hlawka
type inequality for Quasi-Monte Carlo integration (2015)
Numerical Computation of Multivariate
Normal Probabilities Using Bivariate
Conditioning

Alan Genz and Giang Trinh

Abstract New methods are derived for the computation of multivariate normal
probabilities defined for hyper-rectangular probability regions. The methods use con-
ditioning with a sequence of truncated bivariate probability densities. A new approx-
imation algorithm based on products of bivariate probabilities will be described.
Then a more general method, which uses sequences of simulated pairs of bivariate
normal random variables, will be considered. Simulations methods which use Monte
Carlo, and quasi-Monte Carlo point sets will be described. The new methods will be
compared with methods which use univariate normal conditioning, using tests with
random multivariate normal problems.

Keywords Multivariate normal probabilities · Bivariate conditioning

1 Introduction

Many problems in applied statistical analysis require the computation of multivariate


normal (MVN) probabilities in the form
 b1  bn
1 −1
e− 2 x Σ
1 t
Φ(a, b; Σ) = √ ... x
dx,
|Σ| (2π )n a1 an

where x = (x1 , x2 , . . . , xn )t , dx = d xn d xn−1 · · · d x1 , and Σ is an n × n symmetric


positive definite covariance matrix. There are in general no “exact” methods for the
computation of the MVN probabilities, so various methods (see Genz and Bretz [5])
have been developed to provide suitably accurate approximations. And now there

A. Genz (B) · G. Trinh


Department of Mathematics, Washington State University, Pullman,
WA 99164-3113, USA
e-mail: alangenz@wsu.edu
G. Trinh
e-mail: alangenz@wsu.edu

© Springer International Publishing Switzerland 2016 289


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_13
290 A. Genz and G. Trinh

are implementations in scientific computing environments of efficient simulation


methods (see the R pvmnorm package and Matlab mvncdf function, for example),
which can often provide highly accurate MVN probabilities.
The purpose of this paper is to consider generalizations of some simulation
methods which use univariate conditioning. The generalizations we study here use
bivariate conditioning, with the goal of providing more accurate simulations without
significantly increasing the computational cost, compared to a univariate condition-
ing method. In order to provide background for the new simulation methods, we
first describe the basic univariate conditioning method. Then we derive our bivariate
conditioning methods, and finish with some tests comparing the different methods.

2 Univariate Conditioning Algorithms

We start with the Cholesky decomposition of Σ = CC t , where C is a lower triangu-


lar matrix. Then xt Σ −1 x = xt C −t C −1 x, and if √
we use the transformation x = Cy,
we have xt Σ −1 x = yt y with dx = |C| dy = |Σ|dy. The probability region for
Φ(a, b; Σ) is now given by a ≤ Cy ≤ b. Taking advantage of the lower triangular
structure of C, this set of inequalities can be rewritten in more detail in the form

a1 /c11 ≤y1 ≤ b1 /c11


(a2 − c21 y1 )/c22 ≤y2 ≤ (b2 − c21 y1 )/c22
..
.

n−1 
n−1
(an − cnm ym )/cnn ≤yn ≤ (bn − cnm ym )/cnn .
m=1 m=1

i−1 i−1
Then, using ai = (ai − m=1 cim ym )/cii , and bi = (bi − m=1 cim ym )/cii , we
have   
b1 y12 b2 y22

bn−1
1 yn2
Φ(a, b; Σ) = √ e− 2 e− 2 · · · e− 2 dy. (1)
(2π )n a1 a2 
an−1

This “conditioned” form for MVN probabilities has been used as the basis for several
numerical approximation and simulation methods (see Genz and Bretz [4, 5]).

2.1 Univariate Conditioning Simulations

We can use (1) with successive truncated conditional simulations to estimate


Φ(a, b; Σ). In what follows we will use y ∼ N (a, b), to denote the simula-
tion of a random y value from a univariate Normal distribution with truncation
Numerical Computation of Multivariate Normal Probabilities … 291

limits a and b. A standard method


 for computing these y values is to use y =
Φ −1 Φ(a) + (Φ(b) − Φ(a))u , with u ∼ U (0, 1) (u is a random number from the
uniform distribution on [0, 1]). The basic simulation step k is:
1. start with y1 ∼ N (a1 , b1 ),
2. given y1 , . . . , yi−1 , compute
n yi ∼ N (ai , bi ) for i = 1, . . . , n − 1;
3. compute the final Pk = i=1 (Φ(bi ) − Φ(ai )) ≈ Φ(a, b; Σ)


After computing M estimates Pk , k = 1, . . . , M, for Φ(a, b; Σ), we compute the


mean and standard error
 M 21
1 
M
k=1 (Pk
− P̄M )2
P̄M = Pk ≈ Φ(a, b; Σ), EM = . (2)
M k=1 M(M − 1)

The scaled standard error is used to provide error estimates for P̄M . If QMC points
are used instead of the u i ∼ U (0, 1) MC points, the result is a QMC algorithm,
with faster convergence to Φ(a, b; Σ) (see Hickernell [8], where the use of lattice
rule QMC point sets is analyzed). Sándor and András [12] also showed how QMC
point sets can provide faster convergence than MC point sets for this problem, and
compared several types of QMC point sets.

2.2 Variable Prioritization

This algorithm uses an ordering of the variables that is specified by the original Σ,
but there are n! possible orderings of the variables for Φ(a, b, Σ). These orderings
do not change the MVN value as long as the integration limits and corresponding
rows and columns of Σ are also permuted. Schervish [13] originally proposed sorting
the variables so that the variables with the shortest integration interval widths were
the outer integration variables. This approach often reduces the overall variation of
the integrand and consequently produces and easier simulation problem. Gibson,
Glasbey and Elston (GGE [7]) suggested an improved prioritization of the variables.
They proposed sorting the variables so that the outermost integrals have the smallest
expected values. With this heuristic, the outer variables, which have the most influ-
ence on the innermost integrals, tend to have smaller variation, and this often reduces
the overall variance for the resulting integration problem. Test results have shown
that this variable prioritized reordering, when combined with the univariate condi-
tioning algorithm can often produce more accurate results. We use this reordering
as preliminary step for our bivariate conditioning algorithms, so we provide some
details for the GGE reordering method here. We will use μ = E(a, b) to denote the
expected value for a Normal distribution; this is defined by
 b
1 x2
E(a, b) = √ xe− 2 d x/(Φ(b) − Φ(a)).
2π a
292 A. Genz and G. Trinh

The GGE variable prioritization method first chooses the outermost integration vari-
able by selecting the variable i so that

 
bi ai
i = argmin Φ √ −Φ √ .
1≤i≤n σii σii

The integration limits and the rows and columns of Σ for variables 1 and i are inter-
changed. Then the first column of the Cholesky decomposition C of Σ is computed

using c11 = σ11 and ci1 = cσ11i1 for i = 2, . . . , n. Letting â1 = ca111 , b̂1 = cb111 , we set
μ1 = E(â1 , b̂1 ).
At stage j, the jth integration variable is chosen by selecting a variable i so that
⎧ ⎛ ⎞ ⎛ ⎞⎫
⎨b − Σ
j−1
c μ a − Σ
j−1
c μ ⎬
i = argmin Φ ⎝  ⎠−Φ⎝ ⎠ .
i m=1 im m i m=1 im m

j≤i≤n ⎩ j−1 2 j−1 2 ⎭


σii − Σm=1 cim σii − Σm=1 cim

The integration limits, rows and columns of Σ, and partially completed rows of C for
variables j and i are interchanged. Then the jth column of C is computed using The
integration limits, rows and columns of Σ, and partially completed rows of C for
variables
 j and i are interchanged. Then the jth column of C is computed using
j−1 j−1
cjj = σii − Σm=1 cim
2
and ci j = (σi j − Σm=1 cim c jm )/c j j , for i = j + 1, . . . , n.
j−1 j−1
Letting â j = (a j − Σm=1 c jm μm )/c j j , and b̂ j = (b j − Σm=1 c jm μm )/c j j , we set
μ j = E(â j , b̂ j ). The algorithm finishes when j = n, and then the final Cholesky
factor C and permuted integration limits a and b are used for the Pk computations
in (2).
Tests of the univariate conditioned simulation algorithm, with this variable
reordering algorithm show that the resulting Pk have smaller variation, reducing
the overall variation for the MVN estimates (see Genz and Bretz [4]). This (variable
prioritized) algorithm is widely used with QMC or deterministic u’s for implemen-
tations in Matlab, R, and Mathematica.

3 Bivariate Conditioning Simulation

We will now derive algorithms which use a bivariate conditioned form for Φ(a, b; Σ).
These algorithms depend on methods for fast and accurate bivariate normal (BVN)
computations which are now available (see Drezner and Wesolowsky [2], and
Genz [3]). The algorithms also depend on a bivariate decomposition for Σ, which
we now describe.
Numerical Computation of Multivariate Normal Probabilities … 293

3.1 L DL t Decomposition

In order to transform the MVN problem into a sequence of conditioned BVN inte-
grals, we define k =  n2  and use the covariance matrix decomposition Σ = L DL t .
If n is even this decomposition of Σ has
⎡ ⎤ ⎡ ⎤
I2 O2 · · · O2 D1 O 2 · · · O2
⎢ .. ⎥ ⎢ . ⎥
⎢ L 21 . . . . . . . ⎥ ⎢ .. ..
. .. ⎥
L=⎢ ⎥ , D = ⎢ O2 . ⎥,
⎢ . . ⎥ ⎢ . . ⎥
⎣ .. . . I2 O2 ⎦ ⎣ .. . . Dk−1 O2 ⎦
L k1 . . . L k,k−1 I2 O2 . . . O 2 Dk

where Di , L i, j , and O2 , are 2 × 2’s matrices.


For odd n, there is an extra row in L, and the final entry in D is dnn .
For example, if
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 −1 1 −2 1 0 0 00 21 0 00
⎢ 1 2 1 −1 2 ⎥ ⎢ 0 1 0 0 0⎥ ⎢1 2 0 0 0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
Σ =⎢ ⎥ ⎢
⎢ −1 1 4 −3 1 ⎥ , L = ⎢ −1 1 1 0 0⎥, D = ⎢
⎥ ⎥
⎢ 0 0 2 −1 0 ⎥ .
⎣ 1 −1 −3 4 −1 ⎦ ⎣ 1 −1 0 10 ⎦ ⎣ 0 0 −1 2 0 ⎦
−2 2 1 −1 16 −2 2 −1 11 00 0 02

This
 block t decomposition
  can be recursivelycomputed
 using the partitioning
Σ1,1 R I2 O D1 O
Σ= , with L = , and D = , where Σ1,1 is a 2 × 2
R Σ̂ M L̂ O D̂
matrix. Then D1 = Σ1,1 , M = R D1−1 , D̂ = Σ̂ − M D1 M t , and the decomposition
procedure continues by applying the same operations to the (n − 2) × (n − 2) matrix
Σ̂. This is a 2 × 2 block form for the standard Cholesky decomposition algorithm
(see Golub and Van Loan [6]).

3.2 The Bivariate Approximation Algorithm

We start with Σ = L DL t , and use the transformation x = Ly, so that dx = |L| dy =


dy. The y constraints which define the integration region are now determined from
a ≤ Ly ≤ b. Defining (α, β) j = (a j − g j , b j − g j ), with
 j−1
g j = m=1 l jm ym , and y2k = (y2k−1 , y2k )t , we have
 β1  β2  β2k−1  β2k
1 − 21 y2t D1−1 y2 −1
e− 2 y2k Dk y2k
1 t
Φ(a, b; Σ) = √ e · · ·
|D| (2π )n α1 α2 α2k−1 α2k

dy if n = 2k;
× βn − 2d1 yn2
αn e nn dy if n = 2k + 1.
294 A. Genz and G. Trinh

A final transformation using yi = dii z i for i = 1, . . . , n, gives us
 b1  b2 −1     −1
e− 2 z2 Ω12 z2 e− 2 z2k Ω2k−1,2k z2k
1 t 1 t
b2k−1 b2k
Φ(a, b; Σ) = 1 ··· 1
a1 a2 2π |Ω12 | 2 
a2k−1 
a2k 2π |Ω2k−1,2k | 2

dz if n = 2k,
× bn − 21 z n2 (3)
√1
2π ane dz if n = 2k + 1.
 
1 ρk !
with Ω2k−1,2k = , ρk = d2k−1,2k / d2k−1,2k−1 d2k,2k , and
ρk 1

(a  , b )i = (α, β)i / dii .
The bivariate approximation algorithm begins with the computation outermost
BVN probability P1 = Φ((a1 , a2 ), (b1 , b2 ); Ω12 ). We then use explicit formulas,
derived by Muthén [11], for truncated BVN moments μ1 and μ2 : using q1 =

1 − ρ12 ,

(μ1 , μ2 ) = E((a1 , a2 ), (b1 , b2 ); ρ1 )


 b1  b2 u 2 +v2 −2uvρ1
1 −
= (u, v)e 2q12
dvdu.
2π P1 q1 a1 a2

The Muthén formula for μ1 is


 φ(a2 ) " # " #
a1 −ρ1 a2 b1 −ρ1 a2 φ(b2 ) a1 −ρ1 b2 b1 −ρ1 b2
μ1 = ρ1 P1
Φ q1
, q1 − P1
Φ q1
, q1
" # " #
φ(a1 ) a2 −ρ1 a1 b2 −ρ1 a1 φ(b1 ) a2 −ρ1 b1 b2 −ρ1 b1
+ P1
Φ q1
, q1 − P1
Φ q1
, q1 , (4)

using the univariate Φ(a, b) = Φ(b) − Φ(a). The μ2 formula is the same, except
for the interchanges a1  a2 and b1  b2 . Note that the μi formulas depend only
on easily computed univariate pdf and cdf values.
Now, approximate the second BVN by P2 = Φ((â3 , â4 ), (b̂3 , b̂4 ); Ω3,4 ), where
âi , b̂i , are ai , bi , with z 1 , z 2 replaced by μ1 , μ2 . Then, compute (μ3 , μ4 ) =
E((â3 , â4 ), (b̂3 , b̂4 ); ρ2 ). At the ith stage we compute

Pi = Φ((â2i−1 , â2i ), (b̂2i−1 , b̂2i ); Ω2i−1,2i ),

with âi , b̂i , computed ai , bi , with z 1 , ..., z 2i−2 replaced by the expected values μ1 ,
..., μ2i−2
After k stages the bivariate conditioning approximation is

$
k

1 if n = 2k;
Φ(a, b; Σ) ≈ Pi ×
Φ(ân , b̂n ) if n = 2k + 1.
i=1
Numerical Computation of Multivariate Normal Probabilities … 295

This algorithm was proposed and studied by Trinh and Genz [15], where the
BVN conditioned approximations were found to be more accurate than approxima-
tions using univariate means with conditioning. In that paper variable reorderings
were also studied, where a natural strategy is to reorder the variables at stage i to
minimize the Pi . But this strategy uses O(n 2 ) BVN values overall, which can take
a lot more time than the strategy described previously which uses only UVN val-
ues. Tests by Trinh and Genz showed that UVN value prioritization results provided
approximations which were usually as accurate, or almost as accurate as the BVN
prioritized approximations.

4 BVN Conditioned Simulation Algorithms

4.1 Basic BVN Conditioned Simulation Algorithm

We will use the approximation algorithm described in the previous section, except
that the μi values will be replaced by simulated z i values. We focus on Φ(a, b; Σ)
in the form given by Eq. (3).

Basic BVN Conditioned Simulation Algorithm Steps


• First compute the outermost BVN P1 = Φ((a1 , a2 ), (b1 , b2 ); Ω12 ). and simulate
(z 1 , z 2 ) values from the (a1 , a2 ), (b1 , b2 ) truncated density
−1
e− 2 z2 Ω12 z2
1 t

1 ;
2π P1 |Ω12 | 2

• At stage i: given simulated (z 1 , z 2 ), . . . , (z 2i−3 , z 2i−2 ) compute



Pi = Φ((a2i−1 , a2i ), (b2i−1
 
, b2i ); Ω2i−1,2i ).


and simulate (z 2i−1 , z 2i ) values from (a2i−1 , a2i ), (b2i−1
 
, b2i ) the truncated density
−1
e− 2 zi Ω2i−1,2i zi
1 t

1 ;
2π Pi |Ω2i−1,2i | 2

• After k stages

$
k

1 if n = 2k;
Φ(a, b; Σ) ≈ Pi × (5)
Φ(ân , b̂n ) if n = 2k + 1.
i=1
296 A. Genz and G. Trinh

The k stages in the algorithm are repeated and the results are averaged to approximate
Φ(a, b; Σ). The primary complication with this algorithm compared to the algorithm
for univariate simulation is the truncated BVN simulation. In contrast to the univariate
simulation, there is no direct inversion formula for truncated BVN simulation.
Currently, the most efficient methods for truncated BVN simulation use an algo-
rithm derived by Chopin [1], with variations for special cases. The basic algorithm is
an Acceptance-Rejection (AR) algorithm which we now describe. At each stage in
the BVN conditioned simulation algorithm, we need to simulate x, y from a truncated
BVN. We consider a generic BVN problem   truncated region (a, b) × (c, d) and
with

correlation coefficient ρ. Using Ω = , we first define
ρ 1

    √d−ρx
e− 2 x e− 2 y
1 2 1 2
b d b
1 1−ρ 2
e− 2 z Ωz dz 2 dz 1 =
1 t
P= 1 √ √ d yd x
2π |Ω| 2 a c a √c−ρx
1−ρ 2
2π 2π
 b − 1 x2  √d−ρx − 1 y 2
e 2 1−ρ 2 e 2
≡ √ f (x)d x, with f (x) = √ dy.
a 2π √c−ρx
2

1−ρ

The AR algorithm first simulates x (using AR) from the (a, b) truncated density
1 2
e− 2 x
h(x) = √
2π P
f (x). Then, given x, y is simulated directly from a truncated Normal
with limits ( √ 2 , √d−ρx 2 ). For the AR x simulation, x is first simulated directly
c−ρx
1−ρ 1−ρ

using the (a, b) truncated Normal density g(x) = e− 2 x /( 2π (Φ(b, a)). This x is
1 2

accepted if u < h(x)/Cg(x), where u ∼ U (0, 1), and where the AR constant C
is given by C = max x∈[a, b] h(x/g(x)). Now h(x)/g(x) = f (x)Φ(a, b)/P, so C is
given by the x ∈ [a, b] which maximizes f (x).  Using basic analysis,
 it can be shown
that a unique maximum occurs at x ∗ = min max(a, c+d 2ρ
), b , so we define f ∗ =
∗ ∗ ∗
f (x ), with C = f Φ(a, b)/P. This makes h(x)/(Cg(x)) = f (x)/ f . Putting the
steps together we have the following truncated AR algorithm for (x, y):

Truncated BVN AR Simulation Algorithm


1. Input truncation limits
 (a, b) and (c, d),and correlation coefficient ρ.
2. Compute f ∗ = f min max(a, c+d 2ρ
), b , and
Repeat: compute x ∼ N (a, b), u ∼ U (0, 1)
Until u ≤ f (x)/ f ∗ (accepting the final
 x); 
3. Using the accepted x, compute y ∼ N √c−ρx 2 , √d−ρx 2 ;
1−ρ 1−ρ
4. Return (x,y).
The notation (x, y) ∼ B N ((a, b), (c, d); ρ) will be used to denote an (x, y) pair pro-
duced by this algorithm. We need  n−1
2
 (x, y) pairs for each approximate Φ(a, b, Σ)
computation (5). We will present some test results using this MC algorithm in
Sect. 4.4.
Numerical Computation of Multivariate Normal Probabilities … 297

4.2 BVN Conditioned Simulation with QMC Points

We also investigated the use of QMC point sets with BVN conditioned simulations,
because of the improved convergence properties for QMC point sets compared to MC
point sets for the univariate conditioned algorithms. Initially, we considered methods
which use QMC points in a fairly direct manner, by simply replacing the MC points
required for the truncated BVN AR simulations with QMC points. The validity of
the use of QMC points with AR algorithms has been analyzed previously by various
authors and this work was recently reviewed with further analysis in the paper by
Zhu and Dick [17].
An implementation problem with the truncated BVN AR algorithm is the inde-
terminate length AR loop, which is repeated for each approximate Φ(a, b, Σ) com-
putation (5) ( n−1
2
 times). Each approximate Φ(a, b, Σ) computation requires a
vector of components from a QMC sequence, but the vector length is different for
each approximate Φ(a, b, Σ), because of the AR loops. While the expected length
of these vectors can be estimated, a robust implementation requires the use of a QMC
sequence with dimension larger than this expected length, to allow for the cases when
the AR loops all have several rejections. We ran some tests for this type of algorithm
using both Kronecker and lattice rule QMC sequences, with similar results, and
the results for lattice rules are reported in Sect. 4.4. An alternate method for using
QMC sequences with AR algorithms, which does not require indeterminate length
AR loops, uses smoothing. In the next section, we will describe how a smoothing
method can be used with the truncated BVN AR algorithm.

4.3 Smoothed AR for BVN Simulation

Smoothed Acceptance-Rejection has been studied in several forms (see, for example,
Wang [16], or Moskowitz and Caflish [10]). For truncated BVN simulations, we will
use an algorithm similar to the Wang algorithm. In order to describe our algorithm,
we use notation similar to that used in the previous section, and consider the basic
calculation for each stage in the conditioned BVN simulation algorithm. There we
used an approximation in the form
  √d−ρx
e− 2 x e− 2 y
1 2 1 2
b
1−ρ 2
√ √ F(x, y)d yd x ≈ P F(x̂, ŷ), (6)
a 2π √c−ρx 2π
1−ρ 2

with (x̂, ŷ) ∼ B N ((a, b), (c, d); ρ), and we used AR to determine x̂. In order to use
a smoothed AR simulation for x̂, we rewrite the BVN integral as
  
e− 2 x ∗ f (x) e− 2 x ∗
1 2 1 2
b b 1
P= √ f dx ≡ √ f I (r (x) < u)dud x,
a 2π f∗ a 2π 0
298 A. Genz and G. Trinh

where r (x) = f (x)/ f ∗ , and I (s) is the indicator function (with value 1 if s is true and
0 otherwise). This setup can be used for MC or QMC simulations (first simulate x ∼
N (a, b) by inversion from U (0, 1), then use u ∼ U (0, 1)), but the nonsmooth I (s)
is not expected to lead to an efficient algorithm. However, we tested this unsmoothed
(USAR) algorithm, where the approximation which replaces P in (6) is

P ∗ = Φ(a, b)) f ∗ I (r (x) < u).

These approximations, which are sometimes zero, are used to replace the Pi values
in (5), and the primary problem is that the USAR simulation algorithm can often
have zero value for (5).
Smoothed AR replaces I (r < u) with a smoother function wr (u) which satisfies
1
the condition 0 wr (u)du = r . After some experimentation and consideration of
the possibilities discussed by Wang [16], we chose to replace I (r (x) < u) by the
continuous 
(x)
1 − 1−r r (x)
u, if u ≤ r (x);
wr (x) (u) = r (x)
1−r (x)
(1 − u), otherwise.

1
It is is easy to check that 0 wr (u)du = r , so that now we have
  
e− 2 x e− 2 x ∗
1 2 1 2
b b 1
P= √ f (x)d x ≡ √ f w f (x)∗ (u)dud x.
a 2π a 2π 0 f

This leads to a smoothed AR algorithm for BVN simulation where, at each stage, x̂ ∼
N (a, b), followed by ŷ ∼ N √c−ρx 2 , √d−ρx 2 , and u ∼ U (0, 1) is used to provide an
1−ρ 1−ρ
additional weight for that stage. The resulting contribution to the product for each
Φ(a, b, Σ) approximation in (5) is

P̂i = Φ(a, b) f ∗ wr (x) (u)

instead of Pi . Notice that Pi is not needed for the SAR algorithm, and the algorithm
is similar to the univariate conditioned algorithm which uses
% &
c − ρx d − ρx
Φ(a, b)Φ ! ,!
1 − ρ2 1 − ρ2

instead of P̂i . After k stages

$
k

1 if n = 2k;
Φ(a, b; Σ) ≈ P̂i × (7)
Φ(ân , b̂n ) if n = 2k + 1.
i=1
Numerical Computation of Multivariate Normal Probabilities … 299

As with AR, the k stages in the algorithm are repeated and the results are averaged
to produce the final approximation to Φ(a, b; Σ).
The SAR algorithm requires one additional u ∼ U (0, 1) for each stage so, assum-
ing that x̂, and ŷ are both computed using truncated univariate Normal inversion of
U (0, 1)’s, the total number of U (0, 1)’s is m = 3n/2 − 1 for each approximation
to Φ(a, b; Σ) for an MC SAR algorithm. For a QMC SAR algorithm, m-dimensional
QMC vectors with components from (0, 1) replace the m-dimensional U (0, 1) com-
ponent vectors for the MC algorithm.

4.4 Randomized AR and SAR Tests

We completed a series of tests to compare MATLAB implementations of the algo-


rithms discussed in this paper. For each n = 4, . . . , 15, we generated 250 random
(b, Σ) combinations. Each Σ = Q D Q t was determined from a randomly generated
n × n orthogonal matrix Q (see Stewart [14]) and a diagonal matrix with diagonal
entries di = u i , and each b vector had bi = nvi , with u i , vi uniform random from
[0, 1]. We used ai = −∞ for all i for all tests. Given a randomly chosen Φ(a, b; Σ)
problem, all of the tested algorithms were used for that problem. The term “points”
used in the Tables refers to the number of approximations to a randomly chosen
Φ(a, b; Σ) problem that were used by each algorithm to compute that algorithm’s
final approximation. The QMC point set that we used for all tests was a lattice
rule point set determined using the fast CBC algorithm developed by Nuyens and
Cools [9].
Table 1 provides some test results for errors for the six algorithms:
• AR(MC) used BVN simulation with AR and MC points;
• USAR used unsmoothed AR with QMC points;
• SAR used smoothed AR with QMC points;
• AR(QMC) used BVN simulation with AR and QMC points;
• UV(QMC) used univariate simulation with QMC points;
• UV(MC) used univariate simulation with MC points.
All of the algorithm used the GGE univariate variable prioritization algorithm
described in Sect. 2.1. We used the Matlab mvncdf function to compute “exact”
values for each Φ.
The results in Table 1 show, as was expected, that USAR is clearly not competitive
with any of the other algorithms. Somewhat surprisingly, the AR(MC) algorithm had
average errors that were somewhat smaller than the SAR errors, and (2 − 3×) smaller
than the univariate conditioned MC algorithm. The AR(QMC) algorithm had errors
(5 − 10×) smaller than the AR(MC) algorithm and were similar to the UV(QMC)
algorithm errors.
Table 2 provides some test results for times for the six algorithms using Matlab
with a 3.5Ghz processor Linux workstation. The results in Table 2 show that the
300 A. Genz and G. Trinh

Table 1 Average errors for MVN simulation algorithms, 2500 points


n Algorithm average absolute errors, 2500 points
AR(MC) USAR SAR AR(QMC) UV(QMC) UV(MC)
4 0.000039 0.000285 0.000054 0.000008 0.000008 0.000125
5 0.000042 0.000282 0.000097 0.000010 0.000005 0.000137
6 0.000040 0.000370 0.000066 0.000008 0.000005 0.000154
7 0.000056 0.000279 0.000071 0.000007 0.000005 0.000109
8 0.000052 0.000341 0.000075 0.000007 0.000005 0.000111
9 0.000039 0.000335 0.000094 0.000007 0.000005 0.000138
10 0.000066 0.000324 0.000224 0.000005 0.000006 0.000126
11 0.000045 0.000278 0.000073 0.000003 0.000004 0.000113
12 0.000046 0.000298 0.000101 0.000005 0.000003 0.000107
13 0.000036 0.000316 0.000072 0.000004 0.000003 0.000100
14 0.000050 0.000354 0.000079 0.000003 0.000003 0.000106
15 0.000026 0.000406 0.000066 0.000006 0.000003 0.000099

Table 2 Average times(s) for MVN simulation algorithms, 2500 points


n Algorithm average Matlab times(s), 2500 points
AR(MC) USAR SAR AR(QMC) UV(QMC) UV(MC)
4 0.486 0.479 0.471 0.509 0.007 0.008
5 0.899 0.657 0.656 0.926 0.009 0.011
6 1.072 0.829 0.836 1.096 0.011 0.013
7 1.478 1.007 1.014 1.519 0.013 0.016
8 1.649 1.183 1.195 1.686 0.015 0.018
9 2.069 1.357 1.378 2.107 0.016 0.021
10 2.226 1.519 1.553 2.271 0.018 0.023
11 2.626 1.689 1.725 2.695 0.020 0.026
12 2.800 1.864 1.910 2.862 0.022 0.029
13 3.208 2.067 2.087 3.284 0.024 0.032
14 3.380 2.204 2.269 3.440 0.026 0.034
15 3.784 2.405 2.449 3.865 0.028 0.037

AR algorithms takes more time (the difference increasing with dimension) com-
pared to the approximately equal time USAR and SAR algorithms; these AR versus
SAR/USAR time difference are caused by the time needed by AR extra random
number generation and acceptance testing. The UV algorithms take much less time
(≈ 1/100) because these algorithms can easily be implemented in Matlab in a vec-
torized form which allows large sets of Φ(a, b; Σ) approximations to be computed
simultaneously.
Numerical Computation of Multivariate Normal Probabilities … 301

5 Conclusions

The Monte Carlo MVN simulation methods described in this paper which use bivari-
ate conditioning are more accurate than the univariate conditioned Monte Carlo sim-
ulation methods that we tested. However, there is a significant additional time cost
for the bivariate algorithms because there is no simple algorithm for simulation from
truncated BVN distributions.
We also considered the use of QMC methods with bivariate conditioned MVN
computations, but the lack of a direct algorithm for truncated BVN simulation does
not allow the straightforward use of QMC point sequences. But we did test a simple
QMC algorithm which replaces the MC vectors for the truncated BVN AR simu-
lations with QMC vectors and this algorithm was significantly more accurate than
the MC algorithm, with error levels comparable to the univariate conditioned QMC
algorithm. We also derived a smoothed AR algorithm which could be used with a
QMC sequence for truncated BVN simulation. But, when this algorithm was com-
bined in the bivariate conditioned MVN algorithm, the testing showed this smoothed
AR BVN conditioned algorithm had larger errors than the MC AR BVN conditioned
algorithm. The complete algorithm was not as accurate as a univariate conditioned
QMC algorithm. The bivariate conditioned algorithms also require significantly more
time than the (easily vectorized) univariate conditioned algorithms. Unfortunately,
the goal of finding a bivariate conditioned QMC MVN algorithm has not been satis-
fied. It is possible that a more direct algorithm for truncated BVN simulation could
lead to a more efficient MVN computation algorithm based on bivariate conditioning
with QMC sequences, but this is a subject for future research.

References

1. Chopin, N.: Fast simulation of truncated Gaussian distributions. Stat. Comput. 21, 275–288
(2011)
2. Drezner, Z., Wesolowsky, G.O.: On the computation of the bivariate normal integral. J. Stat.
Comput. Simul. 3, 101–107 (1990)
3. Genz, A.: Numerical computation of rectangular bivariate and trivariate normal and t proba-
bilities. Stat. Comput. 14, 151–160 (2004)
4. Genz, A., Bretz, F.: Methods for the computation of multivariate t-probabilities. J. Comput.
Graph. Stat. 11, 950–971 (2002)
5. Genz, A., Bretz, F.: Computation of Multivariate Normal and t Probabilities. Lecture Notes in
Statistics, vol. 195. Springer, New York (2009)
6. Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press,
Baltimore (2012)
7. Gibson, G.J., Glasbey, C.A., Elston, D.A.: Monte Carlo evaluation of multivariate normal
integrals and sensitivity to variate ordering. In: Dimov, I.T., Sendov, B., Vassilevski, P.S. (eds.)
Advances in Numerical Methods and Applications, pp. 120–126. World Scientific Publishing,
River Edge (1994)
8. Hickernell, F.J.: Obtaining O(N −2+ε convergence for lattice quadrature rules. In: Fang, K.T.,
Hickernell, F.J., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2000,
pp. 274–289. Springer, Berlin (2002)
302 A. Genz and G. Trinh

9. Nuyens, D., Cools, R.: Fast algorithms for component-by-component construction of rank-1
lattice rules in shift-invariant Reproducing Kernel Hilbert Spaces. Math. Comput. 75, 903–920
(2006)
10. Moskowitz, B., Caflish, R.E.: Smoothness and dimension reduction in quasi-Monte Carlo
methods. Math. Comput. Model. 23, 37–54 (1996)
11. Muthén, B.: Moments of the censored and truncated bivariate normal distribution. Br. J. Math.
Stat. Psychol. 43, 131–143 (1991)
12. Sándor, Z., András, P.: Alternative sampling methods for estimating multivariate normal prob-
abilities. J. Econ. 120, 207–234 (2002)
13. Schervish, M.J.: Algorithm AS 195: multivariate normal probabilities with error bound. J.
Royal Stat. Soc. Series C 33, 81–94 (1984), correction 34,103–104 (1985)
14. Stewart, G.W.: The efficient generation of random orthogonal matrices with an application to
condition estimators. SIAM J. Numer. Anal. 17(3), 403–409 (1980)
15. Trinh, G., Genz, A.: Bivariate conditioning approximations for multivariate normal probabili-
ties. Stat. Comput. (2014). doi:10.1007/s11222-014-9468-y
16. Wang, X.: Improving the rejection sampling method in quasi-Monte Carlo methods. J. Comput.
Appl. Math. 114, 231–246 (2000)
17. Zhu, H., Dick, J.: Discrepancy bounds for deterministic acceptance-rejection samplers. Elec-
tron. J. Stat. 8, 687–707 (2014)
Non-nested Adaptive Timesteps in Multilevel
Monte Carlo Computations

Michael B. Giles, Christopher Lester and James Whittle

Abstract This paper shows that it is relatively easy to incorporate adaptive timesteps
into multilevel Monte Carlo simulations without violating the telescoping sum on
which multilevel Monte Carlo is based. The numerical approach is presented for
both SDEs and continuous-time Markov processes. Numerical experiments are given
for each, with the full code available for those who are interested in seeing the
implementation details.

Keywords multilevel Monte Carlo · adaptive timestep · SDE · continuous-time


Markov process

1 Multilevel Monte Carlo and Adaptive Simulations

Multilevel Monte Carlo methods [4, 6, 8] are a very simple and general approach to
improving the computational efficiency of a wide range of Monte Carlo applications.
Given a set of approximation levels  = 0, 1, . . . , L giving a sequence of approxi-
mations P of a stochastic output P, with the cost and accuracy both increasing as 
increases, then a trivial telescoping sum gives


L
E[PL ] = E[P0 ] + E[P − P−1 ], (1)
=1

expressing the expected value on the finest level as the expected value on the coarsest
level of approximation plus a sum of expected corrections.

M.B. Giles (B) · C. Lester · J. Whittle


Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
e-mail: mike.giles@maths.ox.ac.uk
C. Lester
e-mail: christopher.lester@maths.ox.ac.uk

© Springer International Publishing Switzerland 2016 303


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_14
304 M.B. Giles et al.

Approximating each of the expectations on the r.h.s. of (1) independently using


N samples, we obtain the multilevel estimator


L N 
 
Y = Y , Y = N−1 P(n) − P−1
(n)

=0 n=1

with P−1 ≡ 0. The Mean Square Error of this estimator can be shown to be


L
E[(Y − E[P])2 ] = (E[PL ] − E[P])2 + N−1 V
=0

where V ≡ V[P − P−1 ] is the variance of a single multilevel correction sample


on level . To ensure that the MSE is less than some given accuracy ε2 , it is then
sufficient
√ to choose the finest level L so that the bias |E[PL ]−E[P]| is less than
ε/ 2, and the number of samples N so that the variance sum is less than ε2 /2.
If C is the cost of a single sample P − P−1 , then a constrained optimisation,
minimising the computational cost for a fixed total variance, leads to

L 


N = 2 ε−2 V /C V C .
 =0

In the particular case in which |E[P ]−E[P] | ∝ 2−α , V ∝ 2−β , C ∝ 2γ  , as


 → ∞, this results in the total cost to achieve the ε2 MSE accuracy being

⎨ O(ε−2 ), β > γ,
C = O(ε−2 (log ε−1 )2 ), β = γ ,

O(ε−2−(γ −β)/α ), β < γ .

The above is a quick overview of the multilevel Monte Carlo (MLMC) approach.
In the specific context of outputs which are functionals of the solution of an SDE, most
MLMC implementations use a set of levels with exponentially decreasing uniform
timesteps, i.e. on level  the uniform timestep is

h  = M − h 0

where M is an integer. When using the Euler–Maruyama approximation it is usually


found that the optimum value for M is in the range 4–8, whereas for higher order
strong approximations such as the Milstein first order approximation it is found that
M = 2 is best.
The MLMC implementation is then very straightforward. In computing a single
correction sample P − P−1 , one can first generate the Brownian increments for the
fine path simulation which leads to the output P . The Brownian increments can then
be summed in groups of size M to provide the Brownian increments for the coarse
Non-nested Adaptive Timesteps … 305

path simulation which yields the output P−1 . The strong convergence properties of
the numerical approximation ensure that the difference between the fine and coarse
path simulations decays exponentially as  → ∞, and therefore the output difference
P − P−1 also decays exponentially; this is an immediate consequence if the output
is a Lipschitz functional of the path solution, but in other cases it requires further
analysis.
In the computational finance applications which have motivated a lot of MLMC
research, it is appropriate to use uniform timesteps on each level because the drift
and volatility in the SDEs does not vary significantly from one path to another, or
from one time to another. However, in other applications with large variations in
drift and volatility, adaptive timestepping can provide very significant reductions in
computational cost for a given level of accuracy [15]. It can also be used to address
difficulties with SDEs such as

dSt = −St3 dt + dWt ,

which have a super-linear growth in the drift and/or the volatility, which otherwise
lead to strong instabilities when using uniform timesteps [11].
The most significant prior research on adaptive timestepping in MLMC has been
by Hoel, von Schwerin, Szepessy and Tempone [9] and [10]. In their research, they
construct a multilevel adaptive timestepping discretisation in which the timesteps
used on level  are a subdivision of those used on level −1, which in turn are
a subdivision of those on level −2, and so on. By doing this, the payoff P on
level  is the same regardless of whether one is computing P − P−1 or P+1 − P ,
and therefore the MLMC telescoping summation, (1), is respected. Another notable
aspect of their work is the use of adjoint/dual sensitivities to determine the optimal
timestep size, so that the adaptation is based on the entire path solution.
In this paper, we introduce an alternative approach in which the adaptive timesteps
are not nested, so that the timesteps on level  do not correspond to a subdivision
of the timesteps on level −1. This leads to an implementation which is perhaps a
little simpler, and perhaps a more natural extension to existing adaptive timestepping
methods. The local adaptation is based on the current state of the computed path, but
it would also work with adjoint-based adaptation based on the entire path. We also
show that it extends very naturally to continuous-time Markov processes, extending
ideas due to Anderson and Higham [1, 2]. The key point to be addressed is how to
construct a tight coupling between the fine and coarse path simulations, and at the
same time ensure that the telescoping sum is fully respected.

2 Non-nested Adaptive Timesteps

The essence of the approach to non-nested adaptive timestepping in MLMC is illus-


trated in Fig. 1.
306 M.B. Giles et al.

Fig. 1 Simulation times for multilevel Monte Carlo with adaptive timesteps

Algorithm 1 Outline of the algorithm for a single MLMC sample for  > 0 for a
scalar Brownian SDE with adaptive timestepping for the time interval [0, T ].
t := 0; t c := 0; t f := 0
h c := 0; h f := 0
ΔW c := 0; ΔW f := 0

while (t < T ) do
told := t
t := min(t c , t f )
ΔW := N (0, t −told )
ΔW c := ΔW c + ΔW
ΔW f := ΔW f + ΔW

if t = t c then
update coarse path using h c and ΔW c
compute new adapted coarse path timestep h c
h c := min(h c , T −t c )
t c := t c + h c
ΔW c := 0
end if

if t = t f then
update fine path using h f and ΔW f
compute new adapted fine path timestep h f
h f := min(h f , T −t f )
t f := t f + h f
ΔW f := 0
end if

end while

compute P − P−1

For Brownian diffusion SDEs, level  uses an adaptive timestep of the form
h  = M − H (Sn ), where M > 1 is a real constant, and H (S) is independent of
level. This automatically respects the telescoping summation, (1), since the adap-
tive timestep on level  is the same regardless of whether it is the coarser or finer of
the two paths being computed. On average, the adaptive timestepping leads to simu-
lations on level  having approximately M times as many timesteps as level −1, but
it also results in timesteps which are not naturally nested, so the simulation times for
the coarse path do not correspond to simulation times on the fine path. It may appear
that this would cause difficulties in the strong coupling between the coarse and fine
Non-nested Adaptive Timesteps … 307

paths in the MLMC implementation, but it does not. As usual, what is essential to
achieve a low multilevel correction variance V is that the same underlying Brownian
path is used for both the fine and coarse paths. Figure 1 shows a set of simulation
times which is the union of the fine and coarse path times. This defines a set of inter-
vals, and for each interval we generate a Brownian increment with the appropriate
variance. These increments are then summed to give the Brownian increments for
the fine and coarse path timesteps.
An outline implementation to compute a single sample of P − P−1 for  > 0 is
given in Algorithm 1. This could use either an Euler–Maruyama discretisation of the
SDE, or a first order Milstein discretisation for those SDEs which do not require the
simulation of Lévy area terms.
Adaptive timestepping for continuous-time Markov processes works in a very
similar fashion. The evolution of a continuous-time Markov process can be described
by t

St = S0 +

ν j Pj λ j (Ss ) ds
j 0

where the summation is over the different reactions, ν j is the change due to reaction
j (the number of molecules of each species which are created or destroyed), the P
j
are independent unit-rate Poisson processes, and λ j is the propensity function for
the j th reaction, meaning that λ j (St ) dt is the probability of reaction j taking place
in the infinitesimal time interval (t, t +dt).
λ j (St ) should be updated after each individual reaction, since it changes St , but in
the tau-leaping approximation [7] λ j is updated only at a fixed set of update times.
This is the basis for the MLMC construction due to Anderson and Higham [1].
Using nested uniform timesteps, with h c = 2 h f , each coarse timestep is split into
two fine timesteps, and for each of the fine timesteps one has to compute
 appropriate


f
Poisson increments P j λ j h for the coarse path and P j λ j h for the fine path.
c f f

To achieve a tight coupling between the coarse and fine paths, they use the fact that
f f
λcj = min(λcj , λ j ) + |λcj − λ j | 1λc >λ f ,
j j
f f f
λj = min(λcj , λ j ) + |λcj − λj | 1λc <λ f ,
j j

together with the fact that a Poisson variate P(a


+b) is equivalent in distribution to
the sum of independent Poisson variates P(a),

P(b). Hence, they generate common


Poisson variates P(min(λ c


j , λ
f
j ) h f
) and

P(|λ c
j −λ
f
j h ) and use these to give the
| f

Poisson variates for the coarse and fine paths over the same fine timestep.
As outlined in Algorithm 2, the extension of adaptive timesteps to continuous-
time Markov processes based on the tau-leaping approximation is quite natural. The
Poisson variates are computed for each time interval in the time grid formed by the
union of the coarse and fine path simulation times. At the end of each coarse timestep,
the propensity functions λc are updated, and a new adapted timestep h c is defined.
Similarly, λ f and h f are updated at the end of each fine timestep.
308 M.B. Giles et al.

Algorithm 2 Outline of the algorithm for a single MLMC sample for a continuous-
time Markov process with adaptive timestepping for the time interval [0, T ].
t := 0; t c := 0; t f := 0
λc := 0; λ f := 0
h c := 0; h f := 0

while (t < T ) do
told := t
t := min(t c , t f )
h := t − told

for each reaction, generate Poisson variates P(min(λ


c −λ f | h),
c , λ f ) h), P(|λ

use Poisson variates to update fine and coarse path solutions

if t = t c then
update coarse path propensities λc
compute new adapted coarse path timestep h c
h c := min(h c , T −t c )
t c := t c + h c
end if

if t = t f then
update fine path propensities λ f
compute new adapted fine path timestep h f
h f := min(h f , T −t f )
t f := t f + h f
end if

end while

compute P − P−1

The telescoping sum is respected because, for each timestep of either the coarse or
fine path simulation, the sum of the Poisson variates for the sub-intervals is equivalent
in distribution to the Poisson variate for the entire timestep, and therefore the expected
value E[P ] is unaffected.

3 Numerical Experiments

3.1 FENE SDE Kinetic Model

A kinetic model for a dilute solution of polymers in a fluid considers each molecule as
a set of balls connected by springs. The balls are each subject to random forcing from
the fluid, and the springs are modelled with a FENE (finitely extensible nonlinear
elastic) potential which increases without limit as the length of the bond approaches
a finite value [3].
Non-nested Adaptive Timesteps … 309

In the case of a molecule with just one bond, this results in the following 3D SDE
for the vector length of the bond:


dqt = − qt dt + 2 dWt
1−qt 2

where μ = 4 for the numerical experiments to be presented, and Wt is a 3D driving


Brownian motion. Note that the drift term ensures that qt  < 1 for all time, and this
property should be respected in the numerical approximation.
An Euler–Maruyama discretisation of the SDE using timestep h n gives

4μh n
qn+1 = qn − qn + 2 ΔWn
1−qn 2

and because the volatility is constant, one would expect this to give first order strong
convergence. The problem is that this discretisation leads to qn+1  > 1 with positive
probability, since ΔWn is unbounded.

-5 -2

-4
-10
log 2 variance

-6
log 2 |mean|

-15 -8

-10
-20 Pl Pl
-12
Pl - P l-1 Pl - P l-1

-25 -14
0 2 4 6 0 2 4 6
level l level l

10 6
1
10
=0.0005 Std MC
=0.001 MLMC
=0.002
=0.005
10 4 =0.01
Cost

0
Nl

10
2

10 2

10 0 10 -1
0 2 4 6 10-3 10-2
level l accuracy

Fig. 2 MLMC results for the FENE model using adaptive timesteps
310 M.B. Giles et al.

This problem is addressed in two ways. The first is√to use adaptive timesteps which
become much smaller as qn  → 1. Since ΔWn = h Z n , where the component of
Z n in the direction normal to the boundary is a standard Normal random variable
which is very unlikely to take a value with magnitude greater than 3, we choose the
timestep so that

6 h n ≤ 1 − qn 

so the stochastic term is highly unlikely to take across the boundary. In addition, the
drift term is singular at the boundary and therefore for accuracy we want the drift
term to be not too large relative to the distance to the boundary so that it will not
change by too much during one timestep. Hence, we impose the restriction

2μh n
≤ 1−qn .
1−qn 

Combining these two gives the adaptive timestep

(1−qn )2
H (qn ) = ,
max(2μ, 36)

on the coarsest level of approximation. On finer levels, the timestep is h n = 2−


H (qn ) so that level  has approximately 2 times as many timesteps as level 0.
Despite the adaptive timestep there is still an extremely small possibility that the
numerical approximation gives qn+1  > 1. This is handled by introducing clamping
with

clamped 1−δ
qn+1 := qn+1
qn+1 

if qn+1  > 1 − δ, with δ typically chosen to be 10−5 , which corresponds to an adap-


tive timestep of order 10−10 for the next timestep. Numerical experiments suggest
that this value for δ does not lead to any significant bias in the output of interest.
The output of interest in the initial experiments is E[q2 ] at time T = 1, having
started from initial data q = 0 at time t = 0. Figure 2 presents the MLMC results,
showing first order convergence for the weak error (top right plot) and second order
convergence for the multilevel correction variance (top left plot). Thus, in terms of
the standard MLMC theory we have α = 1, β = 2, γ = 1, and hence the computational
cost for RMS accuracy ε is O(ε−2 ); this is verified in the bottom right plot, with the
bottom left plot showing the number of MLMC samples on each level as a function
of the target accuracy.
Non-nested Adaptive Timesteps … 311

3.2 Dimerization Model

This dimerization model involving 3 species and 4 reactions has been used widely as
a test of stochastic simulation algorithms [7, 16] as it exhibits behaviour on multiple
timescales. The reaction network is given by:

1 1/25
R1 : S1 −
→ ∅, R2 : S2 −−→ S3 ,
1/500 1/2 (2)
R3 : S1 + S1 −−−→ S2 , R4 : S2 −→ S1 + S1 .

and the corresponding propensity functions for the 4 reactions are

λ1 = S1 , λ2 = (1/25) S2 ,
(3)
λ3 = (1/500) S1 (S1 −1), λ4 = (1/2) S2 ,

where S1 , S2 , S3 are the numbers of each of the 3 species.


We take the initial conditions to be [S1 , S2 , S3 ]T = [105 , 0, 0]T . In order to under-
stand the dynamics of system (2), Fig. 3 presents the temporal evolution of a single
sample path of the system generated by the Gillespie method which simulates each
individual reaction. The behaviour is characterised by two distinct time scales, an
initial transient phase in which there is rapid change, and a subsequent long phase
in which the further evolution is very slow.
This motivates the use of adaptive timesteps. The expected change in species Si in
one timestep of size h is approximately equal to h j νi j λ j , where νi j is the change
in species i due to reaction j and the summation is over all of the reactions. Hence,

10 4 Transient phase 10 4 Long phase


10 10
S S
1 1
S S
8 2 8 2
S S
3 3
Copy number

Copy number

6 6

4 4

2 2

0 0
0 0.01 0.02 0.03 0 10 20 30
Time Time

Fig. 3 The temporal evolution of a single sample path of reaction system (2) on two different
time-scales. Reaction rates are given in (3) and initial conditions are as described in the text
312 M.B. Giles et al.

to ensure that there is no more than a 25 % change in any species in one timestep,
the timestep on the coarsest level is taken to be
 
Si + 1
H = 0.25 min  . (4)
i | j νi j λ j |

On level , this timestep is multiplied by M − . The choice M = 4 is found to be good;


this is in line with experience and analysis of SDEs which shows that values for M
in the range 4–8 are good when the multilevel variance is O(h), as it is with this
continuous-time Markov process application [2].
The output quantity of interest is E[S3 ] at time T = 30, which is the maximum
time shown in Fig. 3. The value is approximately 20,000, so much larger values for ε
are appropriate in this case. The MLMC results for this testcase in Fig. 4 indicate that
the MLMC parameters are α = 2, β = 2, γ = 2, and hence the computational cost is
O(ε−2 (log ε)2 ). Additional results show that the computational efficiency is much
greater than using uniform timesteps.

14 15

12
10
log 2 variance

10
log 2 |mean|

8 5

6
P
0 P
l l
4
Pl - P l-1 Pl - P l-1

2 -5
0 2 4 6 0 2 4 6
level l level l

106 10 8
=1 Std MC
=2 MLMC
=5
=10
104 =20 10 7
Cost
Nl

102
10 6

100
0 2 4 6 10 0 10 1
level l accuracy

Fig. 4 MLMC results for the continuous-time Markov process using adaptive timesteps
Non-nested Adaptive Timesteps … 313

Note that these numerical results do not include a final multilevel correction which
couples the tau-leaping approximation on the finest grid level to the unbiased Sto-
chastic Simulation Algorithm which simulates each individual reaction. This addi-
tional coupling is due to Anderson and Higham [1], and the extension to adaptive
timestepping is discussed in [12]. Related research on adaptation has been carried out
by [13, 14].

4 Conclusions

This paper has just one objective, to explain how non-nested adaptive timesteps
can be incorporated very easily within multilevel Monte Carlo simulations, without
violating the telescoping sum on which MLMC is based.
Outline algorithms and accompanying numerical demonstrations are given for
both SDEs and continuous-time Markov processes. For those interested in learning
more about the implementation details, the full MATLAB code for the numerical
examples is available with other example codes prepared for a recent review paper
[5, 6].
Future papers will investigate in more detail the FENE simulations, including
results for molecules with multiple bonds and the interaction with fluids with non-
uniform velocity fields, and the best choice of adaptive timesteps for continuous-time
Markov processes [12].
The adaptive approach could also be extended easily to Lévy processes and other
processes in which the numerical approximation comes from the simulation of incre-
ments of a driving process over an appropriate set of time intervals formed by a union
of the simulation times for the coarse and fine path approximations.

Acknowledgments MBG’s research was funded in part by EPSRC grant EP/H05183X/1, and CL
and JW were funded in part by a CCoE grant from NVIDIA. In compliance with EPSRC’s open
access initiative, the data in this paper, and the MATLAB codes which generated it, are available from
doi:10.5287/bodleian:s4655j04n. This work has benefitted from extensive discussions
with Ruth Baker, Endre Süli, Kit Yates and Shenghan Ye.

References

1. Anderson, D., Higham, D.: Multi-level Monte Carlo for continuous time Markov chains with
applications in biochemical kinetics. SIAM Multiscale Model. Simul. 10(1), 146–179 (2012)
2. Anderson, D., Higham, D., Sun, Y.: Complexity of multilevel Monte Carlo tau-leaping. SIAM
J. Numer. Anal. 52(6), 3106–3127 (2014)
3. Barrett, J., Süli, E.: Existence of global weak solutions to some regularized kinetic models for
dilute polymers. SIAM Multiscale Model. Simul. 6(2), 506–546 (2007)
4. Giles, M.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
5. Giles, M.: Matlab code for multilevel Monte Carlo computations. http://people.maths.ox.ac.
uk/gilesm/acta/ (2014)
314 M.B. Giles et al.

6. Giles, M.: Multilevel Monte Carlo methods. Acta Numer. 24, 259–328 (2015)
7. Gillespie, D.: Approximate accelerated stochastic simulation of chemically reacting systems.
J. Chem. Phys. 115(4), 1716–1733 (2001)
8. Heinrich, S.: Multilevel Monte Carlo methods. In: Multigrid Methods. Lecture Notes in Com-
puter Science, vol. 2179, pp. 58–67. Springer, Heidelberg (2001)
9. Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Adaptive multilevel Monte Carlo
simulation. In: Engquist, B., Runborg, O., Tsai, Y.H. (eds.) Numerical Analysis of Multiscale
Computations, vol. 82, pp. 217–234. Lecture Notes in Computational Science and Engineering.
Springer, Heidelberg (2012)
10. Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Implementation and analysis of an
adaptive multilevel Monte Carlo algorithm. Monte Carlo Methods Appl. 20(1), 1–41 (2014)
11. Hutzenthaler, M., Jentzen, A., Kloeden, P.: Divergence of the multilevel Monte Carlo method.
Ann. Appl. Prob. 23(5), 1913–1966 (2013)
12. Lester, C., Yates, C., Giles, M., Baker, R.: An adaptive multi-level simulation algorithm for
stochastic biological systems. J. Chem. Phys. 142(2) (2015)
13. Moraes, A., Tempone, R., Vilanova, P.: A multilevel adaptive reaction-splitting simulation
method for stochastic reaction networks. Preprint arXiv:1406.1989 (2014)
14. Moraes, A., Tempone, R., Vilanova, P.: Multilevel hybrid Chernoff tau-leap. SIAM J. Multiscale
Model. Simul. 12(2), 581–615 (2014)
15. Müller-Gronbach, T.: Strong approximation of systems of stochastic differential equations.
Habilitation thesis, TU Darmstadt (2002)
16. Tian, T., Burrage, K.: Binomial leap methods for simulating stochastic chemical kinetics. J.
Chem. Phys. 121(10), 356 (2004)
On ANOVA Decompositions of Kernels
and Gaussian Random Field Paths

David Ginsbourger, Olivier Roustant, Dominic Schuhmacher,


Nicolas Durrande and Nicolas Lenz

Abstract The FANOVA (or “Sobol’-Hoeffding”) decomposition of multivariate


functions has been used for high-dimensional model representation and global sen-
sitivity analysis. When the objective function f has no simple analytic form and is
costly to evaluate, computing FANOVA terms may be unaffordable due to numerical
integration costs. Several approximate approaches relying on Gaussian random field
(GRF) models have been proposed to alleviate these costs, where f is substituted
by a (kriging) predictor or by conditional simulations. Here we focus on FANOVA
decompositions of GRF sample paths, and we notably introduce an associated kernel
decomposition into 4d terms called KANOVA. An interpretation in terms of tensor
product projections is obtained, and it is shown that projected kernels control both
the sparsity of GRF sample paths and the dependence structure between FANOVA
effects. Applications on simulated data show the relevance of the approach for design-
ing new classes of covariance kernels dedicated to high-dimensional kriging.

D. Ginsbourger (B)
Uncertainty Quantification and Optimal Design group, Idiap Research Institute,
Rue Marconi 19, 1920 Martigny, Switzerland
e-mail: ginsbourger@stat.unibe.ch
D. Ginsbourger
IMSV, Department of Mathematics and Statistics, University of Bern,
Alpeneggstrasse, 22, 3012 Bern, Switzerland
O. Roustant · N. Durrande
Mines Saint-Etienne, UMR CNRS 6158, LIMOS, 42023 Saint-etienne, France
e-mail: roustant@emse.fr
N. Durrande
e-mail: durrande@emse.fr
D. Schuhmacher
Institut für Mathematische Stochastik, Georg-August-Universität Göttingen,
Goldschmidtstraße 7, 37077 Göttingen, Germany
e-mail: dominic.schuhmacher@mathematik.uni-goettingen.de
N. Lenz
geo7 AG,
Neufeldstrasse 5-9, 3012 Bern, Switzerland
e-mail: nicolas.lenz@geo7.ch
© Springer International Publishing Switzerland 2016 315
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_15
316 D. Ginsbourger et al.

Keywords Gaussian processes · Sensitivity analysis · Kriging · Covariance func-


tions · Conditional simulations

1 Introduction: Metamodel-Based Global Sensitivity


Analysis

Global Sensitivity Analysis (GSA) is a topic of importance for the study of complex
systems as it aims at uncovering among many candidates which variables and interac-
tions are influential with respect to some response of interest. FANOVA (Functional
ANalysis Of VAriance) [2, 10, 13, 32] has become commonplace for decomposing
a real-valued function f of d-variables into a sum of 2d functions (a.k.a. effects) of
increasing dimensionality, and quantifying the influence of each variable or group
of variables through the celebrated Sobol’ indices [27, 33]. In practice f is rarely
known analytically and a number of statistical procedures have been proposed for
estimating Sobol’ indices based on a finite sample of evaluations of f ; see, e.g., [15].
Alternatively, a pragmatic approach to GSA, when the evaluation budget is drasti-
cally limited by computational cost or time, is to first approximate f using some
class of surrogate models (e.g., regression, neural nets, splines, wavelets, kriging;
see [37] for an overview) and then to perform the analysis on the obtained cheap-
to-evaluate surrogate model. Here we focus on kriging and Gaussian random field
(GRF) models, with an emphasis on the interplay between covariance kernels and
FANOVA decompositions of corresponding centred GRF sample paths.
While GSA relying on kriging have been used for at least two decades [40],
Bayesian GSA under a GRF prior seems to originate in [24], where posterior effects
and related quantities were derived. Later on, posterior distributions of Sobol’ indices
were investigated in [14, 22] relying on conditional simulations, an approach revisited
and extended to multi-fidelity computer codes in [20]. From a different perspective,
FANOVA-graphs were used in [23] to incorporate GSA information into a kriging
model, and a special class of kernels was introduced in [6] for which Sobol’ indices
of the kriging predictor are analytically tractable. Moreover, kernels leading to GRFs
with additive paths has been discussed in [5], and FANOVA decompositions of GRFs
and their covariance were touched upon in [21] where GRFs with ortho-additive paths
were introduced. Also, kernels investigated in [6] were revisited in [4] in the context
GSA with dependent inputs, and a class of kernels related to ANOVA decompositions
was studied in [8, 9]. In a different setup, GRF priors have been used for Bayesian
FANOVA with functional responses [16].
In the present paper we investigate ANOVA decompositions both for
(symmetric positive definite) kernels and for associated centred GRFs. We show
that under standard integrability conditions, s.p.d. kernels can be decomposed into
4d terms that govern the joint distribution of the 2d terms of the associated GRF
FANOVA decomposition. This has some serious consequences in kriging-based
GSA, as for instance the choice of a sparse kernel induces almost sure sparsity
of the associated GRF paths, and such phenomenon cannot be compensated by con-
ditioning on data.
On ANOVA Decompositions of Kernels and Gaussian … 317

2 Preliminaries and Notation

FANOVA. We focus on measurable f : D ⊆ Rd −→ R (d ∈ N\{0}). In FANOVA


d
with independent inputs, D is typically assumed to be of the form D = i=1 Di for
some measurable subsets Di ∈ B(R), where each Di is endowed dwith a probability
measure νi and D is equipped with the product measure ν = i=1 νi . Assuming
further that f is square-integrable w.r.t. ν, f can be expanded into as sum of 2d terms
indexed by the subsets u ⊆ I = {1, . . . , d} of the d variables’ indices,

f = fu , (1)
u⊆I

where each f u ∈ F = L2 (ν) depends only on the variables x j with j ∈ u (up to


an a.e. equality, as all statements involving L2 from Eq. (1)
 on). Uniqueness of this
decomposition is classically guaranteed by imposing that f u ν j (dx j ) = 0 for every
j ∈ u. Any f u , or FANOVA effect, can then be expressed in closed form as
 

f u : x ∈ D −→ f u (x1 , . . . , xd ) = (−1)|u|−|u | f (x1 , . . . , xd ) ν−u (dx−u ),
u ⊆u

 (2)
where ν−u = j∈I \u  ν j and x −u  = (x i )i∈I \u . As developed in [19], Eq. (2) is

a special case of a decomposition relying on commuting projections. Denoting by


P j : f ∈ F −→ f dν j the orthogonal projector onto the subspace F j of f ∈ F
not depending on x j , the identity on F can be expanded as




d
  
IF = (IF − P j ) + P j = (IF − P j ) Pj . (3)
j=1 u⊆I j∈u j∈I \u

FANOVA effects appear then as images of f under  the orthogonal projection opera-
 ⊥

tors onto the associated subspaces Fu = j ∈u
/ F j ∩ j∈u F j , i.e. we have that
   
f u = Tu ( f ), where Tu = (I
j∈u F − P j ) j ∈u
/ P j . Finally, the squared norm

of f decomposes by orthogonality as  f  = u⊆I Tu ( f )2 and the influence of
2

each (group of) variable(s) on f can be quantified via the Sobol’ indices

Tu ( f − T∅ ( f ))2 Tu ( f )2


Su ( f ) = = , u
= ∅. (4)
 f − T∅ ( f )2  f − T∅ ( f )2

Gaussian random fields (GRFs). A random field indexed by D is a collection of


random variables Z = (Z x )x∈D defined on a common probability space (Ω, A , P).
The random field is called a Gaussian random field (GRF) if (Z x(1) , . . . , Z x(n) ) is
n-variate normally distributed for any x(1) , . . . , x(n) ∈ D (n ≥ 1). The distribution
of Z is then characterized by its mean function m(x) = E[Z x ], x ∈ D, and covari-
318 D. Ginsbourger et al.

ance function k(x, y) = Cov(Z x , Z y ), x, y ∈ D. It is well-known that admissible


covariance functions coincide with symmetric positive definite (s.p.d.) kernels on
D × D [3].
A multivariate GRF taking values in R p is a collection of R p -valued random
( j)
vectors Z = (Z x )x∈D such that Z x(i) , 1 ≤ i ≤ n, 1 ≤ j ≤ p, are jointly np-variate
normally distributed for any x(1) , . . . , x(n) ∈ D. The distribution of Z is charac-
terized by its R p -valued mean function and a matrix-valued covariance function
(ki j )i, j∈{1,..., p} .
In both real- and vector-valued cases (assuming additional technical conditions
where necessary) k governs a number of pathwise properties ranging from square-
integrability to continuity, differentiability and more; see e.g. Sect. 1.4 of [1] or
Chap. 5 of [30] for details. As we will see in Sect. 4, k actually also governs the
FANOVA decomposition of GRF paths ω ∈ Ω −→ Z • (ω) ∈ R D . Before establish-
ing this result, let us first introduce a functional ANOVA decomposition for kernels.

3 KANOVA: A Kernel ANOVA Decomposition

Essentially we apply the 2d-dimensional version of the decomposition introduced


in Sect. 2 to ν ⊗ ν-square integrable kernels k (s.p.d. or not). From a formal point
of view it is more elegant and leads to more efficient notation if we work with the
tensor products Tu ⊗ Tv : F ⊗ F −→ F ⊗ F . It is well known that L 2 (ν ⊗ ν)
and F ⊗ F are isometrically isomorphic (see [17] for details on tensor products of
Hilbert spaces), and we silently identify them here for simplicity. Then Tu ⊗ Tv =
Tu(1) Tv(2) = Tv(2) Tu(1) , where Tu(1) , Tv(2) : L 2 (ν ⊗ ν) −→ L 2 (ν ⊗ ν) are given by
(Tu(1) k)(x, y) = (Tu (k(•, y))(x) and (Tv(2) k)(x, y) = (Tv (k(x, •))(y).

Theorem 1 Let k be ν ⊗ ν-square integrable.


(a) There exist ku,v ∈ L2 (ν ⊗ ν) depending
 solely on (xu , yv ) such that k can be
decomposed in a unique way as k = u,v⊆I ku,v under the conditions
 
∀u, v ⊆ I ∀i ∈ u ∀ j ∈ v ku,v νi (dxi ) = 0 and ku,v ν j (dy j ) = 0. (5)

We have
 
 
ku,v (x, y) = (−1)|u|+|v|−|u |−|v | k(x, y) ν−u (dx−u ) ν−v (dy−v ).
u ⊆u v ⊆v
(6)
Moreover, ku,v may be written concisely as ku,v = [Tu ⊗ Tv ]k.
(b) Suppose that D is compact and k is a continuous s.p.d. kernel. Then, for any
d
(αu )u⊆I ∈ R2 , the following function is also a s.p.d. kernel:
On ANOVA Decompositions of Kernels and Gaussian … 319

(x, y) ∈ D × D −→ αu αv ku,v (x, y) ∈ R. (7)
u⊆I v⊆I

Proof The proofs are in the appendix to facilitate the reading. 

Example 1 (The Brownian kernel) Consider the covariance kernel k(x, y) =


min(x, y) of the Brownian motion on D = [0, 1], and suppose that ν is the
Lebesgue measure. The ku,v ’s can then easily be obtained by direct calculation:
2 2
k∅,∅ = 13 , k∅,{1} (y) = y − y2 − 13 , k{1},∅ (x) = x − x2 − 13 , and k{1},{1} (x, y) =
x2 y2
min(x, y) − x + 2
−y+ 2
+ 13 .
Example 2 Consider the very common class of tensor product kernels: k(x, y) =
 d
i=1 ki (x i , yi ) where the ki ’s are 1-dimensional symmetric kernels. It turns out that
Eq. (6) boils down to a sum depending on 1- and 2-dimensional integrals, since

k(x, y)dν−u (x−u )dν−v (y−v ) =
      
ki (xi , yi ) · ki (xi , ·)dνi · ki (·, yi )dνi · ki d(νi ⊗ νi ).
i∈u∩v i∈u\v i∈v\u i ∈u∪v
/
(8)

By symmetry of k, Eq. (8) solely depends on the integrals ki d(νi ⊗νi ) and integral
functions t → ki (·, t)dνi , i = 1, . . . , d. We refer to Sect. 7 for explicit calculations
using typical ki ’s. A particularly convenient case is considered next.
Corollary 1 Let ki(0) : Di × Di −→ R (1 ≤ i ≤ d) be argumentwise centred, i.e.
 
such that ki(0) (·, t)dνi = ki(0) (s, ·)dνi = 0 for all i ∈ I and s, t ∈ Di , and
d
consider k(x, y) = i=1 (1 + ki(0) (xi , yi )). Then the KANOVA decomposition of k

consists of the terms [Tu ⊗ Tu ]k(x, y) = i∈u ki(0) (xi , yi ) and [Tu ⊗ Tv ]k = 0 if
u
= v.
d
Remark 1 By taking k(x, y) = i=1 (1+ki(0) (xi , yi )), where ki0 are s.p.d., we recover
the so-called ANOVA kernels [6, 38, 39]. Corollary 1 guarantees for argumentwise
centred ki(0) (see, e.g., [6, Sect. 2]) that the associated k has a simple KANOVA
decomposition, with analytically tractable ku,u and vanishing ku,v terms (for u
= v),
as also reported in [4] where a GRF model with this structure is postulated.

4 FANOVA Decomposition of Gaussian Random


Field Paths

Let Z = (Z x )x∈D be a centred GRF with covariance function k. To simplify the


arguments we assume for the rest of the article that Di are compact subsets of R
320 D. Ginsbourger et al.

and that Z has continuous sample paths. The latter can be guaranteed by a weak
condition on the covariance kernel; see [1], Theorem 1.4.1. For r ∈ N \ {0} write
Cb (D, Rr ) for the space of (bounded) continuous functions D → Rr equipped
with the supremum norm, and set in particular Cb (D) = Cb (D, R). We reinterpret
Tu as maps Cb (D) → Cb (D), which are still bounded linear operators, and set
Z x(u) = (Tu Z )x .
Theorem 2 The 2d -dimensional vector-valued random field (Z x(u) , u ⊆ I )x∈D is
Gaussian, centred, and has continuous sample paths again. Its matrix-valued covari-
ance function is given by

Cov(Z x(u) , Z y(v) ) = [Tu ⊗ Tv ]k (x, y). (9)

Example 3 Continuing from Example 1, let B = (Bx )x∈[0,1] be the Brownian motion
on D = [0, 1], which is a centred GRF with continuous paths. Theorem 2 yields that
1 1
(T∅ B, T{1} B) = ( 0 Bu du, Bx − 0 Bu du)x∈D is a bivariate random field on D, where
T∅ B is a N (0, 1/3)-distributed random variable, while (T{1} Bx ) is a centred GRF
2 2
with covariance kernel k{1},{1} (x, y) = min(x, y) − x + x2 − y + y2 + 13 . The cross-
2
covariance function of the components is given by Cov(T∅ B, T{1} Bx ) = x − x2 − 13 .

Remark 2 Under our conditions on Z and using the notation ∞ √ from the proof of
Theorem 1, we have a Karhunen–Loève expansion Z x = i=1 λi εi φi (x), where
ε = (εi )i∈N\{0} is a standard Gaussian white noise sequence and the series converges
uniformly (i.e. in Cb (D)) with probability 1 (and in L 2 (Ω)); for d = 1 see [1, 18].
Thus by the continuity of Tu , we can expand the projected random field as
 ∞
 ∞
   
Z x(u) = Tu λi εi φi (x) = λi εi Tu (φi ) (x), (10)
i=1 i=1

where the series converges uniformly in x with probability 1 (and in L 2 (Ω)). This
is the basis for an alternative proof of Theorem 2. We can also verify Eq. (9) under
∞ the left/right-continuity of cov in L (Ω), we obtain indeed
2
these
 conditions. Using
(u) (v)
cov Z x , Z y = i=1 λi Tu (φi )(x) Tv (φi )(y) = ku,v (x, y).

Corollary 2 (a) For any u ⊆ I the following statements are equivalent:


(i) Tu (k(•, y)) = 0 for every y ∈ D
(ii) [Tu ⊗ Tu ]k = 0
(iii) [Tu ⊗ Tu ]k(x, x) = 0 for every x ∈ D
(iv) P(Z (u) = 0) = 1
(b) For any u, v ⊆ I with u
= v the following statements are equivalent:
(i) [Tu ⊗ Tv ]k = 0
(ii) Z (u) and Z (v) are two independent GRFs
On ANOVA Decompositions of Kernels and Gaussian … 321

Remark 3 A consequence of Corollary 2 is that choosing a kernel without u com-


ponent in GRF-based GSA will lead to a posterior distribution without u component
whatever the conditioning observations, i.e. P(Z (u) = 0 | Z x1 , . . . , Z xn ) = 1 (a.s.).
However, the analogous result does not hold for cross-covariances between Z (u)
and Z (v) for u
= v. Let us take for instance D = [0, 1], ν arbitrary, and Z t = U + Yt ,
where U ∼ N (0, σ 2 ) (σ > 0) and (Yt ) is a centred Gaussian process with argumen-
twise centred covariance kernel k (0) . Assuming that U and Y are independent, it is
clear that (T∅ Z )s = U and (T{1} Z )t = Yt , so Cov((T∅ Z )s , (T{1} Z )t ) = 0. If in addi-
tion Z was observed at a point r ∈ D, Eq. (9) yields Cov((T∅ Z )s , (T{1} Z )t |Z r ) =
(T∅ ⊗ T{1} )(k(•,
) − k(•, r )k(r,
)/k(r, r ))(s, t), where k(s, t) = σ 2 + k (0) (s, t)
is the covariance kernel of Z . By Eq. (6) we obtain Cov((T∅ Z )s , (T{1} Z )t |Z r ) =
−σ 2 k (0) (r, t)/(σ 2 + k (0) (r, r )), which in general is nonzero.

Remark 4 Coming back to the ANOVA kernels discussed in Remark 1, Corol-


lary 2(b) implies that for a centred
d GRF with continuous sample paths and covariance
kernel of the form k(x, y) = i=1 (1 + ki(0) (xi , yi )), where ki(0) is argumentwise cen-
tred, the FANOVA effects Z (u) , u ⊆ I , are actually independent.

To close this section, let us finally touch upon the distribution of Sobol’ indices
of GRF sample paths, relying on Theorem 2 and Remark 2.

Corollary 3 For u ⊆ I , u
= ∅, we can represent the Sobol’ indices of Z as

Q u (ε, ε)
Su (Z ) =  ,
v
=∅ Q v (ε, ε)

where the Q u ’s are quadratic forms in astandard Gaussian white noise sequence.
∞ ∞ 
In the notation of Remark 2, Q u (ε, ε) = i=1 j=1 λi λ j Tu φi , Tu φ j εi ε j , where
the convergence is uniform with probability 1.

Remark

   the GRF Z = Z − T∅ Z with Karhunen–Loève expansion
5∞ Consider
Z x = i=1 λi φi (x)εi . From Eq. (4) and (the proof of) Corollary 3 we can see that
  ∞  2 
Su (Z ) = Su (Z  ) = i,∞j=1 gi j εi ε j λ ε
i=1 i i , where g 
ij = λi λj Tu φi , Tu φ j .
Truncating both series above at K ∈ N, applying the theorem in Sect. 2 of [29] and
then Lebesgue’s theorem for K → ∞, we obtain

  ∞  −1
ESu (Z ) = gii (1 + 2λi t)3/2 (1 + 2λl t)1/2 dt,
i=1 0 l
=i
∞ 
 ∞  ∞  −1
ESu (Z )2 = (gii g j j + 2gi j 2 ) t (1 + 2λi t)3/2 (1 + 2λl t)1/2 dt.
i=1 j=1 0 l ∈{i,
/ j}
322 D. Ginsbourger et al.

5 Making New Kernels from Old with KANOVA

While kernel methods and Gaussian process modelling have proven efficient in a
number of classification and prediction problems, finding a suitable kernel for a given
application is often judged difficult. It should simultaneously express the desired fea-
tures of the problem at hand while respecting positive definiteness, a mathematical
constraint that is not straightforward to check in practice. In typical implementa-
tions of kernel methods, a few classes of standard stationary kernels are available for
which positive definiteness was established analytically based on the Bochner theo-
rem. On the other hand, some operations on kernels are known to preserve positive-
definiteness, which enables enriching the available dictionary of kernels notably by
multiplication by a positive constant, convex combinations, products and convolu-
tions of kernels, or deformations of the input space. The section Making new kernels
from old of [26] (Sect. 4.2.4) covers a number of such operations. We now consider
some new ways of creating admissible kernels in the context of the KANOVA decom-
position of Sect. 3. Let us first consider as before some square-integrable symmetric
positive definite kernel kold and take u ⊆ I .
One straightforward approach to create a kernel whose associated Gaussian ran-
dom field has paths in Fu is then to plainly take the “simple” projected kernel

knew = πu kold with πu = Tu ⊗ Tu . (11)

From Theorem 1(b), and also from the fact that knew is the covariance function of
Z (u) where Z is a centred GRF with covariance function kold , it is clear that such
kernels are s.p.d.; however, they will generally not be strictly positive definite.
d
Going one step further, one obtains a richer class of 22 symmetric positive definite
kernels by considering parts of P(I ), and designing kernels accordingly. Taking
U ⊂ P(I ), we obtain a further class of projected kernels as follows:
 
knew = πU kold with πU = TU ⊗ TU = Tu ⊗ Tv , where TU = Tu . (12)
u∈U v∈U u∈U

The resulting kernel is again s.p.d., which follows from Theorem 1(b) by choosing
αu = 1 if u ∈U and αu = 0 otherwise, or again by noting that knew is the covariance
function of u∈U Z (u) where Z is a centred GRF with covariance function kold .
Such a kernel contains not only the covariances of the effects associated with the
different subsets of U , but also cross-covariances between these effects. Finally,
another relevant class of positive definite projected kernels can be designed by taking

knew = πU
kold with πU
= Tu ⊗ Tu . (13)
u∈U

This kernel corresponds to the one of a sum of independent random fields with same
individual distributions as the Z (u) (u ∈ U ). In addition, projectors of the form
On ANOVA Decompositions of Kernels and Gaussian … 323

πU1 ,πU
2 (U1 , U2 ⊂ P(I )) can be combined (e.g. by sums or convex combinations)
in order to generate a large class of s.p.d. kernels, as illustrated here and in Sect. 6.
Example 4 Let us consider A = {∅, {1}, {2}, . . . , {d}} and O, the complement of A
in P(I ). While A corresponds to the constant and main effects forming the addi-
tive component in the FANOVA decomposition, O corresponds to all higher-order
terms, referred to as ortho-additive component in [21]. Taking π A k = (T A ⊗ T A )k
amounts to extracting the additive component of k with cross-covariances between
the various
 main effects (including the constant); see Fig. 1(c). On the other hand,
π A
k = u∈A πu k retains these main effects without their possible cross-covariances;
see Fig. 1(b). In the next theorem (proven in [21]), analytical formulae are given for
π A k and related terms for the class of tensor product kernels.
d
Theorem 3 Let Di = [ai , bi ] (ai < bi ) and k = i=1 ki , where the ki are s.p.d.
kernels on Di such that ki (xi , yi ) > 0 for all xi , yi ∈ Di . Then, the additive and
ortho-additive components of k with their cross-covariances are given by
d


a(x)a(y) ki (xi , yi ) E i (xi )E i (yi )
(π A k)(x, y) = +E · −
E Ei Ei2
i=1


d
k j (x j , y j )
(TO ⊗ T A k)(x, y) = (T A ⊗ TO k)(y, x) = E(x) · 1 − d + − (π A k)(x, y)
E j (x j )
j=1

(π O k)(x, y) = k(x, y) − (T A ⊗ TO k)(x, y) − (TO ⊗ T A k)(x, y) − (π A k)(x, y)

b d b
where E i (xi ) = ai i ki (xi , yi ) dyi , E(x) = i=1 E i (xi ), Ei = ai i E i (xi )νi (dxi ),
d d Ei (xi ) 
E = i=1 Ei , and a(x) = E 1 − d + i=1 Ei
.

6 Numerical Experiments

We consider 30-dimensional numerical experiments where we compare the predic-


tion abilities of sparse kernels obtained from the KANOVA decomposition of

k(x, y) = exp(−||x − y||2 ), x, y ∈ [0, 1]30 . (14)

As detailed in the previous sections, k can be expanded as a sum of 430 terms,


and sparsified versions of k can be obtained by projections such as in Example 4.
We will focus hereafter on eight sub-kernels (all summations are over u, v ⊆ I ):

kfull = k k A
= |u|≤1 πu k
  
k A = |u|≤1 |v|≤1 (Tu ⊗ Tv )k kinter = |u|≤2 πu k

kdiag = πu k k A
+O = k A
+ π O k
k A+O
= k A + (kdiag − k A
) ksparse = (π∅ + π{1} + π{2} + π{2,3} + π{4,5} )k.
(15)
324 D. Ginsbourger et al.

(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 1 Schematic representations of a reference kernel k and various projections or sums of pro-
jections. The expressions of these kernels are detailed in Sect. 6 (Eq. 15)

A schematic representation of these kernels can be found in Fig. 1. Note that the
tensor product structure of k allows to use Theorem 3 in order to get more tractable
expressions for all kernels above. Furthermore, the integrals appearing in the E i and
Ei terms can be calculated analytically as detailed in appendix.
We now compare kriging predictions based on paths simulated from centred GRFs,
selecting any combination of two of the kernels in Fig. 1 and using one for simulation
(“generating kernel”) and one for prediction (“prediction kernel”). Each prediction
is performed at n test = 200 locations based on observations of an individual path
at n train = 500 locations. We judge the performance of the prediction by averaging
over n path = 200 sample paths for each combination of kernels. Whenever the kernel
used for prediction is not the same as the one used for simulation, a Gaussian obser-
vation noise with variance τ 2 is assumed in the models used in prediction, where
τ 2 is chosen so as to reflect the part of variance that cannot be approximated by the
model. For simplicity, only one n train -point training set and one n test -point test set
are considered for the whole experiment. For both, design points are chosen by maxi-
mizing the minimal interpoint distance among random Latin hypercube designs [28]
using DiceDesign [7, 11]. For each path y ( = 1, . . . , n path ), the criterion used for
quantifying prediction accuracy is:
n test
(y ,i − ŷ ,i )2
C = 1 − i=1n test 2 (16)
i=1 y ,i

where y ,i and ŷ ,i are the actual and predicted values of the th path at the i th
test point. While C = 1 means a null prediction error, C = 0 means that ŷ
predicts as badly as the null function. Average values of C are summarized in
On ANOVA Decompositions of Kernels and Gaussian … 325

Table 1 Average values of C over the n path = 200 replications


kfull kdiag k A
+O k A+O
kinter k A
kA ksparse
Z full 0.06 0.05 0.06 0.05 0.05 0.03 0.04 0.01
Z diag 0.05 0.05 0.05 0.05 0.04 0.03 0.03 0.01
Z A
+O 0.05 0.04 0.05 0.04 0.04 0.03 0.03 0.01
Z A+O
0.06 0.06 0.06 0.06 0.05 0.04 0.04 0.01
Z inter 0.33 0.37 0.34 0.37 0.7 0.28 0.28 0.07
Z A
0.67 0.76 0.71 0.75 0.96 1 1 0.2
ZA 0.69 0.77 0.71 0.77 0.96 1 1 0.18
Z sparse 0.75 0.83 0.8 0.78 0.95 0.9 0.9 1
Mean 0.33 0.37 0.35 0.36 0.47 0.41 0.42 0.19
Rows correspond to generating GRF models (characterized by generating kernels) while columns
correspond to prediction kernels. The four last rows of the kinter column are in bold blue to highlight
the superior performances of that prediction kernel when the class of generating GRF models is as
sparse or sparser than Z inter

Table 1 for all couples of generating versus prediction kernel. Note that Table 1 was
slightly perturbed but the conclusions unchanged when replicating the training and
test designs.
First, this example illustrates that, unless the correlation range is increased, pre-
dicting a GRF based on 500 points in dimension 30 is hopeless when the generating
kernel is full or close to full (first four rows of Table 1) no matter what prediction
kernel is chosen. However, for GRFs with a sparser generating kernel, prediction
performances are strongly increased (last four rows of Table 1).
Second, still focusing on the four last lines of Table 1, kinter seems to offer a
nice compromise as it works much better than other prediction kernels on Z inter and
achieves very good performances on sample paths of sparser GRFs. Besides this, it
is not doing notably worse than the best prediction kernels on rows 1–4.
Third, neglecting cross-correlations has very little or no influence on the results,
so that the Gaussian kernel appears to have a structure relatively close to what we
refer to as “diagonal” (diag) here. This point remains to be studied analytically.

7 Conclusion and Perspectives

We have proposed an ANOVA decomposition of kernels (KANOVA), and shown


how KANOVA governs the probability distribution of FANOVA effects of Gaussian
random field paths. This has enabled us in turn to establish that ANOVA kernels
correspond to centred Gaussian random fields (GRFs) with independent FANOVA
effects, to make progress towards the distribution of Sobol’ indices of GRFs, and
also to suggest a number of operations for making new symmetric positive definite
kernels from existing ones. Particular cases include the derivation of additive and
326 D. Ginsbourger et al.

ortho-additive kernels extracted from tensor product kernels, for which a closed form
formula was given. Besides this, a 30-dimensional numerical experiment supports
our claim that KANOVA may be a useful approach to designing kernels for high-
dimensional kriging, as the performances of the interaction kernel suggest. Perspec-
tives include analytically calculating the norm of terms appearing in the KANOVA
decomposition to better understand the structure of common GRF models. From a
practical point of view, a next challenge will be to parametrize decomposed kernels
adequately so as to recover from data which terms of the FANOVA decomposition
are dominating and to automatically design adapted kernels from this.

Acknowledgments The authors would like to thank Dario Azzimonti for proofreading, as well as
the editors and an anonymous referee for their valuable comments and suggestions.

Proofs

Theorem 1 (a) The first part and the concrete solution (6) follow directly from the
corresponding statements in Sect. 2. Having established (6), it is easily seen that
[Tu ⊗ Tv ]k = Tu(1) Tv(2) k coincides with ku,v .
(b) Under these conditions Mercer’s theorem applies (see [34] for an overview and
recent extensions). So there exist a non-negative sequence (λi )i∈N\{0} , and continuous
representatives (φi )i∈N\{0} of an orthonormal basis of L2 (ν) such that k(x, y) =
 ∞
i=1 λi φi (x)φi (y), x, y ∈ D, where the convergence is absolute and uniform.
Noting that Tu , Tv are also bounded as operators on continuous functions, applying
Tu(1) Tv(2) from above yields that

 ∞

αu αv ku,v (x, y) = λi ψi (x)ψi (y), (17)
u⊆I v⊆I i=1


where ψi = u⊆I αu (Tu φi ). Thus the considered function is indeed s.p.d. 
d
Corollary 1 Expand the product l=1 (1 + kl(0) (xl , yl )) and conclude by unique-
 (0)
ness of the KANOVA decomposition, noting that
 l∈u kl (xl , yl )νi (dx i ) =
(0)
l∈u kl (xl , yl )ν j (dy j ) = 0 for any u ⊆ I and any i, j ∈ u. 
Theorem 2 Sample path continuity implies product-measurability of Z and Z (u) ,
respectively, as can be shown by an approximation argument; see e.g. Prop. A.D.
in [31]. Due to Theorem 3 in [35], the covariance kernel k is continuous, hence
D E|Z | ν
x −u (dx )
−u ≤ ( D k(x, x) ν−u (dx−u )) 1/2
< ∞ for any u ⊆ I and by
Cauchy–Schwarz D D E|Z x Z y | ν−u (dx−u )ν−v (dy−v ) < ∞ for any u, v ⊆ I .
Replacing f by Z in Formula (2), taking expectations and using Fubini’s theorem
yields that Z (u) is centred again. Combining (2), Fubini’s theorem, and (6) yields
On ANOVA Decompositions of Kernels and Gaussian … 327

Cov(Z x(u) , Z y(v) )



Cov(Z x ,Z y ) ν−u (dx−u ) ν−v (dy−v )

   
  
= (−1)|u|+|v|−|u |−|v | Cov Z x ν−u (dx−u ), Z y ν−v (dy−v )
u ⊆u v ⊆v

= [Tu ⊗ Tv ]k (x, y).

(18)
It remains to show the joint Gaussianity of the Z (u) . First note that Cb (D, Rr ) is a
separable Banach space for r ∈ N \ {0}. We may and do interprete Z as a random
element of Cb (D), equipped with the σ -algebra B D generated by the evaluation
maps [Cb (D)  f → f (x) ∈ R]. By Theorem 2 in [25] the distribution PZ −1
of Z is a Gaussian measure on Cb (D), B(Cb (D)) . Since Tu is a bounded linear
operator Cb (D) → Cb (D), we obtain immediately that the “combined operator”
d
T : Cb (D) → Cb (D, R2 ), defined by (T( f ))(x) = (Tu f (x))u⊆I , is also bounded
and linear. Corollary 3.7 of [36] yields that the image measure (PZ −1 )T−1 is a
d
Gaussian measure on Cb (D, R2 ). This means that for every bounded linear operator
 : Cb (D, R2 ) → R the image measure ((PZ −1 )T−1 )−1 is a univariate normal
d

distribution, i.e. (TZ ) is a Gaussian random variable. Thus,


n for ∈ N, x(i) ∈ D
 all n (u)
(u)
and ai ∈ R, where 1 ≤ i ≤ n, u ⊆ I , we obtain that i=1 u⊆I ai (Tu Z )x(i) is
Gaussian by the fact that [Cb (D)  f → f (x) ∈ R] is continuous (and linear) for
every x ∈ D. We conclude that TZ = (Z x(u) , u ⊆ I )x∈D is a vector-valued GRF. 
Corollary 2 (a) If (i) holds, [Tu ⊗ Tu ]k = Tu(2) (Tu(1) k) = 0 by (Tu(1) k)(•, y) =
Tu (k(•, y)); thus (ii) holds. (ii) trivially implies (iii). Statement (iii) means that
Var(Z x(u) ) = 0, which implies that Z x(u) = 0 a.s., since Z (u) is centred. (iv) fol-
lows by noting that P(Z x(u) = 0) = 1 for all x ∈ D implies P(Z (u) = 0) = 1 by the
fact that Z (u) has continuous sample paths and is therefore separable. Finally, (iv)
implies (i) because Tu (k(•, y)) = Cov(Z •(u) , Z y ) = 0; see (18) for the first equality.
(b) For any m, n ∈ N and x1 , . . . , xm , y1 , . . . , yn ∈ D we obtain by Theorem 2
that Z x(u)
1
, . . . , Z x(u)
m
, Z y(v)
1
, . . . , Z y(v)
n
are jointly normally distributed. Statement (i) is
equivalent to saying that Cov(Z x , Z y(v) ) = 0 for all x, y ∈ D. Thus (Z x(u)
(u)
1
, . . . , Z x(u)
m
)
(v) (v)
and (Z y1 , . . . , Z yn ) are independent. Since the sets

{( f, g) ∈ R D × R D : ( f (x1 ), . . . , f (xm )) ∈ A, (g(y1 ), . . . , g(yn )) ∈ B} (19)

with m, n ∈ N, x1 , . . . , xm , y1 , . . . , yn ∈ D, A ∈ B(Rm ), B ∈ B(Rn ) generate


B D ⊗ B D (and the system of such sets is stable under intersections), statement (ii)
follows. The converse direction is straightforward. 
Corollary 3 By Remark∞ √ 2, there is a Gaussian white noise sequence ε = (εi )i∈N\{0}
such
∞ √ that Z x = i=1 λi εi φi (x) uniformly with probability 1. From Z x(u) =
i=1 λi εi Tu φi (x), we obtain Z (u) 2 = Q u (ε, ε) with Q u as defined
 in the state-
ment. A similar calculation for the denominator of Su (Z ) leads to v
=∅ Q v (ε, ε).

328 D. Ginsbourger et al.

Additional Examples

Here we give useful expressions to compute the KANOVA decomposition of some


tensor product kernels with respect to the uniform measure on [0, 1]d . For simplicity
we denote the 1-dimensional kernels on which they are based by k (corresponding
to the notation ki in Example 2). The uniform measure on [0, 1] is denoted by λ.

Example 5 (Exponential kernel) If k(x, y) = exp − |x−y|θ
, then:
1
• 0 k(., y)dλ = θ × [2 − k(0, y) − k(1, y)]
• [0,1]2 k(., .)d(λ ⊗ λ) = 2θ (1 − θ + θ e−1/θ )

Example 6 (Matérn kernel, ν = p + 21 ) Define for ν = p + 1


2
( p ∈ N):

p−i

p!  ( p + i)!
p
|x − y| |x − y|
k(x, y) = √ × exp − √ .
(2 p)! i=0 i!( p − i)! θ/ 8ν θ/ 2ν

Then, denoting ζ p = √θ , we have:


 


1
p! y 1−y
k(., y)dλ = ζ p × 2c p,0 − A p − Ap ,
0 (2 p)! ζp ζp
 p  −u  p−
where A p (u) =
=0 c p, u e with c p, = !1 i=0 ( p+i)!
i!
2 p−i . This generalizes
Example 5, corresponding to ν = 1/2. Also, this result can be √ written more explicitly
for the commonly selected value ν = 3/2 ( p = 1, ζ1 = θ/ 3):
 
• k(x, y) = 1 + |x−y|ζ1
exp − |x−y|
ζ1
1   
• 0 k(., y)dλ = ζ1 × 4 − A1 ζy1 − A1 1−y ζ1
with A1 (u) = (2 + u)e−u
 
• [0,1]2 k(., .)d(λ ⊗ λ) = 2ζ1 2 − 3ζ1 + (1 + 3ζ1 )e−1/ζ1

Similarly, for ν = 5/2 ( p = 2, ζ2 = θ/ 5):
 
• k(x, y) = 1 + |x−y| + 13 (x−y) exp − |x−y|
2

ζ2 (ζ2 )2 ζ2
1   
• 0 k(., y)dλ = 13 ζ2 × 16 − A2 ζy2 − A2 1−y ζ2
with A2 (u) = (8+5u+u 2 )e−u

• [0,1]2 k(., .)d(λ ⊗ λ) = 13 ζ2 (16 − 30 ζ2 ) + 23 (1 + 7 ζ2 + 15 (ζ2 )2 )e−1/ζ2

Example 7 (Gaussian kernel) If k(x, y) = exp − 21 (x−y)
2

θ 2 , then
1 √     
• 0 k(., y)dλ = θ 2π × Φ 1−y θ
+ Φ θy − 1
 √    
• [0,1]2 k(., .)d(λ ⊗ λ) = 2(e−1/(2θ ) − 1) + θ 2π × 2Φ θ1 − 1
2

where Φ denotes the cdf of the standard normal distribution.


On ANOVA Decompositions of Kernels and Gaussian … 329

References

1. Adler, R., Taylor, J.: Random Fields and Geometry. Springer, Boston (2007)
2. Antoniadis, A.: Analysis of variance on function spaces. Statistics 15, 59–71 (1984)
3. Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Sta-
tistics. Kluwer Academic Publishers, Boston (2004)
4. Chastaing, G., Le Gratiet, L.: ANOVA decomposition of conditional Gaussian processes for
sensitivity analysis with dependent inputs. J. Stat. Comput. Simul. 85(11), 2164–2186 (2015)
5. Durrande, N., Ginsbourger, D., Roustant, O.: Additive covariance kernels for high-dimensional
Gaussian process modeling. Ann. Fac. Sci. Toulous. Math. 21, 481–499 (2012)
6. Durrande, N., Ginsbourger, D., Roustant, O., Carraro, L.: ANOVA kernels and RKHS of zero
mean functions for model-based sensitivity analysis. J. Multivar. Anal. 115, 57–67 (2013)
7. Dupuy, D., Helbert, C., Franco, J.: DiceDesign and DiceEval: Two R packages for design and
analysis of computer experiments. J. Stat. Softw. 65(11): 1–38 (2015)
8. Duvenaud, D.: Automatic model construction with Gaussian processes. Ph.D. thesis, Depart-
ment of Engineering, University of Cambridge (2014)
9. Duvenaud, D., Nickisch, H., Rasmussen, C.: Additive Gaussian Processes. NIPS conference.
(2011)
10. Efron, B., Stein, C.: The jackknife estimate of variance. Ann. Stat. 9, 586–596 (1981)
11. Franco, J., Dupuy, D., Roustant, O., Damblin, G., Iooss, B.: DiceDesign: Designs of computer
experiments. R package version 1.7 (2015)
12. Gikhman, I.I., Skorokhod, A.V.: The theory of stochastic processes. Springer, Berlin (2004).
Translated from the Russian by S. Kotz, Reprint of the 1974 edition
13. Hoeffding, W.: A class of statistics with asymptotically normal distributions. Ann. Math. Stat.
19, 293–325 (1948)
14. Jan, B., Bect, J., Vazquez, E., Lefranc, P.: approche bayésienne pour l’estimation d’indices de
Sobol. In 45èmes Journées de Statistique - JdS 2013. Toulouse, France (2013)
15. Janon, A., Klein, T., Lagnoux, A., Nodet, M., Prieur, C.: Asymptotic Normality and Efficiency
of Two Sobol Index Estimators. Probability And Statistics, ESAIM (2013)
16. Kaufman, C., Sain, S.: Bayesian functional ANOVA modeling using Gaussian process prior
distributions. Bayesian Anal. 5, 123–150 (2010)
17. Krée, P.: Produits tensoriels complétés d’espaces de Hilbert. Séminaire Paul Krée Vol 1, No. 7
(1974–1975)
18. Kuelbs, J.: Expansions of vectors in a Banach space related to Gaussian measures. Proc. Am.
Math. Soc. 27(2), 364–370 (1971)
19. Kuo, F.Y., Sloan, I.H., Wasilkowski, G.W., Wozniakowski, H.: On decompositions of multi-
variate functions. Math. Comput. 79, 953–966 (2010)
20. Le Gratiet, L., Cannamela, C., Iooss, B.: A Bayesian approach for global sensitivity analysis
of (multi-fidelity) computer codes. SIAM/ASA J. Uncertain. Quantif. 2(1), 336–363 (2014)
21. Lenz, N.: Additivity and ortho-additivity in Gaussian random fields. Master’s thesis, Departe-
ment of Mathematics and Statistics, University of Bern (2013). http://hal.archives-ouvertes.fr/
hal-01063741
22. Marrel, A., Iooss, B., Laurent, B., Roustant, O.: Calculations of Sobol indices for the Gaussian
process metamodel. Reliab. Eng. Syst. Saf. 94, 742–751 (2009)
23. Muehlenstaedt, T., Roustant, O., Carraro, L., Kuhnt, S.: Data-driven Kriging models based on
FANOVA-decomposition. Stat. Comput. 22(3), 723–738 (2012)
24. Oakley, J., O’Hagan, A.: Probabilistic sensitivity analysis of complex models: a Bayesian
approach. J. R. Stat. Soc. 66, 751–769 (2004)
25. Rajput, B.S., Cambanis, S.: Gaussian processes and Gaussian measures. Ann. Math. Stat. 43,
1944–1952 (1972)
26. Rasmussen, C.R., Williams, C.K.I.: Gaussian Processes for Machine Learning. Cambridge,
MIT Press (2006)
27. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global sensitivity analysis: the primer. Wiley Online Library (2008)
330 D. Ginsbourger et al.

28. Santner, T., Williams, B., Notz, W.: The design and analysis of computer experiments. Springer,
New York (2003)
29. Sawa, T.: The exact moments of the least squares estimator for the autoregressive model. J.
Econom. 8(2), 159–172 (1978)
30. Scheuerer, M.: A comparison of models and methods for spatial interpolation in statistics and
numerical analysis. Ph.D. thesis, Georg-August-Universität Göttingen (2009)
31. Schuhmacher, D.: Distance estimates for poisson process approximations of dependent thin-
nings. Electron. J. Probab. 10(5), 165–201 (2005)
32. Sobol’, I.: Multidimensional Quadrature Formulas and Haar Functions. Nauka, Moscow
(1969). (In Russian)
33. Sobol’, I.: Global sensitivity indices for nonlinear mathematical models and their Monte Carlo
estimates. Math. Comput. Simul. 55(1–3), 271–280 (2001)
34. Steinwart, I., Scovel, C.: Mercer’s theorem on general domains: on the interaction between
measures, kernels, and RKHSs. Constr. Approx. 35(3), 363–417 (2012)
35. Talagrand, M.: Regularity of Gaussian processes. Acta Math. 159(1–2), 99–149 (1987)
36. Tarieladze, V., Vakhania, N.: Disintegration of Gaussian measures and average-case optimal
algorithms. J. Complex. 23(4–6), 851–866 (2007)
37. Touzani, S.: Response surface methods based on analysis of variance expansion for sensitivity
analysis. Ph.D. thesis, Université de Grenoble (2011)
38. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
39. Wahba, G.: Spline Models for Observational Data. Siam, Philadelphia (1990)
40. Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D.: Screening, pre-
dicting, and computer experiments. Technometrics 34, 15–25 (1992)
The Mean Square Quasi-Monte Carlo Error
for Digitally Shifted Digital Nets

Takashi Goda, Ryuichi Ohori, Kosuke Suzuki and Takehito Yoshiki

Abstract In this paper, we study randomized quasi-Monte Carlo (QMC) integration


using digitally shifted digital nets. We express the mean square QMC error of the nth
discrete approximation f n of a function f : [0, 1)s → R for digitally shifted digital
nets in terms of the Walsh coefficients of f . We then apply a bound on the Walsh co-
efficients for sufficiently smooth integrands to obtain a quality measure called Walsh
figure of merit for the root mean square error, which satisfies a Koksma–Hlawka
type inequality on the root mean square error. Through two types of experiments, we
confirm that our quality measure is of use for finding digital nets which show good
convergence behavior of the root mean square error for smooth integrands.

Keywords Randomized quasi-Monte Carlo · Digital shift · Digital net · Walsh


functions · Walsh figure of merit

T. Goda
Graduate School of Engineering, The University of Tokyo,
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
e-mail: goda@frcer.t.u-tokyo.ac.jp
R. Ohori
Fujitsu Laboratories Ltd., 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki,
Kanagawa 211-8588, Japan
e-mail: ohori.ryuichi@jp.fujitsu.com
K. Suzuki (B) · T. Yoshiki
School of Mathematics and Statistics, The University of New South Wales,
Sydney, NSW 2052, Australia
e-mail: kosuke.suzuki1@unsw.edu.au
K. Suzuki · T. Yoshiki
Graduate School of Mathematical Sciences, The University of Tokyo,
3-8-1 Komaba, Meguro-ku, Tokyo 153-8914, Japan
e-mail: takehito.yoshiki1@unsw.edu.au

© Springer International Publishing Switzerland 2016 331


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_16
332 T. Goda et al.

1 Introduction

Quasi-Monte Carlo (QMC) integration is one of the well-known methods for


high-dimensional numerical integration [5, 11]. Let P be a point set in the
s-dimensional unit cube [0, 1)s with finite cardinality |P|, and f : [0, 1)s → R a
Riemann integrable
 function. The QMC integration by P gives  an approximation
of I ( f ) := [0,1)s f (x) d x by the average IP ( f ) := |P|−1 x∈P f (x).
Let Zb = Z/bZ be the residue class ring modulo b, which is identified with the set
{0, . . . , b − 1}, and Zs×n
b the set of s × n matrices over Zb for a positive integer n. The
set Zs×n
b is an additive group with respect to the operation +, the usual summation
of matrices over Zb . As QMC point sets, we consider digital nets defined as follows.

Definition 1 Let m, n  be positive integers. Let 0 ≤ k ≤ bm − 1 be an integer with


m
b-adic expansion k = i=1 κi bi−1 . Let Ci ∈ Zn×m b . For 1 ≤ i ≤ s and 1 ≤ j ≤ n,
define yi, j,k ∈ Zb by (yi,1,k, , . . . , yi,n,k ) = Ci · (κ1 , . . . , κm ) . Then we define

yi,1,k yi,2,k yi,n,k


xi,k = + 2 + · · · + n ∈ [0, 1)
b b b

for 1 ≤ i ≤ s. In this way we obtain the k-th point x k = (x1,k , . . . , xs,k ). We call the
set P := {x 0 , . . . , x bm −1 } (P is considered as a multiset) a digital net over Zb with
precision n generated by C1 , . . . , Cs , or simply a digital net.

Recently, the discretization f n of a function f : [0, 1)s → R has been introduced


to analyze QMC integration in the framework of digital computation [9]. We define
the n-digit discretization f n : Zs×n
b → R by

1
f n (X ) := f (x) d x,
Vol(In (X )) In (X )

s n 
for X = (xi, j ) ∈ Zs×n b . Here In (X ) := i=1 [ xi, j b− j , nj=1 xi, j b− j + b−n ).
j=1 
We denote the true integral of f n by I ( f n ) := b−sn X ∈Zbs×n f n (X ), which indeed

equals I ( f ). Define a function ψ : Zs×n b → [0, 1)s by ψ(X ) := ( nj=1 xi, j · b− j )i=1s

for X = (xi, j ) ∈ Zb , where xi, j is considered to be an integer and the sum is taken
s×n

in R. Then it is easy to check that for any digital net P there exists a subgroup
P ⊂ Zs×nb such that P = ψ(P). Thus, in discretized setting, our main concern is
the case that P ⊂ Zs×n b is a subgroup. By abuse of terminology, a subgroup of Zs×n b
is also called a digital net in this paper.
In [9], Matsumoto, Saito and Matoba  treat the QMC integration of the n-th dis-
crete approximation I P ( f n ) := |P|−1 X ∈P f n (X ) for b = 2. They consider the dis-
cretized integration error Err( f n ; P) := I P ( f n ) − I ( f n ) instead of the usual integra-
tion error Err( f ; ψ(P)) := Iψ(P) ( f ) − I ( f ). The difference between them, which
is equal to Iψ(P) ( f ) − I P ( f n ), is called the discretization error and bounded by
sup X ∈Zbs×n , x∈In (X ) | f (x) − f n (X )|. If f is continuous with Lipschitz constant K ,

then the discretization error is bounded by K sb−n , which is negligibly small in
The Mean Square Quasi-Monte Carlo… 333

practice (say n = 30) [9, Lemma 2.1]. Hence, in this case, we have Err( f n ; P) ≈
Err( f ; ψ(P)), which is a part of their setting we adopt.
Assume that f : [0, 1)s → R is a function whose mixed partial derivatives up to
order n in each variable are continuous and P ⊂ Zs×n b is a subgroup. Matsumoto
et al. [9] proved the Koksma–Hlawka type inequality for Err( f n ; P);

|Err( f n ; P)| ≤ Cb,s,n || f ||n × WAFOM(P), (1)

where Cb,s,n is a constant independent of f and P and WAFOM(P) is the Walsh


figure of merit, a quantity which depends only on P and can be computed in O(sn|P|)
steps. || f ||n is the norm of f defined as in [4] (see also Sect. 4). More recently, this
result has been generalized by Suzuki [13] for digital nets over a finite abelian group.
WAFOM was suggested as a criterion for the quality of digital nets in [9]. The first
advantage of WAFOM is that the inequality (1) implies that if WAFOM(P) is small,
Err( f n ; P) can also be small. The second is that WAFOM is efficiently computable.
It means that we can find P with small WAFOM(P) by computer search. Numerical
experiments showed that by stochastic optimization we can find P with WAFOM(P)
small enough, and that such P performs well for a financial problem [9]. Moreover,
the existence of a low-WAFOM digital net P of size N has been proved in [10, 13]
such that WAFOM(P) ≤ N −C(log N )/s+D for positive constants C and D when
(log N )/s is large enough. Thus, a low-WAFOM digital net is asymptotically su-
perior to well-known low-discrepancy point sets for sufficiently smooth integrands.
In this paper, as a continuation of [9, 13], we discuss randomized QMC integration
using digitally shifted digital nets for the n-digit discretization f n . A digitally shifted
digital net P + σ ⊂ Zs×n b is defined as P + σ = {B + σ | B ∈ P} for a subgroup
P ⊂ Zs×n b and σ ∈ Zs×n
b . Here σ is chosen uniformly and randomly. Randomized
QMC integration by P + σ of the n-digit discretization f n gives the approximation
I P+σ ( f n ) of I ( f n ). By adding a random element σ, it becomes possible to obtain
some statistical estimate on the integration error. Such an estimate is not available
for deterministic digital nets.
We note that randomized QMC integration using digitally shifted digital nets has
already been studied in previous works, see for instance [1, 7] among many others,
where a digital shift σ is chosen from [0, 1)s and the QMC integration using P ⊕ σ
is considered to give the approximation of I ( f ). Here ⊕ denotes digitwise addition
modulo b applied componentwise. It is known that the estimator IP⊕σ ( f ) is an
unbiased estimator of I ( f ), so that the mean square QMC error for a function f
with respect to σ ∈ [0, 1)s equals the variance of the estimator.
In the n-digit discretized setting which we consider in this paper, it is also pos-
sible to show that the estimator I P+σ ( f n ) is an unbiased estimator of I ( f n ), so that
the mean square QMC error for a function f n with respect to σ ∈ Zs×n b equals the
variance of the estimator, see Proposition 2. For our case, where the discretization
error is negligible, we also have Var σ∈[0,1)s [Iψ(P)⊕σ ( f )] ≈ Var σ∈Zbs×n [Iψ(P+σ) ( f )] ≈
Var σ∈Zbs×n [I P+σ ( f n )].
The variance Var σ∈Zbs×n [Iψ(P+σ) ( f )] is for practical computation where each
real number in [0, 1) is represented as a finite-digit binary fraction. The estima-
334 T. Goda et al.

tor Iψ(P+σ) ( f ) of I ( f ) has so small a bias that the variance Var σ∈Zbs×n [Iψ(P+σ) ( f )] is
a good approximation of the mean square error Eσ∈Zbs×n [(Iψ(P+σ) ( f ) − I ( f ))2 ].
From the above justifications of the n-digit discretization for digitally shifted
point sets, we focus on analyzing the variance Var σ∈Zbs×n [I P+σ ( f n )] of the estimator
I P+σ ( f n ). As the main result of this paper, in Sect. 4 below, we give a Koksma–
Hlawka type inequality to bound the variance:

Var σ∈Zbs×n [I P+σ ( f n )] ≤ Cb,s,n f n W (P; μ), (2)

where Cb,s,n and f n are the same as in (1), μ denotes the Dick weight defined
later in Definition 3, and W (P; μ) is a quantity which depends only on P and can
be computed in O(sn|P|) steps. Thus, similarly to WAFOM(P), W (P; μ) can be a
useful measure for the quality of digital nets.
The remainder of this paper is organized as follows. We give some preliminaries
in Sect. 2. In Sect. 3, we consider the randomized QMC integration over Zs×n b . For
a function F : Zs×n
b → R, a subgroup P ⊂ Zs×n b and an element σ ∈ Zs×n b , we first
prove the unbiasedness of the estimator I P+σ (F) as mentioned above, and then that
the variance Var σ∈Zbs×n [I P+σ (F)] can be written in terms of the discrete Fourier coef-
ficients of F, see Theorem 2. In Sect. 4, we apply a bound on the Walsh coefficients
for sufficiently smooth functions to the variance Var σ∈Zbs×n [I P+σ ( f n )], and obtain a
quality measure W (P; μ) which satisfies a Koksma–Hlawka type inequality on the
root mean square error. By using the MacWilliams-type identity given in [13], we
give a computable formula for W (P; μ) in Sect. 5. Finally, in Sect. 6, we conduct
two types of experiments to show that our new quality measure is of use for finding
digital nets which show good convergence behavior of the root mean square error
for smooth integrands.

2 Preliminaries

Throughout this paper, we use the following notation. Let N be the set of positive
integers and N0 := N ∪ {0}. For a set S, we denote by |S| the cardinality
√ of S. For
z ∈ C, we denote by z the complex conjugate of z. Let ωb = exp(2π −1/b).
In the following, we recall the notion of the discrete Fourier transform and see
the correspondence of discrete Fourier coefficients to Walsh coefficients.
hg
define the pairing • as g • h := ωb . We also define the pairing
For g, h ∈ Zb , we 
on Zb as A • B := 1≤i≤s,1≤ j≤n ai j • bi j for A = (ai j ) and B = (bi j ) in Zs×n
s×n
b with
1 ≤ i ≤ s, 1 ≤ j ≤ n. We note the following properties used in this paper:

A • B = (A • B)−1 = (−A) • B and A • (B + C) = (A • B)(A • C).

We now define the discrete Fourier transform.


The Mean Square Quasi-Monte Carlo… 335

Definition 2 Let f : Zs×n


b → C. The discrete Fourier transform of f , denoteds×n
by

f : Zs×n
b  → C, is defined by 
f (A) = b −sn
B∈Zbs×n f (B)(A • B) for A ∈ Z
b .

Each value f (A) is called a discrete Fourier coefficient.

We assume that P ⊂ Zs×n b is a digital net. We define the dual net of P as



P := {A ∈ Zs×n b | A • B = 1 for all B ∈ P}. Several important properties of the
discrete Fourier transform are summarized below (for a proof, see [13] for example).

Lemma 1 We have
 bsn if B = 0,
A•B =
0 if B = 0.
A∈Zbs×n

Theorem 1 (Poisson summation formula) Let f : Zs×n b → C be a function and



f : Zs×n
b → C its discrete Fourier transform. Then we have

1  

f (B) = f (A).
|P| B∈P ⊥ A∈P

Walsh functions and Walsh coefficients are widely used to analyze QMC in-
tegration using digital nets, and are defined as follows. Let f : [0, 1)s → R and
k = (k1 , . . . , ks ) ∈ Ns0 . We define the k-th Walsh function wal k by

s 
βi, j κi, j
wal k (x) := ωb j≥1
,
i=1


where for 1 ≤ i ≤ s, we write the b-adic expansion of ki by ki = j≥1 κi, j b j−1

and xi by xi = j≥1 βi, j b− j , where for each i, infinitely many of the digits βi, j are
different from b − 1. By using Walsh functions, we define the k-th Walsh coefficient
F ( f )(k);

F ( f )(k) := f (x) · wal k (x) d x.
[0,1)s

We refer to [5, Appendix A] for general information on Walsh functions. We de-


note the kth Walsh coefficient of f by F ( f )(k), while it is denoted by  f (k)
in [5, Appendix A]. The relationship between Walsh coefficients and discrete
Fourier coefficients is stated in the following proposition (for a proof, see [13,
nLet A = j−1 (ai, j ) ∈ Zs×n
b . We define the function φ : Zb → Ns0 by
s×n
Lemma 2]).
φ(A) := ( j=1 ai, j · b )i=1 for A = (ai, j ) ∈ Zb . Note that each element of
s s×n

φ(A) is strictly less than bn .


Proposition 1 Let A = (ai, j ) ∈ Zs×n
b and assume that f : [0, 1)s → R is integrable.
Then we have
F ( f )(φ(A)) = f n (A).
336 T. Goda et al.

3 Mean Square Error with Respect to Digital Shifts

Let P ⊂ Zs×n be a subset and F : Zs×n → R a real-valued function. Then QMC


−1 
b b
integration by P is an approximation
 I P (F) := |P| B∈P F(B) of the actual
average value I (F) := b−sn B∈Zbs×n F(B) of F over Zs×n b .
b , we define the digitally shifted point set P + σ by P + σ = {B + σ |
For σ ∈ Zs×n
B ∈ P}. We consider the mean and the variance of the estimator I P+σ (F) for digitally
shifted point sets of P ⊂ Zs×n
b .
First we consider the average Eσ∈Zbs×n [I P+σ (F)]. We have

  1  1  −sn 
b−sn I P+σ (F) = b−sn F(B + σ) = b F(B + σ)
|P| |P|
σ∈Zbs×n σ∈Zbs×n B∈P B∈P σ∈Zbs×n
1 
= I (F) = I (F),
|P|
B∈P

and thus we have the following proposition, showing that randomized QMC integra-
tion using a digitally shifted point set P + σ gives an unbiased estimator I P+σ (F)
of I (F).

Proposition 2 For an arbitrary subset P ⊂ Zs×n


b , we have

Eσ∈Zbs×n [I P+σ (F)] = I (F).

It follows from this proposition that the mean square QMC error equals the vari-
ance Var σ∈Zbs×n [I P+σ (F)], namely we have

Eσ∈Zbs×n [(I P+σ (F) − I (F))2 ] = Var σ∈Zbs×n [I P+σ (F)].

Hereafter we assume that P ⊂ Zs×n


b is a subgroup of Zs×n
b .

Lemma 2 Let P ⊂ Zs×n


b be a subgroup. Then we have

I P+σ (F) = 
(A • σ)−1 F(A).
A∈P ⊥

Proof Let Fσ (B) := F(B + σ). Then for A ∈ Zs×n


b , we can calculate Fσ (A) as



Fσ (A) = b−sn Fσ (B)(A • B)
B∈Zbs×n

= (A • (−σ))b−sn F(B + σ)(A • (B + σ))
B∈Zbs×n

= (A • σ)−1 F(A),
The Mean Square Quasi-Monte Carlo… 337


where we use the definition of F(A) in the last equality. Thus by Theorem 1 we have

1  



I P+σ (F) = Fσ (B) = Fσ (A) = (A • σ)−1 F(A),
|P| B∈P ⊥ ⊥
A∈P A∈P

which proves the result. 

By Proposition 2 and Lemma 2, we have



Var σ∈Zbs×n [I P+σ (F)] := b−sn (I P+σ (F) − Eσ∈Zbs×n [I P+σ (F)])2
σ∈Zbs×n

= b−sn |I P+σ (F) − I (F)|2
σ∈Zbs×n
2

 
=b −sn (A • σ) F(A)
−1 


σ∈Zbs×n A∈P ⊥ \{0}

  
= b−sn 
(−A • σ) F(A)  )
(−A • σ) F(A
σ∈Zbs×n A∈P ⊥ \{0} A ∈P ⊥ \{0}
  
= b−sn  F(A
F(A)  ) ((A − A) • σ)
A∈P ⊥ \{0} A ∈P ⊥ \{0} σ∈Zbs×n

= F(A)
 2 ,
A∈P ⊥ \{0}

where the last equality follows from Lemma 1. Now we proved:

Theorem 2 Let P ⊂ Zs×n


b be a subgroup. Then we have

Var σ∈Zbs×n [I P+σ (F)] = F(A)
 2 .
A∈P ⊥ \{0}

In particular, we immediately obtain the following corollary for the most important
case.
Corollary 1 Let P ⊂ Zs×n b be a subgroup, i.e., a digital net over Zb , and f n be the
n-digit discretization of f : [0, 1)s → R. Then we have
 2
Var σ∈Zbs×n [I P+σ ( f n )] = f n (A) .
A∈P ⊥ \{0}

Our results obtained in this section can be regarded as the discretized version of
known results [1, 7].
338 T. Goda et al.

4 WAFOM for the Root Mean Square Error

In the previous section, we obtained that the mean square QMC error is equal to
a certain sum of the squared discrete Fourier coefficients, and thus we would like
to bound the value |  f n (A)|. By Proposition 1, it is sufficient to bound the Walsh
coefficients of f , and several types of upper bounds on the Walsh coefficients are
already known. In order to introduce bounds on the Walsh coefficients proved by
Dick [2, 3, 5], we define the Dick weight.
Definition 3 Let A = (ai, j ) ∈ Zs×n
b . The Dick weight μ : Zb
s×n
→ N0 is defined as

μ(A) := j × δ(ai, j ),
1≤i≤s
1≤ j≤n

where δ : Zb → {0, 1} is defined as δ(a) = 0 for a = 0 and δ(a) = 1 for a = 0.


Here we consider functions f whose mixed partial derivatives up to order α ∈ N,
α > 1, in each variable are continuous. In [2, 3], Dick proved upper bounds on
Walsh coefficients for these functions. By letting α = n, we have the following, see
also [4].
Lemma 3 (Dick) There exists a constant Cb,s,n depending only on b, s and n such
that for any n-smooth function f : [0, 1)s → R and any A ∈ Zs×n
b it holds that

f n (A) ≤ Cb,s,n f n · b−μ(A) , (3)

where f n denotes the norm of f for a Sobolev space, which is defined as


⎛ ⎞1/2
    2

f n := ⎝ f (τ S\u ,nu ) (x) d x S\u d x u ⎠ ,
|u|

u⊆S τ S\u ∈{0,...,n−1}s−|u| [0,1] [0,1]s−|u|

where we used the following notation: Let S := {1, . . . , s}, x = (x1 , . . . , xs ), and for
u ⊆ S let x u = (x j ) j∈u . (τ S\u , nu ) denotes a sequence (ν j ) j with ν j = n for j ∈ u
/ u. Moreover, we write f (n 1 ,...,n s ) = ∂ n 1 +···+n s f /∂x1n 1 · · · ∂xsn s .
and ν j = τ j for j ∈
Another upper bound on the Walsh coefficients of f has been shown by Yoshiki
[14] for b = 2. Applying Proposition 1, we also have the following;
Lemma 4 (Yoshiki) Let f : [0, 1]s → R and define Ni := |{ j = 1, . . . , n | ai, j =
0}| and N := (Ni )1≤i≤s ⊂ Ns0 for A = (ai, j ) ∈ Zs×n
2 . If the Nth mixed partial deriv-
(N) N1 +···+Ns N1
ative f =∂ f /∂x1 · · · ∂xs of f exists and is continuous, then we have
Ns

 
f n (A) ≤  f (N) ∞ · 2−(μ(A)+h(A)) , (4)

where h(A) := i, j δ(ai, j ) is the Hamming weight and · ∞ the supremum norm.
The Mean Square Quasi-Monte Carlo… 339

Generally speaking, we cannot prove an inequality between f (N) ∞ and f n .


But it happens that f n is much larger than f (N) ∞ since f n is the summa-
tion of sn positive terms for large n. For example, when s = 1 and f = exp(−x),
f (N ) ∞ = 1 while f n = ((n + 1)(1 − e−1 )2 + (1 − e−2 )/2)1/2 . In this case, if
we take n large enough, f (N ) ∞ / f n goes to 0. In this way, f (N) ∞ tends to be
small compared with f n .
Similar to [9] and [13], we define a kind of figure of merit corresponding to these
bounds on Walsh coefficients. Since Yoshiki’s bound (4) tends to be tighter than
Dick’s bound (3), we use the figure of merit obtained by Yoshiki’s bound in the
experiment in the last section.
Definition 4 (Walsh figure of merit for the root mean square error) Let s, n be
positive integers and P ⊂ Zs×n
b a subgroup. We define two Walsh figures of merit
for the root mean square error of P by
 
W (P; μ) := b−2μ(A) ,
A∈P ⊥ \{0}
 
W (P; μ + h) := b−2(μ(A)+h(A)) .
A∈P ⊥ \{0}

We have the following main result.


Theorem 3 (Koksma–Hlawka type inequalities for the root mean square error) For
an arbitrary subgroup P ⊂ Zs×n
b we have

Var σ∈Zbs×n [I P+σ ( f n )] ≤ Cb,s,n f n W (P; μ).

Moreover, if b = 2 then
⎛ ⎞
  (N) 
Var σ∈Z2s×n [I P+σ ( f n )] ≤ ⎝ max  f ∞ ⎠ W (P; μ + h)
0≤N≤n
N=0

holds where the condition for the maximum is denoted by a multi-index, i.e., the
maximum value is taken over N = (N1 , . . . , Ns ) such that 0 ≤ Ni ≤ n for all i and
Ni = 0 for some i.
Proof Since the proofs of these inequalities are almost identical, we only show the
latter. Apply Lemma
 4 to each term in the right-hand side of the result in Corollary 1.

For the factor  f (N) ∞ , note that N depends only on A, that A runs through all
non-zero elements of P ⊥ , and that Ni ≤ n for all i. Then we have
⎛ ⎞2
  (N) 
Var σ∈Zbs×n [I P+σ ( f n )] ≤ ⎝ max  f  ⎠ 2−2(μ(A)+h(A))
0≤N≤n ∞
A∈P ⊥ \{0} N=0
340 T. Goda et al.

and the result follows. 

5 Inversion Formula for W ( P; ν)

For A = (ai, j ) ∈ Zs×n


b , we consider a general weight ν : Zb
s×n
→ R given by

ν(A) = νi, j δ(ai, j ),
1≤i≤s
1≤ j≤n

where νi, j ∈ R for 1 ≤ i ≤ s, 1 ≤ j ≤ n. In this section, we give a practically com-


putable formula for  
W (P; ν) := b−2ν(A) .
A∈P ⊥ \{0}

Note that the Dick weight μ is given by νi, j = j and the Hamming weight h is given
by νi, j = 1. The key to the formula [9, (4.2)] for WAFOM is the discrete Fourier
transform. In order to obtain a formula for W (P; ν), we use a MacWilliams-type
identity [13], which is also based on the discrete Fourier transform.
Let X := {xi, j (l)} be a set of indeterminates for 1 ≤ i ≤ s, 1 ≤ j ≤ n, and l ∈ Zb .
The complete weight enumerator polynomial of P ⊥ , in a standard sense [8, Chap. 5],
is defined by 

GW P ⊥ (X ) := xi, j (ai, j ).
A∈P ⊥ 1≤i≤s
1≤ j≤n

Similarly, the complete weight enumerator polynomial of P is defined by




GW P∗ (X ∗ ) := xi,∗ j (bi, j ),
B∈P 1≤i≤s
1≤ j≤n

where B = (bi, j )1≤i≤s,1≤ j≤n and X ∗ := {xi,∗ j (g)} is a set of indeterminates for
1 ≤ i ≤ s, 1 ≤ j ≤ n, and g ∈ Zb . We define Y := {yi, j (g)} for 1 ≤ i ≤ s, 1 ≤ j ≤
n and g ∈ Z with
yi, j (0) = 1, yi, j (l) = b−2νi, j (l = 0).

Note that, by substituting Y into X for GW P ⊥ (X ), we have

GW P ⊥ (Y ) = W (P; ν)2 + 1.

By the MacWilliams-type identity for GW [13, Proposition 2], we have


The Mean Square Quasi-Monte Carlo… 341

1
GW P ⊥ (X ) = GW P∗ (Z ∗ ), (5)
|P|

where in the right hand side every xi,∗ j (g) ∈ X ∗ is substituted by z i,∗ j (g) ∈ Z ∗ , which
is defined by 
z i,∗ j (g) := (l • g)xi, j (l).
l∈Zb

By substituting Y into X for (5), we have the following result. Since the result
follows in the same way as in [13, Corollary 2], we omit the proof.

Theorem 4 Let P ⊂ Zs×n


b be a subgroup. Then we have


 1 

W (P; ν) = −1 + (1 + η(bi, j )b−2νi, j ),


 |P| B∈P 1≤i≤s
1≤ j≤n

where η(bi, j ) = b − 1 if bi, j = 0 and η(bi, j ) = −1 if bi, j = 0.

In particular, we can compute W (P; μ) and W (P; μ + h) as follows.


Corollary 2 Let P ⊂ Zs×n
b be a subgroup. Then we have


 1 

W (P; μ) = −1 + (1 + η(bi, j )b−2 j ),


 |P| B∈P 1≤i≤s
1≤ j≤n


 1 

W (P; μ + h) = −1 + (1 + η(bi, j )b−2( j+1) ),


 |P| B∈P 1≤i≤s
1≤ j≤n

where η(bi, j ) = b − 1 if bi, j = 0 and η(bi, j ) = −1 if bi, j = 0.


While computing WAFOM by definition needs an iteration through P ⊥ , Theo-
rem 4 and Corollary 2 give it by iterating over P. For QMC, the size |P| cannot
exceed a reasonable number of computer operations opposed to huge |P ⊥ |, and thus
Theorem 4 and Corollary 2 are useful in many cases.
We use the figure of merit W (P; μ + h) obtained by Yoshiki’s bound (4) in the
experiment of the next section.

6 Numerical Experiments

To show that W works as a useful bound on the root mean square error we conduct
two types of experiments. The first one is to generate many point sets at random, and
342 T. Goda et al.

to observe the distribution of the criterion W and the standard deviation E . The other
one is to search for low-W point sets and to compare with digital nets consisting of
the first terms of a known low-discrepancy sequence.
In this section we consider only the case b = 2. The dimension of a digital net P
as a subvector space of Zs×n 2 is denoted by m, i.e., |P| = 2m . We set s = 4, 12 and
use the following eight test functions for x = (xi )1≤i≤s :

Polynomial f 0 (x) = ( i xi )6 ,
Exponential f j (x) = exp(a  i xi ) (a = 2/3 for j = 1 and a = 3/2 for j = 2),
Oscillatory f 3 (x) = cos( i xi ),
Gaussian f 4 (x) = exp( i xi2 ),
Product peak f 5 (x) = i (xi2 + 1)−1 ,
Continuous f 6 (x) = i T (xi ) where T (x) = mink∈Z |3x − 2k|,
Discontinuous f 7 (x) = i C(xi ) where C(x) = (−1)3x .
Assuming that the discretization error is negligible, we have that Iψ(P+σ) ( f ) is a
practically unbiased estimator of I ( f ). Thus we may say that if the standard devi-
ation E ( f ; P) := Var σ∈Z2s×n [Iψ(P+σ) ( f )] of the quasi-Monte Carlo integration is

small then the root mean square error Eσ∈Z2s×n [(Iψ(P+σ) ( f ) − I ( f ))2 ] is as small as
E (f ; P). From the same assumption we also have that E ( f ; P) is well approximated
by Var σ∈Z2s×n [I P+σ ( f n )], on which we have a bound in Theorem 3.
In this section we implicitly use the weight μ + h so W (P) denotes W (P; μ + h).
The aim of the experiments is to establish that if W (P) is small then so is E ( f ; P).
For this we  compute W by the inversion formula in Corollary 2 and approximate
E ( f ; P) = Var σ∈Z2s×n [Iψ(P+σ) ( f )] by sampling 210 digital shifts σ ∈ Zs×n
2 uni-
formly, randomly and independently of each other. We shall observe both the criterion
W and the variance E in binary logarithm, which is denoted by lg.

6.1 The Distribution of (W , E )

In this experiment we set m = 10, 12 and n = 32, generate point sets P, compute
W (P), approximate E ( f ; P) for test functions f and observe (W , E ). We generate
1000 point sets P by random and uniform choice of generating matrices C1 , . . . , Cs
from the set (Zn×m2 )s .
For each (s, m, f ) we calculate the correlation coefficient between W (P) and
E ( f ; P) log-scaled, obtaining the result as in Table 1. For typical distributions of
(W (P), E ( f ; P)) for smooth, continuous nondifferentiable and discontinuous func-
tions we refer the readers to Figs. 1, 2, 3 and 4. We observe that there are very
high correlations (the correlation coefficient is larger than 0.85) between W (P) and
E ( f ; P) if f is smooth. Though f 6 is a nondifferentiable function we have moderate
correlation coefficients around 0.35. However, for the discontinuous function f 7 it
seems we can do almost nothing for the root mean square error through W (P).
The Mean Square Quasi-Monte Carlo… 343

Table 1 The correlation coefficient between lg W (P) and lg E ( f ; P)


s 4 4 12 12
m 10 12 10 12
f0 0.9861 0.9920 0.9821 0.9776
f1 0.9907 0.9901 0.9842 0.9866
f2 0.9897 0.9887 0.9821 0.9851
f3 0.9794 0.9818 0.8900 0.8916
f4 0.9723 0.9599 0.9975 0.9951
f5 0.9421 0.9144 0.9912 0.9839
f6 0.3976 0.3218 0.4077 0.3258
f7 0.0220 0.0102 0.0208 0.0171

Fig. 1 s = 4 and m = 10. −2 +


The integrand is the −3 +
oscillatory function
 −4 +
f 3 (x) = cos( i xi ) −5 + +
−6 + +
−7 +++++
+
+
+
−8
++
+++ +
+
lg E −9 +
+
++
+ ++
++
++
+
+ +
−10 +
+
+
+
++++
+++
+
+
++ +
+
+ +
+
++
++++
+
−11
+
+
+
+
+
++
+
+
+
+
+
+ +
+
+
+++
+
+
+
+
+++
++
+
++++
−12 +
+
+
++
+
++
+
+
+++
+
+
+
−13 +
++
+
+
++
+
+
+
+
+
++
+
+
+
+++
+
+
+
+
+
+
+
++
++
+
+
+++
+
+
+
+
+
++
+
++
+
++
+
++
++
−14 +
++
++
+
+
+
+
++
+
+
+
+
++
++
++
+
+
−15 ++
+ +
+
+++
+
−16
−15 −14−13−12−11−10 −9 −8 −7 −6 −5 −4 −3 −2 −1
lg W

Fig. 2 s = 12 and m = 12. −6


The integrand is the product +
−7
peak function
 +
f 5 (x) = i (xi2 + 1)−1 −8
−9 +
++
+
−10 +++
+ +
+
+ +++
lg E −11 + ++
+
+
+++
+
+++
+
+
++
++ +
+
−12 + +
+
+
+
++
+
+
++
+
+
+
+
+
+
+
+++
+
+
+
+
++
+
+
++
+++
++ +
+
++
+
+
++
+
−13 +++
+++
++
+
++
+
+
+
+
+
+
+
+
+
+
+
+
++
+++++++
+
++
+
+
+
++
+
++
+
+
+
+
+
+
+
+ +
++
+
++
+
+
++
+
+
++
+
++
+
−14 ++
+
++
+
++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
++
+
++
++
+
+
++
+
+
++
++
+
+++
+
+
+
++
+
++ + +
−15 +
+
+
++
+
+
++
++
−16
−10 −9 −8 −7 −6 −5 −4 −3 −2 −1
lg W
344 T. Goda et al.

Fig. 3 s = 12 and m = 10. −13 +


+
+
+
The integrand is the
continuous nondifferentiable +
 +++
function f 6 (x) = i T (xi ) +
+ +
+
+ +
where +++++
++ ++ + +
−14 + + +
T (x) = mink∈Z |3x − 2k| +
+ ++++ + ++
+
+
+
++
+
++
+
+
+
+
+
++
+++ ++ +
+ +
+
+ ++ ++ +
+++ ++
++
+
++
++
+
+
+
+
+
+++
+
+
+++
+
+
+
+
+
+
+
++
+++ +
+
+ +
++
+
+ + +++
+
++
++
+
+++++
+
++
++++
+++ + ++
++
+
+ +
+ +
lg E +++++
+
+
+
+
+
+
+
++
+
+
++
+
++
+
+
+
++
++
+
+
+
+++
+
+
+
+
+
+
+
+
+
++
+
+
+
+++
++
+
++++
++ +
+
+++
++
++
++
++
+
+
++
++
+
+
+
+++
++
+
++
+
++
+
++
++
++
++ ++
+
+
+
++
+
++ +
+
+
++
+++
+++
++
+
++
++
+
++
++
++
+++
+
++
++
++++
++
+ ++
++
+
++ ++
++
+
+
++
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
++
+
+
+
+++++
++++
+
+
++ +
+ +
+
+
+
+
+++
++
+
+ +
+
+
++
++
+
++
+
++
+
++
+
+
++
++
+
++
++
++
++
+
+
+++
+ +++ +
+
−15 +
+++
+ +
++
++++ +
++
++
+
++
++
++
++
+
++ ++
++
++++ ++++

−16
−8 −7 −6 −5 −4 −3 −2 −1
lg W

Fig. 4 s = 4 and m = 12. −3


The integrand is the + +
+
discontinuous
 function
f 7 (x) = i C(xi ) where −4 +
+ + + +
+++
+ + + +
+
C(x) = (−1)3x
+ +
−5 + ++++ + +
+
+
++ ++ + +++++++ +
+
+ ++
+++
+ +
+ + + +
lg E +
+ + + + ++
+ + ++ ++ + +
+
++++
+
+ ++
+ ++ +
+ + ++ + + + + +
−6 ++ +++
++ ++ +
++ +++
+ +
+ +++
+
+++
+ + +
+ ++
+ + +
+ +
+ +++
+++
++
+
+ +
++
+
++
+ +
+
+++ +
+
+
++
++++
+
+ +
+ +
+
+ +
+++ + + +
+ + + + +++ ++
+
++
+++
+
+
++
+
++
+
+
+
+
++
++
+++
+
+
+
++
+++
+
+
+
+
+
++ +
++++
+ +++ ++
+
+
+
+ +
+++
++++
+
++ +
++
+
++
+++
+
++
+
++
+ +
+
+
++
++++
+
+
++
+ +
++ +
+
++++ +
+ + +
+ ++ + +
+ +
+ +
++
++ +
+++
++
+
+++++
++
+
+
+++
+
+
+
++
+
+ +
++
+ +
+ +
++++ +
+++ + + + +
−7 +
+
+ ++ +
+
++
+
+
++
+
+
+
+
+
+
+++
+
+ +
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
++
++
++++
+
+
+
+
+
+ +
+ +
+ +
+ +
+
+
++ +
+
+
+ +
+
+
+++
+
++
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
+
+
+
+++
+
++
+ +
+
+
+
+
+
+++
+
+
+
++
+ +
+
+ +
+
+
+
+ + + + +
+++ + + + + ++ ++ + + +
+ ++
+ +

−8
−19−18−17−16−15−14−13−12−11−10−9 −8 −7 −6 −5 −4 −3
lg W

6.2 Comparison to Known Low-Discrepancy Sequence

In this experiment we set n = 30. For 8 ≤ m < 16, let P be a low-W point set and
PNX the digital net consisting of the first 2m points of an s-dimensional Niederreiter-
Xing sequence from [12]. Here we search for low-W point sets based on simulated
annealing as follows:

1. Let s, m, n ∈ N be fixed.
2. For τ = 4, . . . , 12, do the following:
a. Choose generating matrices C1(i) , . . . , Cs(i) randomly and uniformly from the
set (Zn×m
2 )s and denote by P (i) the digital net generated by C1(i) , . . . , Cs(i) for
i = 1, . . . , 2τ .
The Mean Square Quasi-Monte Carlo… 345

b. Find C1(i) , . . . , Cs(i) such that W (P (i) ) ≤ W (P ( j) ) for all j = 1, . . . , 2τ . Let


C1 = C1(i) , . . . , Cs = Cs(i) and P = P (i) .
c. For l = 1, . . . , 2τ , do the following:
i. Choose matrices A = (ai j ) and B = (bi ) randomly and uniformly from
the sets Zs×n2 and Zm 2 , respectively.
ii. Construct generating matrices D1 = (di(1) (s)
j ), . . . , Ds = (di j ) ∈ Z2
n×m
by

di(h) (h)
j = ci j + b j ahi

for 1 ≤ i ≤ n, 1 ≤ j ≤ m and 1 ≤ h ≤ s, where we write C1 = (ci(1) j ), . . . ,


(s)
Cs = (ci j ). Denote by Q the digital net generated by D1 , . . . , Ds .
iii. Replace C1 , . . . , Cs and P by D1 , . . . , Ds and Q with probability
 1/Tl 
W (P)
min ,1 .
W (Q)

3. Output P which gives a minimum value of W during the process 2.

In the above algorithm, Ti is called the temperature and is given in the form T αi
for 0 < α < 1. T and α are determined such that T1 = 1 and T2τ = 0.01 for a given
τ . Note that point sets we obtain by this algorithm are not extensible in m, i.e.,
one cannot increase the size of P while retaining the existing points. For a search
for extensible point sets which are good in a W -like (but different in weight and
exponent) criterion, see [6].
Varying m, we observe lg W (PNX ), lg W (P) and lg E ( f ; PNX ), lg E ( f ; P) for
each test function in Table 2. As shown in Figs. 5 and 6, the W -value of point sets P
optimized in W by our algorithm is far better than that of PNX , however this is not
surprising. The W -values of PNX have plateaus and sudden drops. In Figs. 7 and 8 are
the root mean square errors for two test functions; we clearly observe higher order
convergence in the former for the smooth function f 5 and for the discontinuous
function f 7 in the latter only lower order convergence can be achieved by both
methods.

6.3 Discussion

The first experiment shows that W works as a useful bound on E for some of the
functions tested above. The other experiment shows that point sets with low W
values are easy enough to find and perform better for smooth test functions, while
these point sets work as badly as the Niederreiter-Xing sequence for non-smooth or
discontinuous functions.
346

Table 2 Comparison between Niederreiter–Xing sequences (PNX ) and low-W point sets (P) in lg W and lg E .
s m=8 9 10 11 12 13 14 15
lg W (PNX ) 4 −10.31 −12.40 −12.90 −12.98 −15.74 −15.77 −15.77 −23.20
lg W (P) 4 −12.59 −14.39 −16.39 −17.91 −19.50 −21.82 −23.67 −26.00
lg E ( f 0 ; PNX ) 4 −0.19 −2.17 −3.22 −3.45 −5.93 −5.98 −5.94 −12.75
lg E ( f 0 ; P) 4 −2.14 −3.99 −6.03 −7.51 −9.35 −11.95 −13.63 −16.40
$ lg E ( f 1 ; PNX ) 4 −9.81 −11.99 −12.07 −12.12 −15.01 −15.00 −14.98 −23.26
lg E ( f 1 ; P) 4 −12.74 −14.72 −16.54 −18.62 −20.58 −23.09 −24.82 −27.47
lg E ( f 2 ; PNX ) 4 −3.76 −5.60 −6.67 −6.93 −9.42 −9.50 −9.46 −15.92
lg E ( f 2 ; P) 4 −5.25 −6.87 −8.82 −10.20 −11.55 −13.51 −15.34 −17.45
lg E ( f 3 ; PNX ) 4 −10.93 −13.62 −14.14 −14.47 −16.84 −16.84 −16.86 −24.03
lg E ( f 3 ; P) 4 −13.13 −14.91 −17.00 −18.57 −20.17 −22.40 −24.28 −27.04
lg E ( f 4 ; PNX ) 4 −12.44 −14.57 −15.00 −15.14 −17.88 −17.97 −17.95 −25.30
lg E ( f 4 ; P) 4 −13.16 −15.69 −17.26 −18.05 −19.75 −21.43 −24.32 −24.46
lg E ( f 5 ; PNX ) 4 −13.24 −15.39 −15.57 −15.67 −18.48 −18.55 −18.55 −26.47
lg E ( f 5 ; P) 4 −13.81 −16.24 −17.89 −18.30 −20.66 −21.79 −25.12 −24.66
lg E ( f 6 ; PNX ) 4 −9.77 −11.23 −11.54 −12.13 −12.20 −14.57 −15.92 −17.60
lg E ( f 6 ; P) 4 −8.93 −10.31 −11.70 −9.55 −11.88 −14.85 −15.56 −17.19
lg E ( f 7 ; PNX ) 4 −4.32 −4.96 −5.70 −6.17 −6.47 −6.65 −8.06 −9.22
lg E ( f 7 ; P) 4 −4.53 −4.12 −5.25 −5.68 −6.21 −7.40 −7.05 −8.84
lg W (PNX ) 12 −5.18 −6.07 −6.68 −6.82 −6.92 −6.98 −11.52 −12.01
lg W (P) 12 −6.16 −6.93 −7.89 −8.67 −9.66 −10.73 −11.67 −12.64
lg E ( f 0 ; PNX ) 12 9.95 8.89 8.00 7.84 7.80 7.76 1.39 0.09
(continued)
T. Goda et al.
Table 2 (continued)
s m=8 9 10 11 12 13 14 15
lg E ( f 0 ; P) 12 8.09 7.19 6.05 4.98 4.15 2.46 1.49 −0.31
lg E ( f 1 ; PNX ) 12 −0.57 −1.60 −2.43 −2.60 −2.64 −2.69 −8.27 −8.99
lg E ( f 1 ; P) 12 −2.20 −3.05 −4.12 −5.07 −5.97 −7.35 −8.36 −9.61
lg E ( f 2 ; PNX ) 12 11.02 10.20 9.77 9.54 9.40 9.25 6.00 5.45
The Mean Square Quasi-Monte Carlo…

lg E ( f 2 ; P) 12 10.58 9.91 9.07 8.53 7.53 6.80 5.84 5.18


lg E ( f 3 ; PNX ) 12 −6.14 −7.34 −8.32 −8.64 −8.97 −9.27 −12.74 −13.51
lg E ( f 3 ; P) 12 −7.18 −8.01 −9.01 −10.16 −10.78 −11.86 −12.90 −13.76
lg E ( f 4 ; PNX ) 12 −10.56 −11.52 −12.07 −12.27 −12.39 −12.41 −16.99 −17.47
lg E ( f 4 ; P) 12 −11.54 −12.36 −13.28 −14.09 −14.82 −16.17 −17.10 −18.20
lg E ( f 5 ; PNX ) 12 −10.69 −11.70 −12.33 −12.62 −12.70 −12.71 −18.09 −18.69
lg E ( f 5 ; P) 12 −12.00 −12.86 −13.97 −14.86 −15.17 −16.99 −17.90 −19.34
lg E ( f 6 ; PNX ) 12 −13.64 −14.31 −14.93 −15.65 −16.11 −16.62 −17.10 −17.54
lg E ( f 6 ; P) 12 −13.87 −14.16 −14.83 −15.48 −15.97 −16.45 −17.30 −18.09
lg E ( f 7 ; PNX ) 12 −4.06 −4.45 −4.93 −5.50 −6.01 −6.48 −7.02 −7.48
lg E ( f 7 ; P) 12 −4.00 −4.51 −5.00 −5.52 −5.96 −6.50 −6.95 −7.52
347
348 T. Goda et al.

Fig. 5 W values for s = 4 −10 +


−11 Niederreiter-Xing sequence +
−12 Low-W digital nets ×
−13
× +
+ +
−14 ×
−15
−16 + + +
×
−17
lg W −18 ×
−19
×
−20
−21
−22 ×
−23 +
−24 ×
−25
−26 ×
8 9 10 11 12 13 14 15
dimension/F2

Fig. 6 W values for s = 12 −5 +


Niederreiter-Xing sequence +
−6 Low-W digital nets ×
× +
+ +
−7 × + +

−8 ×
×
lg W −9
×
−10
×
−11
+
×
−12 +
×
−13
8 9 10 11 12 13 14 15
dimension/F2

Fig. 7 s = 4. The integrand −13 +


is the product −14 × Niederreiter-Xing sequence +
 peak function Low-W digital nets ×
f 5 (x) = i (xi2 + 1)−1 −15
+ + +
−16 ×
−17
−18 ×
× + + +
−19
lg E −20
−21
×
−22 ×
−23
−24
−25
×
×
−26
+
−27
8 9 10 11 12 13 14 15
dimension/F2
The Mean Square Quasi-Monte Carlo… 349

Fig. 8 s = 12. The −4 ×


+
integrand is the Niederreiter-Xing sequence +
discontinuous +
× Low-W digital nets ×
 function
f 7 (x) = i C(xi ) where +
−5 ×
C(x) = (−1)3x
×
+

lg E −6 ×
+

×
+

−7 ×
+

×
+

−8
8 9 10 11 12 13 14 15
dimension/F2

Acknowledgments The authors would like to thank Prof. Makoto Matsumoto for helpful discus-
sions and comments. The work of T.G. was supported by Grant-in-Aid for JSPS Fellows No.24-4020.
The works of R.O., K.S. and T.Y. were supported by the Program for Leading Graduate Schools,
MEXT, Japan. The work of K.S. was partially supported by Grant-in-Aid for JSPS Fellows Grant
number 15J05380.

References

1. Baldeaux, J., Dick, J.: QMC rules of arbitrary high order: reproducing kernel Hilbert space
approach. Constr. Approx. 30(3), 495–527 (2009)
2. Dick, J.: Walsh spaces containing smooth functions and quasi-Monte Carlo rules of arbitrary
high order. SIAM J. Numer. Anal. 46(3), 1519–1553 (2008)
3. Dick, J.: The decay of the Walsh coefficients of smooth functions. Bulletin of the Australian
Mathematical Society 80(3), 430–453 (2009)
4. Dick, J.: On quasi-Monte Carlo rules achieving higher order convergence. In: Monte Carlo and
Quasi-Monte Carlo Methods 2008, pp. 73–96. Springer, Berlin (2009)
5. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and quasi-Monte
Carlo integration. Cambridge University Press, Cambridge (2010)
6. Harase, S., Ohori, R.: A search for extensible low-WAFOM point sets (2013)
7. L’Ecuyer, P., Lemieux, C.: Recent advances in randomized quasi-Monte Carlo methods.
Modeling uncertainty. International Series in Operations Research and Management Science,
vol. 46, pp. 419–474. Kluwer Academic Publishers, Boston, MA (2002)
8. MacWilliams, F.J., Sloane, N.J.A.: The theory of error-correcting codes. I. North-Holland
Mathematical Library, North-Holland Publishing Co., Amsterdam (1977)
9. Matsumoto, M., Saito, M., Matoba, K.: A computable figure of merit for quasi-Monte Carlo
point sets. Math. Comput. 83(287), 1233–1250 (2014)
10. Matsumoto, M., Yoshiki, T.: Existence of higher order convergent quasi-Monte Carlo rules via
Walsh figure of merit. In: Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 569–579.
Springer, Heidelberg (2013)
11. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods, CBMS-NSF
Regional Conference Series in Applied Mathematics, vol. 63. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA (1992)
350 T. Goda et al.

12. Nuyens, D.: The magic point shop of QMC point generators and generating vectors, http://
people.cs.kuleuven.be/~dirk.nuyens/qmc-generators/
13. Suzuki, K.: WAFOM on abelian groups for quasi-Monte Carlo point sets. Hiroshima Math. J.
45(3), 341–364 (2015)
14. Yoshiki, T.: Bounds on Walsh coefficients by dyadic difference and a new Koksma-Hlawka
type inequality for Quasi-Monte Carlo integration (2015)
Uncertainty and Robustness in Weather
Derivative Models

Ahmet Göncü, Yaning Liu, Giray Ökten and M. Yousuff Hussaini

Abstract Pricing of weather derivatives often requires a model for the underlying
temperature process that can characterize the dynamic behavior of daily average
temperatures. The comparison of different stochastic models with a different number
of model parameters is not an easy task, especially in the absence of a liquid weather
derivatives market. In this study, we consider four widely used temperature models
in pricing temperature-based weather derivatives. The price estimates obtained from
these four models are relatively similar. However, there are large variations in their
estimates with respect to changes in model parameters. To choose the most robust
model, i.e., the model with smaller sensitivity with respect to errors or variation in
model parameters, the global sensitivity analysis of Sobol’ is employed. An empirical
investigation of the robustness of models is given using temperature data.

Keywords Weather derivatives · Sobol’ sensitivity analysis · Model robustness

1 Introduction

Weather related risks exist in many economic sectors, especially in agriculture,


tourism, energy, and construction. Hanley [10] reports that about one-seventh of
the industrialized economy is sensitive to weather. The weather related risks can be

A. Göncü
Xian Jiaotong Liverpool University, Suzhou 215123, China
e-mail: Ahmet.Goncu@xjtlu.edu.cn
Y. Liu
Hydrogeology Department, Earth Sciences Division,
Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
e-mail: yaningliu@lbl.gov
G. Ökten (B) · M.Y. Hussaini
Florida State University, Tallahassee, FL 32306, USA
e-mail: okten@math.fsu.edu
M.Y. Hussaini
e-mail: yousuff@fsu.edu

© Springer International Publishing Switzerland 2016 351


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_17
352 A. Göncü et al.

hedged via weather derivatives, which is a relatively new form of a financial instru-
ment that has contingent payoffs with respect to possible weather events or indices.
The market for weather derivatives was established in the USA in 1997 following
the deregulation of the energy market. The Weather Risk Management Association
(WRMA) reported that as of 2011 the weather derivatives market has grown to 12
billion US dollars. The Chicago Mercantile Exchange (CME) trades standardized
weather derivatives with the highest trading volume in temperature-based weather
derivatives; this type of derivatives is the focus of this study.
There are different approaches to price weather derivatives, such as, historical
burn analysis, index modeling, and stochastic modeling of daily average tempera-
tures ([13]). In the stochastic modelling approach, a mean-reverting process such as
the Ornstein–Uhlenbeck process is often used for modelling the evolution of daily
average temperatures at a particular measurement station. Amongst others, some
examples of studies that follow this approach are given by Alaton et. al. [1], Benth
and Benth [3], Brody et al. [4], Cao and Wei [6], Platen and West [20], Huang et al.
[12], and Göncü [8]. Some studies suggest the superiority of daily temperature mod-
elling over the index modelling approach (Oetomo and Stevenson [18], Schiller et al.
[25]). Another important modeling approach uses time series to model daily average
temperatures. An example is the model of Campbell and Diebold [5], which forecasts
daily average temperatures using an autoregressive conditional heteroscedasticity
(ARCH) model.
Within the class of dynamic models of daily temperatures, four models that are
highly cited in the literature (see, for example, the survey by Schiller et. al. [25]) and
widely used in the weather derivatives industry are given by Alaton et al. [1], Benth
and Benth [3], Brody et al. [4] and Campbell and Diebold [5]. In the study by Göncü
[9] these four models are compared in terms of their forecasting power of the futures
prices for different locations. Different models come with different parameters that
need to be estimated from the historical data, and although we may know how accu-
rately a certain parameter can be estimated, the question of the impact of the parame-
ter estimation error on the overall model has not been investigated in the literature. In
this paper, we propose a framework based on global sensitivity analysis to assess the
robustness of a model with respect to the uncertainties in its parameters. We apply our
methodology to the four different temperature models given in [1], [3–5].
The paper is organized as follows. In Sect. 2, we describe the dataset utilized,
introduce the temperature models investigated, and present estimation results of
each model. Section 3 discusses the global sensitivity analysis employed and Sect. 4
presents numerical results and conclusions.

2 Modelling of Daily Average Temperatures

In the weather derivatives market, daily temperatures are defined as the average of
the minimum and maximum temperatures observed during a given day. The most
common type of weather derivative contracts are based on the heating and cooling
degree days index, defined as follows.
Uncertainty and Robustness in Weather Derivative Models 353

Definition 1 (Heating/Cooling Degree Days) Let Ti denote the temperature for


day i. We define heating degree-days (HDD) and cooling degree-days for a given
day i and reference temperature Tr e f as H D Di = max(Tr e f − Ti , 0), and C D Di =
max(Ti − Tr e f , 0), respectively. The industry convention for the reference tempera-
ture Tr e f is 18 ◦ C (or, 65 Fahrenheit), which we adopt in this paper.
The number 
of HDDs and CDDs accumulated
n for a contract period of n days are
n
given by Hn = i=1 H D Di and Cn = i=1 C D Di , respectively.

Definition 2 (Weather Options) Call and put options are defined with respect to the
accumulated HDDs or CDDs during a contract period of n days and a predetermined
strike level K . The payoff of the call and put options written on the accumulated
HDDs (or CDDs) during a contract period of n days is given as max(Hn − K , 0) and
max(K − Hn , 0), respectively.

In the standard approach to price financial derivatives, one uses the risk neutral
dynamics of the underlying variables, which are often tradable, and from no-arbitrage
arguments an arbitrage free price is obtained. On the other hand, the underlying
for weather derivatives is a temperature index, which is not tradable, and thus no-
arbitrage arguments do not apply. However, one can still find a risk neutral measure
(which will be model dependent) from the market price of weather derivatives. (eg.
see [11])
In this section, we describe temperature models given by Alaton et. al. [1], Benth
and Benth [3], Brody et. al. [4], and Campbell and Diebold [5]. In the first three
models, the long-term dynamics of daily average temperatures are modeled deter-
ministically. The long-term mean temperature at time t is given by

Ttm = A + Bt + C sin(ωt) + D cos(ωt), (1)

where ω = 2π/365. The sine and cosine functions capture the seasonality of daily
temperatures, whereas the linear term captures the trend in temperatures which might
be due to global warming or urbanization effects. The parameters A, B, C, D can
be estimated from the data by a linear regression. An improvement in the fit can
be obtained by increasing the number of sine and cosine functions in the above
representation. However, in our dataset, we did not observe any significant improve-
ments by adding more terms. Our dataset consists of daily average temperatures1
and HDD/CDD monthly futures prices for the measurement station at New York La
Guardia International Airport. Daily average temperature data for the period between
01/01/1997 and 01/21/2012 is used to estimate the parameters of each model con-
sidered. In Fig. 1, the historical temperatures for New York are plotted.

1 Dailyaverage temperatures are measured by the Earth Satellite Corporation and our dataset is
provided by the Chicago Mercantile Exchange (CME).
354 A. Göncü et al.

100
90
80
70
Fahrenheit
60
50
40
30
20
10
0
0 1000 2000 3000 4000 5000 6000
Sample Size (Number of days)

Fig. 1 Daily average temperatures at New York La Guardia Airport: 1997–2012

2.1 The Model by Alaton, Djehiche, and Stillberger (2002)

In the model by Alaton et. al. [1], the daily temperatures are modeled by a mean
reverting Ornstein–Uhlenbeck process
 
dTtm
dTt = + a(Ttm − Tt ) dt + σt dWt , (2)
dt

where Tt is the temperature at time t, a is the mean reversion parameter, σt is a piece-


wise constant volatility function, Wt is P-Brownian motion (the physical probability
measure) and Ttm is the long-term mean temperature given by Eq. (1).
The volatility of daily temperatures σt is assumed to be constant for each month
of the year. We will not discuss the estimation of model parameters since they are
explained in [1]. We estimate the piecewise constant volatility function for our dataset
using the regression and quadratic variation methods. Figure 2 plots these results,

9
Monthly volatility (quadratoic variation)
Monthly volatility (regression method)
Fourier series fitted to empirical volatility
8 Empirical volatility for each day of the year

6
σ (t)

2
0 50 100 150 200 250 300 350 400
Time (t)

Fig. 2 Empirical versus estimated volatility


Uncertainty and Robustness in Weather Derivative Models 355

Table 1 Estimated parameters for the model by Alaton, Djehiche, and Stillberger (standard errors
of estimators in parenthesis)
A B C D a
55.7952 3.0 × 10−4 −8.7965 −20.0178 0.3491
(0.1849) (5.6 × 10−5 ) (0.1307) (0.1307) (0.01)

Table 2 Estimated monthly volatility (σt ) for each month of the year, for the model by Alaton,
Djehiche, and Stillberger (standard errors of estimators in parenthesis)
Jan Feb Mar Apr May Jun
Volatility 6.36 (0.76) 5.84 (0.64) 5.82 (0.64) 5.52 (0.57) 4.69 (0.41) 4.53 (0.39)
Jul Aug Sep Oct Nov Dec
Volatility 3.61 (0.25) 3.53 (0.23) 4.03 (0.30) 4.67 (0.41) 5.00 (0.47) 5.96 (0.67)

together with the empirical daily volatility and its Fourier series fit. Tables 1 and 2
display the estimated model parameters (including the parameters for Eq. (1)) for
our dataset with the standard errors given in parenthesis.

2.2 The Model by Benth and Benth (2007)

Benth and Benth [3] use the same mean reverting Ornstein–Uhlenbeck process used
by Alaton et. al. [1], but model the volatility function differently:


I1 
J1
σt2 = c0 + ci sin(ωit) + d j cos(ωjt), (3)
i=1 j=1

where ω = 2π/365. Following [3], we set I1 = J1 = 4 in the above equation in


our numerical results. Volatility estimates obtained from Eq. (3) are given in Fig. 2
(the curve labeled as “Fourier series fitted to empirical volatility”). The long-term
average temperatures are modeled in the same way as in Alaton et. al. [1] by Eq. (1),
hence the estimated parameters, A, B, C, D, are the same as given in Table 1. The
estimates for the rest of the parameters of the model are displayed in Table 3.

Table 3 Estimated parameters for the model by Benth and Benth (standard errors of estimators in
parenthesis)
c0 c1 c2 c3 c4
24.0422 (0.5450) 6.9825 (0.7708) −0.1127 (0.7708) 0.3783 (0.7708) −1.2162 (0.7708)
d1 d2 d3 d4
9.3381 (0.7708) −0.1068 (0.7708) 0.4847 (0.7708) 1.1303 (0.7708)
356 A. Göncü et al.

2
σ(T)
H=0.64
H=0.50
1.5

Slope = −0.36
1
σ (T)

0.5

−0.5

−1
1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
log (T)

Fig. 3 Estimation of the Hurst exponent using the estimator in [4]

2.3 The Model by Brody, Syroka, and Zervos

Brody et. al. [4] generalizes the Ornstein–Uhlenbeck stochastic process used in the
previous models by replacing the Brownian motion in the stochastic differential
equation (2) with a fractional Brownian motion, giving the following equation:
 
dTtm
dTt = + a(Tt − Tt ) dt + σt dWtH .
m
(4)
dt

WtH is a fractional Brownian motion defined on a probability space (Ω, F , P H ). See


[4] for the properties of fractional Brownian motion. The motivation for the use of
fractional Brownian motion is to capture the possible long memory effects in the data.
The “Hurst exponent”, H , characterizes the persistence in the fractional Brownian
motion process. We estimated the Hurst exponent using the statistic described in
Brody et al. [4], which measures the variability of temperature with respect to time.
In the absence of long-memory effects, we would expect to observe a decay in
the standard deviation proportional to σ (T ) ∼ T −0.5 , whereas an exponent between
0 and −0.5 suggests the existence of temporal correlation between daily average
temperatures. As can be seen in Fig. 3, the decay of the standard deviation follows,
σ (T ) ∼ T −0.36 , which supports the existence of such temporal correlation, and thus
long-memory effect. The deterministic part of the temperature dynamics, i.e. the
trend and seasonal terms, are modeled as given in Eq. (1).
Uncertainty and Robustness in Weather Derivative Models 357

2.4 The Model by Campbell and Diebold

The model proposed by Campbell and Diebold [5] follows a non-structural ARCH
type time series modeling approach. Different from [1] and [3], autoregressive lags
of daily average temperatures are also included as explanatory variables to the model.
The time series model proposed in [5] is given by


L 
P
Tt = β1 + β2 t + δl sin(ωlt) + θl cos(ωlt) + ρ p Tt− p + σt εt , (5)
l=1 p=1


Q
  R
σt2 = α0 + γq sin(ωqt) + κq cos(ωqt) + αr εt−r
2
, (6)
q=1 r =1

where εt ∼ N (0, 1) iid. Based on a similar preliminary data analysis as described


in [5], we set L = 1, P = 10, Q = 1, R = 9. First we regress temperature data on
the trend, seasonal term and autoregressive lags. We follow Engle’s [7] two-step
estimation approach, which is also used in [5], to remove the heteroscedasticity and
seasonality in the data. The estimated parameters are given in Tables 4 and 5.
The four models we have discussed share the common characteristic that sea-
sonal temperature patterns are modeled via sine and cosine functions and thus have
the same expected value for future long-term mean temperatures. Furthermore, the
models by Alaton et al. [1], Benth and Benth [3], and Campbell and Diebold [5]

Table 4 Estimated parameters for the model by Campbell and Diebold (standard errors of estima-
tors in parenthesis)
β1 β2 δ1 θ1 ρ1 ρ2 ρ3
15.2851 0.0001 −1.0969 −5.9156 0.8820 −0.3184 0.1193
(0.8534) (3.6×10−5 ) (0.1424) (0.3247) (0.0137) (0.0184) (0.0187)
ρ4 ρ5 ρ6 ρ7 ρ8 ρ9 ρ10
−0.0149 0.0160 0.0185 −0.0019 0.0066 0.0207 −0.0017
(0.0189) (0.0192) (0.0189) (0.0186) (0.0189) (0.0183) (0.0134)

Table 5 Estimated parameters for the model by Campbell and Diebold, cont’d. (standard errors of
estimators in parenthesis)
α0 γ1 κ1 α1 α2 α3
16.4401 2.2933 7.3571 0.0294 0.0366 0.0110
(0.9091) (0.6893) (0.7528) (0.0133) (0.0133) (0.0133)
α4 α5 α6 α7 α8 α9
0.0465 0.0505 0.0114 0.0151 0.0611 0.0043
(0.0133) (0.0133) (0.0133) (0.0133) (0.0133) (0.0133)
358 A. Göncü et al.

assume a Gaussian noise term after removing the effects of trend, seasonality, and
heteroscedasticity in the daily temperatures, whereas the model by Brody et. al. [4]
captures the long-memory effects by using the fractional Brownian motion different
from the other models. For option pricing of short term weather contracts it is pos-
sible to assume a simpler form of heteroscedasticity in the volatility which would
be sufficient to price monthly weather options (see [9]). The model by Campbell
and Diebold [5] might be prone to pricing errors due to the large number of ARCH
coefficients to be estimated, whereas the model by Brody et. al. [4] suffers from the
difficulty of estimating the Hurst exponent and long-term sensitivity with respect to
this parameter. These issues are investigated in the next section.

3 Global Sensitivity Analysis

Global sensitivity analysis (SA) measures parameter importance by considering vari-


ations of all input parameters at the same time. As a result, interactions among
different inputs can be detected. Among all global SA methods, Sobol’ sensitiv-
ity measures [16, 23, 26, 27] that utilize the analysis of variance (ANOVA) of the
model output are the most widely used. Variance-based global sensitivity analysis
has the advantage that type II errors (failure to identify a significant parameter) can
be avoided with a higher probability (Saltelli [24]). Other advantages include model
independence, full exploration of input parameter ranges, as well as capabilities to
capture parameter interactions and tackle groups of parameters (Saltelli [24]). Other
techniques (e.g. EFAST (Saltelli [22]) and DGSM (Sobol’ [28], Kucherenko [14]))
have been developed to approximate Sobol’s sensitivity measures with less computa-
tional cost. However, they can give inaccurate sensitivity indices in certain situations
(e.g. Sobol’ [28]) and computational efficiency is not a focus in this study.
There is an extensive literature on applications of Sobol’ sensitivity measures, for
example, Kucherenko et. al. [15] use Sobol’ sensitivity measures to identify model
effective dimensions, which are closely related to the effectiveness of applying quasi-
Monte Carlo sequences; Rohmer et. al. [21] perform Sobol’ global sensitivity analysis
in computationally intensive landslide modelling with the help of Gaussian-process
surrogate modeling; Alexanderian et. al. [2] compute Sobol’ sensitivity measures for
an ocean general circulation model by constructing a polynomial chaos expansion of
the model outputs; and Liu et. al. [17] utilize Sobol’ sensitivity measures to identify
the important input parameters in a wildland surface fire spread model to develop
efficient simulations.
Let u ⊆ {1, . . . , d} be an index set and x u denote the |u|-dimensional vector
with elements x j for j ∈ u. The ANOVA decomposition writes a square inte-
grable function
 f (x), defined on the d-dimensional unit hypercube I d = [0, 1]d , as
f (x) = u⊆{1,...,d} f u (x u ), where f u (x u ) is a function that only depends on the
 f u (x ) uis 2associated
u
variables in u. Each component function with a variance, called
a partial variance, defined as σu = [0,1]|u| f u (x ) dx u . The variance of the func-
2
Uncertainty and Robustness in Weather Derivative Models 359

tion f (x), called the total variance, is σ 2 = [0,1]d f (x)2 dx − f ∅2 . The total variance

can be written as the sum of all partial variances: σ 2 = u⊆{1,...,d} σu2 . Based on
the ANOVA decomposition, Sobol’ [26] introduced  two types of global
 sensitivity
indices (GSI) for an index set u: S u = σ12 v⊆u σv2 and S u = σ12 v∩u=∅ σv2 . The
sensitivity index S u sums all the normalized variances whose index sets are subsets
of u, and S u sums all those whose index sets have non-empty intersections with u.
Clearly, S u ≤ S u , and hence they can be used as the lower and upper bounds for
the sensitivity measures on the parameters x u . The GSI with respect to singletons,
S {i} , for instance, represents the impact on the output of parameter xi alone, and
S {i} considers the individual impact as well as the cooperative impact of xi and the
other parameters. In this sense, S {i} and S {i} are called main effects and total effects,
respectively. In the general case, S u and S u are also called lower and upper Sobol’
indices. The main effects S {i} can be used to prioritize the model parameters in terms
of their importance, while the total effects S {i} can be used as a tool to reduce model
complexity. If S {i} is relatively small, then the corresponding parameter can be frozen
at its nominal value.

4 Numerical Results

In our global sensitivity analysis, the model output is the estimate of the HDD call
option price that is calculated by averaging the payoff in Definition 2. The model
inputs are the temperature model parameters, which are estimated from the historical
temperatures. In our numerical results, the pricing of the weather derivatives is done
under the physical probability measure. We estimate the price of an HDD call option
on December 31, 20122 with strike price 800 HDDs. The contract period is January
1-31, 2012. We will refer to the four weather derivatives models considered in Sect. 2
by simply using the name of the first author. The parameters of the weather deriv-
atives models can be classified into six groups: trend, seasonality, volatility, mean
reversion, Hurst parameters, and ARCH parameters. Trend, seasonality and volatil-
ity are common to Alaton’s, Benth’s and Brody’s models. Brody’s model assumes a
fractional Brownian motion and thus involves the additional Hurst parameter. Camp-
bell’s model considers an AR(P) process for the temperatures and an ARCH(R) for
the volatility process. Least squares regression is used to obtain the mean of each
estimate and its standard error. The detailed grouping is listed in Table 6. We apply
global sensitivity analysis to these groups of parameters. Table 7 shows the Sobol’
indices S̄ with respect to groups of parameters for all models. The Sobol’ indices
are computed using a sample size of 20,000, and the price of the derivative is com-
puted using a randomly permuted random-start Halton sequence ([19]) of sample
size 10,000.

2 Our historical data starts from 1/1/1997, which corresponds to t = 1. The date we price the option,
December 31, 2012 corresponds to t = 5475.
360 A. Göncü et al.

Table 6 Parameter grouping for daily average temperature models


Alaton Benth Brody Campbell
Trend A, B A, B A, B β1 , β2
Seasonality C, D C, D C, D γ1 , κ1
Volatility σi , i = 1, . . . , 12 c, ci , di , σi , i = 1, . . . , 12 α0 , α1 , ...α9
i = 1, . . . , 4
Mean reversion a a a ρ1 , ...ρ10
Hurst parameter N/A N/A H N/A

Table 7 Upper Sobol’ indices for groups of parameters


Alaton Benth Brody Campbell
Trend 0.8240 0.8794 0.6317 0.2073
Seasonality 0.1053 0.1148 0.0823 0.0278
Volatility 0.0736 0.0019 0.2666 0.00001
Mean reversion 0.0040 0.0027 0.0118 N/A
Hurst parameter N/A N/A 0.0134 N/A
ARCH N/A N/A N/A 0.8313
parameters (ρ’s)
The sample sizes used for sensitivity analysis and for calculating the prices are 20,000 and 10,000,
respectively. M = 31, t0 = 5475, and regression standard errors are chosen as standard deviations

For all models, the sum of the upper Sobol’ indices is approximately 1, indicating
that the secondary interactions between groups of parameters are small. From Table 7,
we see that the largest sensitivity in the models by Alaton, Benth, and Brody are
due to the trend parameters. The sensitivities of the mean reversion parameters are
negligible. For Campbell’s model, the ARCH parameters are the most sensitive,
while the seasonality and volatility parameters are the most insensitive.
We first compare Alaton’s, Benth’s and Brody’s models due to their similarities.
Note that the trend and seasonality parameters are the same for the three models and
the characterization of volatility by Benth is different from Alaton. Despite the fact
that Brody’s model considers volatility the same way as Alaton’s model, the use of
fractional Brownian motion changes the behavior of the underlying stochastic process
and thus changes the volatility part as well. We keep the uncertainties of all groups
of parameters, excluding volatility, fixed at their regression standard errors. We vary
the uncertainty of the volatility group by increasing the coefficient of variation (CoV,
defined as the ratio of standard deviation to the mean) for each parameter in the
volatility group from 1 to 35 %. For example, when the CoV is 1 % for the first-
month volatility parameter σ1 in Alaton’s model, then σ1 is modeled as a normal
distribution with mean 6.36, and standard deviation 0.01 × 6.36. (The estimated
mean for σ1 is 6.36, as shown in Table 2.)
Figure 4a shows that as the CoV of volatility increases, Sobol’s non-normalized
upper index σ 2 S̄V , which represents the sum of all the partial variances of groups
Uncertainty and Robustness in Weather Derivative Models 361

(a) (b)
Alaton 110 Alaton
250 Benth Benth
Brody Brody
105
200

}
σ 2 S {− V
100
}
σ 2 S̄ { V

150
95
100
90

50 85

0 80
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
CoV of volatility CoV of volatility

Fig. 4 Model robustness using Sobol’ indices. a Sobol’s upper index for the volatility parameters
against the coefficient of variation of volatility; b Sobol’s lower index for the compliment of volatility
parameters against the coefficient of variation of volatility

of parameters that include a volatility parameter, increases monotonically for all


three models. However, for each CoV of volatility, Benth’s model has the smallest
sensitivity while Brody’s model has the largest. In addition, the sensitivity of Benth’s
model increases at a much smaller rate than that of Brody’s model. On the other
hand, Fig. 4b shows that the values of Sobol’s non-normalized lower index σ 2 S̄{−V }
is relatively constant for all models (Here, the notation −V stands for the complement
of the set V ). Since σ 2 S̄V + σ 2 S {−V } = σ 2 , this result suggests that the faster rate
of increase in the total variance of Brody’s model is explained by the faster rate of
increase in the sensitivity of the volatility parameter.
These observations suggest the following qualitative approach to compare two
models in terms of their robustness. Consider models, A and B, with the same
output. Let x be an input parameter (or a group of parameters), for the models. This
input parameter is estimated from data, and has uncertainty due to the estimation
process. Assume the uncertainty in x leads to its modeling by a normal distribution,
with mean equaling the estimated value, and a standard deviation characterizing the
estimation error. If the growth of the (non-normalized) upper Sobol’ index for x in
model B, as a function of the estimation error of the input, is at a higher rate than
that of model A, but yet the rate of increase of the (non-normalized) lower Sobol’
indices for the complimentary parameters are similar for both models, then model A
will be deemed more robust than model B with respect to x. For example, assume
that the total variances of the two models are equal, i.e., σ 2 S̄x + σ 2 S {−x} = σ 2 , is
the same for each model, however, the rate of growth in model B for the term σ 2 S̄x
is higher than that of model A. Then model A would be preferable since it is less
“sensitive” to estimation error in the input parameter x. With this understanding, and
the observations made in the previous paragraph, we conclude that Benth’s model is
more robust than Alaton’s and Brody’s models.
362 A. Göncü et al.

(a) (b)
150000 Benth Benth
Campbell 18000 Campbell

120000 15000

σ 2 S {− T }
12000
σ 2 S̄ { T }

90000
9000
60000
6000

30000 3000

0 0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
CoV of trend CoV of trend

Fig. 5 a Sobol’s upper index for the trend parameters against the coefficient of variation of trend;
b Sobol’s lower index for the compliment of trend parameters against the coefficient of variation
of trend

(a) (b)
25000 24000
Benth Benth
Campbell 21000 Campbell
20000
18000
σ 2 S {− S }

15000
σ 2 S̄ { S }

15000
12000
10000 9000
6000
5000
3000

0 0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
CoV of seasonality CoV of seasonality

Fig. 6 a Sobol’s upper index for the seasonality parameters against the coefficient of variation
of seasonality; b Sobol’s lower index for the compliment of seasonality parameters against the
coefficient of variation of seasonality

Next we compare Benth’s model with Campbell’s time series model. Figure 5a
shows that as the CoV of the trend parameters increases, the non-normalized upper
Sobol’ index σ 2 S̄T increases monotonically in a similar pattern for both models.
However, when we examine the lower Sobol’ index σ 2 S̄{−T } plot in Fig. 5b, we
observe that Campbell’s model has significantly larger sensitivity for components
other than the trend. This also means that the total variance of the model output for
Campbell’s model is much larger. Figure 6 plots the sensitivity for the seasonality
parameters. The upper Sobol’ index increases at a similar rate for both Benth’s
and Campbell’s models. However, the lower Sobol’ index for Campbell’s model
Uncertainty and Robustness in Weather Derivative Models 363

(a) (b)
7
Benth 25000 Benth
Campbell Campbell
6
20000
5

}
}

σ 2 S {− V
4 15000
σ 2 S̄ { V

3
10000
2
5000
1

0 0
0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 0.3
CoV of volatility CoV of volatility
V

Fig. 7 a Sobol’s upper index for the volatility parameters against the coefficient of variation of
volatility; b Sobol’s lower index for the compliment of volatility parameters against the coefficient
of variation of volatility

Fig. 8 Total variance σ 2 140000


against CoV of trend, trend
seasonality
seasonality and volatility 120000 volatility
parameters in Campbell’s
model 100000

80000
σ2

60000

40000

20000

0
0 0.05 0.1 0.15 0.2 0.25 0.3
CoV

is very large relative to Benth’s model. In Fig. 7, we conduct a similar analysis


for the volatility parameters, and observe a similar behavior. Finally, we plot the
total variance σ 2 of the output for Campbell’s model as a function of the CoV in
trend, seasonality, and volatility coefficients in Fig. 8. We observe that the model is
most sensitive to the increasing uncertainty in the trend parameters. This observation
makes sense if we note that any initial uncertainty in the trend coefficients applies
throughout time affecting the whole trajectory of temperatures during the contract
period. We also observe that the total variance does not change much with respect to
increasing CoV in volatility.
A summary of the many observations we have discussed, in a more general context,
will be useful. When one sets out to compare the accuracy of different models for the
same problem, a reasonable first step is to compare their total variances, which we
364 A. Göncü et al.

Table 8 Mean and total variance for all models


Alaton Benth Brody Campbell
Mean 106.69 104.86 118.95 140.70
Variance 108.04 104.16 114.61 20337.33
The sample sizes used for sensitivity analysis and for calculating the prices are 20,000 and 10,000,
respectively. M = 31, t0 = 5475, and regression standard errors are chosen as standard deviations

did in Table 8 for the four weather derivative models considered in the paper. From
this table, one can deduce the models by Alaton, Benth and Brody perform equally
well, and the model by Campbell is unsatisfactory. However, the information in this
table does not reveal how the variances will change as the models are recalibrated
with different input, resulting in different standard errors for the input parameters.
In other words, the total variance information does not explain how robust a model
is with respect to its input parameter(s). Our qualitative analysis computes Sobol’
sensitivity indices for each model, with inputs (or input groups) that match across
models, and compares the growth of the sensitivity indices as the estimation error in
the input parameters (CoV) increases. Based on our empirical results, we conclude
Benth’s model is the most “robust”; the model that has the smallest rate of increase in
the sensitivity indices as a function of input parameter error. In future work, we will
investigate developing a quantitative approach to define the robustness of a model.

References

1. Alaton, P., Djehiche, B., Stillberger, D.: On modelling and pricing weather derivatives. Appl.
Math. Financ. 9, 1–20 (2002)
2. Alexanderian, A., Winokur, J., Sraj, I., Srinivasan, A., Iskandarani, M., Thacker, W.C., Knio,
O.M.: Global sensitivity analysis in an ocean general circulation model: a sparse spectral
projection approach. Comput. Geosci. 16, 757–778 (2012)
3. Benth, F.E., Benth, J.S.: The volatility of temperature and pricing of weather derivatives. Quant.
Financ. 7, 553–561 (2007)
4. Brody, D.C., Syroka, J., Zervos, M.: Dynamical pricing of weather derivatives. Quant. Financ.
3, 189–198 (2002)
5. Campbell, S., Diebold, F.X.: Weather forecasting for weather derivatives. J. Am. Stat. Assoc.
100, 6–16 (2005)
6. Cao, M., Wei, J.: Weather derivatives valuation and market price of weather risk. J. Futur. Mark.
24, 1065–1089 (2004)
7. Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of variance of united
kingdom inflation. Econometrica 50, 987–1008 (1982)
8. Göncü, A.: Pricing temperature-based weather derivatives in China. J. Risk Financ. 13, 32–44
(2011)
9. Göncü, A.: Comparison of temperature models using heating and cooling degree days futures.
J. Risk Financ. 14, 159–178 (2013)
10. Hanley, M.: Hedging the force of nature. Risk Prof. 1, 21–25 (1999)
11. Härdle, W.K., Cabrera, B.L.: The Implied Market Price of Weather Risk. Appl. Math. Financ.
19, 59–95 (2012)
Uncertainty and Robustness in Weather Derivative Models 365

12. Huang, H.-H., Shiu, Y.-M., Lin, P.-S.: HDD and CDD option pricing with market price of
weather risk for Taiwan. J. Futu. Mark. 28, 790–814 (2008)
13. Jewson, S.: Weather Derivative Valuation: The Meteorological, Statistical, Financial and Math-
ematical Foundations. Cambridge University Press, Cambridge (2005)
14. Kucherenko, S., Rodriguez-Fernandez, M., Pantelides, C., Shah, N.: Monte Carlo evaluation
of derivative-based global sensitivity measures. Reliab. Eng. Syst. Saf. 94, 1135–1148 (2009)
15. Kucherenko, S., Feil, B., Shah, N., Mauntz, W.: The identification of model effective dimensions
using global sensitivity analysis. Reliab. Eng. Syst. Saf. 96, 440–449 (2011)
16. Liu, R., Owen, A.: Estimating mean dimensionality of analysis of variance decompositions. J.
Am. Stat. Assoc. 101, 712–721 (2006)
17. Liu, Y., Jimenez, E., Hussaini, M.Y., Ökten, G., Goodrick, S.: Parametric uncertainty quantifi-
cation in the Rothermel model with randomized quasi-Monte Carlo methods. Int. J. Wildland
Fire 24, 307–316 (2015)
18. Oetomo, T., Stevenson, M.: Hot or Cold? a comparison of different approaches to the pricing
of weather derivatives. J. Emerg. Mark. Financ. 4, 101–133 (2005)
19. Ökten, G., Shah, M., Goncharov, Y.: Random and deterministic digit permutations of the Halton
sequence. In: Plaskota, L., Woźniakowski, H. (eds.) 9th International Conference on Monte
Carlo and Quasi-Monte Carlo Methods in Scientific Computing, Warsaw, Poland, August
15–20, pp. 589–602. Springer, Berlin (2012)
20. Platen, E., West, J.: A fair pricing approach to weather derivatives. Asian-Pac. Financ. Mark.
11, 23–53 (2005)
21. Rohmer, J., Foerster, E.: Global sensitivity analysis of large-scale numerical landslide models
based on Gaussian-Process meta-modeling. Comput. Geosci. 37, 917–927 (2011)
22. Saltelli, A., Tarantola, S., Chan, K.P.-S.: A quantitative model-independent method for global
sensitivity analysis of model output. Technometrics 41, 39–56 (1999)
23. Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput.
Phys. Commun. 145, 80–297 (2002). doi:10.1016/S0010-4655(02)00280-1
24. Saltelli, A.: Global Sensitivity Analysis: The Primer. Wiley, New Jersey (2008)
25. Schiller, F., Seidler, G., Wimmer, M.: Temperature models for pricing weather derivatives.
Quant. Financ. 12, 489–500 (2012)
26. Sobol’, I.M.: Sensitivity estimates for non-linear mathematical models. Math. Model. Comput.
Exp. 1, 407–414 (1993)
27. Sobol’, I.M.: Global sensitivity indices for nonlinear mathematical models and their
Monte Carlo estimates. Math. Comput. Simul. 55, 271–280 (2001). doi:10.1016/S0378-
4754(00)00270-6
28. Sobol’, I.M., Kucherenko, S.: Derivative based global sensitivity measures and their link with
global sensitivity indices. Math. Comput. Simul. 79, 3009–3017 (2009)
Reliable Adaptive Cubature Using
Digital Sequences

Fred J. Hickernell and Lluís Antoni Jiménez Rugama

In honor of Ilya M. Sobol’

Abstract Quasi-Monte Carlo cubature methods often sample the integrand using
Sobol’ (or other digital) sequences to obtain higher accuracy than IID sampling. An
important question is how to conservatively estimate the error of a digital sequence
cubature so that the sampling can be terminated when the desired tolerance is reached.
We propose an error bound based on the discrete Walsh coefficients of the integrand
and use this error bound to construct an adaptive digital sequence cubature algorithm.
The error bound and the corresponding algorithm are guaranteed to work for inte-
grands lying in a cone defined in terms of their true Walsh coefficients. Intuitively,
the inequalities defining the cone imply that the ordered Walsh coefficients do not
dip down for a long stretch and then jump back up. An upper bound on the cost of our
new algorithm is given in terms of the unknown decay rate of the Walsh coefficients.

Keywords Quasi-Monte Carlo methods · Multidimensional integration · Digital


sequences · Sobol’ sequences · Adaptive algorithms · Automatic algorithms

1 Introduction

Quasi-Monte Carlo cubature rules approximate multidimensional integrals over the


unit cube by an equally weighted sample average of the integrand values at the first n

F.J. Hickernell · Ll.A. Jiménez Rugama (B)


Department of Applied Mathematics, Illinois Institute of Technology,
10 W. 32nd Street, E1-208, Chicago, IL 60616, USA
e-mail: ljimene1@hawk.iit.edu
F.J. Hickernell
e-mail: hickernell@iit.edu
© Springer International Publishing Switzerland 2016 367
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_18
368 F.J. Hickernell and Ll.A. Jiménez Rugama


nodes from some sequence {z i }i=0 . This node sequence should be chosen to minimize
the error, and for this one can appeal to Koksma–Hlawka type error bounds of the
form  
 1 
n−1
 f (x)dx − f (z i ) ≤ D({z i }i=0
n−1
)V ( f ). (1)
 n
[0,1)
d
i=0

The discrepancy, D({z i }i=0


n−1
), measures how far the empirical distribution of the first
n nodes differs from the uniform distribution. The variation, V ( f ), is some semi-
norm of the integrand, f . The definitions of the discrepancy and variation are linked
to each other. Examples of such error bounds are given by [3, Chaps. 2–3], [4], [11,
Sect. 5.6], [12, Chaps. 2–3], and [14, Chap. 9].
A practical problem is how large to choose n so that the absolute error is smaller
than some user-defined tolerance, ε. Error bounds of the form (1) do not help in this
regard because it is too hard to compute V ( f ), which is typically defined in terms
of integrals of mixed partial derivatives of f .
This article addresses the challenge of reliable error estimation for quasi-Monte
Carlo cubature based on digital sequences, of which Sobol’ sequences are the most
popular example. The vector space structure underlying these digital sequences facil-
itates a convenient expression for the error in terms of the (Fourier)-Walsh coefficients
of the integrand. Discrete Walsh coefficients can be computed efficiently, and their
decay provides a reliable cubature error estimate. Underpinning this analysis is the
assumption that the integrands lie in a cone defined in terms of their true Walsh
coefficients; see (13).
The next section introduces digital sequences and their underlying algebraic struc-
ture. Section 3 explains how the cubature error using digital sequences as nodes can
be elegantly formulated in terms of the Walsh series representation of the integrand.
Our contributions begin in Sect. 4, where we derive a reliable data-based cubature
error bound for a cone of integrands, (16), and an adaptive cubature algorithm based
on that error bound, Algorithm 2. The cost of the algorithm is also represented in
terms of the unknown decay of the Walsh series coefficients and the error tolerance
in Theorem 1. A numerical example and discussion then conclude this article. A
parallel development for cubature based on lattice rules is given in [9].

2 Digital Sequences

The integrands considered here are defined over the half open d-dimensional unit
cube, [0, 1)d . For integration problems on other domains one may often transform the
integration variable so that the problem is defined on [0, 1)d . See [1, 5–8] for some
discussion of variable transformations and the related error analysis. The example in
Sect. 5 also employs a variable transformation.
Reliable Adaptive Cubature Using Digital Sequences 369

Digital sequences are defined in terms of digitwise addition. Let b be a prime


number; b = 2 is the choice made for Sobol’ sequences. Digitwise addition, ⊕, and
negation, , are defined in terms of the proper b-ary expansions of points in [0, 1)d :
⎛ ⎞d ⎛ ⎞d

 ∞

x=⎝ x j b− ⎠ , t=⎝ t j b− ⎠ , x j , t j ∈ Fb := {0, . . . , b − 1},
=1 j=1 =1 j=1
⎛ ⎞d


x⊕t =⎝ [(x j + t j ) mod b]b− (mod 1)⎠ , x  t := x ⊕ (t),
=1 j=1
⎛ ⎞d


x=⎝ [−x j mod b]b− ⎠ , ax := x ⊕ · · · ⊕ x ∀a ∈ Fb .


=1 j=1 a times

We do not have associativity for all of [0, 1)d . For example, for b = 2,

1/6 = 2 0.001010 . . . , 1/3 = 2 0.010101 . . . , 1/2 = 2 0.1000 . . .


1/3 ⊕ 1/3 = 2 0.00000 . . . = 0, 1/3 ⊕ 1/6 = 2 0.011111 . . . = 1/2,
(1/3 ⊕ 1/3) ⊕ 1/6 = 0 ⊕ 1/6 = 1/6, 1/3 ⊕ (1/3 ⊕ 1/6) = 1/3 ⊕ 1/2 = 5/6.

This lack of associativity comes from the possibility of digitwise addition resulting
in an infinite trail of digits b − 1, e.g., 1/3 ⊕ 1/6 above.
Define the Boolean operator that checks whether digitwise addition of two points
does not result in an infinite trail of digits b − 1:

true, min j=1,...,d sup{ : [(x j + t j ) mod b] = b − 1} = ∞,
ok(x, t) = (2)
false, otherwise.

If P ⊂ [0, 1)d is some set that is closed under ⊕ and ok(x, t) = true for all x, t ∈ P,
then associativity holds for all points in P. Moreover, P is an Abelian group and
also a vector space over the field Fb .

Suppose that P∞ = {z i }i=0 ⊂ [0, 1)d is such a vector space that satisfies the
following additional conditions:

{z 1 , z b , z b2 , . . .} is a set of linearly independent points, (3a)



 ∞

zi = i  z b , where i = i  b ∈ N0 , i  ∈ Fb . (3b)
=0 =0

b −1 m
Such a P∞ is called a digital sequence. Moreover, any Pm := {z i }i=0 is a subspace
of P∞ and is called a digital net. From this definition it is clear that
370 F.J. Hickernell and Ll.A. Jiménez Rugama

(a) (b)
1 1

0.75 0.75

0.5 0.5

0.25 0.25

0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1

Fig. 1 a 256 Sobol’ points, b 256 scrambled and digitally shifted Sobol’ points


P0 = {0} ⊂ P1 = {0, z 1 , . . . , (b − 1)z 1 } ⊂ P2 ⊂ · · · ⊂ P∞ = {z i }i=0 .

This digital sequence definition is equivalent to the traditional one in terms of


generating matrices. By (3) and according to the b-ary expansion notation introduced
earlier, the m,  element of generating matrix, C j , for the jth coordinate is the th
binary digit of the jth element of z bm−1 , i.e.,
⎛ ⎞
(z 1 ) j1 (z b ) j1 (z b2 ) j1 ···
⎜(z 1 ) j2 (z b ) j2 (z b2 ) j2 · · ·⎟
⎜ ⎟
C j = ⎜(z 1 ) j3 (z b ) j3 (z b2 ) j3 · · ·⎟ , for j = 1, . . . , d.
⎝ ⎠
.. .. .. ..
. . . .

The Sobol’ sequence works in base b = 2 and makes a careful choice of the basis
{z 1 , z 2 , z 4 , . . .} so that the points are evenly distributed. Figure 1a displays the initial
points of the two-dimensional Sobol’ sequence. In Fig. 1b the Sobol’ sequence has
been linearly scrambled to obtain another digital sequence and then digitally shifted.

3 Walsh Series

Non-negative integer vectors are used to index the Walsh series for the integrands.
The set Nd0 is a vector space under digitwise addition, ⊕, and the field Fb . Digitwise
addition and negation are defined as follows for all k, l ∈ Nd0 :
Reliable Adaptive Cubature Using Digital Sequences 371
 ∞
d ∞ d
 
k= k j b , l= l j b , k j , l j ∈ Fb ,
=0 j=1 =0 j=1
∞ d


k⊕l = [(k j + l j ) mod b]b ,
=0 j=1
∞ d

k= (b − k j )b , ak := k ⊕ ·

· · ⊕ k ∀a ∈ Fb .
=0 j=1 a times

For each wavenumber k ∈ Nd0 a function


k, · : [0, 1)d → Fb is defined as

 ∞
d 

k, x := k j x j,+1 (mod b). (4a)
j=1 =0

For all points t, x ∈ [0, 1)d , wavenumbers k, l ∈ Nd0 , and a ∈ Fb , it follows that


k, 0 =
0, x = 0, (4b)

k, ax ⊕ t = a
k, x +
k, t (mod b) if ok(ax, t) (4c)

ak ⊕ l, x = a
k, x +
l, x (mod b), (4d)

k, x = 0 ∀k ∈ Nd0 =⇒ x = 0. (4e)


The digital sequences P∞ = {z i }i=0 considered here are assumed to contain suffi-
cient points so that

k, z i = 0 ∀i ∈ N0 =⇒ k = 0. (5)

Defining N0,m := {0, . . . , bm − 1}, the dual net corresponding to the net Pm is
the set of all wavenumbers for which
k, · maps the whole net to 0:

Pm⊥ := {k ∈ Nd0 :
k, z i = 0, i ∈ N0,m }
= {k ∈ Nd0 :
k, z b = 0,  = 0, . . . , m − 1}.

The properties of the bilinear transform defined in (4) imply that the dual nets Pm⊥
are subspaces of each other:

P0⊥ = Nd0 ⊃ P1⊥ ⊃ · · · ⊃ P∞



= {0}.

The integrands are assumed to belong to some subset of L 2 ([0, 1)d ), the space of
square integrable functions. The L 2 inner product is defined as


f, g 2 = f (x)g(x) dx.
[0,1)d
372 F.J. Hickernell and Ll.A. Jiménez Rugama

The Walsh functions {exp(2π −1
k, · /b) : k ∈ Nd0 } [3, Appendix A] are a com-
plete orthonormal basis for L 2 ([0, 1)d ). Thus, any function in L 2 may be written in
series form as
 √  √ 
f (x) = fˆ(k)e2π −1
k,x /b , where fˆ(k) := f, e2π −1
k,· /b , (6)
2
k∈Nd0

and the L 2 inner product of two functions is the 2 inner product of their Walsh series
coefficients:
     

f, g 2 = fˆ(k)ĝ(k) =: fˆ(k) k∈Nd , ĝ(k) k∈Nd .
0 0 2
k∈Nd0

Since the digital net Pm is a group under ⊕, one may derive a useful formula for
the average of a Walsh function sampled over a net. For all wavenumbers k ∈ Nd0
and all x ∈ Pm one has

b −1
1  2π √−1
k,zi /b
m

0= m [e − e2π −1
k,zi ⊕x /b ]
b i=0
b −1
1  2π √−1
k,zi /b
m

= m [e − e2π −1{
k,zi +
k,x }/b ] by (4c)
b i=0
b −1
1  2π √−1
k,zi /b
m

2π −1
k,x /b
= [1 − e ] m e .
b i=0

By this equality it follows that the average of the sampled Walsh function values is
either one or zero, depending on whether the wavenumber is in the dual net or not:

bm −1

1  2π √−1
k,zi /b 1, k ∈ Pm⊥
e = 1Pm⊥ (k) = (7)
m
b i=0 0, k ∈ Nd0 \ Pm⊥ .

Multivariate integrals may be approximated by the average of the integrand sam-


pled over a digitally shifted digital net, namely,

b −1
1 
m

Iˆm ( f ) := m f (z i ⊕ Δ). (8)


b i=0

Under the assumption that ok(z i , Δ) = true (see (2)) for all i ∈ N0 , it follows that
the error of this cubature rule is the sum of the Walsh coefficients of the integrand
over those wavenumbers in the dual net:
Reliable Adaptive Cubature Using Digital Sequences 373
    √ 
   
 ˆ   ˆ
f (x) dx − Im ( f ) =  f (0) − ˆ ˆ
f (k) Im e 2π −1
k,· /b 
 
[0,1)d
k∈Nd0
 
  √ 
=  fˆ(0) − fˆ(k)1Pm⊥ (k)e2π −1
k,Δ /b 
k∈Nd0
 
  √ 
=  fˆ(k)e2π −1
k,Δ /b 
. (9)
k∈Pm⊥ \{0}

Adaptive Algorithm 2 that we construct in Sect. 4 works with this expression for the
cubature error in terms of Walsh coefficients.
Although the true Walsh series coefficients are generally not known, they can be
estimated by the discrete Walsh transform, defined as follows:

 √  bm −1
1  −2π √−1
k,zi ⊕Δ /b
˜ ˆ
f m (k) := Im e −2π −1
k,· /b
f (·) = m e f (z i ⊕ Δ)
b i=0
⎡ ⎤
bm −1
1  ⎣ −2π √−1
k,zi ⊕Δ /b  ˆ √
= m e f (l)e2π −1
l,zi ⊕Δ /b ⎦
b i=0 d
l∈N0

 b
m
−1 √
1
= fˆ(l) m e2π −1
lk,z i ⊕Δ /b
b i=0
l∈Nd0

 b −1
1  2π √−1
lk,zi /b
m

= fˆ(l)e2π −1
lk,Δ /b
e
bm i=0
l∈Nd0
 √
= fˆ(l)e2π −1
lk,Δ /b
1Pm⊥ (l  k)
l∈Nd0
 √
= fˆ(k ⊕ l)e2π −1
l,Δ /b

l∈Pm⊥
 √
= fˆ(k) + fˆ(k ⊕ l)e2π −1
l,Δ /b
, ∀k ∈ Nd0 . (10)
l∈Pm⊥ \{0}

The discrete transform, f˜m (k) is equal to the true Walsh transform, fˆ(k), plus aliasing
terms proportional to fˆ(k ⊕ l) where l is a nonzero wavenumber in the dual net.
374 F.J. Hickernell and Ll.A. Jiménez Rugama

4 Error Estimation and an Adaptive Cubature Algorithm

4.1 Wavenumber Map

Since the discrete Walsh transform has aliasing errors, some assumptions must be
made about how quickly the true Walsh coefficients decay and which coefficients
are more important. This is done by way of a map of the non-negative integers onto
the space of all wavenumbers, k̃ : N0 → Nd0 , according to the following algorithm.

Algorithm 1 Given a digital sequence, P∞ = {z i }i=0 define k̃ : N0 → Nd0 as fol-
lows:
Step 1. Define k̃(0) = 0.
Step 2. For m = 0, 1, . . .
For κ = 0, . . . , bm − 1
Choose the values of k̃(κ + bm ), . . . , k̃(κ + (b − 1)bm ) from
the sets
   
k ∈ Nd0 : k  k̃(κ) ∈ Pm⊥ , k  k̃(κ), z bm = a , a = 1, . . . , b − 1,

but not necessarily in that order.


There is some flexibility in the choice of this map. One might choose k̃ to map
smaller values of κ to smaller values of k based on some standard measure of size
such as that given in [3, (5.9)]. The motivation is that larger κ should generally lead
to smaller fˆ( k̃(κ)). We use Algorithm 3 below to construct this map implicitly.
To illustrate the initial steps of Algorithm 1, consider the Sobol’ points in dimen-
sion 2. In this case, z 1 = (1/2, 1/2), z 2 = (1/4, 3/4) and z 4 = (1/8, 5/8). For
m = κ = 0, one needs
     
k̃(1) ∈ k ∈ Nd0 : k  k̃(0) ∈ P0⊥ , k  k̃(0), z 1 = 1 = k ∈ Nd0 :
k, z 1 = 1 .

Thus, one may choose k̃(1) = (1, 0). Next, m = 1 and κ = 0 leads to
   
k̃(2) ∈ k ∈ Nd0 : k  k̃(0) ∈ P1⊥ , k  k̃(0), z 2 = 1
 
= k ∈ Nd0 : k ∈ P1⊥ ,
k, z 2 = 1 .

Hence, we can take k̃(2) := (1, 1). Continuing with m = κ = 1 requires


   
k̃(3) ∈ k ∈ Nd0 : k  k̃(1) ∈ P1⊥ , k  k̃(1), z 2 = 1 ,

so the next choice can be k̃(3) := (0, 1).


Reliable Adaptive Cubature Using Digital Sequences 375

Introducing the shorthand notation fˆκ := fˆ( k̃(κ)) and f˜m,κ := f˜m ( k̃(κ)), the
aliasing relation (10) may be written as
∞  
 √
−1 k̃(κ+λbm ) k̃(κ),Δ /b
f˜m,κ = fˆκ + fˆκ+λbm e

, (11)
λ=1

and the cubature error in (9) may be bounded as


  ∞ √    ∞  
   2π −1 k̃(λbm ),Δ /b 
  ˆ m
 f (x) dx − Iˆm ( f ) =  fˆλbm e ≤  f λb  . (12)
 
[0,1)d λ=1 λ=1

We will use the discrete transform, f˜m,κ , to estimate true Walsh coefficients, fˆκ , for
m significantly larger than logb (κ).

4.2 Sums of Walsh Series Coefficients and Cone Conditions

Consider the following sums of the true and approximate Walsh series coefficients.
For , m ∈ N0 and  ≤ m let

b
m
−1 b −1 ∞

   
Sm ( f ) =  fˆκ , S,m ( f ) =  fˆκ+λbm ,
κ=bm−1  κ=b−1  λ=1


 b −1
   
Šm ( f ) = S0,m ( f ) + · · · + Sm,m ( f ) =  fˆκ , !
S,m ( f ) =  f˜m,κ .
κ=bm κ=b−1 

The first three sums, Sm ( f ), S,m ( f ), and Šm ( f ), cannot be observed because they
involve the true series coefficients. But, the last sum, ! S,m ( f ), is defined in terms of
the discrete Walsh transform and can easily be computed in terms of function values.
The details are described in the Appendix.
We now make critical assumptions about how certain sums provide upper bounds
on others. Let ∗ ∈ N be some fixed integer and ω and ω̊ be some non-negative valued
functions with limm→∞ ω̊(m) = 0 such that ω(r )ω̊(r ) < 1 for some r ∈ N. Define
the cone of integrands

C := { f ∈ L 2 ([0, 1)d ) : S,m ( f ) ≤ ω(m − ) Šm ( f ),  ≤ m,


Šm ( f ) ≤ ω̊(m − )S ( f ), ∗ ≤  ≤ m}.
(13)

This is a cone because f ∈ C =⇒ a f ∈ C for all real a.


376 F.J. Hickernell and Ll.A. Jiménez Rugama

Fig. 2 The magnitudes of


true Walsh coefficients
 for
f (x) = e−3x sin 10x 2

The first inequality asserts that the sum of the larger indexed Walsh coefficients
bounds a partial sum of the same coefficients. For example, this means that S0,12 , the
sum of the values of the large black dots in Fig. 2, is no greater than some factor times
Š12 ( f ), the sum of the values of the gray ×. Possible choices of ω are ω(m) = 1
or ω(m) = Cb−αm for some C > 1 and 0 ≤ α ≤ 1. The second inequality asserts
that the sum of the smaller indexed coefficients provides an upper bound on the sum
of the larger indexed coefficients. In other words, the fine scale components of the
integrand are not unduly large compared to the gross scale components. In Fig. 2 this
means that Š12 ( f ) is no greater than some
  factor times S8 ( f ), the sum of the values
of the black squares. This implies that  fˆκ  does not dip down and then bounce back
up too dramatically as κ → ∞. The reason for enforcing the second inequality only
for  ≥ ∗ is that for small , one might have a coincidentally small S ( f ), while
Šm ( f ) is large.
The cubature error bound in (12) can be bounded in terms of Sl ( f ), a certain
finite sum of the Walsh coefficients for integrands f in the cone C . For , m ∈ N,
∗ ≤  ≤ m, it follows that
   ∞
   
 ˆ 
f (x) dx − Im ( f ) ≤  fˆλbm  = S0,m ( f ) by (12)

[0,1)d λ=1

≤ ω(m) Šm ( f ) ≤ ω(m)ω̊(m − )S ( f ). (14)

Thus, the faster S ( f ) decays as  → ∞, the faster the cubature error must decay.
Unfortunately, the true Walsh coefficients are unknown. Thus we must bound
S ( f ) in terms of the observable sum of the approximate coefficients, !
S,m ( f ). This
is done as follows:
Reliable Adaptive Cubature Using Digital Sequences 377

 −1
b
 
S ( f ) =  fˆκ 
κ=b−1 
 −1
b  ∞ √   
 
ˆκ+λbm e2π −1 k̃(κ+λb )k̃(κ),Δ /b 
m
=  f˜m,κ − f by (11)
 
κ=b−1  λ=1
 −1
b  −1
b ∞
    
≤  f˜m,κ  +  fˆκ+λbm  = !
S,m ( f ) + S,m ( f )
κ=b−1  κ=b−1  λ=1

≤!S,m ( f ) + ω(m − )ω̊(m − )S ( f ) by (13),


!
S,m ( f )
S ( f ) ≤ provided that ω(m − )ω̊(m − ) < 1. (15)
1 − ω(m − )ω̊(m − )

Combining (14) with (15) leads to the following conservative upper bound on the
cubature error for , m ∈ N, ∗ ≤  ≤ m:
 
  !
S,m ( f )ω(m)ω̊(m − )
 f (x) dx − Iˆm ( f ) ≤ . (16)
 1 − ω(m − )ω̊(m − )
[0,1)d

This error bound suggests the following algorithm.

4.3 An Adaptive Cubature Algorithm and Its Cost

Algorithm 2 (Adaptive Digital Sequence Cubature, cubSobol_g) Given the


parameter ∗ ∈ N and the functions ω and ω̊ that define the cone C in (13), choose the
parameter r ∈ N such that ω(r )ω̊(r ) < 1. Let C(m) := ω(m)ω̊(r )/[1 − ω(r )ω̊(r )]
and m = ∗ + r . Given a tolerance, ε, and a routine that produces values of the
integrand, f , do the following:
Step 1. Compute the sum of the discrete Walsh coefficients, ! Sm−r,m ( f ) according
to Algorithm 3.
Step 2. Check whether the error tolerance is met, i.e., whether C(m)! Sm−r,m ( f ) ≤ ε.
If so, then return the cubature Iˆm ( f ) defined in (8) as the answer.
Step 3. Otherwise, increment m by one, and go to Step 1.

There is a balance to be struck in the choice of r . Choosing r too large causes the
error bound to depend on the Walsh coefficients with smaller indices, which may be
large, even thought the Walsh coefficients determining the error are small. Choosing
r too large makes ω(r )ω̊(r ) large, and thus the inflation factor, C, large to guard
against aliasing.
378 F.J. Hickernell and Ll.A. Jiménez Rugama

Theorem 1 If the integrand, f , lies in the cone, C , then the Algorithm 2 is successful:
 
 
 f (x)dx − Im ( f ) ≤ ε.
ˆ

[0,1)d

The number of integrand values required to obtain this answer is bm , where the
following upper bound on m depends on the tolerance and unknown decay rate of
the Walsh coefficients.

m ≤ min{m  ≥ ∗ + r : C(m  )[1 + ω(r )ω̊(r )]Sm  −r ( f ) ≤ ε}

The computational cost of this algorithm beyond that of obtaining the integrand
values is O(mbm ) to compute the discrete Walsh transform.

Proof The success of this algorithm comes from applying (16). To bound the number
of integrand values required note that argument leading to (15) can be modified to
provide an upper bound on ! S,m ( f ) in terms of S ( f ):

b −1
 
!
S,m ( f ) =  f˜m,κ 
κ=b−1 

b 
−1 ∞ √   
  2π −1 k̃(κ+λbm ) k̃(κ),Δ /b 
=  ˆ ˆ
 fκ + f κ+λbm e  by (11)
κ=b−1  λ=1
 
b −1 b −1 ∞

   
≤  fˆκ  +  fˆκ+λbm  = S ( f ) + S,m ( f )
κ=b−1  κ=b−1  λ=1

≤ [1 + ω(m − )ω̊(m − )]S ( f ) by (13).

Thus, the upper bound on the error in Step 2 of Algorithm 2, is itself bounded above
by C(m)[1 + ω(r )ω̊(r )]Sm−r ( f ). Therefore, the stopping criterion in Step 2 must be
satisfied no later than when this quantity falls below ε.
The computation of the discrete Walsh transform and ! Sm−r,m ( f ) is described in
Algorithm 3 in the Appendix. The cost of this algorithm is O(mbm ) operations. 

5 Numerical Experiments

Algorithm 2 has been implemented in MATLAB code as the function cubSobol_g.


It is included in our Guaranteed Automatic Integration Library (GAIL) [2]. Our
cubSobol_g utilizes MATLAB’s built-in Sobol’ sequences, so b = 2. The default
algorithm parameters are
Reliable Adaptive Cubature Using Digital Sequences 379

Fig. 3 Time required and 0 0 0.2 0.4 0.6 0.8 1


error observed for 10 1
cubSobol_g (Algorithm 2)
for the Keister example, (17).
Small dots denote the time 0.8
and error when the tolerance −1
of ε = 0.001 was met. Large 10
0.6
dots denote the time and
error when the tolerance was
not met. The solid line
0.4
denotes the empirical −2
10
distribution function of the
error, and the dot-dashed line 0.2
denotes the empirical
distribution function of the
time 10
−3
0
−6 −5 −4 −3 −2 −1
10 10 10 10 10 10

∗ = 6, r = 4, C(m) = 5 × 2−m ,

and mapping k̃ is fixed heuristically according to Algorithm 3. Fixing C partially


determines ω and ω̊ since ω(m) = C(m)/ω(r ) and ω(r )ω̊(r ) = C(r )/[1 + C(r )].
We have tried cubSobol_g on an example from [10]:
⎛" ⎞
  # d
#1 
e−t cos(t) dt = π d/2 cos ⎝$ [Φ −1 (x j )]2 ⎠ dx,
2
I = (17)
Rd [0,1)d 2 j=1

where Φ is the standard Gaussian distribution function (Fig. 3). We generated 1000
IID random values of the dimension d = e D  with D being uniformly distributed
between 0 and log(20). Each time cubSobol_g was run, a different scrambled and
shifted Sobol’ sequence was used. The tolerance was met about 97 % of the time
and failures were more likely among the higher dimensions. For those cases where
the tolerance was not met, mostly the larger dimensions, the integrand lay outside
the cone C . Our choice of k̃ via Algorithm 3 depends somewhat on the particular
scrambling and digital shift, so the definition of C also depends mildly on these.

6 Discussion

There are few quasi-Monte Carlo cubature algorithms available that adaptively deter-
mine the sample size needed based on integrand values. The chief reason is that
reliable error estimation for quasi-Monte Carlo is difficult. Quasi-standard error has
serious drawbacks, as explained in [15]. Internal replications have no explicit theory.
380 F.J. Hickernell and Ll.A. Jiménez Rugama

IID replications of randomized quasi-Monte Carlo rules are sometimes used, but one
does not know how many replications are needed.
The proposed error bound and adaptive algorithm here are practical and have
theoretical justification. The conditions imposed on the sums of the (true) Fourier–
Walsh coefficients make it possible to bound the cubature error in terms of discrete
Fourier–Walsh coefficients. The set of integrands satisfying these conditions is a non-
convex cone (13), thereby placing us in a setting where adaption has the opportunity
to be beneficial.
Problems requiring further consideration include how to choose the default para-
meters for Algorithm 2. We would also like to extend our algorithm and theory to
the case of relative error.

Acknowledgments This work was partially supported by US National Science Foundation grants
DMS-1115392, DMS-1357690, and DMS-1522687. The authors thank Ronald Cools and Dirk
Nuyens for organizing MCQMC 2014. We thank Sergei Kucherenko and Art Owen for organizing
the special session in honor of Ilya M. Sobol’. We are grateful for Professor Sobol’s many contribu-
tions to MCQMC and related fields. The suggestions made by Sou-Cheng Choi, Yuhan Ding, Lan
Jiang, and the anonymous referees to improve this manuscript are greatly appreciated.

Appendix: Fast Computation of the Discrete Walsh


Transform

Let y0 , y1 , . . . be some data. Define Yν(m) for ν = 0, . . . , bm − 1 as follows:

b −1
1  −2π √−1 %=0 1  
m
b−1 b−1 √ %m−1
m−1
Yν(m) := m e ν i  /b
yi = m ··· e−2π −1 =0 ν i /b yi ,
b i=0 b i =0 i =0 m−1 0

where i = i 0 + i 1 b + · · · i m−1 bm−1 and ν = ν0 + ν1 b + · · · νm−1 bm−1 . For all i j , ν j


∈ Fb , j,  = 0, . . . , m − 1, recursively define

Ym,0 (i 0 , . . . , i m−1 ) := yi ,

Ym,+1 (ν0 , . . . , ν , i +1 , . . . , i m−1 )


1  −2π √−1ν i /b (m)
b−1
:= e Ym, (ν1 , . . . , ν−1 , i  , . . . , i m−1 ).
b i =0


This allows us to identify Yν(m) = Ym,m (ν0 , . . . , νm−1 ). By this iterative process one
can compute Y0(m) , . . . , Yb(m)
m −1 in only O(mb ) operations.
m

Note also, that Ym+1,m (ν0 , . . . , νm−1 , 0) = Ym,m (ν0 , . . . , νm−1 ) = Yν(m) .
This means that the work done to compute Yν(m) can be used to compute Yν(m+1) .
Reliable Adaptive Cubature Using Digital Sequences 381

Next, we relate the Yν to the discrete Walsh transform of the integrand f . For

every k ∈ Nd0 and every digital sequence P∞ = {z i }i=0 , let


m−1
!
ν0 (k) := 0, !
νm (k) :=
k, z b b ∈ N0,m , m ∈ N. (18)
=0

If we set yi = f (z i + Δ), and if !


νm (k) = ν, then

b −1
1  −2π √−1
k,zi ⊕Δ /b
m

f˜m (k) = m e yi
b i=0

−1
k,Δ /b b
m
−1
e−2π √
−1
k,z i /b
= e−2π yi by (4c)
bm i=0

−1
k,Δ /b b −1 √  % 
m
e−2π −2π −1 k, m−1
j=0 i j z b j /b
= e yi by (3)
bm i=0

−1
k,Δ /b b
m
−1
e−2π √ %m−1
i j
k,z b j /b
= e−2π −1 j=0 yi by (4c)
bm i=0

−1
k,Δ /b b
m
−1
e−2π √ %m−1
= e−2π −1 =0 ν i  /b
yi by (18)
bm i=0

= e−2π −1
k,Δ /b
Yν(m) . (19)

Using the notation in Sect. 4, for all m ∈ N0 define a pointer ν̊m : N0,m → N0,m
as ν̊m (κ) := !
νm ( k̃(κ)). It follows that

f˜m,κ = f˜m ( k̃(κ)) = e−2π −1
k,Δ /b
Yν̊(m)
m (κ)
,

b −1 
b −1  
   (m) 
!
S,m ( f ) =  f˜m,κ  = Yν̊m (κ) . (20)
κ=b−1 κ=b−1

The quantity ! Sm−r,m ( f ) is the key to the stopping criterion in Algorithm 2.


If the map k̃ : N0 → Nd0 defined in Algorithm 1 is known explicitly, then speci-
fying ν̊m is straightforward. However, in practice the bookkeeping involved in con-
structing k̃ might be tedious, so we take a data-dependent approach to constructing
the pointer ν̊m (κ) for κ ∈ N0,m directly, which then defines k̃ implicitly.

Algorithm 3 Let r ∈ N be fixed. Given the input m ∈ N0 , the discrete Walsh coef-
ficients Yν(m) for ν ∈ N0,m , and also the pointer ν̊m−1 (κ) defined for κ ∈ N0,m−1 ,
provided m > 0,
382 F.J. Hickernell and Ll.A. Jiménez Rugama

Step 1. If m = 0, then define ν̊(0) = 0 and go to Step 4.


Step 2. Otherwise, if m ≥ 1, then initialize ν̊m (κ) = ν̊m−1 (κ) for κ ∈ N0,m−1 and
ν̊m (κ) = κ for κ = bm−1 , . . . , bm − 1.
Step 3. For  = m − 1, m − 2, . . . , max(1, m − r ),
for κ = 1, . . . , b − 1   
   (m) 
Find a ∗ such that Yν̊(m)
m (κ+a ∗ b )  ≥ Yν̊ (κ+ab )  for all a ∈ Fb .
m

Swap the values of ν̊m (κ) and ν̊m (κ + a ∗ b ).


Step 4. Return ν̊m (κ) for κ ∈ N0,m . If m ≥ r , then compute !Sm−r,r ( f ) according to
(20), and return this value as well.

Lemma 1 Let Pm,κ := {k ∈ Nd0 : !
νm (k) = ν̊m (κ)} for κ ∈ N0,m , m ∈ N0 , where ν̊m
is given by Algorithm 3. Then we implicitly have defined the map k̃ in the sense

that any map k̃ : N0,m → Nd0 that chooses k̃(0) = 0 ∈ Pm,0 , and k̃(κ) ∈ Pm,κ for
all κ = 1, . . . , b − 1 gives the same value of Sm−r,r ( f ). It is also consistent with
m

Algorithm 1 for κ ∈ N0,m−r .


Proof The constraint that k̃(κ) ∈ Pm,κ implies that Sm−r,r ( f ) is invariant under all

k̃ chosen according to the assumption that k̃(κ) ∈ Pm,κ . By definition 0 ∈ Pm,0
remains true for all m for Algorithm 3.
The remainder of the proof is to show that choosing k̃(κ) by the hypothesis of
this lemma is consistent with Algorithm 1. To do this we show that for m ∈ N0
⊥ ⊥ ⊥
k ∈ Pm,κ , l ∈ Pm,κ+ab  =⇒ k  l ∈ P for all κ = 1, . . . , b ,  < m,
(21)
and that
⊥ ⊥
Pm,κ ⊃ Pm+1,κ ⊃ ··· for κ ∈ N0,m−r provided m ≥ r. (22)


The proof proceeds by induction. Since P0,0 = Nd0 , the above two conditions are
satisfied automatically.
If they are satisfied for m − 1 (instead of m), then the initialization stage in Step
2 of Algorithm 3 preserves (21) for m. The swapping of ν̊m (κ) and ν̊m (κ + a ∗ b )
⊥ ⊥
values in Step 3 also preserves (21). Step 3 may cause Pm−1,κ ∩ Pm,κ = ∅ for some
larger values of κ, but the constraint on the values of  in Step 3 mean that (22) is
preserved. 

References

1. Caflisch, R.E.: Monte Carlo and quasi-Monte Carlo methods. Acta Numer. 7, 1–49 (1998)
2. Choi, S.C.T., Ding, Y., Hickernell, F.J., Jiang, L., Jiménez Rugama, Ll.A., Tong, X., Zhang,
Y., Zhou, X.: GAIL: Guaranteed Automatic Integration Library (versions 1.0–2.1). MATLAB
software (2013–2015). https://github.com/GailGithub/GAIL_Dev
3. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
Reliable Adaptive Cubature Using Digital Sequences 383

4. Hickernell, F.J.: A generalized discrepancy and quadrature error bound. Math. Comput. 67,
299–322 (1998)
5. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On strong tractability of weighted multivariate
integration. Math. Comput. 73, 1903–1911 (2004)
6. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On tractability of weighted integration for
certain Banach spaces of functions. In: Niederreiter [13], pp. 51–71
7. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On tractability of weighted integration over
bounded and unbounded regions in Rs . Math. Comput. 73, 1885–1901 (2004)
8. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: The strong tractability of multivariate inte-
gration using lattice rules. In: Niederreiter [13], pp. 259–273
9. Jiménez Rugama, Ll.A., Hickernell, F.J.: Adaptive multidimensional integration based on
rank-1 lattices. In: Cools, R., Nuyens, D., (eds.) Monte Carlo and Quasi-Monte Carlo Methods
2014, vol. 163, pp. 407–422. Springer, Heidelberg (2016)
10. Keister, B.D.: Multidimensional quadrature algorithms. Comput. Phys. 10, 119–122 (1996)
11. Lemieux, C.: Monte Carlo and quasi-Monte Carlo Sampling. Springer Science+Business Media
Inc, New York (2009)
12. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF
Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1992)
13. Niederreiter, H. (ed.): Monte Carlo and Quasi-Monte Carlo Methods 2002. Springer, Berlin
(2004)
14. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems Volume II: Standard Infor-
mation for Functionals. No. 12 in EMS Tracts in Mathematics. European Mathematical Society,
Zürich (2010)
15. Owen, A.B.: On the Warnock-Halton quasi-standard error. Monte Carlo Methods Appl. 12,
47–54 (2006)
Optimal Point Sets for Quasi-Monte Carlo
Integration of Bivariate Periodic Functions
with Bounded Mixed Derivatives

Aicke Hinrichs and Jens Oettershagen

Abstract We investigate quasi-Monte Carlo (QMC) integration of bivariate periodic


functions with dominating mixed smoothness of order one. While there exist several
QMC constructions which asymptotically yield the optimal rate of convergence of
O(N −1 log(N ) 2 ), it is yet unknown which point set is optimal in the sense that it is
1

a global minimizer of the worst case integration error. We will present a computer-
assisted proof by exhaustion that the Fibonacci lattice is the unique minimizer of the
1
QMC worst case error in periodic Hmix for small Fibonacci numbers N . Moreover,
we investigate the situation for point sets whose cardinality N is not a Fibonacci
number. It turns out that for N = 1, 2, 3, 5, 7, 8, 12, 13 the optimal point sets are
integration lattices.

Keywords Multivariate integration · Quasi-Monte Carlo · Optimal quadrature


points · Fibonacci lattice

1 Introduction

Quasi-Monte Carlo (QMC) rules are equal-weight quadrature rules which can be
used to approximate integrals defined on the d-dimensional unit cube [0, 1)d

1 
N
f (x) dx ≈ f (x i ),
[0,1)d N i=1

where P N = {x 1 , x 2 , . . . , x N } are deterministically chosen quadrature points in


[0, 1)d . The integration error for a specific function f is given as

A. Hinrichs
Institut für Analysis, Johannes-Kepler-Universität Linz, Altenberger Straße 69,
4040 Linz, Austria
e-mail: aicke.hinrichs@uni-rostock.de
J. Oettershagen (B)
Institute for Numerical Simulation, Wegelerstraße 6, 53115 Bonn, Germany
e-mail: oettershagen@ins.uni-bonn.de

© Springer International Publishing Switzerland 2016 385


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_19
386 A. Hinrichs and J. Oettershagen
 
 1 
N 
 
 f (x) dx − f (x i ) .
 [0,1)d N i=1 

To study the behavior of this error as N increases for f from a Banach space (H , ·)
one considers the worst case error
 
 1 
N 
 
wce(H , P N ) = sup  f (x) dx − f (x i ) .
f ∈H  [0,1)d N 
i=1
 f ≤1

Particularly nice examples of such function spaces are reproducing kernel Hilbert
1
spaces [1]. Here, we will consider the reproducing kernel Hilbert space Hmix of
1-periodic functions with mixed smoothness. Details on these spaces are given in
Sect. 2. The reproducing kernel is a tensor product kernel of the form


d
K d,γ (x, y) = K 1,γ (x j , y j ) for x = (x1 , . . . , xd ), y = (y1 , . . . , yd ) ∈ [0, 1)d
j=1

with K 1,γ (x, y) = 1 + γ k(|x − y|) and k(t) = 21 (t 2 − t + 16 ) and a parameter γ > 0.
It turns out that minimizing the worst case error wce(Hmix 1
, P N ) among all N -point
sets P N = {x 1 , . . . , x N } with respect to the Hilbert space norm corresponding to
the kernel K d,γ is equivalent to minimizing the double sum


N
G γ (x 1 , . . . , x N ) = K d,γ (x i , x j ).
i, j=1

There is a general connection between the discrepancy of a point set and the worst case
error of integration. Details can be found in [11, Chap. 9]. In our case, the relevant
notion is the L 2 -norm of the periodic discrepancy. We describe the connection in
detail in Sect. 2.3.
There are many results on the rate of convergence of worst case errors and of
the optimal discrepancies for N → ∞, see e.g. [9, 11], but results on the optimal
point configurations for fixed N and d > 1 are scarce. For discrepancies, we are
only aware of [21], where the point configurations minimizing the standard L ∞ -star-
discrepancy for d = 2 and N = 1, 2, . . . , 6 are determined, [14], where for N = 1
the point minimizing the standard L ∞ - and L 2 -star discrepancy for d ≥ 1 is found,
and [6], where this is extended to N = 2.
It is the aim of this paper to provide a method which for d = 2 and N > 2
yields the optimal points for the periodic L 2 -discrepancy and worst case error in
1
Hmix . Our approach is based on a decomposition of the global optimization problem
into exponentially many local ones which each possess unique solutions that can be
approximated efficiently by a nonlinear block Gauß–Seidel method. Moreover, we
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 387

use the symmetries of the two-dimensional torus to significantly reduce the number
of local problems that have to be considered.
It turns out that in the case that N is a (small) Fibonacci number, the Fibonacci
lattice yields the optimal point configuration. It is common wisdom, see e.g.
[3, 10, 15–18], that the Fibonacci lattice provides a very good point set for integrat-
ing periodic functions. Now our results support the conjecture that they are actually
the best points.
These results may suggest that the optimal point configurations are integration
lattices or at least lattice point sets. This seems to be true for some numbers N of
points, for example for Fibonacci numbers, but not always. However, it can be shown
that integration lattices are always local minima of wce(Hmix 1
, P N ). Moreover, our
numerical results also suggest that for small γ the optimal points are always close
to a lattice point set, i.e. N -point sets of the form
 
i σ (i)
, : i = 0, . . . , N − 1 ,
N N

where σ is a permutation of {0, 1, . . . , N − 1}.


The remainder of this article is organized as follows: In Sect. 2 we recall Sobolev
spaces with bounded mixed derivatives, the notion of the worst case integration error
in reproducing kernel Hilbert spaces and the connection to periodic discrepancy.
In Sect. 3 we discuss necessary and sufficient conditions for optimal point sets and
derive lower bounds of the worst case error on certain local patches of the whole
[0, 1)2N . In Sect. 4 we compute candidates for optimal point sets up to machine
precision. Using arbitrary precision rational arithmetic we prove that they are indeed
near the global minimum which also turns out to be unique up to torus-symmetries.
For certain point numbers the global minima are integration lattices as is the case if
N is a Fibonacci number. We close with some remarks in Sect. 5.

1 (T2 )
2 Quasi-Monte Carlo Integration in Hmix

2.1 Sobolev Spaces of Periodic Functions

We consider univariate 1-periodic functions f : R → R which are given by their


values on the torus T = [0, 1).

1 For k ∈ Z, the kth Fourier coefficient of a function
f ∈ L 2 (T) is given by fˆk = 0 f (x) exp(2π i kx) dx. The definition

  2 
 f 2H 1,γ = fˆ02 + γ |2π k|2 fˆk2 = f (x) dx +γ f (x)2 dx (1)
k∈Z T T
388 A. Hinrichs and J. Oettershagen

for a function f in the univariate Sobolev space H 1 (T) = W 1,2 (T) ⊂ L 2 (T) of
functions with first weak derivatives bounded in L 2 gives a Hilbert space norm
 f  H 1,γ on H 1 (T) depending on the parameter γ > 0. The corresponding inner
product is given by
 1   1   1
( f, g) H 1,γ (T) = f (x) dx g(x) dx + γ f (x)g (x) dx.
0 0 0

We denote the Hilbert space H 1 (T) equipped with this inner product by H 1,γ (T).
Since H 1,γ (T) is continuously embedded in C 0 (T) it is a reproducing
kernel Hilbert space (RKHS), see [1], with a symmetric and positive definite kernel
K 1,γ : T × T → R, given by [20]

K 1,γ (x, y) := 1 + γ |2π k|−2 exp(2π ik(x − y))
k∈Z\{0} (2)
= 1 + γ k(|x − y|),

where k(t) = 21 (t 2 − t + 16 ) is the Bernoulli polynomial of degree two divided by


two.
This kernel has the property that it reproduces point evaluations in H 1 , i.e.
f (x) = ( f (·), K (·, x)) H 1,γ for all f ∈ H 1 . The reproducing kernel of the tensor
1,γ
product space Hmix (T2 ) := H 1 (T) ⊗ H 1 (T) ⊂ C(T2 ) is the product of the univari-
ate kernels, i.e.

K 2,γ (x, y) = K 1,γ (x1 , y1 ) · K 1,γ (x2 , y2 )


= 1 + γ k(|x1 − y1 |) + γ k(|x2 − y2 |) + γ 2 k(|x1 − y1 |)k(|x2 − y2 |).
(3)

2.2 Quasi-Monte Carlo Cubature


N
A linear cubature algorithm Q N ( f ) := N1 i=1 f (x i ) with uniform weights N1 on a
point set P N = {x 1 , . . . , x N } is called a QMC cubature rule. Well-known examples
for point sets used in such quadrature methods are digital nets, see e.g. [4, 9], and
lattice rules [15]. A two-dimensional integration lattice is a set of N points given as
 
i ig
, mod 1 : i = 0, . . . , N − 1
N N

for some g ∈ {1, . . . , N − 1} coprime to N . A special case of such a rank-1 lattice


rule is the so called Fibonacci lattice that only exists for N being a Fibonacci number
Fn and is given by the generating vector (1, g) = (1, Fn−1 ), where Fn denotes the
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 389

nth Fibonacci number. It is well known that the Fibonacci lattices yield the optimal
rate of convergence in certain spaces of periodic functions.
In the setting of a reproducing kernel Hilbert space with kernel K on a general
domain D, the worst case error of the QMC-rule Q N can be computed as

  N 
2  1 
N
wce(H , P N )2 = K (x, y) dx d y − K (x i , y) dy + 2 K (x i , x j ),
D D N N
i=1 D i, j=1

which is the norm of the error functional, see e.g. [4, 11]. For the kernel K 2,γ we
obtain

1 
N N
1,γ
wce(Hmix (T2 ), P N )2 = −1 + K 2,γ (x i , x j ).
N 2 i=1 j=1

There is a close connection between the worst case error of integration in


1,γ
wce(Hmix (T2 ), P N ) for the case γ = 6 and periodic L 2 -discrepancy, which we
will describe in the following.

2.3 Periodic Discrepancy

The periodic L 2 -discrepancy is measured with respect to periodic boxes. In dimen-


sion d = 1, periodic intervals I (x, y) for x, y ∈ [0, 1) are given by

I (x, y) = [x, y) if x ≤ y and I (x, y) = [x, 1) ∪ [0, y) if x > y.

In dimension d > 1, the periodic boxes B(x, y) for x = (x1 , . . . , xd ) and y =


(y1 , . . . , yd ) ∈ [0, 1)d are products of the one-dimensional intervals, i.e.

B(x, y) = I (x1 , y1 ) × · · · × I (xd , yd ).

The discrepancy of a set P N = {x 1 , . . . , x N } ⊂ [0, 1)d with respect to such a


periodic box B = B(x, y) is the deviation of the relative number of points of P N
in B from the volume of B

#P N ∩ B
D(P N , B) = − vol(B).
N

Finally, the periodic L 2 -discrepancy of P N is the L 2 -norm of the discrepancy func-


tion taken over all periodic boxes B = B(x, y), i.e.
390 A. Hinrichs and J. Oettershagen
  1/2
D2 (P N ) = D(P N , B(x, y)) d y dx
2
.
[0,1)d [0,1)d

It turns out, see [11, p. 43] that the periodic L 2 -discrepancy can be computed as

1 
D2 (P N )2 = − 3−d + K̃ d (x, y)
N2
x, y∈P N

= 3−d wce(Hmix
1,6
(Td ), P N )2 ,

where K̃ d is the tensor product of d kernels K̃ 1 (x, y) = |x − y|2 − |x − y| + 21 . So


minimizing the periodic L 2 -discrepancy is equivalent to minimizing the worst case
1,γ
error in Hmix for γ = 6. Let us also remark that the periodic L 2 -discrepancy is (up to
a factor) sometimes also called diaphony. This terminology was introduced in [22].

3 Optimal Cubature Points

In this section we deal with (local) optimality conditions for a set of two-dimensional
points P N ≡ (x, y) ⊂ T2 , where x, y ∈ T N denote the vectors of the first and
second components of the points, respectively.

3.1 Optimization Problem

We want to minimize the squared worst case error

N −1
1,γ 1 
wce(Hmix (T2 ), P N )2 = −1 + K 1,γ (xi , x j ) K 1,γ (yi , y j )
N2
i, j=0


N −1
1
=−1+ 1 + γ k(|xi − x j |) + γ k(|yi − y j |) + γ 2 k(|xi − x j |)k(|yi − y j |)
N2
i, j=0


N −1
γ
= k(|xi − x j |) + k(|yi − y j |) + γ k(|xi − x j |)k(|yi − y j |)
N2
i, j=0

γ (2k(0) + γ k(0)2 )
=
N
N −2 N −1
2γ  
+ 2 k(|xi − x j |) + k(|yi − y j |) + γ k(|xi − x j |)k(|yi − y j |)
N
i=0 j=i+1
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 391

1,γ
Thus, minimizing wce(Hmix (T2 ), P N )2 is equivalent to minimizing either

N −2 
 N −1

Fγ (x, y) := k(|xi − x j |) + k(|yi − y j |) + γ k(|xi − x j |)k(|yi − y j |)
i=0 j=i+1
(4)
or
N −1

G γ (x, y) := (1 + γ k(|xi − x j |))(1 + γ k(|yi − y j |)). (5)
i, j=0

For theoretical considerations we will sometimes use G γ , while for the numerical
implementation we will use Fγ as objective function, since it has less summands.
Let τ, σ ∈ S N be two permutations of {0, 1, . . . , N − 1}. Define the sets

x ≤ xτ (1) ≤ · · · ≤ xτ (N −1)
Dτ,σ = x ∈ [0, 1) , y ∈ [0, 1) : τ (0)
N N
(6)
yσ (0) ≤ yσ (1) ≤ · · · ≤ yσ (N −1)

on which all points maintain the same order in both components and hence it holds
|xi − x j | = si, j (xi − x j ) for si, j ∈ {−1, 1}. It follows that the restriction of Fγ to
Dτ,σ , i.e. Fγ (x, y)|Dτ,σ , is a polynomial of degree 4 in (x, y). Moreover, Fγ |Dτ,σ is
convex for sufficiently small γ .

Proposition 1 Fγ (x, y)|Dτ,σ and G γ (x, y)|Dτ,σ are convex if γ ∈ [0, 6].

Proof It is enough to prove the claim for

N −1

G γ (x, y) = (1 + γ k(|xi − x j |))(1 + γ k(|yi − y j |)).
i, j=0

Since the sum of convex functions is convex and since f (x − y) is convex if f is, it
is enough to show that f (s, t) = 1 + γ k(s) 1 + γ k(t) is convex for s, t ∈ [0, 1].
To this end, we show that the Hesse matrix H ( f ) is positive definite if 0 ≤ γ < 6.
First, f ss = γ 1 + γ k(t) is positive if γ < 24. Hence is is enough to check that the
determinant of H ( f ) is positive, which is equivalent to the inequality
   
1 2 1 2
1 + γ k(s) 1 + γ k(t) > γ 2 s − t− .
2 2

So it remains to see that


   
γ 1 1 2
1 + γ k(s) = 1 + s −s+
2
>γ s− .
2 6 2
392 A. Hinrichs and J. Oettershagen

But this is elementary to check for 0 ≤ γ < 6 and s ∈ [0, 1]. In the case γ = 6
the determinant of H ( f ) = 0 and some additional argument is necessary which we
omit here. 

Since 
[0, 1) N × [0, 1) N = Dτ,σ ,
(τ,σ )∈S N ×S N

one can obtain the global minimum of Fγ on [0, 1) N × [0, 1) N by computing


argmin(x, y)∈Dτ,σ Fγ (x, y) for all (τ, σ ) ∈ S N × S N and choose the global minimum
as the smallest of all the local ones.

3.2 Using the Torus Symmetries

We now want to analyze how symmetries of the two dimensional torus T2 allow to
reduce the number of regions Dτ,σ for which the optimization problem has to be
solved.
The symmetries of the torus T2 which do not change the worst case error for the
considered classes of periodic functions are generated by
1. Shifts in the first coordinate x → x +c mod 1 and shifts in the second coordinate
y → y + c mod 1.
2. Reflection of the first coordinate x → 1−x and reflection of the second coordinate
y → 1 − y.
3. Interchanging the first coordinate x and the second coordinate y.
4. The points are indistinguishable, hence relabeling the points does not change the
worst case error.
Applying finite compositions of these symmetries to all the points in the point set
P N = {(x0 , y0 ), . . . , (x N −1 , y N −1 )} leads to an equivalent point set with the same
worst case integration error. This shows that the group of symmetries G acting on
the pairs (τ, σ ) indexing Dτ,σ generated by the following operations
1. replacing τ or σ by a shifted permutation: τ → (τ (0) + k mod N , . . . ,
τ (N − 1) + k mod N ) or σ → (σ (0) + k mod N , . . . , σ (N − 1) + k mod N )
2. replacing τ or σ by its flipped permutation: τ → (τ (N − 1), τ (N − 2), . . . , τ (1),
τ (0)) or σ → (σ (N − 1), σ (N − 2), . . . , σ (1), σ (0))
3. interchanging σ and τ : (τ, σ ) → (σ, τ )
4. applying a permutation π ∈ S N to both τ and σ : (τ, σ ) → (π τ, π σ )
lead to equivalent optimization problems. So let us call the pairs (τ, σ ) and (τ , σ )
in S N × S N equivalent if they are in the same orbit with respect to the action of G.
In this case we write (τ, σ ) ∼ (τ , σ ).
Using the torus symmetries 1. and 4. it can always be arranged that τ = id and
σ (0) = 0, which together with fixing the point (x0 , y0 ) = (0, 0) leads to the sets
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 393

0 = x0 ≤ x1 ≤ . . . ≤ x N −1
Dσ = x ∈ [0, 1) N , y ∈ [0, 1) N : , (7)
0 = y0 ≤ yσ (1) ≤ · · · ≤ yσ (N −1)

where σ ∈ S N −1 denotes a permutation of {1, 2, . . . , N − 1}.


But there are many more symmetries and it would be algorithmically desirable
to cycle through exactly one representative of each equivalence class without ever
touching the other equivalent σ . This seems to be difficult to implement, hence we
settled for a little less which still reduces the amount of permutations to be handled
considerably.
To this end, let us define the symmetrized metric

d(i, j) = min{|i − j|, N − |i − j|} for 0 ≤ i, j ≤ N − 1 (8)

and the following subset of S N .


Definition 1 The set of semi-canonical permutations C N ⊂ S N consists of permu-
tations σ which fulfill
(i) σ (0) = 0
(ii) d(σ (1), σ (2)) ≤ d(0, σ (N − 1))
(iii) σ (1) = min {d(σ (i), σ (i + 1)) | i = 0, 1, . . . , N − 1}
(iv) σ is lexicographically smaller than σ −1 .
Here we identify σ (N ) with 0 = σ (0).
This means that σ is semi-canonical if the distance between 0 = σ (0) and σ (1)
is minimal among all distances between σ (i) and σ (i + 1), which can be arranged
by a shift. Moreover, the distance between σ (1) and σ (2) is at most as large as the
distance between σ (0) and σ (N − 1), which can be arranged by a reflection and a
shift if it is not the case. Hence we have obtained the following lemma.
Lemma 1 For any permutation σ ∈ S N with σ (0) = 0 there exists a semi-canonical
σ such that the sets Dσ and Dσ are equivalent up to torus symmetry.
Thus we need to consider only semi-canonical σ which is easy to do algorithmi-
cally.
Remark 1 If σ ∈ S N is semi-canonical, it holds σ (1) ≤ N /2.
Another main advantage in considering our objective function only in domains
Dσ is that it is not only convex but strictly convex here. This is due to the fact that
we fix (x0 , y0 ) = (0, 0).
Proposition 2 Fγ (x, y)|Dσ and G γ (x, y)|Dσ are strictly convex if γ ∈ [0, 6].
Proof Again it is enough to prove the claim for

N −1

G γ (x, y) = (1 + γ k(|xi − x j |))(1 + γ k(|yi − y j |)).
i, j=0
394 A. Hinrichs and J. Oettershagen

Now we use that the sum of a convex and a strictly convex function is again strictly
convex. Hence it is enough to show that the function

N −1

f (x1 , . . . , x N −1 , y1 , . . . , y N −1 ) = (1 + γ k(|xi − x0 |))(1 + γ k(|yi − y0 |))
i=1
N −1

= (1 + γ k(xi ))(1 + γ k(yi ))
i=1

is strictly convex on [0, 1] N −1 × [0, 1] N −1 . In the proof of Proposition 1 it was


actually shown that f i (xi , yi ) = (1 + γ k(xi ))(1 + γ k(yi )) is strictly convex for
(xi , yi ) ∈ [0, 1]2 for each fixed i = 1, . . . , N − 1. Hence the strict convexity of f
follows from the following easily verified lemma. 

Lemma 2 Let f i : Di → R, i = 1, . . . , m be strictly convex functions on the convex


domains Di ∈ Rdi . Then the function


m
f : D = D1 × · · · × Dm → R, (z 1 , . . . , z m ) → f i (z i )
i=1

is strictly convex. 

Hence we have indeed a unique point in each Dσ where the minimum of Fγ is


attained.

3.3 Minimizing Fγ on Dσ

Our strategy will be to compute the local minimum of Fγ on each region


Dσ ⊂ [0, 1) N × [0, 1) N for all semi-canonical permutations σ ∈ C N ⊂ S N and
determine the global minimum by choosing the smallest of all the local ones.
This gives for each σ ∈ C N the constrained optimization problem

min Fγ (x, y) subject to vi (x) ≥ 0 and wi ( y) ≥ 0 for all i = 1, . . . , N − 1,


(x, y)∈Dσ
(9)
where the inequality constraints are linear and given by

vi (x) = xi − xi−1 and wi ( y) = yσ (i) − yσ (i−1) for i = 1, . . . , N − 1. (10)

In order to use the necessary (and due to local strict convexity also sufficient) con-
ditions for local minima
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 395

∂ ∂
Fγ (x, y) = 0 and Fγ (x, y) = 0 for k = 1, . . . , N − 1
∂ xk ∂ yk

for (x, y) ∈ Dσ we need to evaluate the partial derivatives of Fγ .

Proposition 3 For a given permutation σ ∈ C N the partial derivative of Fγ |Dσ with


respect to the second component y is given by
⎛ ⎞ ⎛ ⎞
N −1
 N −1
 k−1 N −1

∂ ⎜ ⎟ 1
Fγ (x, y)|Dσ = yk ⎝ ci,k ⎠ − ci,k yi + ⎝ ci,k si,k − ck, j sk, j ⎠ ,
∂ yk i=0 i=0
2 i=0 j=k+1
i=k i=k
(11)
where si, j = sgn(yi − y j ) and ci, j := 1 + γ k(|xi − x j |) = c j,i .
Interchanging x and y the same result holds for the partial derivatives with respect
to x with the obvious modification to ci, j and the simplification that si, j = −1.
The second order derivatives with respect to y are given by
 N −1
k−1
∂2 i=0 ci,k + i=k+1 ci,k for j = k
F(x, y)|Dσ = , k, j ∈ {1, . . . , N − 1}
∂ yk ∂ y j −ck, j for j  = k
(12)

Again, the analogue for ∂ x∂k ∂ x j F(x, y)|Dσ is obtained with the obvious modification
2

ci, j = 1 + γ k(|yi − y j |).

Proof We prove the claim for the partial derivative with respect to y:


N −2 N
 −1
∂ ∂ ∂
Fγ (x, y) = k(|yi − y j |) 1 + γ k(|xi − x j |) + k(|xi − x j |)
∂ yk ∂ yk    ∂ yk
i=0 j=i+1
=:ci, j


N −2 N
 −1

= ci, j k(|yi − y j |)
∂ yk
i=0 j=i+1


N −2 N
 −1 ⎪
⎨si, j for i = k
= ci, j k (si, j (yi − y j )) · −si, j for j = k


i=0 j=i+1 0 else

N −1    k−1  
1 1
= ck, j sk, j sk, j (yk − y j ) − − ci,k si,k si,k (yi − yk ) −
2 2
j=k+1 i=0
⎛ ⎞
⎛ ⎞
−1 −1 −1
⎜ N ⎟ N 1 ⎝
k−1 
N

= yk ⎝ ⎟
ci,k ⎠ − ci,k yi + ci,k si,k − ck, j sk, j ⎠ .
2
i=0 i=0 i=0 j=k+1
i=k i=k

From this we immediately get the second derivative (12). 


396 A. Hinrichs and J. Oettershagen

3.4 Lower Bounds of Fγ on Dσ

Until now we are capable of approximating local minima of Fγ on a given Dσ . If this


is done for all σ ∈ C N we can obtain a candidate for a global minimum, but due to
the finite precision of floating point arithmetic one can never be sure to be close to the
actual global minimum. However, it is also possible to compute a lower bound for
the optimal point set for each Dσ using Wolfe-duality for constrained optimization.
It is known [12] that for a convex problem with linear inequality constraints like (9)
the Lagrangian

L F (x, y, λ, μ) := F(x, y) − λT v(x) − μT w( y) (13)


N −1

= F(x, y) − (λi vi (x) + μi wi ( y)) (14)
i=1

gives a lower bound on F, i.e.

min F(x, y) ≥ L F ( x̃, ỹ, λ, μ)


(x, y)∈Dσ

for all ( x̃, ỹ, λ, μ) that fulfill the constraint

∇(x, y) L F ( x̃, ỹ, λ, μ) = 0 and λ, μ ≥ 0 (component-wise). (15)

Here, ∇(x, y) = (∇ x , ∇ y ), where ∇ x denotes the gradient of a function with respect to


the variables in x. Hence it is our goal to find for each Dσ such an admissible point
( x̃, ỹ, λ, μ) which yields a lower bound that is larger than some given candidate for
the global minimum. If the relevant computations are carried out in infinite precision
rational number arithmetic these bounds are mathematically reliable.
In order to accomplish this we first have to compute the Lagrangian of (9). To this
end, let Pσ ∈ {−1, 0, 1}(N −1)×(N −1) denote the permutation matrix corresponding to
σ ∈ S N −1 and ⎛ ⎞
1 −1 0 . . . 0 0
⎜0 1 −1 . . . 0 0 ⎟
⎜ ⎟
⎜ .. ⎟ ∈ R(N −1)×(N −1) .
B := ⎜ ... ..
. . ⎟ (16)
⎜ ⎟
⎝0 . . . 0 1 −1⎠
0 ... 0 1

Then the partial derivatives of L F with respect to x and y are given by


⎛ ⎞
λ1 − λ2
⎜ ..⎟
⎜ .⎟
∇ x L F (x, y, λ, μ) =∇ x F(x, y) − ⎜ ⎟ = ∇ x F(x, y) − Bλ (17)
⎝λ N −2 − λ N −1 ⎠
λ N −1
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 397

and
⎛ ⎞
μσ (1) − μσ (2)
⎜ ⎟ ..
⎜ ⎟ .
∇ y L F (x, y, λ, μ) =∇ y F(x, y) − ⎜ ⎟ = ∇ y F(x, y) − BPσ μ.
⎝μσ (N −2) − μσ (N −1) ⎠
μσ (N −1)
(18)

This leads to the following theorem.


Theorem 1 For σ ∈ C N and δ > 0 let the point ( x̃ σ , ỹσ ) ∈ Dσ fulfill

∂ ∂
F( x̃ σ , ỹσ ) = δ and F( x̃ σ , ỹσ ) = δ for k = 1, . . . , N − 1. (19)
∂ xk ∂ yk

Then
N −1

F(x, y) ≥ F( x̃ σ , ỹσ ) − δ (N − i) · vi ( x̃ σ ) + σ (N − i)wi ( ỹσ ) (20)
i=1
> F( x̃ σ , ỹσ ) − δ N 2
(21)

holds for all (x, y) ∈ Dσ .


Proof Choosing

λ = B −1 ∇ x F( x̃ σ , ỹσ ) and μ = Pσ−1 B −1 ∇ y F( x̃ σ , ỹσ ) (22)

yields
∇ x F( x̃, ỹ) = Bλ and ∇ y F( x̃, ỹ) = BPσ μ. (23)

A short computation shows that the inverse of B from (16) is given by


⎛ ⎞
1 1 ... 1
⎜0 1 . . . 1⎟
⎜ ⎟
B −1 := ⎜ . .. ⎟ ∈ R
(N −1)×(N −1)
,
⎝ .. 0 . . . .⎠
0 ... 0 1

which yields y, λ > 0 and hence by Wolfe duality gives (20). The second inequality
(21)
Nthen noting that both |vi (x)| and |wi ( y)| are bounded by 1 and
follows from
−1 N −1
2 i=1 σ (N − i) = 2 i=1 i = (N − 1)(N − 2) < N 2 . 
Now, suppose we had some candidate (x ∗ , y∗ ) ∈ Dσ ∗ for an optimal point set. If we
can find for all other σ ∈ C N points ( x̃ σ , ỹσ ) that fulfills (19) and

F( x̃ σ , ỹσ ) − δ N 2 ≥ Fγ (x ∗ , y∗ )
398 A. Hinrichs and J. Oettershagen

for some δ > 0, we can be sure that Dσ ∗ is (up to torus symmetry) the unique domain
Dσ that contains the globally optimal point set.

4 Numerical Investigation of Optimal Point Sets

In this section we numerically obtain optimal point sets with respect to the worst
1
case error in Hmix . Moreover, we present a proof by exhaustion that these point
sets are indeed approximations to the unique (modulo torus symmetry) minimizers
of Fγ . Since integration lattices are local minima, if the Dσ containing the global
minimizer corresponds to an integration lattice, this integration lattice is the exact
global minimizer.

4.1 Numerical Minimization with Alternating Directions

In order to obtain the global minimum (x ∗ , y∗ ) of Fγ we are going to compute

σ ∗ := argmin min Fγ (x, y), (24)


σ ∈C N (x, y)∈Dσ

where the inner minimum has a unique solution due to Proposition 2. Moreover, since
Dσ is a convex domain we know that the local minimum of Fγ (x, y)|Dσ is not on
the boundary. Hence we can restrict our search for optimal point sets to the interior
of Dσ , where Fγ is differentiable.
Instead of directly employing a local optimization technique, we will make use
of the special structure of Fγ . While Fγ (x, y)|Dσ is a polynomial of degree four, the
functions
x → Fγ (x, y0 )|Dσ and y → Fγ (x 0 , y)|Dσ , (25)

where one coordinate direction is fixed, are quadratic polynomials, which have unique
minima in Dσ . We are going to use this property within an alternating minimization
approach. This means, that the objective function F is not minimized along all coor-
dinate directions simultaneously, but with respect to certain successively alternating
blocks of coordinates. If these blocks have size one this method is usually referred
to as coordinate descent [7] or nonlinear Gauß–Seidel method [5]. It is success-
fully employed in various applications, like e.g. expectation maximization or tensor
approximation [8, 19].
In our case we will alternate between minimizing Fγ (x, y) along the first coor-
dinate block x ∈ (0, 1) N −1 and the second one y ∈ (0, 1) N −1 , which can be done
exactly due to the quadratic polynomial property of the partial objectives (25). The
method is outlined in Algorithm 1, which for threshold-parameter δ = 0 approxi-
mates the local minimum of Fγ on Dσ . For δ > 0 it obtains feasible points that
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 399

Algorithm 1 Alternating minimization algorithm. For off-set δ = 0 it finds local


minima of Fγ . For δ > 0 it obtains feasible points used by Algorithm 2.
Given: Permutation σ ∈ CN , tolerance ε > 0 and off-set δ ≥ 0.
Initialize:
N −1 σ (1) σ (N −1)
1. x (0) := (0, 1
N ,..., N ) and y(0) = (0, N ,..., N ).
2. k := 0.
repeat
N N
1. compute H x := ∂xi ∂x j Fγ (x (k) , y(k) i, j=1 and ∇ x = ∂xi Fγ (x (k) , y(k) i=1 by (12) and (11).
2. Update x (k+1) := H −1 (∇ x + δ1) via Cholesky factorization.
x N N
3. compute H y := ∂ yi ∂ y j Fγ (x (k+1) , y(k) i, j=1 and ∇ y = ∂ yi Fγ (x (k+1) , y(k) i=1 .

4. Update y(k+1) := H −1 y ∇ y + δ1 via Cholesky factorization.
5. k := k + 1.

until ∇ x 2 + ∇ y 2 < ε
Output: point set (x, y) ∈ Dσ with ∇ x Fγ (x, y) ≈ δ1 and ∇ y Fγ (x, y) ≈ δ1.

fulfill (19), i.e. ∇(x, y) Fγ = (δ, . . . , δ) = δ1. Linear convergence of the alternating
optimization method for strictly convex functions was for example proven in [2, 13].

4.2 Obtaining Lower Bounds

By now we are able to obtain a point set (x ∗ , y∗ ) ∈ Dσ ∗ as a candidate for a global


minimum of Fγ by finding local minima on each Dσ , σ ∈ C N . On first sight we can
not be sure that we chose the right σ ∗ , because the value of min(x, y)∈Dσ Fγ (x, y) can
only be computed numerically.
On the other hand, Theorem 1 allows to compute lower bounds for all the other
domains Dσ with σ ∈ C N . If we were able to obtain for each σ a point ( x̃ σ , ỹσ ),
such that

min Fγ (x, y) ≈ θ N := Fγ (x ∗ , y∗ ) < L F ( x̃ σ , ỹσ ) − 2N 2 δ ≤ Fγ (x, y),


(x, y)∈Dσ ∗

we could be sure that the global optimum is indeed located in Dσ ∗ and (x ∗ , y∗ ) is a


good approximation to it. Luckily, this is the case. Of course certain computations
can not be done in standard double floating point arithmetic. Instead we use arbitrary
precision rational number (APR) arithmetic from the GNU Multiprecision library
GMP from http://www.gmplib.org. Compared to standard floating point arithmetic
in double precision this is very expensive, but it has only to be used at certain parts of
the algorithm. The resulting procedure is outlined in Algorithm 2, where we marked
those parts which require APR arithmetic.
400 A. Hinrichs and J. Oettershagen

Algorithm 2 Computation of lower bound on Dσ .


Given: Optimal point candidate P N := (x ∗ , y∗ ) ∈ Dσ with σ ∈ CN , tolerance ε > 0 and off-set
θ ≥ 0.
Initialize:
1. Compute θ N := Fγ (x ∗ , y∗ ) (in APR arithmetic).
2. Ξ N := ∅.
for all σ ∈ CN
1. Find ( x̃ σ , ỹσ ) ∈ Dσ s.t. ∇(x, y) Fγ ( x̃ σ , ỹσ ) ≈ δ1 by Algorithm 1.
2. Compute λ := B −1 ∇ x F( x̃ σ , ỹσ ) and μ := Pσ−1 B −1 ∇ y F( x̃ σ , ỹσ ) (in APR arithmetic).
3. Verify λ, μ > 0.
4. Evaluate βσ := L Fγ ( x̃ σ , ỹσ , λ, μ) (in APR arithmetic).
5. If ( βσ ≤ θ N ) Ξ N := Ξ N ∪ σ .
Output: Set Ξ of permutations σ in which Dσ contained a lower bound smaller than θ N .

4.3 Results

In Figs. 1 and 2 the optimal point sets for N = 2, . . . , 16 and both γ = 1 and γ = 6
are plotted. It can be seen that they are close to lattice point sets, which justifies using
them as start points in Algorithm 1. The distance to lattice points seems to be small
if γ is small.
In Table 1 we list the permutations σ for which Dσ contains an optimal set of
cubature points. In the second column the total number of semi-canonical permuta-
tions C N that had to be considered is shown. It grows approximately like 21 (N − 2)!.
Moreover, we computed the minimal worst case error and periodic L 2 -discrepancies.
In some cases we found more than one semi-canonical permutation σ for which
Dσ contained a point set which yields the optimal worst case error. Nevertheless, they
represent equivalent permutations. In the following list, the torus symmetries used
to show the equivalency of the permutations are given. All operations are modulo 1.
• N = 7: (x, y) → (1 − y, x)
• N = 9: (x, y) → (y − 2/9, x − 1/9)
• N = 11: (x, y) → (y + 5/11, x − 4/11)
• N = 14: (x, y) → (x − 4/14, y + 6/14)
• N = 15: (x, y) → (y + 3/15, x + 2/15), (y − 2/15, 12/15 − x), (y − 6/15,
4/15 − x)
• N = 16: (x, y) → (1/16 − x, 3/16 − y)
In all the examined cases N ∈ {2, . . . , 16} Algorithm 2 produced sets Ξ N which
contained exactly the permutations that were previously obtained by Algorithm 1
and are listed in Table 1. Thus we can be sure, that the respective Dσ contained
minimizers of Fγ , which on each Dσ are unique. Hence we know that our numerical
approximation of the minimum is close to the true global minimum, which (modulo
torus symmetries) is unique. In the cases N = 1, 2, 3, 5, 7, 8, 12, 13 the obtained
global minima are integration lattices.
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 401

Fig. 1 Optimal point sets for N = 2, . . . , 16 and γ = 1


402 A. Hinrichs and J. Oettershagen

Fig. 2 Optimal point sets for N = 2, . . . , 16 and γ = 6


Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 403

Table 1 List of semi-canonical permutations σ , such that Dσ contains an optimal set of cubature
points for N = 1, . . . , 16
N |CN | 1,1
wce(Hmix , P N∗ ) D2 (P N∗ ) σ∗ Lattice
1 0 0.416667 0.372678 (0) 
2 1 0.214492 0.212459 (0 1) 
3 1 0.146109 0.153826 (0 1 2) 
4 2 0.111307 0.121181 (0 1 3 2)
5 5 0.0892064 0.0980249 (0 2 4 1 3) 
6 13 0.0752924 0.0850795 (0 2 4 1 5 3)
7 57 0.0650941 0.0749072 (0 2 4 6 1 3 5), (0 
3 6 2 5 1 4)
8 282 0.056846 0.0651562 (0 3 6 1 4 7 2 5) 
9 1,862 0.0512711 0.0601654 (0 2 6 3 8 5 1 7 4),
(0 2 7 4 1 6 3 8 5)
10 14,076 0.0461857 0.054473 (0 3 7 1 4 9 6 2 8
5)
11 124,995 0.0422449 0.050152 (0 3 8 1 6 10 4 7 2
9 5), (0 3 9 5 1 7
10 4 8 2 6)
12 1,227,562 0.0370732 0.0456259 (0 5 10 3 8 1 6 11 
4 9 2 7)
13 13,481,042 0.0355885 0.0421763 (0 5 10 2 7 12 4 9 
1 6 11 3 8)
14 160,456,465 0.0333232 0.0400524 (0 5 10 2 8 13 4
11 6 1 9 3 12 7),
(0 5 10 3 12 7 1 9
4 13 6 11 2 8)
15 2,086,626,584 0.0312562 0.0379055 (0 4 9 13 6 1 11 3
8 14 5 10 2 12 7),
(0 5 11 2 7 14 9 3
12 6 1 10 4 13 8),
(0 5 11 2 8 13 4
10 1 6 14 9 3 12
7), (0 5 11 2 8 13
6 1 10 4 14 7 12 3
9)
16 29,067,602,676 0.0294507 0.0359673 (0 3 11 5 14 9 1 7
12 4 15 10 2 6 13
8), (0 3 11 6 13 1
9 4 15 7 12 2 10 5
14 8)
404 A. Hinrichs and J. Oettershagen

5 Conclusion

In the present paper we computed optimal point sets for quasi-Monte Carlo cubature
of bivariate periodic functions with mixed smoothness of order one by decomposing
the required global optimization problem into approximately (N − 2)!/2 local ones.
Moreover, we computed lower bounds for each local problem using arbitrary preci-
sion rational number arithmetic. Thereby we obtained that our approximation of the
global minimum is in fact close to the real solution.
In the special case of N being a Fibonacci number our approach showed that for
N ∈ {1, 2, 3, 5, 8, 13} the Fibonacci lattice is the unique global minimizer of the
1
worst case integration error in Hmix . We strongly conjecture that this is true for all
Fibonacci numbers. Also in the cases N = 7, 12, the global minimizer is the obtained
integration lattice.
In the future we are planning to prove that optimal points are close to lattice
r
points. Moreover, we will investigate Hmix , i.e. Sobolev spaces with dominating
mixed smoothness of order r ≥ 2 and other suitable kernels and discrepancies.

Acknowledgments The authors thank Christian Kuske and André Uschmajew for valuable hints
and discussions. Jens Oettershagen was supported by the Sonderforschungsbereich 1060 The Math-
ematics of Emergent Effects of the DFG.

References

1. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
2. Bezdek, J.C., Hathaway, R.J., Howard, R.E., Wilson, C.A., Windham, M.P.: Local convergence
analysis of a grouped variable version of coordinate descent. J. Optim. Theory Appl. 54(3),
471–477 (1987)
3. Bilyk, D., Temlyakov, V.N., Yu, R.: Fibonacci sets and symmetrization in discrepancy theory.
J. Complex. 28, 18–36 (2012)
4. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
5. Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauβ-Seidel method
under convex constraints. Oper. Res. Lett. 26(3), 127–136 (2000)
6. Larcher, G., Pillichshammer, F.: A note on optimal point distributions in [0, 1)s . J. Comput.
Appl. Math. 206, 977–985 (2007)
7. Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differ-
entiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)
8. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley series in probability
and statistics. Wiley, New York (1997)
9. Niederreiter, H.: Quasi-Monte Carlo Methods and Pseudo-Random Numbers, Society for
Industrial and Applied Mathematics (1987)
10. Niederreiter, H., Sloan, I.H.: Integration of nonperiodic functions of two variables by Fibonacci
lattice rules. J. Comput. Appl. Math. 51, 57–70 (1994)
11. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems. Volume II: Standard
Information for Functionals. European Mathematical Society Publishing House, Zürich (2010)
12. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
13. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables,
Society for Industrial and Applied Mathematics (1987)
Optimal Point Sets for Quasi-Monte Carlo Integration of Bivariate … 405

14. Pillards, T., Vandewoestyne, B., Cools, R.: Minimizing the L 2 and L ∞ star discrepancies of a
single point in the unit hypercube. J. Comput. Appl. Math. 197, 282–285 (2006)
15. Sloan, I.H., Joe, S.: Lattice Methods for Multiple Integration. Oxford University Press, New
York and Oxford (1994)
16. Sós, V.T., Zaremba, S.K.: The mean-square discrepancies of some two-dimensional lattices.
Stud. Sci. Math. Hung. 14, 255–271 (1982)
17. Temlyakov, V.N.: Error estimates for Fibonacci quadrature formulae for classes of functions.
Trudy Mat. Inst. Steklov 200, 327–335 (1991)
18. Ullrich, T., Zung, D.: Lower bounds for the integration error for multivariate functions with
mixed smoothness and optimal Fibonacci cubature for functions on the square. Math. Nachr.
288(7), 743–762 (2015)
19. Uschmajew, A.: Local convergence of the alternating least squares algorithm for canonical
tensor approximation. SIAM J. Matrix Anal. Appl. 33(2), 639–652 (2012)
20. Wahba, G.: Smoothing noisy data with spline functions. Numer. Math. 24(5), 383–393 (1975)
21. White, B.E.: On optimal extreme-discrepancy point sets in the square. Numer. Math. 27, 157–
164 (1977)
22. Zinterhof, P.: Über einige Abschätzungen bei der Approximation von Funktionen mit Gle-
ichverteilungsmethoden. Österreich. Akad. Wiss. Math.-Naturwiss. Kl. S.-B. II 185, 121–132
(1976)
Adaptive Multidimensional Integration
Based on Rank-1 Lattices

Lluís Antoni Jiménez Rugama and Fred J. Hickernell

Abstract Quasi-Monte Carlo methods are used for numerically integrating mul-
tivariate functions. However, the error bounds for these methods typically rely on
a priori knowledge of some semi-norm of the integrand, not on the sampled func-
tion values. In this article, we propose an error bound based on the discrete Fourier
coefficients of the integrand. If these Fourier coefficients decay more quickly, the
integrand has less fine scale structure, and the accuracy is higher. We focus on rank-1
lattices because they are a commonly used quasi-Monte Carlo design and because
their algebraic structure facilitates an error analysis based on a Fourier decomposi-
tion of the integrand. This leads to a guaranteed adaptive cubature algorithm with
computational cost O(mbm ), where b is some fixed prime number and bm is the
number of data points.

Keywords Quasi-Monte Carlo methods · Multidimensional integration · Rank-1


lattices · Adaptive algorithms · Automatic algorithms

1 Introduction

Quasi-Monte Carlo (QMC) methods use equally weighted sums of integrand values
at carefully chosen nodes to approximate multidimensional integrals over the unit
cube,

1
n−1
f (z i ) ≈ f (x) dx.
n i=0 [0,1)d

Ll.A. Jiménez Rugama (B) · F.J. Hickernell


Department of Applied Mathematics, Illinois Institute of Technology,
10 W 32nd Street, E1-208, Chicago, IL 60616, USA
e-mail: ljimene1@hawk.iit.edu
F.J. Hickernell
e-mail: hickernell@iit.edu

© Springer International Publishing Switzerland 2016 407


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_20
408 Ll.A. Jiménez Rugama and F.J. Hickernell

Integrals over more general domains may often be accommodated by a transformation


of the integration variable. QMC methods are widely used because they do not suffer
from a curse of dimensionality. The existence of QMC methods with dimension-
independent error convergence rates is discussed in [11, Chaps. 10–12]. See [3] for
a recent review.
The QMC convergence rate of O(n −(1−δ) ) does not give enough information about
the absolute error to determine how large n must be to satisfy a given error tolerance, ε.
The objective of this research is to develop a guaranteed, QMC algorithm based on
rank-1 lattices that determines n adaptively by calculating a data-driven upper bound
on the absolute error. The Koksma–Hlawka inequality is impractical for this purpose
because it requires the total variation of the integrand. Our data-driven bound is
expressed in terms of the integrand’s discrete Fourier coefficients.
Sections 2–4 describe the group structure of rank-1 lattices and how the complex
exponential functions are an appropriate basis for these nodes. For computation
purposes, there is also an explanation of how to obtain the discrete Fourier transform
of f with an O(n log(n)) computational cost. New contributions are described in
Sects. 5 and 6. Initially, a mapping from N0 to the space of wavenumbers, Zd , is
defined according to constraints given by the structure of our rank-1 lattice node
sets. With this mapping, we define a set of integrands for which our new adaptive
algorithm is designed. This set is defined in terms of cone conditions satisfied by
the (true) Fourier coefficients of the integrands. These conditions make it possible
to derive an upper bound on the rank-1 lattice rule error in terms of the discrete
Fourier coefficients, which can be used to construct an adaptive algorithm. An upper
bound on the computational cost of this algorithm is derived. Finally, there is an
example of option pricing using the MATLAB implementation of our algorithm,
cubLattice_g, which is part of the Guaranteed Automatic Integration Library
[1]. A parallel development for Sobol’ cubature is given in [5].

2 Rank-1 Integration Lattices

Let b be prime number, and let Fn := {0, . . . , n − 1} denote the set of the first n non-
negative integers for any n ∈ N. The aim is to construct a sequence of embedded
node sets with bm points for m ∈ N0 :

{0} =: P0 ⊂ P1 · · · ⊂ Pm := {z i }i∈Fbm ⊂ · · · ⊂ P∞ := {z i }i∈N0 .

Specifically, the sequence z 1 , z b , z b2 , . . . ∈ [0, 1)d is chosen such that

z 1 = b−1 a0 , a0 ∈ {1, . . . , b − 1}d , (1a)


−1 −1 −m−1
z bm = b (z bm−1 + am ) = b am + · · · + b a0 , am ∈ Fdb , m ∈ N. (1b)
Adaptive Multidimensional Integration Based on Rank-1 Lattices 409

From this definition it follows that for all m ∈ N0 ,



 z bm− ,  = 0, . . . , m
b z bm mod 1 = (2)
0,  = m + 1, m + 2, . . . .

Next, for any i ∈ N with proper b-ary expansion i = i 0 + i 1 b + i 2 b2 + · · · , and m =


logb (i) + 1 define


 
m−1 
m−1
z i := i  z b mod 1 = i  z b mod 1 = i  bm−1− z bm−1 mod 1
=0 =0 =0


m−1
= j z bm−1 mod 1, where j = i  bm−1− , (3)
=0

where (2) was used. This means that node set Pm defined above may be written as
the integer multiples of the generating vector z bm−1 since
 
m−1 
Pm := {z i }i∈Fbm = z bm−1 i  bm−1− mod 1 : i 0 , . . . , i m−1 ∈ Fb
=0
= { j z bm−1 mod 1} j∈Fbm .

Integration lattices, L , are defined as discrete groups in Rd containing Zd and


closed under normal addition [13, Sects. 2.7 and 2.8]. The node set of an integration
lattice is its intersection with the half-open unit cube, P := L ∩ [0, 1)d . In this case,
P is also a group, but this time under addition modulo 1, i.e., operator ⊕ : [0, 1)d ×
[0, 1)d → [0, 1)d defined by x ⊕ y := (x + y) mod 1, and where x := 1 − x.
Sets Pm defined above are embedded node sets of integration lattices. The suffi-
ciency of a single generating vector for each of these Pm is the reason that Pm is
called the node set of a rank-1 lattice. The theoretical properties of good embedded
rank-1 lattices for cubature are discussed in [6].
The set of d-dimensional integer vectors, Zd , is used to index Fourier series
expressions for the integrands, and Zd is also known as the wavenumber space.
We define the bilinear operation ·, · : Zd × [0, 1)d → [0, 1) as the dot product
modulo 1:

k, x := k T x mod 1 ∀k ∈ Zd , x ∈ [0, 1)d . (4)

This bilinear operation has the following properties: for all t, x ∈ [0, 1)d , k, l ∈ Zd ,
and a ∈ Z, it follows that

k, 0 = 0, x = 0, (5a)
k, ax mod 1 ⊕ t = (a k, x + k, t ) mod 1 (5b)
410 Ll.A. Jiménez Rugama and F.J. Hickernell

ak + l, x = (a k, x + l, x ) mod 1, (5c)
k, x = 0 ∀k ∈ Z d
=⇒ x = 0. (5d)

An additional constraint placed on the embedded lattices is that

k, z bm = 0 ∀m ∈ N0 =⇒ k = 0. (6)

The bilinear operation defined in (4) is also used to define the dual lattice corre-
sponding to Pm :

Pm⊥ := {k ∈ Zd : k, z i = 0, i ∈ Fbm }
= {k ∈ Zd : k, z bm−1 = 0} by (3) and (5b). (7)

By this definition P0⊥ = Zd , and the properties (2), (4), and (6), imply also that the
Pm⊥ are nested subgroups with

Zd = P0⊥ ⊇ · · · ⊇ Pm⊥ ⊇ · · · ⊇ P∞

= {0}. (8)

Analogous to the dual lattice definition, for j ∈ Fbm one can define the dual
 
cosets as Pm⊥, j := {k ∈ Zd : bm k, z bm−1 = j}. Hence, a similar extended property (8)
applies:


b−1
⊥, j+abm ⊥, j+abm
Pm⊥, j = Pm+1 =⇒ Pm⊥, j ⊇ Pm+1 , a ∈ Fb , j ∈ Fbm . (9)
a=0

⊥, j+abm b−1
The overall dual cosets structure can be represented as a tree, where {Pm+1 }a=0
⊥, j
are the children of Pm .

(a) (b)
1 20
15
0.8
10
0.6 5
0
0.4 −5
−10
0.2
−15
0 −20
0 0.2 0.4 0.6 0.8 1 −20 −10 0 10 20

Fig. 1 Plots of a the node set P6 depicted as •{z 0 , z 1 }, ×{z 2 , z 3 }, ∗{z 4 , . . . , z 7 }, {z 8 , . . . , z 15 },
+{z 16 , . . . , z 31 }, {z 32 , . . . , z 63 }, and b some of the dual lattice points, P6⊥ ∩ [−20, 20]2
Adaptive Multidimensional Integration Based on Rank-1 Lattices 411

Figure 1 shows an example of a rank-1 lattice node set with 64 points in dimension
2 and its dual lattice. The parameters defining this node set are b = 2, m = 6, and
z 32 = (1, 27)/64. It is useful to see how Pm = Pm−1 ∪ {Pm−1 + z 2m−1 mod 1}.

3 Fourier Series

The integrands considered here are absolutely continuous periodic functions. If the
integrand is not initially periodic, it may be periodized as discussed in [4, 12], or
[13, Sect. 2.12]. More general box domains may be considered, also by using variable
transformations, see e.g., [7, 8].

The L 2 ([0, 1)d ) inner product is defined as f, g 2 = [0,1)d f (x)g(x) dx. The

complex exponential functions, {e2π −1 k,· }k∈Zd form a complete orthonormal basis
for L 2 ([0, 1)d ). So, any function in L 2 ([0, 1)d ) may be written as its Fourier series
as
 √ √
f (x) = fˆ(k)e2π −1 k,x , where fˆ(k) = f, e2π −1 k,· , (10)
2
k∈Zd

and the inner product of two functions in L 2 ([0, 1)d ) is the 2 inner product of their
series coefficients:
  
f, g 2 = fˆ(k)ĝ(k) =: fˆ(k) k∈Zd , ĝ(k) k∈Zd .
2
k∈Zd


Note that for any z ∈ Pm and k ∈ Pm⊥ , we have e2π −1 k,z = 1. The special
group structure of the lattice node set, Pm , leads to a useful formula for the average
of any Fourier basis function over Pm . According to [10, Lemma 5.21],

bm −1

1  2π √−1 k,zi 1, k ∈ Pm⊥
e = 1Pm⊥ (k) = (11)
m
b i=0 0, k ∈ Zd \ Pm⊥ .

This property of the dual lattice is used below to describe the absolute error of a
shifted rank-1 lattice cubature rule in terms of the Fourier coefficients for wavenum-
bers in the dual lattice. For fixed Δ ∈ [0, 1)d , the cubature rule is defined as

b −1
1 
m

Iˆm ( f ) := m f (z i ⊕ Δ), m ∈ N0 . (12)


b i=0
 √  √
Note from this definition that Iˆm e2π −1 k,· = e2π −1 k,Δ 1Pm⊥ (k). The series
decomposition defined in (10) and Eq. (11) are used in intermediate results from
412 Ll.A. Jiménez Rugama and F.J. Hickernell

[10, Theorem 5.23] to show that,


     
    √   ˆ 
 f (x) dx − Im ( f ) = 
ˆ fˆ(k)e2π −1 k,Δ 
 ≤  f (k) . (13)
[0,1)d
k∈Pm⊥ \{0} k∈Pm⊥ \{0}

4 The Fast Fourier Transform for Function Values


at Rank-1 Lattice Node Sets

Adaptive Algorithm 1 (cubLattice_g) constructed in Sect. 6 has an error analysis


based on the above expression. However, the true Fourier coefficients are unknown
and they must be approximated by the discrete coefficients, defined as:
 √ 
f˜m (k) := Iˆm e−2π −1 k,· f (·) (14a)
 
√  √
= Iˆm e −2π −1 k,·
fˆ(l)e 2π −1 l,·

l∈Zd
  √ 
= fˆ(l) Iˆm e2π −1 l−k,·
l∈Zd
 √
= fˆ(l)e2π −1 l−k,Δ
1Pm⊥ (l − k)
l∈Zd
 √
= fˆ(k + l)e2π −1 l,Δ

l∈Pm⊥
 √
= fˆ(k) + fˆ(k + l)e2π −1 l,Δ
, ∀k ∈ Zd . (14b)
l∈Pm⊥ \{0}

Thus, the discrete transform f˜m (k) equals the integral transform fˆ(k), defined in
(10), plus aliasing terms corresponding to fˆ(k + l) scaled by the shift, Δ, where
l ∈ Pm⊥ \ {0}.
To facilitate the calculation of f˜m (k), we define the map 
νm : Zd → Fbm as fol-
lows:


ν0 (k) := 0, νm (k) := bm k, z bm−1 , m ∈ N.
 (15)
⊥, j
A simple but useful remark is that Pm corresponds to all k ∈ Zd such that νm (k) =
j for j ∈ Fbm . The above definition implies that k, z i appearing in f˜m (k), may be
written as
Adaptive Multidimensional Integration Based on Rank-1 Lattices 413
 

m−1 
m−1
k, z i = k, i  z b mod 1 = i  k, z b mod 1
=0 =0


m−1
= ν+1 (k)b−−1 mod 1. (16)
i 
=0

The map  νm depends on the choice of the embedded rank-1 lattice node sets
defined in (1) and (3). We can confirm that the right hand side of this definition lies
in Fbm by appealing to (1) and recalling that the a are integer vectors:

bm k, z bm−1 = bm [(b−1 k T am−1 + · · · + b−m k T a0 ) mod 1]


= (bm−1 k T am−1 + · · · + k T a0 ) mod bm ∈ Fbm , m ∈ N.

Moreover, note that for all m ∈ N

 νm (k) = bm+1 k, z bm − bm k, z bm−1


νm+1 (k) − 
= bm [b k, z bm − k, z bm−1 ]
= bm [a + k, bz bm mod 1 − k, z bm−1 ], for some a ∈ Fb
= bm [a + k, z bm−1 − k, z bm−1 ], by (2)
= abm for some a ∈ Fb . (17)

For all ν ∈ N0 with proper b-ary expansion ν = ν0 + ν1 b + · · · ∈ N0 , let ν m denote


the integer obtained by keeping only the first m terms of its b-ary expansion, i.e.,

ν m := ν0 + · · · + νm−1 bm−1 = [(b−m ν) mod 1]bm ∈ Fbm (18)

νm (k) = ν ∈ Fbm , then


The derivation in (17) means that if 

ν (k) = ν  ,
  = 1, . . . , m. (19)

Letting yi := f (z i ⊕ Δ) for i ∈ N0 and considering (16), the discrete Fourier


transform defined in (14a) can now be written as follows:

 √  bm −1
1  −2π √−1 k,zi ⊕Δ
˜ ˆ
f m (k) := Im e −2π −1 k,·
f (·) = m e yi
b i=0

= e−2π −1 k,Δ
Ym (
νm (k)), m ∈ N0 , k ∈ Zd , (20)

where for all m, ν ∈ N0 ,


 
1   
√ m−1
b−1 b−1
Ym (ν) := m ··· yi0 +···+im−1 bm−1 exp −2π −1 i  ν +1 b−−1
b i =0 i =0 =0
m−1 0

= Ym (ν m ).
414 Ll.A. Jiménez Rugama and F.J. Hickernell

The quantity Ym (ν), ν ∈ Fbm , which is essentially the discrete Fourier transform, can
be computed efficiently via some intermediate quantities. For p ∈ {0, . . . , m − 1},
m, ν ∈ N0 define Ym,0 (i 0 , . . . , i m−1 ) := yi0 +···+im−1 bm−1 and let

Ym,m− p (ν, i m− p , . . . , i m−1 )


 
1 
b−1 
b−1
√ 
m− p−1
:= ··· yi0 +···+im−1 bm−1 exp −2π −1 i  ν +1 b−−1 .
bm− p i m− p−1 =0 i 0 =0 =0

Note that Ym,m− p (ν, i m− p , . . . , i m−1 ) = Ym,m− p (ν m− p , i m− p , . . . , i m−1 ), and thus


takes on only bm distinct values. Also note that Ym,m (ν) = Ym (ν). For p = m −
1, . . . , 0, compute

Ym,m− p (ν, i m− p , . . . , i m−1 )


⎛ ⎞
1 
b−1 
b−1
√ m−p−1
= ··· yi0 +···+im−1 bm−1 exp ⎝−2π −1 i  ν +1 b −−1 ⎠
bm− p
i m− p−1 =0 i 0 =0 =0

1 
b−1 √ 
= Ym,m− p−1 (ν, i m− p−1 , . . . , i m−1 ) exp −2π −1i m− p−1 ν m− p b−m+ p .
b
i m− p−1 =0

For each p one must perform O(bm ) operations, so the total computational cost to
obtain Ym (ν) for all ν ∈ Fbm is O(mbm ).

5 Error Estimation

As seen in Eq. (13), the absolute error is bounded by a sum of the absolute value of
the Fourier coefficients in the dual lattice. Note that increasing the number of points
in our lattice, i.e. increasing m, removes wavenumbers from the set over which this
summation is defined. However, it is not obvious how fast is this error decreasing
with respect to m. Rather than deal with a sum over the vector wavenumbers, it is
more convenient to sum over scalar non-negative integers. Thus, we define another
mapping k̃ : N0 → Zd .

Definition 1 Given a sequence of points in embedded lattices, P∞ = {z i }i=0 define
k̃ : N0 → Zd one-to-one and onto recursively as follows:
Set k̃(0) = 0
For m ∈ N0
For κ ∈ Fbm ,
Let a ∈ Fb be such that νm+1 ( k̃(κ)) = 
νm ( k̃(κ)) + abm .
(i) If a = 0, choose k̃(κ + ab ) ∈ {k ∈ Zd : 
m
νm+1 (k) =  νm ( k̃(κ))}.
(ii) Choose k̃(κ + a  bm ) ∈ {k ∈ Zd : νm+1 (k) =  νm ( k̃(κ)) + a  bm },
for a  ∈ {1, . . . , b − 1}\{a}.
Adaptive Multidimensional Integration Based on Rank-1 Lattices 415

Definition 1 is intended to reflect the embedding of the dual cosets described in (8)
⊥, j+abm
and (9). For clarity, consider  νm ( k̃(κ)) = j. In (i), if k̃(κ) ∈ Pm+1 with a > 0,
⊥, j
we choose k̃(κ + ab ) ∈ Pm+1 . Otherwise by (ii), we simply choose k̃(κ + a  bm ) ∈
m
⊥, j+a  bm ⊥, j
Pm+1 . Condition (i) forces us to pick wavenumbers in Pm+1 .
This mapping is not uniquely defined and one has the flexibility to choose part
of it. For example, defining a norm such as in [13, Chap. 4] one can assign smaller
values of κ to smaller wavenumbers k. In the end, our goal is to define this mapping
such that fˆ( k̃(κ)) → 0 as κ → ∞. In addition, it is one-to-one since at each step the
new values k̃(κ + abm ) or k̃(κ + a  bm ) are chosen from sets of wavenumbers that
exclude those wavenumbers already assigned to k̃(κ). The mapping can be made
onto by choosing the “smallest” wavenumber in some sense.
It remains to be shown that for any κ ∈ Fbm , {k ∈ Zd :  νm+1 (k) = νm ( k̃(κ)) +
a b } is nonempty for all a  ∈ Fb with a  = a. Choose l such that l, z 1 = b−1 .
 m

This is possible because z 1 = b−1 a0 = 0. For any m ∈ N0 , κ ∈ Fbm , and a  ∈ Fb ,


note that

k̃(κ) + a  bm l, z bm = k̃(κ), z bm + a  bm l, z bm mod 1 by (5c)
 
= [b−m−1 νm+1 ( k̃(κ)) + a  l, bm z bm mod 1 ] mod 1
by (5b) and (15)
−m−1
= [b νm ( k̃(κ)) + ab−1 + a  l, z 1 ] mod 1
 by (2)
−m−1  −1
= [b 
νm ( k̃(κ)) + (a + a )b ] mod 1,

Then it follows that

νm+1 ( k̃(κ) + a  bm l) = 
 νm ( k̃(κ)) + (a + a  mod b)bm by (15).

By choosing a  such that a  = (a + a  mod b), we have shown that the set κ ∈ Fbm ,
{k ∈ Zd :  νm+1 (k) = νm ( k̃(κ)) + a  bm } is nonempty.
To illustrate the initial steps of a possible mapping, consider the lattice in Fig. 1
and Table 1. For m = 0, κ ∈ {0} and a = 0. This skips i) and implies k̃(1) ∈ {k ∈
Zd : ν1 (k) = 2 k, (1, 27)/2 = 1}, so one may choose k̃(1) := (−1, 0). After that,
m = 1 and κ ∈ {0, 1}. Starting with κ = 0, again a = 0 and we jump to ii) where
we require k̃(2) ∈ {k ∈ Zd :  ν2 (k) = 4 k, (1, 27)/4 = 2} and thus, we can take
k̃(2) := (−1, 1). When κ = 1, we note that  ν2 ( k̃(1)) = 
ν((−1, 0)) = 3. Here a = 1
leading to i) and k̃(3) ∈ {k ∈ Zd :  ν2 (k) = 1}, so we may choose k̃(3) := (1, 0).
Continuing, we may take k̃(4) := (−1, −1), k̃(5) := (0, 1), k̃(6) := (1, −1) and
k̃(7) := (0, −1).

Lemma 1 The map in Definition 1 has the property that for m ∈ N0 and κ ∈ Fbm ,

{ k̃(κ + λbm )}∞ ⊥


λ=0 = {l ∈ Z : k̃(κ) − l ∈ Pm }.
d
416 Ll.A. Jiménez Rugama and F.J. Hickernell

Table 1 Values 
ν1 , 
ν2 and 
ν3 for some wavenumbers and a possible assignment of k̃(κ)
k̃(κ) κ 
ν1 ( k̃(κ)) = 
ν2 ( k̃(κ)) = 
ν3 ( k̃(κ)) =
2 k̃(κ), (1, 27)/2 4 k̃(κ), (1, 27)/4 8 k̃(κ), (1, 27)/8
(0, 0) 0 0 0 0
(−1, −1) 4 0 0 4
(−1, 1) 2 0 2 2
(1, −1) 6 0 2 6
(−1, 0) 1 1 3 7
(1, 0) 3 1 1 1
(0, −1) 7 1 1 5
(0, 1) 5 1 3 3
(1, 1) ··· 0 0 4
The reader should notice that 
νm+1 ( k̃(κ)) − 
νm ( k̃(κ)) is either 0 or 2m

Proof This statement holds trivially for m = 0 and κ = 0. For m ∈ N it is noted that

k − l ∈ Pm⊥ ⇐⇒ k − l, z bm−1 = 0 by (7)


⇐⇒ k, z bm−1 = l, z bm−1 by (5c)
⇐⇒ b−mνm (k) = b−mνm (l) by (15)
⇐⇒ 
νm (k) = 
νm (l). (21)

This implies that for all m ∈ N and κ ∈ Fbm ,

{l ∈ Zd :  νm ( k̃(κ))} = {l ∈ Zd : k̃(κ) − l ∈ Pm⊥ }.


νm (l) =  (22)

By Definition 1 it follows that for m ∈ N and κ ∈ Fbm ,

{ k̃(κ + λbm )}b−1


λ=0 ⊆ {k ∈ Z : 
d
νm+1 (k) = 
νm ( k̃(κ)) + abm , a ∈ Fb }
= {k ∈ Zd : 
νm (k) = 
νm ( k̃(κ))}.

Applying property (19) on the right side,

{ k̃(κ + λbm )}b−1


λ=0 ⊆ {k ∈ Z : 
d
ν (k) = 
ν ( k̃(κ  ))}, ∀ = 1, . . . , m.

Because one can say the above equation holds ∀ = 1, . . . , n < m, the left hand side
can be extended,

{ k̃(κ + λbm )}∞


λ=0 ⊆ {k ∈ Z : 
d
νm (k) = 
νm ( k̃(κ))}. (23)
Adaptive Multidimensional Integration Based on Rank-1 Lattices 417

Now suppose that l is any element of {k ∈ Zd :  νm (k) = νm ( k̃(κ))}. Since the
map k̃ is onto, there exists some κ  ∈ N0 such that l = k̃(κ  ). Choose λ such that
κ  = κ  m + λ bm , where the overbar notation was defined in (18). According to (23) it
follows that  νm ( k̃(κ  m + λ bm )) = 
νm ( k̃(κ  m )) =  νm ( k̃(κ)). Since κ  m and
νm (l) = 
κ are both in Fbm , this implies that κ m = κ, and so l ∈ { k̃(κ + λbm )}∞

λ=0 . Thus,
{ k̃(κ + λbm )}∞λ=0 ⊇ {k ∈ Zd
: 
ν m (k) = 
νm ( k̃(κ))}, and the lemma is proved. 

For convenience we adopt the notation fˆκ := fˆ( k̃(κ)) and f˜m,κ := f˜m ( k̃(κ)).
Then, by Lemma 1 the error bound in (13) may be written as
   ∞  
   ˆ m
 f (x) dx − Iˆm ( f ) ≤  f λb  , (24)

[0,1)d λ=1

and the aliasing relationship in (14b) becomes



 √
−1 k̃(κ+λbm )− k̃(κ),Δ
f˜m,κ = fˆκ + fˆκ+λbm e

. (25)
λ=1

Given an integrand with absolutely summable Fourier coefficients, consider the


following sums defined for , m ∈ N0 ,  ≤ m:

b
m
−1 b −1 ∞

   
Sm ( f ) =  fˆκ , 
S,m ( f ) =  fˆκ+λbm ,
κ=bm−1  κ=b−1  λ=1


 b −1
   
Sqm ( f ) = 
S0,m ( f ) + · · · + 
Sm,m ( f ) =  fˆκ , 
S,m ( f ) =  f˜m,κ .
κ=bm κ=b−1 

Note that  S,m ( f ) is the only one that can be observed from data because it
involves
   discrete transform
the  coefficients. In fact, from (20) one can identify
 f˜m,κ  = Ym (
νm ( k̃(κ))) and our adaptive algorithm will be based on this sum bound-
ing the other three, Sm ( f ),  S,m ( f ), and Sqm ( f ), which cannot be readily observed.
Let ∗ ∈ N be some fixed integer and  ω and ω̊ be some bounded non-negative
valued functions. We define a cone, C , of absolutely continuous functions whose
Fourier coefficients decay according to certain inequalities:

C := { f ∈ AC([0, 1)d ) : 
S,m ( f ) ≤ ω(m − ) Sqm ( f ),  ≤ m,
Sqm ( f ) ≤ ω̊(m − )S ( f ), ∗ ≤  ≤ m}. (26)

ω(r )ω̊(r ) < 1 and that limm→∞ ω̊(m) =


We also require the existence of r such that 
0. This set is a cone, i.e. f ∈ C =⇒ a f ∈ C ∀a ∈ R, but it is not convex. A wider
discussion on the advantages and disadvantages of designing numerical algorithms
for cones of functions can be found in [2].
418 Ll.A. Jiménez Rugama and F.J. Hickernell

Fig. 2 The magnitudes of


true Fourier coefficients for
some integrand

Functions in C have Fourier coefficients that do not oscillate wildly. According


to (24), the error of our integration is bounded by  S0,m ( f ). Nevertheless, for prac-
tical purposes we will use S ( f ) as an indicator for the error. Intuitively, the cone
conditions enforce these two sums to follow a similar trend. Thus, one can expect
that small values of S ( f ) imply small values of  S0,m ( f ).
The first inequality controls how an infinite sum of some of the larger wavenumber
coefficients are bounded above by a sum of all the surrounding coefficients. The
second inequality controls how the sum of these surrounding coefficients is bounded
above by a finite sum of some smaller wavenumber Fourier coefficients. In Fig. 2 we
can see how S8 ( f ) can be used to bound Sq12 ( f ) and Sq12 ( f ) to bound 
S0,12 ( f ). The
former sum also corresponds to the error bound in (24).
For small  the sum S ( f ) includes only a few summands. Therefore, it could acci-
dentally happen that S ( f ) is too small compared to Sqm ( f ). To avoid this possibility,
the cone definition includes the constraint that  is greater than some minimum ∗ .
Because we do not assume the knowledge of the true Fourier coefficients, for
functions in C we need bounds on S ( f ) in terms of the sum of the discrete coef-
ficients 
S,m ( f ). This is done by applying (25), and the definition of the cone in
(26):
 

b −1
 

b −1  ∞
 √ 
 fˆκ  = ˜ 2π −1 k̃(κ+λbm )− k̃(κ),Δ 
S ( f ) =  f m,κ − fˆκ+λbm e 
 
κ=b−1  κ=b−1  λ=1
 
−1
b −1
b ∞

   
≤  f˜m,κ  +  fˆκ+λbm  = 
S,m ( f ) + 
S,m ( f )
κ=b−1  κ=b−1  λ=1

≤
S,m ( f ) + 
ω(m − )ω̊(m − )S ( f ) (27)

and provided that 


ω(m − )ω̊(m − ) < 1,
Adaptive Multidimensional Integration Based on Rank-1 Lattices 419


S,m ( f )
S ( f ) ≤ . (28)
1−
ω(m − )ω̊(m − )

By (24) and the cone conditions, (28) implies a data-based error bound:
  ∞
    
 f (x) dx − Im ( f ) ≤
ˆ  fˆλbm  =  ω(m) Sqm ( f )
S0,m ( f ) ≤ 

[0,1)d λ=1
≤ω(m)ω̊(m − )S ( f )

ω(m)ω̊(m − ) 
≤ S,m ( f ). (29)
1−ω(m − )ω̊(m − )

In Sect. 6 we construct an adaptive algorithm based on this conservative bound.

6 An Adaptive Algorithm Based for Cones of Integrads

Inequality (29) suggests the following algorithm. First, choose ∗ and fix r := m −
 ∈ N such that ω(r )ω̊(r ) < 1 for  ≥ ∗ . Then, define


ω(m)ω̊(r )
C(m) := .
1−ω(r )ω̊(r )

The choice of the parameter r is important. Larger r means a smaller C(m), but it
also makes the error bound more dependent on smaller indexed Fourier coefficients.
Algorithm 1 (Adaptive Rank-1 Lattice Cubature, cubLattice_g) Fix r and ∗ ,
ω and ω̊ describing C in (26). Given a tolerance, ε, initialize m = ∗ + r and do:

Step 1. According to Sect. 4, compute  Sm−r,m ( f ).
Step 2. Check whether C(m) Sm−r,m ( f ) ≤ ε. If true, return Iˆm ( f ) defined in (12).

If not, increment m by one, and go to Step 1.
Theorem 1 For m = min{m  ≥ ∗ + r : C(m  )
Sm  −r,m  ( f ) ≤ ε}, Algorithm 1 is suc-
cessful whenever f ∈ C ,
 
 
 f (x)dx − Iˆm ( f ) ≤ ε.

[0,1)d

Thus, the number of function data needed is bm . Defining m ∗ = min{m  ≥ ∗ + r :



C(m  )[1 + 
ω(r )ω̊(r )]Sm  −r ( f ) ≤ ε}, we also have bm ≤ bm . This means that the
computational cost can be bounded,

cost 
∗ ∗
Im , f, ε ≤ $( f )bm + cm ∗ bm

where $( f ) is the cost of evaluating f at one data point.


420 Ll.A. Jiménez Rugama and F.J. Hickernell

Proof By construction, the algorithm must be successful. Recall that the inequality
used for building the algorithm is (29).
To find the upper bound on the computational cost, a similar result to (27) provides

b −1 
b −1  ∞
 √ 
   2π −1 k̃(κ+λbm )− k̃(κ),Δ 

S,m ( f ) =  f˜m,κ  =  fˆκ + ˆ
f κ+λbm e
 
κ=b−1 κ=b−1 λ=1
 
−1
b −1
b ∞

   
≤  fˆκ  +  fˆκ+λbm  = S ( f ) + 
S,m ( f )
κ=b−1 κ=b−1 λ=1
≤ [1 + 
ω(m − )ω̊(m − )]S ( f ).

Replacing  S,m ( f ) in the error bound in (29) by the right hand side above proves that
the choice of m needed to satisfy the tolerance is no greater than m ∗ defined above.
In Sect. 4, the computation of  Sm−r,m ( f ) is described in terms of O(mbm ) opera-
tions. Thus, the total cost of Algorithm 1 is,

cost 
∗ ∗
Im , f, ε ≤ $( f )bm + cm ∗ bm 

7 Numerical Example

Algorithm 1 has been coded in MATLAB as cubLattice_g in base 2, and is


part of GAIL, [1]. To test it, we priced an Asian call with geometric Brownian
motion, S0 = K = 100, T = 1 and r = 3 %. The test is performed on 500 samples
whose dimensions are chosen IID uniformly among 1, 2, 4, 8, 16, 32, and 64, and
the volatility also IID uniformly from 10 to 70 %. Results, in Fig. 3, show 97 % of
success meeting the error tolerance.
The algorithm cone parametrization was ∗ = 6, r = 4 and C(m) = 5 × 2−m . In
addition, each replication used a shifted lattice with Δ ∼ U (0, 1). However, results
are strongly dependent on the generating vector that was used for creating the rank-1
lattice embedded node sets. The vector applied to this example was found with the
latbuilder software from Pierre L’Ecuyer and David Munger [9], obtained for
226 points, d = 250 and coordinate weights γ j = j −2 , optimizing the P2 criterion.
For this particular example, the choice of C(m) does not have a noticeable impact
on the success rate or execution time. In other cases such as discontinuous func-
tions, it is more sensitive. Being an adaptive algorithm, if the Fourier coefficients
Adaptive Multidimensional Integration Based on Rank-1 Lattices 421

Fig. 3 Empirical 0 0.2 0.4 0.6 0.8 1


distribution functions 10 2 1
obtained from 500 samples,
for the error (continuous 10 1 0.8
line) and time (slashed-doted

Time (seconds)
line). Quantiles are specified
0 0.6
on the right and top axes 10
respectively. The tolerance
of 0.02 (vertical dashed line) -1
10 0.4
is an input of the algorithm
and will be a guaranteed
bound on the error if the 10 -2 0.2
function lies inside the cone
-3 0
10 -5 -4 -3 -2 -1 0
10 10 10 10 10 10
Error

decrease quickly, cone conditions have a weaker effect. One can see that the number
of summands involving  Sm−r,m ( f ) is 2m−r −1 for a fixed r . Thus, in order to give a
uniform weight to each wavenumber, we chose C(m) proportional to 2−m .

8 Discussion and Future Work

Quasi-Monte Carlo methods rarely provide guaranteed adaptive algorithms. This


new methodology that bounds the absolute error via the discrete Fourier coefficients
allows us to build an adaptive automatic algorithm guaranteed for cones of inte-
grands. The non-convexity of the cone allows our adaptive, nonlinear algorithm to
be advantageous in comparison with non-adaptive, linear algorithms.
Unfortunately, the definition of the cone does contain parameters, 
ω and ω̊, whose
optimal values may be hard to determine. Moreover, the definition of the cone does
not yet correspond to traditional sets of integrands, such as Korobov spaces. These
topics deserve further research.
Concerning the generating vector used in Sect. 7, some further research should be
carried out to specify the connection between dimension weights and cone parame-
ters. This might lead to the existence of optimal weights and generating vector.
Our algorithm provides an upper bound on the complexity of the problem, but
we have not yet obtained a lower bound. We are also interested in extending our
algorithm to accommodate a relative error tolerance. We would like to understand
how the cone parameters might depend on the dimension of the problem, and we
would like to extend our adaptive algorithm to infinite dimensional problems via
multi-level or multivariate decomposition methods.
422 Ll.A. Jiménez Rugama and F.J. Hickernell

Acknowledgments The authors thank Ronald Cools and Dirk Nuyens for organizing MCQMC
2014 and greatly appreciate the suggestions made by Sou-Cheng Choi, Frances Kuo, Lan Jiang,
Dirk Nuyens and Yizhi Zhang to improve this manuscript. In addition, the first author also thanks
Art B. Owen for partially funding traveling expenses to MCQMC 2014 through the US National
Science Foundation (NSF). This work was partially supported by NSF grants DMS-1115392, DMS-
1357690, and DMS-1522687.

References

1. Choi, S.C.T., Ding, Y., Hickernell, F.J., Jiang, L., Jiménez Rugama, Ll.A., Tong, X., Zhang,
Y., Zhou, X.: GAIL: Guaranteed Automatic Integration Library (versions 1.0–2.1). MATLAB
software. https://github.com/GailGithub/GAIL_Dev (2013–2015)
2. Clancy, N., Ding, Y., Hamilton, C., Hickernell, F.J., Zhang, Y.: The cost of deterministic,
adaptive, automatic algorithms: cones, not balls. J. Complex. 30(1), 21–45 (2014)
3. Dick, J., Kuo, F., Sloan, I.H.: High dimensional integration – the Quasi-Monte Carlo way. Acta
Numer. 22, 133–288 (2013)
4. Hickernell, F.J.: Obtaining O(N −2+ ) convergence for lattice quadrature rules. In: Fang, K.T.,
Hickernell, F.J., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2000,
pp. 274–289. Springer, Berlin (2002)
5. Hickernell, F.J., Jiménez Rugama, Ll.A.: Reliable adaptive cubature using digital sequences.
In: Cools, R., Nuyens, D., (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2014, vol. 163,
pp. 367–383. Springer, Heidelberg (2016)
6. Hickernell, F.J., Niederreiter, H.: The existence of good extensible rank-1 lattices. J. Complex.
19, 286–300 (2003)
7. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On tractability of weighted integration over
bounded and unbounded regions in Rs . Math. Comput. 73, 1885–1901 (2004)
8. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: The strong tractability of multivariate integra-
tion using lattice rules. In: Niederreiter, H. (ed.) Monte Carlo and Quasi-Monte Carlo Methods
2002, pp. 259–273. Springer, Berlin (2004)
9. L’Ecuyer, P., Munger, D.: Algorithm xxx: A general software tool for constructing rank-1 lattice
rules. ACM Trans. Math. Softw. (2016). To appear, http://www.iro.umontreal.ca/~lecuyer/
myftp/papers/latbuilder.pdf
10. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF
Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1992)
11. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems Volume II: Standard Infor-
mation for Functionals. No. 12 in EMS Tracts in Mathematics. European Mathematical Society,
Zürich (2010)
12. Sidi, A.: A new variable transformation for numerical integration. In: Brass, H., Hämmerlin,
G. (eds.) Numerical Integration IV, No. 112 in International Series of Numerical Mathematics,
pp. 359–373. Birkhäuser, Basel (1993)
13. Sloan, I.H., Joe, S.: Lattice Methods for Multiple Integration. Oxford University Press, Oxford
(1994)
Path Space Filtering

Alexander Keller, Ken Dahm and Nikolaus Binder

Abstract We improve the efficiency of quasi-Monte Carlo integro-approximation


by using weighted averages of samples instead of the samples themselves. The pro-
posed deterministic algorithm is constructed such that it converges to the solution of
the given integro-approximation problem. The improvements and wide applicability
of the consistent method are demonstrated by visual evidence in the setting of light
transport simulation for photorealistic image synthesis, where the weighted averages
correspond to locally smoothed contributions of path space samples.

Keywords Transport simulation · Integro-approximation · Photorealistic image


synthesis · Rendering

1 Introduction

Modeling with physical entities like cameras, light sources, and materials on top of
a scene surface stored in a computer, light transport simulation may deliver photore-
alistic images. Due to complex discontinuities and the curse of dimension, analytic
solutions are out of reach. Thus simulation algorithms have to rely on sampling path
space and summing up the contributions of light transport paths that connect cam-
era sensors and light sources. Depending on the complexity of the modeled scene,
the inherent noise of sampling may vanish only slowly with the progression of the
computation.
This noise may be efficiently reduced by smoothing the contribution of light trans-
port paths before reconstructing the image. So far, intermediate approximations were

A. Keller (B) · K. Dahm · N. Binder


NVIDIA, Fasanenstr. 81, 10623 Berlin, Germany
e-mail: keller.alexander@gmail.com
K. Dahm
e-mail: ken.dahm@gmail.com
N. Binder
e-mail: nikolaus.binder@gmail.com

© Springer International Publishing Switzerland 2016 423


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_21
424 A. Keller et al.

computed for this purpose. However, removing frequent persistent visual artifacts
due to insufficient approximation then forces simulation from scratch. In addition,
optimizing the numerous parameters of such methods in order to increase efficiency
has been challenging.
We therefore propose a simple and efficient deterministic algorithm that has fewer
parameters. Furthermore, visual artifacts are guaranteed to vanish by progressive
computation and the consistency of the scheme, which in addition overcomes tedious
parameter tuning. While the algorithm unites the advantages of previous work, it also
provides the desired noise reduction as shown by many practical examples.

1.1 Light Transport Simulation by Connecting


Path Segments

Just following photon trajectories and counting photons incident on the camera is
hopelessly inefficient. Therefore light transport paths are sampled by following both
photon trajectories from the lights and tracing paths from the camera aiming to
connect both classes of path segments by proximity and shadow rays [3, 6, 30].
Instead of (pseudo-) random sampling, we employ faster quasi-Monte Carlo meth-
ods [22], which for the context of computer graphics are reviewed in [11]. The exten-
sive survey provides all algorithmic building blocks for generating low discrepancy
sequences and in depth explains how to transform them into light transport paths. For
the scope of our article, it is sufficient to know that quasi-Monte Carlo methods in
computer graphics use deterministic low discrepancy sequences to generate path seg-
ments. Other than (pseudo-) random sequences, such low discrepancy sequences lack
independence, however, are much more uniformly distributed. In order to generate
light transport path segments, the components of the ith vector of a low discrepancy
sequence are partitioned into two sets (for example by separating the odd and even
components), which then are used to trace the ith camera and light path segment.
Such path segments usually are started by using two components to select an origin
on an area, as for example a light source, and then selecting a direction by two more
components to trace a ray. At the first point of intersection with the scene surface,
another component may be used to decide on path termination, otherwise, the next
two components are used to determine a direction of scattering to trace the next ray,
repeating the procedure.
As illustrated in Fig. 1a, one way to establish a light transport path is by means
of a shadow ray, testing whether both end points of two path segments are mutually
visible. While shadow rays work fine for mostly diffuse surfaces, they may become
inefficient for light transport paths that include specular-diffuse-specular segments
as for example light that is reflected by a mirror onto a diffuse surface and reflected
back by the mirror. To overcome this problem of insufficient techniques [15, Fig. 2],
connecting photon trajectories to camera path segments by proximity, which is called
photon mapping [7], aims to efficiently capture contributions that shadow rays fail
on.
Path Space Filtering 425

(a) (b)

connecting path segments by shadow rays and proximity path space filtering

Fig. 1 Illustration of connecting path segments in light transport simulation: a Segments of light
transport paths are generated by following photon trajectories from the light source L and tracing
paths from the camera. End points of path segments then are connected either if they are mutually
visible (dashed line, shadow ray) or if they are sufficiently close (indicated by the dashed circle).
b Complementary to these connection techniques, path space filtering is illustrated by the green
part of the schematic: The contribution ci to the vertex xi of a light transport path is replaced by
a smoothed contribution c̄i resulting from averaging contributions csi + j to vertices inside the ball
B(n). This averaged contribution c̄i then is multiplied by the throughput τi of the path segment
towards the camera and accumulated on the image plane P. In order to guarantee a consistent
algorithm, the radius r (n) of the ball B(n) must vanish with an increasing number n of samples

Photon mapping connects end points of path segments that are less than a spec-
ified radius apart. Decreasing such a radius r (n) with the increasing number n of
sampled light transport paths as introduced by progressive photon mapping [5], the
scheme became consistent: In the limit it in fact becomes equivalent to shadow ray
connections. A consistent and numerically robust quasi-Monte Carlo method for
progressive photon mapping has been developed in [12], while the references in this
article reflect the latest developments in photon mapping as well. Similar to sto-
chastic progressive photon mapping [4], the computation is processing consecutive
batches of light transport paths. Depending on the low discrepancy sequence used,
some block sizes are preferable over others and we stick to integer block sizes of the
form bm as derived in [12]. Note that b is fixed by the underlying low discrepancy
sequence.

2 Consistent Weighted Averaging

Already in [20, 21] it has been shown that a very sparse set of samples may pro-
vide sufficient information for high quality image synthesis. Progressive path space
filtering is a new simpler, faster, and consistent variance reduction technique that is
complementary to shadow rays and progressive photon mapping.
426 A. Keller et al.

Considering the ith out of a current total of n light transport paths, selecting a
vertex xi suitable for filtering the radiance contribution ci of the light path segment
towards xi also determines the throughput τi along the path segment towards the
camera (see Fig. 1b). While any or even multiple vertices of a light transport path
may be selected, a simple and practical choice is the first vertex along the path from
the camera whose optical properties are considered sufficiently diffuse.
As mentioned before, one low discrepancy sequence is transformed to sample
path space in contiguous batches of bm ∈ N light transport paths, where for each
path one selected tuple (xi , τi , ci ) is stored for path space filtering. As the memory
consumption is proportional to the batch size bm and given the size of the tuples
and the maximum size of a memory block, it is straightforward to determine the
maximum natural number m.  
Processing the batch of bm paths starting at index si := bim bm , the image is
formed by accumulating τi · c̄i , where
bm −1  
j=0χB(n) xsi + j − xi · wi, j · csi + j
c̄i := bm −1   (1)
j=0 χB(n) x si + j − x i · wi, j

is the weighted average of the contributions csi + j of all vertices xsi + j in a ball B(n)
of radius r (n) centered in xi normalized by the sum of weights wi, j as illustrated in
Fig. 1. While the weights will be detailed in Sect. 2.1, for the moment it is sufficient
to postulate wi,i = 0.
Centered in xi , the characteristic function χB(n) always includes the ith path (as
opposed to for example [28]). Therefore, given an initial radius r0 (see Sect. 2.2 for
details), and a radius (see [12])
r0
r (n) = for α ∈ (0, 1) (2)

vanishing with the total number n of paths guarantees limn→∞ c̄i = ci and thus
consistency. As a consequence, all artifacts visible during progressive computation
must be transient, even if they may vanish slowly. However, selecting a small radius
to hide the transient artifacts is a goal competing with a large radius to include as
many as possible contributions in the weighted average.
Given the path space samples of a path tracer with next event estimation and
implicit multiple importance sampling [11, 19], Fig. 2 illustrates progressive path
space filtering, especially its noise reduction, transient artifacts, and consistency
for an increasing number n of light transport paths. The lighting consists of a high
dynamic range environment map. The first hit points as seen from the camera are
stored as the vertices xi , where the range search and filtering takes place.
In spite of the apparent similarity of Eq. 1, methods used for scattered data inter-
polation [14, 25], and weighted uniform sampling [23, 27], there are principal differ-
ences: First, an interpolation property c̄i = ci would inhibit any averaging right from
the beginning and second, bm  ∞, as bm is proportional to the required amount of
memory to store light transport paths. Nevertheless, the batch size bm should be cho-
Path Space Filtering 427

Fig. 2 The series of images illustrates progressive path space filtering. Each image shows the
unfiltered input above and the accumulation of weighted averages below the diagonal. As more and
more batches of paths are processed, the splotchy artifacts vanish due to the consistency of the
algorithm as guaranteed by the decreasing range search radius r (n). Model courtesy M. Dabrovic
and Crytek

sen as large as memory permits, because the efficiency results from simultaneously
filtering as many vertices as possible.
Caching samples of irradiance and interpolating them to increase the efficiency
of light transport simulation [32] has been intensively investigated [18] and has been
implemented in many renderers (see Fig. 5b). Scintillation in animations is the key
artifact of this method, which appears due to interpolating cached irradiance samples
that are noisy [17, Sect. 6.3.2] and cannot be placed in a coherent way over time. Such
artifacts require to adjust a set of multiple parameters followed by simulation from
scratch, because the method is not consistent.
Other than irradiance interpolation, path space filtering efficiently can filter across
discontinuities such as detailed geometry (for examples, see Fig. 6). It overcomes
the necessity of excessive trajectory splitting to reduce noise in the cached samples,
too, which enables path tracing using the fire-and-forget paradigm as required for
efficient parallel light transport simulation. This in turn fits the observation that
with an increasing number of simulated light transport paths trajectory splitting
becomes less efficient. In addition, reducing artifacts in a frame due to consistency
only requires to continue computation instead of starting over from scratch.
The averaging process defined in Eq. 1 may be iterated within a batch of light
transport paths, i.e. computing c̄¯i from c̄i and so on. This yields a further dramatic
428 A. Keller et al.

Fig. 3 Iterated weighted averaging very efficiently smooths the solution by relaxation at the cost
of losing some detail. Obviously, path space filtering replaces the black pixels of the input with the
weighted averages, which brightens up the image in the expected way. Model courtesy G. M. Leal
LLaaguno

speed up at the cost of some blurred illumination detail as can be seen in Fig. 3. Note
that such an iteration is consistent, too, because the radius r (n) decreases with the
number of batches.

2.1 Weighting by Similarity

Although Eq. 1 is consistent even without weighting, i.e. wi, j ≡ 1, for larger radii r (n)
the resulting images may look overly blurred as contributions csi + j become included
in the average that actually never could have been gathered in xi (see Fig. 4). In order
to reduce this transient artifact of light leaking and to benefit from larger radii to
include more contributions in the average, the weights wi, j should value how likely
the contribution csi + j could have been created in xi by trajectory splitting.
Various heuristics for such weights are known from irradiance interpolation [18],
the discontinuity buffer [10, 31], photon mapping [7], light field reconstruction
[20, 21], and Fourier histogram descriptors [2]. The effect of the following weights
of similarity is shown in Fig. 4:

Blur across geometry: The similarity of the surface normal n i in xi and other sur-
face normals n si + j in xsi + j can be determined by their scalar product n si + j , n i ∈
[−1, 1]. While obviously contributions with negative scalar product will be
excluded in order to prevent light leaking through the backside of an opaque
surface, including only contributions with n si + j , n i ≥ 0.95 (in our implemen-
tation) avoids light being transported across geometry that is far from planar.
Blur across textures: The images would be most crisp if for all contributions included
in the average the optical surface properties were evaluated in xi . For surfaces
other than diffuse surfaces, like for example glossy surfaces, these properties also
depend on the direction of observation, which then must be explicitly stored with
the xi . Some of this additional memory can be saved when directions are implic-
itly known to be similar, as for example for query locations xi as directly seen
from the camera.
Path Space Filtering 429

Fig. 4 The effect of weighting: The top left image was rendered by a forward path tracer at 16 path
space samples per pixel. The bottom left image shows the same algorithm with path space filtering.
The improvement is easy to see in the enlarged insets. The right column illustrates the effect of the
single components of the weights. From top to bottom: Using uniform weights, the image looks
blurred and light is transported around corners. Including only samples with similar surface normals
(middle), removes a lot of blur resulting in crisp geometry. The image at the bottom right in addition
reduces texture blur by not filtering contributions with too different local throughput by the surface
reflectance properties. Finally, the bottom left result adds improvements on the shadow boundaries
by excluding contributions that have too different visibility. Model courtesy M. Dabrovic and Crytek

In situations where this evaluation is too costly or not feasible, the algorithm has
to rely on data stored during path segment generation. Such data usually includes
a color term, which is the bidirectional scattering distribution function (BSDF)
multiplied by the ratio of the cosine between surface normal and direction of
incidence and the probability density function (pdf) evaluated for the directions
of transport. For the example of cosine distributed samples on diffuse surfaces
only the diffuse albedo remains, because all other terms cancel. If a norm of the
difference of these terms in xsi + j and xi is below a threshold ( · 2 < 0.05 in our
implementation), the contribution of xsi + j is included in the average. Unless the
surface is diffuse, the similarity of the directions of observation must be checked
as well to avoid incorrect in-scattering on glossy materials. Including more and
430 A. Keller et al.

more heuristics of course excludes more and more candidates, decreasing the
potential of noise reduction. In the real-time implementation of path space filter-
ing [2], the weighted average is computed for each component resulting from a
decomposition of path space induced by the basis functions used to represent the
optical surface properties.
Blurred shadows: Given a point light source, its visibility as seen from xi and xsi + j
may be either identical or different. In order to avoid sharp shadow boundaries
to be blurred, contributions may be only included upon identical visibility. For
ambient occlusion and illumination by an environment map, blur can be reduced
by comparing the lengths of each one ray shot into the hemisphere at xi and xsi + j
by thresholding their difference.

Using only binary weights that are either zero or one, the denominator of the ratio in
Eq. 1 amounts to the number of included contributions. Although seemingly counter-
intuitive, using the norms to directly weight the contributions results in higher vari-
ance. This effect already has been observed in an article [1] on efficient anti-aliasing:
Having other than uniform weights, the same contribution may be weighted differ-
ently in neighboring queries, which in turn results in increased noise. In a similar
way, using kernels (for examples see [26] or kernels used in the domain of smoothed
particles hydrodynamics (SPH)) other than the characteristic function χB(n) to weight
contributions by their distance to the query location xi increases the variance.

2.2 Range Search

The vertices xsi + j selected by the characteristic function χB(n) centered at xi effi-
ciently may be queried by a range search using a hash grid [29], a bounding volume
hierarchy or a kd-tree organizing the entirety of stored vertices in space, or a divide-
and-conquer method [13] simultaneously considering all queries. As the sets of query
and data points are identical, data locality is high and implementation is simplified.
Note that storing vertex information only in screen space even enables real-time
path space filtering [2], however, can only query a subset of the neighborhood rela-
tions as compared to the full 3d range search. In fact, real-time path space filtering
[2] improves on previous work [24, Chap. 4, p. 83] with respect to similarity criteria
and path space decomposition, while the basic ideas are similar and both based on
earlier attempts of filtering approaches [10, 16, 31] in order to improve efficiency.
As already observed in [12], the parameter α in Eq. 2 does not have much influence
and α = 41 is a robustly working choice. In fact, the radius is decreasing arbitrarily
slowly, which leaves the initial radius r0 as the most important parameter.
As fewer and fewer contributions are averaged with decreasing radius, there is
a point in time, where actually almost no averaging takes place any longer as only
the central vertex xi is included in the queries. On the one hand, this leads to the
intuition that comparing the maximum of the number of averaged contributions to
a threshold can be utilized to automatically switch off the algorithm. On the other
Path Space Filtering 431

hand, it indicates that the initial radius needs to be selected sufficiently large in order
to include a meaningful number of contributions in the weighted averages from Eq. 1.
The initial radius r0 also may depend on the query location xi . For example, it
πr 2
may be derived from the definition of the solid angle Δω := d 20 of a disk of radius
r0 in xi perpendicular to a ray at a distance d from the ray origin. For a fixed solid
angle Δω, the initial radius

d 2 Δω
r0 = ∼d
π
then is proportional to the distance d. The factor of proportionality may be either
chosen by the user or can be determined using a given solid angle. For example, Δω
can be chosen as the solid angle determined by the area of 3 × 3 pixels on the screen
with respect to the focal point. Finally, the distance d may be chosen as the length
of the camera path towards xi .
Note that considering an anisotropic footprint (area of averaging determined by
projecting the solid angle of a ray onto the intersected surface) is not practical for
several reasons: The requirement of dividing by the cosine between surface normal
in xi and the ray direction may cause numerical issues for vectors that are close to
perpendicular. In addition the efficiency of the range search may be decreased, as
now the query volume may have an arbitrarily large extent. Finally, this would result
in possibly averaging contributions from vertices that are spatially far apart, although
the local environment of the vertex xi may be small such as for example in foliage
or hair.

2.3 Differentiation of Path Space Filtering


and Photon Mapping

Progressive path space filtering is different from progressive photon mapping: First
of all, progressive photon mapping is not a weighted average as it determines radiance
by querying the flux of photons inside a ball around a query point divided by the
corresponding disk area. Without progressive photon mapping the contribution of
light transport paths that are difficult to sample [15, Fig. 2] would be just missing or
add high variance sporadically. Second, the query locations in photon mapping are
not part of the photon cloud queried by range search, while in path space filtering the
ensemble of vertices subject to range search includes both data and query locations.
Third, progressive photon mapping is concerned with light path segments, while
progressive path space filtering is concerned with camera path segments.
Temporally visible light leaks and splotches are due to a large range search radius
r (n), which allows for collecting light beyond opaque surfaces and due to the shape of
the ball B(n) blurs light into disk-like shapes. If the local environment around a query
point is not a disk, as for example close to a geometric edge, the division by the disk
area in photon mapping causes an underestimation resulting in a darkening along such
432 A. Keller et al.

edges. While this does not happen for the weighted average of path space filtering,
contrast may be reduced (see the foliage rendering in Fig. 6). In addition, so-called
fire flies that actually are rarely sampled peaks of the integrand, are attenuated by
the weighted average and therefore may look more like splotches instead of single
bright pixels. Since both progressive photon mapping and path space filtering are
consistent, all of these artifacts must be transient.

3 More Applications in Light Transport Simulation

Path space filtering is simple to implement and due to linearity (see Eq. 1) works for
any decomposition of path space including any variant of (multiple) importance sam-
pling. It can overcome the need for excessive trajectory splitting (see the schematic in
Fig. 5) for local averaging in xi in virtually all common use cases in rendering: Ambi-
ent occlusion, shadows from extended and/or many light sources (like for example
instant radiosity [9]), final gathering, ray marching, baking light probes and textures
for games, rendering with participating media, or effects like depth of field simula-
tion can be determined directly from path space samples. Some exemplary results
are shown in Fig. 6 and some more applications are briefly sketched in the following:
Animations: A common artifact in animations rendered with interpolation methods
is scintillation due to for example temporally incoherent cached samples, noisy
cached samples, or temporal changes in visibility. Then parameters have to be
tweaked and computation has to be started from scratch. Progressive path space
filtering removes this critical source of inefficiency: Storing the next batch starting
index si with each frame (see Sect. 2), any selected frame can be refined by just
continuing the computation as all artifacts are guaranteed to be transient.
Multiple views: In addition, path space filtering can be applied across vertices gener-
ated from multiple views. As such, rendering depth of field, stereo pairs of images

(a) (b) (c) (d)

trajectory splitting irradiance interpolation path space filtering super-sampling

Fig. 5 In order to determine the radiance in xi as seen by the long ray, a many rays are shot
into the hemisphere to sample the contributions. As this becomes too expensive due to the large
number of rays, b irradiance interpolation interpolates between cached irradiance samples that were
smoothed by trajectory splitting. c Path space filtering mimics trajectory splitting by averaging the
contributions of paths in the proximity. d Supersampling the information provided by the paths used
for path space filtering is possible by tracing additional path segments from the camera. Note that
then xi does not have an intrinsic contribution
Path Space Filtering 433

(a) (b)

ambient occlusion shadows


(c) (d)

light transport simulation complex geometry


(e) (d)

transluscent material red-cyan super imposed stereo image pair

Fig. 6 The split image comparisons show how path space filtering can remove substantial amounts
of noise in various example settings. Models courtesy S. Laine, cgtrader, Laubwerk, Stanford
Computer Graphics Laboratory, and G.M. Leal LLaaguno

(see Fig. 6f), multiple views of a scene, rendering for light field displays, or an
animation of a static scene can greatly benefit as vertices can be shared among all
frames to be rendered.
Motion blur: Identical to [11], the consistent simulation of motion blur may be real-
ized by averaging images at distinct points in time. As an alternative, extending
the range search to include proximity in time allows for averaging across ver-
tices with different points in time. In cases where linear motion is a sufficient
approximation and storing linear motion vectors is affordable, reconstructing the
visibility as introduced in [20, 21] may improve the speed of convergence.
Spectral rendering: The consistent simulation of spectral light transport may be real-
ized by averaging monochromatic contributions ci associated to a wavelength λi .
The projection onto a suitable color system may happen during the averaging
process, where the suitable basis functions are multiplied as factors to the weights.
One example of such a set of basis functions are the CIE XYZ response curves.
434 A. Keller et al.

Fig. 7 The left image shows a fence rendered with one light transport path per pixel. The image on
the right shows the result of anti-aliasing by path space filtering using the paths from the left image
and an additional three camera paths per pixel. Model courtesy Chris Wyman

One very compact continuous approximation of these response curves is proposed


in [33].
Participating media and translucency: As path space filtering works for any kind
of path space samples, it readily can be applied to the simulation of subsurface
scattering and participating media in order to improve rendering performance.
Figure 6e features a statuette with light transported through translucent matter,
where path space filtering has been performed across the surface of the statuette.
At this level of efficiency, the consistent direct simulation may become affordable
over approximations like for example bidirectional subsurface scattering distrib-
ution functions (BSSRDF) based on the dipole approximation [8].
Decoupling anti-aliasing from shading: As illustrated in Fig. 5d, it is straightforward
to just sample more camera path segments. Contributions to their query locations
are computed as before. However, similar to [28], these queries may be empty
due to the lack of a guaranteed central contribution ci and in that case must not
be considered in the accumulation process. Figure 7 illustrates how anti-aliasing
by super-sampling with path space filtering works nicely across discontinuities.

4 Conclusion

Path space filtering is simple to implement on top of any sampling-based rendering


algorithm and has low overhead. The progressive algorithm efficiently reduces vari-
ance and is guaranteed to converge without persistent artifacts due to consistency.
It will be interesting to explore the principle applied to integro-approximation prob-
lems other than computer graphics and to investigate how the method fits into the
context of multilevel Monte Carlo methods.

References

1. Ernst, M., Stamminger, M., Greiner, G.: Filter importance sampling. In: Proceedings of the
IEEE/EG Symposium on Interactive Ray Tracing, pp. 125–132 (2006)
Path Space Filtering 435

2. Gautron, P., Droske, M., Wächter, C., Kettner, L., Keller, A., Binder, N., Dahm, K.: Path space
similarity determined by Fourier histogram descriptors. In: ACM SIGGRAPH 2014 Talks,
SIGGRAPH’14, pp. 39:1–39:1. ACM (2014)
3. Georgiev, I., Křivánek, J., Davidovič, T., Slusallek, P.: Light transport simulation with vertex
connection and merging. ACM Trans. Graph. (TOG) 31(6), 192:1–192:10 (2012)
4. Hachisuka, T., Jensen, H.: Stochastic progressive photon mapping. In: SIGGRAPH Asia’09:
ACM SIGGRAPH Asia Papers, pp. 1–8. ACM (2009)
5. Hachisuka, T., Ogaki, S., Jensen, H.: Progressive photon mapping. ACM Trans. Graph. 27(5),
130:1–130:8 (2008)
6. Hachisuka, T., Pantaleoni, J., Jensen, H.: A path space extension for robust light transport
simulation. ACM Trans. Graph. (TOG) 31(6), 191:1–191:10 (2012)
7. Jensen, H.: Realistic Image Synthesis Using Photon Mapping. AK Peters, Natick (2001)
8. Jensen, H., Buhler, J.: A rapid hierarchical rendering technique for translucent materials. ACM
Trans. Graph. 21(3), 576–581 (2002)
9. Keller, A.: Instant radiosity. In: SIGGRAPH’97: Proceedings of the 24th Annual Conference
on Computer Graphics and Interactive Techniques, pp. 49–56 (1997)
10. Keller, A.: Quasi-Monte Carlo Methods for Photorealistic Image Synthesis. Ph.D. thesis, Uni-
versity of Kaiserslautern, Germany (1998)
11. Keller, A.: Quasi-Monte Carlo image synthesis in a nutshell. In: Dick, J., Kuo, F., Peters, G.,
Sloan, I. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 203–238. Springer,
Heidelberg (2013)
12. Keller, A., Binder, N.: Deterministic consistent density estimation for light transport simulation.
In: Dick, J., Kuo, F., Peters, G., Sloan, I. (eds.) Monte Carlo and Quasi-Monte Carlo Methods
2012, pp. 467–480. Springer, Heidelberg (2013)
13. Keller, A., Droske, M., Grünschloß, L., Seibert, D.: A divide-and-conquer algo-
rithm for simultaneous photon map queries. Poster at High-Performance Graphics
in Vancouver. http://www.highperformancegraphics.org/previous/www_2011/media/Posters/
HPG2011_Posters_Keller1_abstract.pdf (2011)
14. Knauer, E., Bärz, J., Müller, S.: A hybrid approach to interactive global illumination and soft
shadows. Vis. Comput.: Int. J. Comput. Graph. 26(6–8), 565–574 (2010)
15. Kollig, T., Keller, A.: Efficient bidirectional path tracing by randomized quasi-Monte Carlo
integration. In: Niederreiter, H., Fang, K., Hickernell, F. (eds.) Monte Carlo and Quasi-Monte
Carlo Methods 2000, pp. 290–305. Springer, Berlin (2002)
16. Kontkanen, J., Räsänen, J., Keller, A.: Irradiance filtering for Monte Carlo ray tracing. In: Talay,
D., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2004, pp. 259–272.
Springer, Berlin (2004)
17. Křivánek, J.: Radiance caching for global illumination computation on glossy surfaces. Ph.D.
thesis, Université de Rennes 1 and Czech Technical University in Prague (2005)
18. Křivánek, J., Gautron, P.: Practical Global Illumination with Irradiance Caching. Synthesis
lectures in computer graphics and animation. Morgan & Claypool, San Rafael (2009)
19. Lafortune, E.: Mathematical Models and Monte Carlo Algorithms for Physically Based Ren-
dering. Ph.D. thesis, Katholieke Universiteit Leuven, Belgium (1996)
20. Lehtinen, J., Aila, T., Chen, J., Laine, S., Durand, F.: Temporal light field reconstruction for
rendering distribution effects. ACM Trans. Graph. 30(4), 55:1–55:12 (2011)
21. Lehtinen, J., Aila, T., Laine, S., Durand, F.: Reconstructing the indirect light field for global
illumination. ACM Trans. Graph. 31(4), 51 (2012)
22. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM,
Philadelphia (1992)
23. Powell, M., Swann, J.: Weighted uniform sampling - a Monte Carlo technique for reducing
variance. IMA J. Appl. Math. 2(3), 228–236 (1966)
24. Schwenk, K.: Filtering techniques for low-noise previews of interactive stochastic ray tracing.
Ph.D. thesis, Technische Universität Darmstadt (2013)
25. Shepard, D.: A two-dimensional interpolation function for irregularly-spaced data. In: Pro-
ceedings of the 23rd ACM National Conference, pp. 517–524. ACM (1968)
436 A. Keller et al.

26. Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC,
London (1986)
27. Spanier, J., Maize, E.: Quasi-random methods for estimating integrals using relatively small
samples. SIAM Rev. 36(1), 18–44 (1994)
28. Suykens, F., Willems, Y.: Adaptive filtering for progressive Monte Carlo image rendering. In:
WSCG 2000 Conference Proceedings (2000)
29. Teschner, M., Heidelberger, B., Müller, M., Pomeranets, D., Gross, M.: Optimized spatial
hashing for collision detection of deformable objects. In: Proceedings of VMV’03, pp. 47–54.
Munich, Germany (2003)
30. Veach, E.: Robust Monte Carlo Methods for Light Transport Simulation. Ph.D. thesis, Stanford
University (1997)
31. Wald, I., Kollig, T., Benthin, C., Keller, A., Slusallek, P.: Interactive global illumination using
fast ray tracing. In: Debevec, P., Gibson, S. (eds.) Rendering Techniques (Proceedings of the
13th Eurographics Workshop on Rendering), pp. 15–24 (2002)
32. Ward, G., Rubinstein, F., Clear, R.: A ray tracing solution for diffuse interreflection. Comput.
Graph. 22, 85–90 (1988)
33. Wyman, C., Sloan, P., Shirley, P.: Simple analytic approximations to the CIE XYZ color match-
ing functions. J. Comput. Graph. Tech. (JCGT) 2, 1–11 (2013). http://jcgt.org/published/0002/
02/01/
Tractability of Multivariate Integration
in Hybrid Function Spaces

Peter Kritzer and Friedrich Pillichshammer

Abstract We consider tractability of integration in reproducing kernel Hilbert


spaces which are a tensor product of a Walsh space and a Korobov space. The main
result provides necessary and sufficient conditions for weak, polynomial, and strong
polynomial tractability.

Keywords Multivariate integration · Quasi-Monte Carlo · Tractability · Korobov


space · Walsh space

1 Introduction

In this paper we study multivariate integration Is (f ) = [0,1]s f (x) dx in reproducing
kernel Hilbert spaces H (K) of functions f : [0, 1]s → R, equipped with the norm
 · H (K) , where K denotes the reproducing kernel. We refer to Aronszajn [1] for
an introduction to the theory of reproducing kernel Hilbert spaces. Without loss of
generality, see, e.g., [19, 23], we can restrict ourselves to approximating Is (f ) by
means of linear algorithms QN,s of the form


N−1
QN,s (f , P) := qk f (xk ),
k=0

P. Kritzer (B) · F. Pillichshammer


Department of Financial Mathematics, Johannes Kepler University Linz,
Altenbergerstr. 69, 4040 Linz, Austria
e-mail: peter.kritzer@jku.at
P. Kritzer
Johann Radon Institute for Computational and Applied Mathematics,
Austrian Academy of Sciences, Altenbergerstr. 69, 4040 Linz, Austria
F. Pillichshammer
e-mail: friedrich.pillichshammer@jku.at

© Springer International Publishing Switzerland 2016 437


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_22
438 P. Kritzer and F. Pillichshammer

where N ∈ N, with coefficients qk ∈ C and deterministically chosen sample points


P = {x0 , x1 , . . . , xN−1 } in [0, 1)s . In this paper we further restrict ourselves to
considering only qk of the form qk = 1/N for all 0 ≤ k < N in which case one
speaks of quasi-Monte Carlo (QMC) algorithms. QMC algorithms are often used
in practical applications especially if s is large. We are interested in studying the
worst-case integration error,
 
e(H (K), P) = sup Is (f ) − QN,s (f , P) .
f ∈H (K)
f H (K) ≤1

For N ∈ N let e(N, s) be the Nth minimal QMC worst-case error,

e(N, s) = inf e(H (K), P),


P

where the infimum is extended over all N-element point sets in [0, 1)s . Additionally,
the initial error e(0, s) is defined as the worst-case error of the zero algorithm,

e(0, s) = sup |Is (f )|


f ∈H (K)
f H (K) ≤1

and is used as a reference value. In this paper we are interested in the dependence
of the worst-case error on the dimension s. To study this dependence systematically
we consider the so-called information complexity defined as

Nmin (ε, s) = min{N ∈ N0 : e(N, s) ≤ ε e(0, s)},

which is the minimal number of points required to reduce the initial error by a factor
of ε, where ε > 0.
We would like to avoid cases where the information complexity Nmin (ε, s) grows
exponentially or even faster with the dimension s or with ε−1 . To quantify the behavior
of the information complexity we use the following notions of tractability.
We say that the integration problem in H (K) is
Nmin (ε,s)
• weakly QMC-tractable, if lims+ε−1 →∞ logs+ε −1 = 0;
• polynomially QMC-tractable, if there exist non-negative numbers c, p, and q such
that Nmin (ε, s) ≤ csq ε−p ;
• strongly polynomially QMC-tractable, if there exist non-negative numbers c and
p such that Nmin (ε, s) ≤ cε−p .
Of course, strong polynomial QMC-tractability implies polynomial QMC-tractability
which in turn implies weak QMC-tractability. If we do not have weak QMC-
tractability, then we say that the integration problem in H (K) is intractable.
In the existing literature, many authors have studied tractability (since we only
deal with QMC-rules here we omit the prefix “QMC” from now on) of integration
in many different reproducing kernel Hilbert spaces. The current state of the art of
Tractability of Multivariate Integration in Hybrid Function Spaces 439

tractability theory is summarized in the three volumes of the book of Novak and
Woźniakowski [19–21] which we refer to for extensive information on this subject
and further references. Most of these investigations have in common that reproducing
kernel Hilbert spaces are tensor products of one-dimensional spaces whose kernels
are all of the same type (but maybe equipped with different weights). In this paper
we consider the case where the reproducing kernel is a tensor product of spaces
with kernels of different type. We call such spaces hybrid spaces. Some results on
tractability in general hybrid spaces can be found in the literature. For example, in
[20] multivariate integration is studied for arbitrary reproducing kernels Kd without
relation to Kd+1 . Here we consider as a special instance the tensor product of Walsh
and Korobov spaces. As far as we are aware of, this specific problem has not been
studied in the literature so far. This paper is a first attempt in this direction.
In particular, we consider the tensor product of an s1 -dimensional weighted Walsh
space and an s2 -dimensional weighted Korobov space (the exact definitions will be
given in the next section). The study of such spaces could be important in view of the
integration of functions which are periodic with respect to some of the components
and, for example, piecewise constant with respect to the remaining components.
Moreover, it has been pointed out by several scientists (see, e.g., [11, 17]) that
hybrid integration problems may be relevant for certain integration problems in
applications. Indeed, communication with the authors of [11] and [17] have motivated
our idea for considering function spaces, where we may have very different properties
of the integrands with respect to different components, as for example regarding
smoothness.
From the analytical point of view, it is very challenging to deal with integration
in hybrid spaces. The reason for this is the rather complex interplay between the
different analytic and algebraic structures of the kernel functions. In the present study
we are concerned with Fourier analysis carried out simultaneously with respect to
the Walsh and the trigonometric function system. The problem is also closely related
to the study of hybrid point sets which received much attention in recent times (see,
for example, [5, 6, 8–10, 13–15]).
The paper is organized as follows. In Sect. 2 we introduce the Hilbert space under
consideration in this paper. The main result states necessary and sufficient conditions
for various notions of tractability and is stated in Sect. 3. In Sect. 4 we prove the
necessary conditions and in Sect. 5 the sufficient ones.

2 The Hilbert Space

2.1 Basic Notation



Let k ∈ N0 with b-adic representation k = ∞ i=0 κi b
i
, ∞κi ∈ {0, . . . , b − 1}. Further-
more, let x ∈ [0, 1) with b-adic representation x = i=1 ξi b−i , ξi ∈ {0, . . . , b − 1},
unique in the sense that infinitely many of the ξi differ from b−1. If κa = 0 is the most
440 P. Kritzer and F. Pillichshammer

significant nonzero digit of k, we define the kth Walsh function walk : [0, 1) → C
(in base b) by  
ξ1 κ0 + · · · + ξa+1 κa
walk (x) := e ,
b

where e(v) := exp(2π iv). For dimension s ≥ 2 and vectors k = (k1 , . . . , ks ) ∈ Ns0
and x = (x1 , . .. , xs ) ∈ [0, 1)s we define the kth Walsh function walk : [0, 1)s → C
by walk (x) := sj=1 walkj (xj ).
Furthermore, for l ∈ Zs and y ∈ Rs we define the lth trigonometric function by
el (y) := e(l · y), where “·” denotes the usual dot product.
We define two functions r (1) , r (2) : let α > 1 and γ > 0 be reals and let γ = (γj )j≥1
be a sequence of positive reals.
• For integer b ≥ 2, and k ∈ N0 let

(1) 1 if k = 0,
rα,γ (k) := −α logb k

γb if k = 0.

(1)
 (1)
For k = (k1 , . . . , ks ) ∈ Ns0 let rα,γ (k) := sj=1 rα,γ j
(kj ). Even though the parameter
(1)
b occurs in the definition of rα,γ , we do not explicitly include it in our notation as
the choice of b will usually be clear from the context.
• For l ∈ Z let
(2) 1 if l = 0,
rα,γ (l) := −α
γ |l| if l = 0.

(2)
s (2)
For l = (l1 , . . . , ls ) ∈ Zs let rα,γ (l) := j=1 rα,γj (lj ).

2.2 Definition of the Hilbert Space

Let s1 , s2 ∈ N0 such that s1 + s2 ≥ 1. We write s = (s1 , s2 ). For x =


(x1 , . . . , xs1 ) ∈ [0, 1)s1 and y = (y1 , . . . , ys2 ) ∈ [0, 1)s2 , we use the short hand
(x, y) for (x1 , . . . , xs1 , y1 , . . . , ys2 ) ∈ [0, 1)s1 +s2 .
Let γ (1) = (γj(1) )j≥1 and γ (2) = (γj(2) )j≥1 be non-increasing sequences of positive
real numbers. We write γ for the tuple (γ (1) , γ (2) ). Furthermore, let α1 , α2 ∈ R, with
α1 , α2 > 1 and write α = (α1 , α2 ).
We first define a function Ks,α,γ : [0, 1]s1 +s2 × [0, 1]s1 +s2 → C (which will be the
kernel function of a Hilbert space, as we shall see later) by

Ks,α,γ ((x, y), (x , y ))


  (1)
:= rα1 ,γ (1) (k)rα(2)
2 ,γ

(2) (l)walk (x)el (y)walk (x )el (y )
s s
k∈N01 l∈Z 2
Tractability of Multivariate Integration in Hybrid Function Spaces 441

for (x, y), (x , y ) ∈ [0, 1]s1 +s2 (to be more precise, we should write x, x ∈ [0, 1]s1
and y, y ∈ [0, 1]s2 ; from now on, when using the notation (x, y) ∈ [0, 1]s1 +s2 , we
shall always tacitly assume that x ∈ [0, 1]s1 and y ∈ [0, 1]s2 ).
Note that Ks,α,γ can be written as

Ks,α,γ ((x, y), (x , y )) = KsWal


1 ,α1 ,γ
Kor
2 2

(1) (x, x )Ks ,α ,γ (2) (y, y ), (1)

where KsWal
1 ,α1 ,γ
(1) is the reproducing kernel of a Hilbert space based on Walsh functions.

This space is defined as


⎧ ⎫
⎨  ⎬
H (KsWal (1) ) := f =
f (k)wal : f  ,α ,γ (1) < ∞ ,

1 1 ,γ ⎩
wal k s1 1

s1
k∈N0


where the
fwal (k) := [0,1]s1 f (x)walk (x) dx are the Walsh coefficients of f and
⎛ ⎞1/2
 1
f s1 ,α1 ,γ (1) = ⎝ |
fwal (k)|2 ⎠ .
s
k∈N01
rα(1)
1 ,γ
(1) (k)

This so-called Walsh space was first introduced and studied in [3]. The kernel
KsWal
1 ,α1 ,γ
(1) can be written as (see [3, p. 157])


KsWal
1 ,α1 ,γ

(1) (x, x ) = rα(1)
1 ,γ

(1) (k)walk (x)walk (x )
s
k∈N01
 

s1  walk (xj xj )
= 1+ γj(1) (2)
j=1
bα1 logb k

k∈N

s1
= (1 + γj(1) φwal,α1 (xj , xj )), (3)
j=1

where denotes digit-wise subtraction modulo b, and where the function φwal,α1 is
defined as in [3, p. 170], where it is also noted that 1 + γj φwal,α1 (u, v) ≥ 0 for any
u, v as long as γj ≤ 1.
Furthermore, KsKor
2 ,α2 ,γ
(2) is the reproducing kernel of a Hilbert space based on

trigonometric functions. This second function space is defined as


⎧ ⎫
⎨  ⎬
H (KsKor (2) ) := f =
f (l)e : f  ,α ,γ (2) < ∞ ,

2 2 ,γ ⎩
trig l s2 2

s2
l∈Z0
442 P. Kritzer and F. Pillichshammer

where the
ftrig (l) := [0,1]s2 f (y)el (y) dy are the Fourier coefficients of f and
 1/2
 1
f s2 ,α2 ,γ (2) = |
ftrig (l)| 2
.
l∈Zs2 rα(2)
2 ,γ
(2) (l)

This so-called Korobov space is studied in many papers. We refer to [20, 22] and
the references therein for further information. The kernel KsKor
2 ,α2 ,γ
(2) can be written as

(see [22])

KsKor
2 ,α2 ,γ

(2) (y, y ) = rα(2)
2 ,γ

(2) (l)el (y)el (y )

l∈Zs2
⎛ ⎞

s2  el (yj − yj )
= ⎝1 + γj(2) ⎠ (4)
j=1
|l|α2
l∈Z\{0}
 ∞


s2
(2)
 cos(2π l(yj − yj ))
= 1 + 2γj . (5)
j=1
l α2
l=1

Note that Ks2 ,α2 ,γ (2) (y, y ) ≥ 0 as long as γj(2) ≤ (2ζ (α2 ))−1 for all j ≥ 1, where ζ is
the Riemann zeta function.
Furthermore, [1, Part I, Sect. 8, Theorem I, p. 361] implies that Ks,α,γ is the repro-
ducing kernel of the tensor product of the spaces H (KsWal 1 ,α1 ,γ
(1) ) and H (Ks ,α ,γ (2) ),
Kor
2 2
i.e., of the space

H (Ks,α,γ ) = H (KsWal
1 ,α1 ,γ
(1) ) ⊗ H (Ks ,α ,γ (2) ).
Kor
2 2

The elements of H (Ks,α,γ ) are defined on [0, 1]s1 +s2 , and the space is equipped with
the norm
⎛ ⎞1/2
  1
||f ||s,α,γ = ⎝ |
f (k, l)|2 ⎠ ,
s
k∈N01
s
l∈Z02
rα(1)
1 ,γ
(2)
(1) (k) rα ,γ (2) (l)
2


where
f (k, l) := [0,1]s1 +s2 f (x, y)walk (x)el (y) dx dy. From (1), (3) and (5) it follows
that

Ks,α,γ ((x, y), (x , y ))


⎛ ⎞⎛  ⎞
  ∞

s1 s2
cos(2π l(y j − y ))
(1)
= ⎝ (1 + γj φwal,α1 (xj , xj ))⎠ ⎝ (2) ⎠.
j
1 + 2γj α2
j=1 j=1
l
l=1
Tractability of Multivariate Integration in Hybrid Function Spaces 443

In particular, if γj(1) ≤ 1 and γj(2) ≤ (2ζ (α2 ))−1 for all j ≥ 1, then the kernel Ks,α,γ
is nonnegative.
We study the problem of numerically integrating a function f ∈ H (Ks,α,γ ), i.e.,
we would like to approximate
 
Is (f ) = f (x, y) dx dy.
[0,1]s1 [0,1]s2

s1 +s2
We use a QMC rule based on a point set SN,s = ((xn , yn ))N−1
n=0 ⊆ [0, 1) , so we
approximate Is (f ) by
1 
N−1
f (xn , yn ).
N n=0

Using [4, Proposition 2.11] we obtain that e(0, s1 + s2 ) = 1 for all s1 , s2 and

1 
N−1
e2 (H (Ks,α,γ ), SN,s ) = −1 + Ks,α,γ ((xn , yn ), (xn , yn )). (6)
N 2 n,n =0

3 The Main Result

The main result of this paper states necessary and sufficient conditions for the various
notions of tractability.
Theorem 1 We have strong polynomial QMC-tractability of multivariate integra-
tion in H (Ks,α,γ ) iff
⎛ ⎞
s1 
s2
lim ⎝ γj(1) + γj(2) ⎠ < ∞. (7)
(s1 +s2 )→∞
j=1 j=1

We have polynomial QMC-tractability of multivariate integration in H (Ks,α,γ ) iff


 s1 s2 
j=1 γj(1) j=1 γj(2)
lim + < ∞, (8)
(s1 +s2 )→∞ log+ s1 log+ s2

where log+ s = max(1, log s).


We have weak QMC-tractability of multivariate integration in H (Ks,α,γ ) iff
⎛ ⎞
s1 
s2
lim ⎝ (1)
γj + (2)
γj s1 + s2 ⎠ = 0. (9)
(s1 +s2 )→∞
j=1 j=1
444 P. Kritzer and F. Pillichshammer

The necessity of the conditions in Theorem 1 will be proven in Sect. 4 and the
sufficiency in Sect. 5. In the latter section we will see that the notions of tractability
can be achieved by using so-called hybrid point sets made of polynomial lattice point
sets and of classical lattice point sets. We will construct these by a component-by-
component algorithm.

4 Proof of the Necessary Conditions

First we prove the following theorem.

Theorem 2 For any point set SN,s = ((xn , yn ))N−1


n=0 ⊆ [0, 1) , we have
s

1  
s1 s2
e2 (H (Ks,α,γ ), SN,s ) ≥ −1 + (1 + γj(1) μ(α1 )) (1 + 2γj(2) ζ (α2 )),
N j=1 j=1

bα (b−1)
where μ(α) := bα −b
for α > 1, and where ζ is the Riemann zeta function.

Proof Let us, for the sake of simplicity, assume that

1
γj(1) ≤ 1 and γj(2) ≤ ,
2ζ (α2 )

respectively, for j ≥ 1. This imposes no loss of generality due to the fact that if we
decrease product weights, then the problem becomes easier. Under the assumption
on the weights we know from Sect. 2.2 that Ks,α,γ is nonnegative. Now, taking only
the diagonal elements in (6), and from the representations of the kernels in (1), (3)
and (5) we obtain

1 
N−1
e2 (H (Ks,α,γ ), SN,s ) ≥ −1 + Ks,α,γ ((xn , yn ), (xn , yn ))
N 2 n=0
⎛ ⎞⎛ ⎞
1 ⎝ 
s1 s2
= −1 + (1 + γj(1) μ(α1 ))⎠ ⎝ (1 + 2γj(2) ζ (α2 ))⎠ ,
N j=1 j=1

since φwal,α (x, x) = μ(α) according to [3, p. 170]. 

From Theorem 2, we conclude that for ε ∈ (0, 1) we have


⎛ ⎞⎛ ⎞
1 ⎝ 
s1 s2
Nmin (ε, s1 + s2 ) ≥ (1) (2)
(1 + γj μ(α1 ))⎠ ⎝ (1 + 2γj ζ (α2 ))⎠ .
1 + ε2 j=1 j=1
Tractability of Multivariate Integration in Hybrid Function Spaces 445

Now the two products can be analyzed in the same way as it was done in [3] and [22],
respectively. This finally leads to the necessary conditions (7) and (8) in Theorem 1.
Now assume that we have weak QMC-tractability. Then for ε = 1 we have

1  
1 s 2 s
log Nmin (1, s1 + s2 ) ≥ log + log(1 + γj(1) μ(α1 )) + log(1 + 2γj(2) ζ (α2 ))
2 j=1 j=1

and
s1 s2
j=1 log(1 + γj(1) μ(α1 )) + j=1 log(1 + 2γj(2) ζ (α2 ))
lim = 0.
(s1 +s2 )→∞ s1 + s2

This implies that limj→∞ γj(k) = 0 for k ∈ {1, 2}. For small enough x > 0 we have
log(1 + x) ≥ cx for some c > 0. Hence, for some j1 , j2 ∈ N and s1 ≥ j1 and s2 ≥ j2
we have


s1 
s2
log(1 + γj(1) μ(α1 )) + log(1 + 2γj(2) ζ (α2 ))
j=1 j=1

s1 
s2
≥ c1 μ(α1 ) γj(1) + c2 2ζ (α2 ) γj(2)
j=j1 j=j2

and therefore, under the assumption of weak QMC-tractability,


s1 s2
c1 μ(α1 ) j=j1 γj(1) + c2 2ζ (α2 ) j=j2 γj(2)
lim = 0.
(s1 +s2 )→∞ s1 + s2

This implies the necessity of (9).

5 Proof of the Sufficient Conditions

We construct, component-by-component (or, for short, CBC), a QMC algorithm


whose worst-case error implies the sufficient conditions in Theorem 1. This QMC
algorithm is based on lattice rules and on polynomial lattice rules, where the lattice
rules are used to integrate the “Korobov part” of the integrand and the polynomial
lattice rules are used to integrate the “Walsh part”. We quickly recall the concepts of
(polynomial) lattice rules:
• Lattice point sets (according to Hlawka [7] and Korobov [12]). Let N ∈ N
be an integer and let z = (z1 , . . . , zs2 ) ∈ Zs2 . The lattice point set (yn )N−1
n=0 with
generating vector z, consisting of N points in [0, 1)s2 , is defined by
446 P. Kritzer and F. Pillichshammer
 nz   nz 
1 s2
yn = ,..., for all 0 ≤ n ≤ N − 1,
N N
where {·} denotes the fractional part of a number. Note that it suffices to choose
z ∈ ZNs2 , where

ZN := {z ∈ {0, 1, . . . , N − 1} : gcd(z, N) = 1}.

• Polynomial lattice point sets (according to Niederreiter [18]). Let Fb be the


finite field of prime order b. Furthermore let Fb [x] be the set of polynomials
over Fb , and let Fb ((x −1 )) be the field of formal Laurent series over Fb . The
latter contains the field of rational functions as a subfield. Given m ∈ N, set
Gb,m := {a ∈ Fb [x] : deg(a) < m} and define a mapping νm : Fb ((x −1 )) → [0, 1)
by ∞ 
 m
−l
νm tl x := tl b−l .
l=z l=max(1,z)

Let f ∈ Fb [x] with deg(f ) = m and g = (g1 , . . . , gs1 ) ∈ Fb [x]s1 . The polynomial
lattice point set (xh )h∈Gb,m with generating vector g, consisting of bm points in
[0, 1)s1 , is defined by
    
h(x)g1 (x) h(x)gs1 (x)
xh := νm , . . . , νm for all h ∈ Gb,m .
f (x) f (x)

A QMC rule using a (polynomial) lattice point set is called (polynomial) lattice rule.

5.1 Component-by-Component Construction


We now show a CBC construction algorithm for point sets that are suitable for inte-
gration in the space H (Ks,α,γ ). For practical reasons, we will, in the following,
denote the worst-case error of a hybrid point set SN,s = ((xn , yn ))N−1
n=0 , consisting of
an s1 -dimensional polynomial lattice generated by g and an s2 -dimensional lattice
generated by z, by e2s,α,γ (g, z), where g is the generating vector of the polynomial
lattice part, and z is the generating vector of the lattice part. Using the kernel repre-
sentations in (2) and (4) we have
⎡  ⎤
1 
N−1 
s1  walk (xn,j xn ,j )
e2s,α,γ (g, z) = −1 + ⎣ 1 + γj (1) ⎦
N2 n,n =0 j=1
bα1 logb k

k∈N
⎡ ⎛ ⎞⎤

s2  el (yn,j − yn ,j )
×⎣ ⎝1 + γj(2) ⎠⎦ , (10)
j=1
|l|α2
l∈Z\{0}

where xn,j is the jth component of xn and similar for yn,j .


Tractability of Multivariate Integration in Hybrid Function Spaces 447

We now proceed to our construction algorithm. Note that we state the algorithm
in a way such that we exclude the cases s1 = 0 or s2 = 0, as these are covered by
the results in [2] and [16]. For s ∈ N let [s] := {1, . . . , s}.
Algorithm 1 Let s1 , s2 , m ∈ N, a prime number b, and an irreducible polynomial
f ∈ Fb [x] with deg(f ) = m be given. We write N = bm .
1. For d1 = 1, choose g1 = 1 ∈ Gb,m .
2. For d2 = 1, choose z1 ∈ ZN such that e2(1,1),α,γ (g1 , z1 ) is minimized as a function
of z1 .
3. For d1 ∈ [s1 ] and d2 ∈ [s2 ], assume that g∗d1 = (g1 , . . . , gd1 ) and z∗d2 =
(z1 , . . . , zd2 ) are given. If d1 < s1 and d2 < s2 go to either Step (3a) or (3b). If
d1 = s1 and d2 < s2 go to Step (3b). If d1 < s1 and d2 = s2 , go to Step (3a). If
d1 = s1 and d2 = s2 , the algorithm terminates.
a. Choose gd1 +1 ∈ Gb,m such that e2(d1 +1,d2 ),α,γ ((g∗d1 , gd1 +1 ), z∗d2 ) is minimized
as a function of gd1 +1 . Increase d1 by 1 and repeat Step 3.
b. Choose gd2 +1 ∈ ZN such that e2(d1 ,d2 +1),α,γ (g∗d1 , (z∗d2 , zd2 +1 )) is minimized as
a function of zd2 +1 . Increase d2 by 1 and repeat Step 3.
Remark 1 As pointed out in, e.g., [22] and [3], the infinite sums in (10) can be
represented in closed form, so the construction cost of Algorithm 1 is of order
O(N 3 (s1 + s2 )2 ). Of course it would be desirable to lower this cost bound. If s1 = 0
or s2 = 0 one can use the fast CBC approach based on FFT as done by Cools and
Nuyens to reduce the construction cost to O(sN log N), where s ∈ {s1 , s2 }. It is not
yet clear if these ideas also apply to the hybrid case.
Theorem 3 Let d1 ∈ [s1 ] and d2 ∈ [s2 ] be given. Then the generating vectors g∗d1
and z∗d2 constructed by Algorithm 1 satisfy
⎛ ⎞⎛ ⎞
d1   d2  
2 (1) (2)
e2(d1 ,d2 ),α,γ (g∗d1 , z∗d2 ) ≤ ⎝ 1 + γj 2μ(α1 ) ⎠ ⎝ 1 + γj 4ζ (α2 ) ⎠ .
N j=1 j=1

The proof of Theorem 3 is deferred to the appendix.

5.2 Proof of the Sufficient Conditions

From Theorem 3 it follows that for N = bm we have


⎛ ⎞⎛ ⎞
s1   s2  
2 (1) (2)
e2 (N, s1 + s2 ) ≤ ⎝ 1 + γj 2μ(α1 ) ⎠ ⎝ 1 + γj 4ζ (α2 ) ⎠ .
N j=1 j=1
448 P. Kritzer and F. Pillichshammer

∞
Assuming that (7) holds, we know that j=1 γj(1) < ∞, and hence
⎛ ⎞

s1 ∞
(1 + γj(1) μ(α1 )) ≤ exp ⎝ (1)
γj μ(α1 )⎠ =: C1 (α1 , γ (1) ).
j=1 j=1

s2
A similar argument shows that j=1 (1 + 4γj(2) ζ (α2 )) ≤ C2 (α2 , γ (2) ). Hence

2 C(α, γ )
e2 (N, s1 + s2 ) ≤ C1 (α1 , γ (1) )C2 (α2 , γ (2) ) =: .
N N

For ε > 0 choose m ∈ N such that bm−1 < C(α, γ )ε−2  =: N ≤ bm . Then we
have e(bm , s1 + s2 ) ≤ ε and hence

Nmin (ε, s1 + s2 ) ≤ bm < bN = bC(α, γ )ε−2 .

This implies strong polynomial QMC-tractability. The corresponding bounds can be


achieved with the point set constructed by Algorithm 1.
The sufficiency of the condition for polynomial QMC-tractability is shown in a
similar fashion by standard arguments (cf. [3, 22]).
For weak QMC-tractability we deduce from Theorem 3 that
⎡⎛ ⎞⎛ ⎞⎤
s1   s2  
Nmin (ε, s1 + s2 ) ≤ 2ε−2 ⎢

⎝ 1 + γj(1) 2μ(α1 ) ⎠ ⎝ 1 + γj(2) 4ζ (α2 ) ⎠⎥
⎥.
⎢ j=1 j=1 ⎥

Hence


s1 
s2
log Nmin (ε, s1 + s2 ) ≤ log 4 + 2 log ε−1 + 2μ(α1 ) γj(1) + 4ζ (α2 ) γj(2) ,
j=1 j=1

and this together with (9) implies the result.

6 Open Questions

The findings of this paper naturally lead to the following two open problems:
• Study tractability for general algorithms (not only QMC rules) and compare the
tractability conditions with the one given in Theorem 1.
• From Theorem 3 we obtain a convergence rate of order O(N −1/2 ) for the worst-
case error which is the same as for plain Monte Carlo. Improve this convergence
rate.
Tractability of Multivariate Integration in Hybrid Function Spaces 449

Acknowledgments The authors would like to thank the anonymous referees for their remarks
which helped to improve the presentation of this paper. P. Kritzer is supported by the Austrian
Science Fund (FWF), Projects P23389-N18 and F05506-26. The latter is part of the Special Research
Program “Quasi-Monte Carlo Methods: Theory and Applications”. F. Pillichshammer is supported
by the Austrian Science Fund (FWF) Project F5509-N26, which is part of the Special Research
Program “Quasi-Monte Carlo Methods: Theory and Applications”.

Appendix: The Proof of Theorem 3

Proof We show the result by an inductive argument. We start our considerations by


dealing with the case where d1 = d2 = 1. According to Algorithm 1, we have chosen
g1 = 1 ∈ Gb,m and z1 ∈ ZN such that e2(1,1),α,γ (g1 , z1 ) is minimized as a function
of z1 . In the following, we denote the points generated by (g, z) ∈ Gb,m × ZN by
(xn (g), yn (z)).
According to Eq. (10), we have

e2(1,1),α,γ (g1 , z1 ) = e21,α1 ,γ (1) (1) + θ(1,1) (z1 ),

where e21,α1 ,γ (1) (1) denotes the squared worst-case error of the polynomial lattice rule
generated by 1 in the Walsh space H (K1,α Wal
1 ,γ
(1) ), and where

⎛ ⎞
γ1(2)  ⎝ 
N−1
wal (x (1) x
n ,1 (1))
1 + γ1(1) ⎠
k1 n,1
θ(1,1) (z1 ) := 2
N n,n =0 bα1 logb k1

k1 ∈N
 el (yn,1 (z1 ) − yn ,1 (z1 ))
× 1
.
|l1 |α2
l1 ∈Z\{0}

By results in [2], we know that

2  
e21,α1 ,γ (1) (1) ≤ 1 + γ1(1) μ(α1 ) . (11)
N
Then, as z1 was chosen to minimize the error,

1 
θ(1,1) (z1 ) ≤ θ(1,1) (z)
φ(N) z∈Z
⎛ ⎞
N

γ1(2)  ⎝  walk (xn,1 (1) xn ,1 (1))


N−1
(1) ⎠
= 2 1 + γ1 1

N n,n =0 bα1 logb k1

k1 ∈N

1   el1 (yn,1 (z) − yn ,1 (z))


×
φ(N) z∈Z |l1 |α2
N l1 ∈Z\{0}
 
≤ γ1(2) 1 + γ1(1) μ(α1 ) ΣB ,
450 P. Kritzer and F. Pillichshammer

where
 
 
1  N−1
N−1
 1   e 2πi(n−n )zl1 /N 
ΣB := 2  
N n=0 n =0  φ(N) z∈Z |l1 |α2 

N l1 ∈Z\{0}
 
N  
1   1   e2πinzl1 /N 
= ,
N n=1  φ(N) z∈Z |l1 |α2 
N l1 ∈Z\{0}

since the inner sum in the second line always has the same value. We now use [16,
Lemmas 2.1 and 2.3] and obtain ΣB ≤ 4ζ (α2 )N −1 , where we used that N has only
one prime factor. Hence we obtain

γ1(2)  
θ(1,1) (z1 ) ≤ 1 + γ1(1) μ(α1 ) 4ζ (α2 ). (12)
N

Combining Eqs. (11) and (12) yields the desired bound for (g1 , z1 ).
Let us now assume d1 ∈ [s1 ] and d2 ∈ [s2 ] and that we have already found
generating vectors g∗d1 and z∗d2 such that the bound in Theorem 3 is satisfied.
In what follows, we are going to distinguish two cases: In the first case, we assume
that d1 < s1 and add a component gd1 +1 to g∗d1 , and in the second case, we assume
that d2 < s2 and add a component zd2 +1 to z∗d2 . In both cases, we will show that the
corresponding bounds on the squared worst-case errors hold.
Let us first consider the case where we start from (g∗d1 , z∗d2 ) and add, by Algorithm
1, a component gd1 +1 to g∗d1 . According to Eq. (10), we have

e2(d1 +1,d2 ),α,γ ((g∗d1 , gd1 +1 ), z∗d2 ) = e2(d1 ,d2 ),α,γ (g∗d1 , z∗d2 ) + θ(d1 +1,d2 ) (gd1 +1 ),

where

θ(d1 +1,d2 ) (gd1 +1 )


⎡  ⎤
γd(1)
+1

N−1 
d1  wal (x (g ) x (g
n ,j j ))
:= 1 2 ⎣ 1 + γj(1)
k n,j j

N n,n =0 j=1 bα1 logb k

k∈N
⎡ ⎛ ⎞⎤
d2  e (y (z ) − y (z
n ,j j ))
× ⎣ ⎝1 + γj(2) ⎠⎦
l n,j j
α2
j=1
|l|
l∈Z\{0}
 walk (xn,d (gd1 +1 ) xn ,d1 +1 (gd1 +1 ))
1 +1
× .
bα1 logb k

k∈N
Tractability of Multivariate Integration in Hybrid Function Spaces 451

However, by the assumption, we know that

d1  d2  
2  (1)
e2(d1 ,d2 ),α,γ (g∗d1 , z∗d2 ) ≤ 1 + γj 2μ(α1 ) 1 + γj(2) 4ζ (α2 ) . (13)
N j=1 j=1

Furthermore, as gd1 +1 was chosen to minimize the error,

1 
θ(d1 +1,d2 ) (gd1 +1 ) ≤ θ(d1 +1,d2 ) (g)
N g∈G
b,m
⎡ ⎤⎡ ⎤

d1    d2  
≤ γd(1)
1 +1
⎣ 1 + γj(1) μ(α1 ) ⎦ ⎣ 1 + γj(2) 2ζ (α2 ) ⎦ ΣC ,
j=1 j=1

where

 
    
1  N−1
walk (xn,d1 +1 (g) xn ,d1 +1 (g)) 
1

ΣC := 2
N n,n =0  N g∈G bα1 logb k


b,m k∈N
 
N−1 N−1  
1    1   walk (xn n ,d1 +1 (g)) 
= 2
N n=0 n =0  N g∈G bα1 logb k


b,m k∈N
 
N−1  
1   1   walk (xn,d1 +1 (g)) 
= ,
N n=0  N g∈G bα1 logb k

b,m k∈N

where we used the group structure of the polynomial lattice points


(see [4, Sect. 4.4.4]) in order to get from the first to the second line and where
we again used that the inner sum in the second line always has the same value. We
now write

 
N−1  
1  1 1   1   walk (xn,d1 +1 (g)) 
ΣC = +
N bα1 logb k
N n=1  N g∈G bα1 logb k


k∈N b,m k∈N
 
N−1  
μ(α1 ) 1   1   walk (xn,d1 +1 (g)) 
= + .
N N n=1  N g∈G bα1 logb k

b,m k∈N
452 P. Kritzer and F. Pillichshammer

Let now n ∈ {1, . . . , N − 1} be fixed, and consider the term

1   walk (xn,d1 +1 (g))


ΣC,n :=
N g∈G bα1 logb k

b,m k∈N
 1  walk (xn,d +1 (g))  1  walk (xn,d +1 (g))
= 1
+ 1

N g∈G bα1 logb k


N g∈G bα1 logb k

k∈N b,m k∈N b,m


k≡0(N) k≡0(N)

=:ΣC,n,1 + ΣC,n,2 .

By results in [2],
 1 μ(α1 ) μ(α1 )
ΣC,n,1 = = ≤ .
bα1 logb k
b mα N
k∈N
k≡0(N)

Furthermore,
 1 1 
ΣC,n,2 = walk (xn,d1 +1 (g))
bα1 logb k
N g∈G
k∈N b,m
k≡0(N)

 1
b −1
1 
m
 g 
= wal k ,
bα1 logb k
N g=0 bm
k∈N
k≡0(N)

where we used that


    
n(x)g(x)
walk (xn,d1 +1 (g)) = walk νm
g∈Gb,m g∈Gb,m
f (x)

    b
m
−1  g 
g(x)
= walk νm = walk m ,
g∈Gb,m
f (x) g=0
b

since n = 0 and since g takes on all values in Gb,m , and f is irreducible.


 However,
bm −1 #g$
 
= 0 and so ΣC,n,2 = 0. This yields ΣC,n ≤ μ(α1 )N −1 and
g=0 walk bm
−1
ΣC ≤ 2μ(α1 )N , which in turn implies

d1  d2  
2γd(1)
1 +1
μ(α1 )  (1)
θ(d1 +1,d2 ) (gd1 +1 ) ≤ 1 + γj μ(α1 ) 1 + γj(2) 2ζ (α2 ) .
N j=1 j=1
Tractability of Multivariate Integration in Hybrid Function Spaces 453

Combining the latter result with Eq. (13), we obtain

d1 +1  d2  
2  (1)
e2(d1 +1,d2 ),α,γ ((g∗d1 , gd1 +1 ), z∗d2 )) ≤ 1 + 2γj μ(α1 ) 1 + γj(2) 4ζ (α2 ) .
N j=1 j=1

The case where we start from (g∗d1 , z∗d2 ) and add, by Algorithm 1, a component
zd2 +1 to z∗d2 can be shown by a similar reasoning. We just sketch the basic points:
According to Eq. (10), we have

e2(d1 ,d2 +1),α,γ (g∗d1 , (z∗d2 , zd2 +1 )) = e2(d1 ,d2 ),α,γ (g∗d1 , z∗d2 ) + θ(d1 ,d2 +1) (zd2 +1 ),

where e2(d1 ,d2 ),α,γ (g∗d1 , z∗d2 ) satisfies (13) and where
⎡ ⎤⎡ ⎤
d1 
  d2 
 
θ(d1 ,d2 +1) (zd2 +1 ) ≤ γd(1)
2 +1
⎣ 1 + γj(1) μ(α1 ) ⎦ ⎣ 1 + γj(2) 2ζ (α2 ) ⎦ ΣD ,
j=1 j=1

with
 
N−1  
1   1   e2πinzl/N  4ζ (α2 )
ΣD = ≤ ,
N n=0  φ(N) z∈Z |l|α2  N
N l∈Z\{0}

according to [16, Lemmas 2.1 and 2.3]. This implies

d1  d2 
 
γd(1)
2 +1
4ζ (α2 ) 
θ(d1 ,d2 +1) (zd2 +1 ) ≤ 1 + γj(1) μ(α1 ) 1 + γj(2) 2ζ (α2 ) .
N j=1 j=1

Combining these results we obtain the desired bound. 

References

1. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
2. Dick, J., Kuo, F.Y., Pillichshammer, F., Sloan, I.H.: Construction algorithms for polynomial
lattice rules for multivariate integration. Math. Comput. 74, 1895–1921 (2005)
3. Dick, J., Pillichshammer, F.: Multivariate integration in weighted Hilbert spaces based on Walsh
functions and weighted Sobolev spaces. J. Complex. 21, 149–195 (2005)
4. Dick, J., Pillichshammer, F.: Digital Nets and Sequences. Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
5. Hellekalek, P.: Hybrid function systems in the theory of uniform distribution of sequences. In:
Plaskota, L., Woźniakowski, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2010, pp.
435–449. Springer, Berlin (2012)
6. Hellekalek, P., Kritzer, P.: On the diaphony of some finite hybrid point sets. Acta Arithmetica
156, 257–282 (2012)
454 P. Kritzer and F. Pillichshammer

7. Hlawka, E.: Zur angenäherten Berechnung mehrfacher Integrale. Monatshefte für Mathematik
66, 140–151 (1962)
8. Hofer, R., Kritzer, P.: On hybrid sequences built of Niederreiter-Halton sequences and Kro-
necker sequences. Bull. Aust. Math. Soc. 84, 238–254 (2011)
9. Hofer, R., Kritzer, P., Larcher, G., Pillichshammer, F.: Distribution properties of generalized
van der Corput-Halton sequences and their subsequences. Int. J. Number Theory 5, 719–746
(2009)
10. Hofer, R., Larcher, G.: Metrical results on the discrepancy of Halton-Kronecker sequences.
Mathematische Zeitschrift 271, 1–11 (2012)
11. Keller, A.: Quasi-Monte Carlo image synthesis in a nutshell. In: Dick, J., Kuo, F.Y., Peters,
G.W., Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods, pp. 213–249. Springer,
Berlin (2013)
12. Korobov, N.M.: Approximate evaluation of repeated integrals. Doklady Akademii Nauk SSSR
124, 1207–1210 (1959). (in Russian)
13. Kritzer, P.: On an example of finite hybrid quasi-Monte Carlo Point Sets. Monatshefte für
Mathematik 168, 443–459 (2012)
14. Kritzer, P., Leobacher, G., Pillichshammer, F.: Component-by-component construction of
hybrid point sets based on Hammersley and lattice point sets. In: Dick, J., Kuo, F.Y., Peters,
G.W., Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2012, 501–515. Springer,
Berlin (2013)
15. Kritzer, P., Pillichshammer, F.: On the existence of low-diaphony sequences made of digital
sequences and lattice point sets. Mathematische Nachrichten 286, 224–235 (2013)
16. Kuo, F.Y., Joe, S.: Component-by-component construction of good lattice rules with a com-
posite number of points. J. Complex. 18, 943–976 (2002)
17. Larcher, G.: Discrepancy estimates for sequences: new results and open problems. In: Kritzer,
P., Niederreiter, H., Pillichshammer, F., Winterhof, A. (eds.) Uniform Distribution and Quasi-
Monte Carlo Methods, Radon Series in Computational and Applied Mathematics, 171–189.
DeGruyter, Berlin (2014)
18. Niederreiter, H.: Low-discrepancy point sets obtained by digital constructions over finite fields.
Czechoslovak Mathematical Journal 42, 143–166 (1992)
19. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, Volume I: Linear Infor-
mation. EMS, Zurich (2008)
20. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, Volume II: Standard
Information for Functionals. EMS, Zurich (2010)
21. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, Volume III: Standard
Information for Operators. EMS, Zurich (2012)
22. Sloan, I.H., Woźniakowski, H.: Tractability of multivariate integration for weighted Korobov
classes. J. Complex. 17, 697–721 (2001)
23. Traub, J.F., Wasilkowski, G.W., Woźniakowski, H.: Information-Based Complexity. Academic
Press, New York (1988)
Derivative-Based Global Sensitivity
Measures and Their Link with Sobol’
Sensitivity Indices

Sergei Kucherenko and Shugfang Song

Abstract The variance-based method of Sobol’ sensitivity indices is very popular


among practitioners due to its efficiency and easiness of interpretation. However,
for high-dimensional models the direct application of this method can be very time-
consuming and prohibitively expensive to use. One of the alternative global sensi-
tivity analysis methods known as the method of derivative based global sensitivity
measures (DGSM) has recently become popular among practitioners. It has a link
with the Morris screening method and Sobol’ sensitivity indices. DGSM are very
easy to implement and evaluate numerically. The computational time required for
numerical evaluation of DGSM is generally much lower than that for estimation of
Sobol’ sensitivity indices. We present a survey of recent advances in DGSM and
new results concerning new lower and upper bounds on the values of Sobol’ total
sensitivity indices Sitot . Using these bounds it is possible in most cases to get a good
practical estimation of the values of Sitot . Several examples are used to illustrate an
application of DGSM.

Keywords Global sensitivity analysis · Monte Carlo methods · Quasi Monte Carlo
methods · Derivative based global measures · Morris method · Sobol’ sensitivity
indices

1 Introduction

Global sensitivity analysis (GSA) is the study of how the uncertainty in the model
output is apportioned to the uncertainty in model inputs [9, 14]. GSA can provide
valuable information regarding the dependence of the model output to its input para-
meters. The variance-based method of global sensitivity indices developed by Sobol’
[11] became very popular among practitioners due to its efficiency and easiness of

S. Kucherenko (B) · S. Song


Imperial College London, SW7 2AZ, London, UK
e-mail: s.kucherenko@imperial.ac.uk
S. Song
e-mail: shufangsong@nwpu.edu.cn

© Springer International Publishing Switzerland 2016 455


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_23
456 S. Kucherenko and S. Song

interpretation. There are two types of Sobol’ sensitivity indices: the main effect
indices, which estimate the individual contribution of each input parameter to the
output variance, and the total sensitivity indices, which measure the total contribution
of a single input factor or a group of inputs [3]. The total sensitivity indices are used
to identify non-important variables which can then be fixed at their nominal values
to reduce model complexity [9]. For high-dimensional models the direct application
of variance-based GSA measures can be extremely time-consuming and impractical.
A number of alternative SA techniques have been proposed. In this paper we
present derivative based global sensitivity measures (DGSM) and their link with
Sobol’ sensitivity indices. DGSM are based on averaging local derivatives using
Monte Carlo or Quasi Monte Carlo sampling methods. These measures were briefly
introduced by Sobol’ and Gershman in [12]. Kucherenko et al. [6] introduced some
other derivative-based global sensitivity measures (DGSM) and coined the acronym
DGSM. They showed that the computational cost of numerical evaluation of DGSM
can be much lower than that for estimation of Sobol’ sensitivity indices which later
was confirmed in other works [5]. DGSM can be seen as a generalization and for-
malization of the Morris importance measure also known as elementary effects [8].
Sobol’ and Kucherenko [15] proved theoretically that there is a link between DGSM
and the Sobol’ total sensitivity index Sitot for the same input. They showed that DGSM
can be used as an upper bound on total sensitivity index Sitot . They also introduced
modified DGSM which can be used for both a single input and groups of inputs [16].
Such measures can be applied for problems with a high number of input variables
to reduce the computational time. Lamboni et al. [7] extended results of Sobol’ and
Kucherenko for models with input variables belonging to the class of Boltzmann
probability measures.
The numerical efficiency of the DGSM method can be improved by using the auto-
matic differentiation algorithm for calculation DGSM as was shown in [5]. However,
the number of required function evaluations still remains to be proportional to the
number of inputs. This dependence can be greatly reduced using an approach based
on algorithmic differentiation in the adjoint or reverse mode [1]. It allows estimat-
ing all derivatives at a cost at most 4–6 times of that for evaluating the original
function [4].
This paper is organised as follows: Sect. 2 presents Sobol’ global sensitivity
indices. DGSM and lower and upper bounds on total Sobol’ sensitivity indices
for uniformly distributed variables and random variables are presented in Sects. 3
and 4, respectively. In Sect. 5 we consider test cases which illustrate an application
of DGSM and their links with total Sobol’ sensitivity indices. Finally, conclusions
are presented in Sect. 6.

2 Sobol’ Global Sensitivity Indices


The method of global sensitivity indices developed by Sobol’ is based on ANOVA
decomposition [11]. Consider the square integrable function f (x) defined in the unit
hypercube H d = [0, 1]d . The decomposition of f (x)
Derivative-Based Global Sensitivity Measures … 457


d 
d 
d
f (x) = f 0 + f i (xi ) + f i j (xi , x j ) + · · · + f 12···d (x1 , · · · , xd ), (1)
i=1 i=1 j>i


where f 0 = Hd f (x)d x , is called ANOVA if conditions

f i1 ...is d xik = 0 (2)
Hd

are satisfied for all different groups of indices x1 , · · · , xs such that 1 ≤ i 1 < i 2 < ...
< i s ≤ n. These conditions guarantee that all terms in (1) are mutually orthogonal
with respect to integration.
The variances of the terms in the ANOVA decomposition add up to the total
variance:
 n n
D= f (x)dx − f 0 =
2 2
Di1 ...is ,
Hd s=1 i 1 <···<i s


where Di1 ...is = H d f i21 ...is (xi1 , ..., xis )d xi1 , ..., xis are called partial variances.
Total partial variances account for the total influence of the factor xi :

Ditot = Di1 ...is ,
<i>


where the sum is extended over all different groups of indices x1 , · · · , xs satis-
<i>
fying condition 1 ≤ i 1 < i 2 < ... < i s ≤ d, 1 ≤ s ≤ d, where one of the indices is
equal to i. The corresponding total sensitivity index is defined as

Sitot = Ditot D.

Denote u i (x) the sum of all terms in ANOVA decomposition (1) that depend on xi :


d
u i (x) = f i (xi ) + f i j (xi , x j ) + · · · + f 12···d (x1 , · · · , xd ).
j=1, j=i

From the definition of ANOVA decomposition it follows that



u i (x)dx = 0. (3)
Hd
458 S. Kucherenko and S. Song

The total partial variance Ditot can be computed as


 
Ditot = u i2 (x)dx = u i2 (xi , z)d xi dz.
Hd Hd

Denote z = (x1 , ..., xi−1 , xi+1 , ..., xd ) the vector of all variables but xi , then
x ≡ (xi , z) and f (x) ≡ f (xi , z). The ANOVA decomposition of f (x) in (1) can be
presented in the following form

f (x) = u i (xi , z) + v(z),

 of terms independent of xi . Because of (2) and (3) it is easy to


where v(z) is the sum
show that v(z) = H d f (x)d xi . Hence

u i (xi , z) = f (x) − f (x)d xi . (4)
Hd

Then the total sensitivity index Sitot is equal to



d u (x)dx
2
Si = H i
tot
. (5)
D
We note that in the case of independent random variables all definitions of the
ANOVA decomposition remain to be correct but all derivations should be considered
in probabilistic sense as shown in [14] and presented in Sect. 4.

3 DGSM for Uniformly Distributed Variables

Consider continuously differentiable


 function f (x) defined in the unit hypercube
H d = [0, 1]d such that ∂ f ∂ xi ∈ L 2 .
 
 
Theorem 1 Assume that c ≤  ∂∂xfi  ≤ C. Then

c2 C2
≤ Sitot ≤ . (6)
12D 12D
The proof is presented in [15].
The Morris importance measure also known as elementary effects originally
defined as finite differences averaged over a finite set of random points [8] was
generalized in [6]:
  
 ∂ f (x) 
μi =  
 ∂ x  dx. (7)
Hd i
Derivative-Based Global Sensitivity Measures … 459

Kucherenko et al. [6] also introduced a new DGSM measure:


  2
∂ f (x)
νi = dx. (8)
Hd ∂ xi

In this paper we define two new DGSM measures:



∂ f (x)
wi(m) = xim dx, (9)
H d ∂ xi

where m is a constant, m > 0,


  2
1 ∂ f (x)
ςi = xi (1 − xi ) dx. (10)
2 Hd ∂ xi

 2
We note that νi is in fact the mean value of ∂ f ∂ xi . We also note that
∂f ∂u i
= . (11)
∂ xi ∂ xi

3.1 Lower Bounds on Sit ot

Theorem 2 There exists the following lower bound between DGSM and the Sobol’
total sensitivity index


 2
Hd [ f (1, z) − f (0, z)] [ f (1, z) + f (0, z) − 2 f (x)] dx
< Sitot . (12)
4νi D

Proof Consider an integral



∂u i (x)
u i (x) dx. (13)
Hd ∂ xi

Applying the Cauchy–Schwarz inequality we obtain the following result:


 2    2
∂u i (x) ∂u i (x)
u i (x) dx ≤ u i2 (x)dx · dx. (14)
Hd ∂ xi Hd Hd ∂ xi

It is easy to prove that the left and right parts of this inequality cannot be equal.
Indeed, for them to be equal functions u i (x) and ∂u∂ix(x)
i
should be linearly dependent.
For simplicity consider a one-dimensional case: x ∈ [0, 1]. Let’s assume
∂u(x)
= Au(x),
∂x
460 S. Kucherenko and S. Song

where A is a constant. The general solution to this equation u(x) = B exp(Ax), where
B is a constant. It is easy to see that this solution is not consistent with condition (3)
which should be imposed on function u(x).
Integral H d u i (x) ∂u∂ix(x)
i
dx can be transformed as
 1  ∂u i2 (x)
u i (x) ∂u∂ix(x) dx = dx
Hd
2 H ∂ xi
d
i

1 

= d−1 u i (1, z) − u i (0, z) dz
2 2
2 H (15)
1
= d−1 (u i (1, z) − u i (0, z)) (u i (1, z) + u i (0, z)) dz
2 H
1
= d ( f (1, z) − f (0, z)) ( f (1, z) + f (0, z) − 2v(z)) dz.
2 H
All terms in the last integrand are independent of xi , hence we can replace inte-
gration with respect to dz to integration with respect to dx and substitute v(z) for
f (x) in the integrand due to condition (3). Then (15) can be presented as
 
∂u i (x) 1
u i (x) dx = [ f (1, z) − f (0, z)] [ f (1, z) + f (0, z) − 2 f (x)] dx
Hd ∂ xi 2 Hd
(16)
From (11) ∂u∂ix(x)
i
= ∂∂f x(x)
i
, hence the right hand side of (14) can be written as νi Ditot .
Finally dividing (14) by νi D and using (16), we obtain the lower bound (12). 

We call

 2
Hd [ f (1, z) − f (0, z)] [ f (1, z) + f (0, z) − 2 f (x)] dx
4νi D

the lower bound number one (LB1).

Theorem 3 There exists the following lower bound between DGSM (9) and the
Sobol’ total sensitivity index
 2
(2m + 1) Hd ( f (1, z) − f (x)) dx − wi(m+1)
< Sitot (17)
(m + 1)2 D

Proof Consider an integral



xim u i (x)dx. (18)
Hd

Applying the Cauchy–Schwarz inequality we obtain the following result:


 2  
xim u i (x)dx ≤ xi2m dx · u i2 (x)dx. (19)
Hd Hd Hd
Derivative-Based Global Sensitivity Measures … 461

It is easy to see that equality in (19) cannot be attained. For this to happen functions
u i (x) and xim should be linearly dependent. For simplicity consider a one-dimensional
case: x ∈ [0, 1]. Let’s assume

u(x) = Ax m ,

where A = 0 is a constant. This solution does not satisfy condition (3) which should
be imposed on function u(x).
Further we use the following transformation:

 
∂ xim+1 u i (x) ∂u i (x)
dx = (m + 1) xim u i (x)dx + xim+1 dx
Hd ∂ xi Hd Hd ∂ xi

to present integral (18) in a form:


  
∂ (xim+1 u i (x))
H d x i u i (x)dx = m+1
m 1
Hd ∂ xi
dx − H d xim+1 ∂u∂ix(x)
i
dx
 
m+1 ∂u i (x)
= m+1 H d−1 u i (1, z)dz − H d xi
1
∂ xi
dx (20)
 
m+1 ∂u i (x)
= m+1 1
H d ( f (1, z) − f (x)) dx − H d x i ∂ xi
dx .

We notice that

1
xi2m dx = . (21)
Hd (2m + 1)

Using (20) and (21) and dividing (19) by D we obtain (17). 

This second lower bound on Sitot we denote γ (m):


 2
(2m + 1) Hd ( f (1, z) − f (x)) dx − wi(m+1)
γ (m) = < Sitot . (22)
(m + 1)2 D

In fact, this is a set of lower bounds depending on parameter m. We are interested


in the value of m at which γ (m) attains its maximum. Further we use star to denote
such a value m: m ∗ = arg max(γ (m)) and call
 ∗
2
(2m ∗ + 1) Hd ( f (1, z) − f (x)) dx − wi(m +1)

γ ∗ (m ∗ ) = (23)
(m ∗ + 1)2 D

the lower bound number two (LB2).


We define the maximum lower bound L B ∗ as

L B ∗ = max(L B1, L B2). (24)


462 S. Kucherenko and S. Song

We note that both lower and upper bounds can be estimated by a set of derivative
based measures:
Υi = {νi , wi(m) }, m > 0. (25)

3.2 Upper Bounds on Sit ot

Theorem 4 νi
Sitot ≤ . (26)
π2D
The proof of this Theorem in given in [15].
Consider the set of values ν1 , ..., νn , 1 ≤ i ≤ n. One can expect that smaller νi
correspond to less influential variables xi .
We further call (26) the upper bound number one (UB1).

Theorem 5 ςi
Sitot ≤ , (27)
D
where ςi is given by (10).

Proof We use the following inequality [2]:


 1  1 2  1
1
0≤ u2d x − ud x ≤ x(1 − x)u 2 d x. (28)
0 0 2 0

The inequality is reduced to an equality only if u is constant. Assume that u is given


1
by (3), then 0 ud x = 0, and from (28) we obtain (27). 

Further we call ςDi the upper bound number two (UB2). We note that 21 xi (1 − xi )
for 0 ≤ xi ≤ 1 is bounded: 0 ≤ 21 xi (1 − xi ) ≤ 18 . Therefore, 0 ≤ ςi ≤ 18 νi .

3.3 Computational Costs


∂ f (x)
All DGSM can be computed using the same set of partial derivatives ∂ xi
,
∂ f (x)
i = 1, ..., d. Evaluation of can be done analytically for explicitly given easily-
∂ xi
differentiable functions or numerically.
In the case of straightforward numerical estimations of all partial derivatives and
computation of integrals using MC or QMC methods, the number of required function
evaluations for a set of all input variables is equal to N (d + 1), where N is a number
of sampled points. Computing LB1 also requires values of f (0, z) , f (1, z), while
computing LB2 requires only values of f (1, z). In total, numerical computation of
Derivative-Based Global Sensitivity Measures … 463


L B ∗ for all input variables would require N FL B = N (d + 1) + 2N d = N (3d + 1)
function evaluations. Computation of all upper bounds require N FU B = N (d + 1)
function evaluations. We recall that the number of function evaluations required for
computation of Sitot is N FS = N (d +1) [10]. The number of sampled points N needed
to achieve numerical convergence can be different for DGSM and Sitot . It is generally
lower for the case of DGSM. The numerical efficiency of the DGSM method can be
significantly increased by using algorithmic differentiation in the adjoint (reverse)
mode [1]. This approach allows estimating all derivatives at a cost at most 6 times
of that for evaluating the original function f (x) [4]. However, as mentioned above

lower bounds also require computation of f (0, z) , f (1, z) so N FL B would only be
L B∗
reduced to N F = 6N + 2N d = N (2d + 6), while N F would be equal to 6N .
UB

4 DGSM for Random Variables

Consider a function f (x1 , ..., xd ), where x1 , ..., xd are independent random variables
with distribution functions F1 (x1 ) , ..., Fd (xd ). Thus the point x = (x1 , ..., xd ) is
defined in the Euclidean space R d and its measure is d F1 (x1 ) · · · d Fd (xd ).
The following DGSM was introduced in [15]:
  2
∂ f (x)
νi = d F(x). (29)
Rd ∂ xi

We introduce a new measure



∂ f (x)
wi = d F(x). (30)
Rd ∂ xi

4.1 The Lower Bounds on Sit ot for Normal Variables

Assume that xi is normally distributed with the finite variance σi2 and the mean
value μi .

Theorem 6
σi2 wi2
≤ Sitot . (31)
D

Proof Consider Rd xi u i (x)d F(x). Applying the Cauchy–Schwarz inequality we
obtain
 2  
xi u i (x)d F(x) ≤ xi2 d F(x) · u i2 (x)d F(x). (32)
Rd Rd Rd
464 S. Kucherenko and S. Song

Equality in (32) can be attained if functions u i (x) and xi are linearly dependent. For
simplicity consider a one-dimensional case. Let’s assume

u(x) = A(x − μ),

where A = 0 is a constant. This solution satisfies condition (3) for normally distrib-
uted variable x with the mean value μ: R d u(x)d F(x) = 0.
For normally distributed variables the following equality is true [2]:
 2  
∂u i (x)
xi u i (x)d F(x) = xi2 d F(x) · d F(x). (33)
Rd Rd Rd ∂ xi

By definition R d xi2 d F(x) = σi2 . Using (32) and (33) and dividing the resulting
inequality by D we obtain the lower bound (31). 

4.2 The Upper Bounds on Sit ot for Normal Variables

The following Theorem 7 is a generalization of Theorem 1.


 
 
Theorem 7 Assume that c ≤  ∂∂xfi  ≤ C, then

σi2 c2 σ 2C 2
≤ Sitot ≤ i . (34)
D D

The constant factor σi2 cannot be improved.

Theorem 8
σi2
Sitot ≤ νi . (35)
D

The constant factor σi2 cannot be reduced.

Proofs are presented in [15].

5 Test Cases

In this section we present the results of analytical and numerical estimation of Si ,


Sitot , LB1, LB2 and UB1, UB2. The analytical values for DGSM and Sitot were cal-
culated and compared with numerical results. For text case 2 we present convergence
plots in the form of root mean square error (RMSE) versus the number of sampled
Derivative-Based Global Sensitivity Measures … 465

points N . To reduce the scatter in the error estimation the values of RMSE were
averaged over K = 25 independent runs:
 K  ∗  21
1  Ii,k − I0 2
εi = .
K k=1 I0

Here Ii∗ can be either numerically computed Sitot , LB1, LB2 or UB1, UB2, I0 is
the corresponding analytical value of Sitot , LB1, LB2 or UB1, UB2. The RMSE can
be approximated by a trend line cN −α . Values of (−α) are given in brackets on the
plots. QMC integration based on Sobol’ sequences was used in all numerical tests.

Example 1 Consider a linear with respect to xi function:

f (x) = a(z)xi + b(z).


 
For this function Si = Sitot , Ditot = 12 1
H d−1 a (z)dz, νi =
2
H d−1 a (z)dz, L B1 =
2
 
( Hd ( 
a 2
(z)−2a 2
(z)x i) dzd x i)
2
= 0 and γ (m) =
(2m+1)m 2
( H d−1 a(z)dz ) . A maximum value
2

4D d−1 a 2 (z)dz
H
4(m+2)2 (m+1)2 D
of γ (m) is attained at m ∗ =3.745, when γ ∗ (m ∗ ) = 0.0401

 2 D
a(z)dz . The lower and upper bounds are L B∗ ≈ 0.48Sitot . U B1 ≈ 1.22Sitot .
1
U B2 = 12D
1
0 a(z) dz = Si . For this test function UB2 < UB1.
2 tot

Example 2 Consider the so-called g-function which is often used in GSA for illus-
tration purposes:


d
f (x) = gi ,
i=1

|4xi −2|+ai
where gi = 1+ai
, ai (i = 1, ..., d) are constants. It is easy to see that for this
d
function f i (xi ) = (gi − 1), u i (x) = (gi − 1) g j and as a result LB1=0. The
j=1, j=i
d 
 
total variance is D = −1 + 1+ 1/3
(1+a j )2
. The analytical values of Si , Sitot and
j=1
LB2 are given in Table 1.

Table 1 The analytical expressions for Si , Sitot and LB2 for g-function
Si Sitot γ (m)

d   
2
1/3
1+ 1/3 4 1−(1/2)m+1
(1+ai )2 (1+a j )2 (2m + 1) 1 − m+2
1/3 j=1, j=i
(1 + ai )2 D D (1 + ai )2 (m + 1)2 D
466 S. Kucherenko and S. Song

(m)
By solving equation dγdm = 0, we find that m ∗ = 9.64, γ (m ∗ ) = (1+a 0.0772
i) D
2 .

It is interesting to note that m does not∗
depend on ai ,i = 1, 2, ..., d and d. In the
extreme cases: if ai → ∞ for all i, γ S(mtot ) → 0.257, SStoti → 1, while if ai → 0 for
i i

all i, γ S(mtot ) → (4/3) d−1 , S tot → (4/3)d−1 . The analytical expression for Si
0.257 Si 1 tot
, UB1 and
i i
UB2 are given in Table 2.
Sitot Sitot
= π48 , UB2 = π12 < 1. Values of Si ,
2 2
For this test function UB1 = 41 , hence UB2
UB1
Sitot , UB and LB2 for the case of a = [0, 1, 4.5, 9, 99, 99, 99, 99], d = 8 are given
in Table 3 and shown in Fig. 1. We can conclude that for this test the knowledge of
LB2 and UB1, UB2 allows to rank correctly all the variables in the order of their
importance.
Figure 2 presents RMSE of numerical estimations of Sitot , UB1 and LB2. For an
individual input LB2 has the highest convergence rate, following by Sitot , and UB1
in terms of the number of sampled points. However, we recall that computation of
all indices requires N FL B∗ = N (3d + 1) function evaluations for LB, while for Sitot
this number is N FS = N (d + 1) and for UB it is also N FU B = N (d + 1).
 
 4 
n
Example 3 Hartmann function f (x) = − ci exp − αi j (x j − pi j )2 , xi ∈
i=1 j=1
[0, 1]. For this test case a relationship between the values LB1, LB2 and Si varies
with the change of input (Table 4, Fig. 3): for variables x2 and x6 LB1> Si > LB2,
while for all other variables LB1< LB2 <Si . LB* is much smaller than Sitot for all
inputs. Values of m* vary with the change of input. For all variables but variable 2
UB1 > UB2.

Table 2 The analytical expressions for Sitot UB1 and UB2 for g-function
Sitot U B1 U B2

d   
d   d  
1/3 1/3 1/3 1/3
(1+ai )2
1+ (1+a j )2
16 1+ (1+a j )2
4 1+ (1+a j )2
j=1, j=i j=1, j=i j=1, j=i
D (1 + ai )2 π 2 D 3(1 + ai )2 D

Table 3 Values of LB*, Si , Sitot , UB1 and UB1. Example 2, a = [0, 1, 4.5, 9, 99, 99, 99, 99],
d =8
x1 x2 x3 x4 x5 ...x8
L B∗ 0.166 0.0416 0.00549 0.00166 0.000017
Si 0.716 0.179 0.0237 0.00720 0.0000716
Sitot 0.788 0.242 0.0343 0.0105 0.000105
U B1 3.828 1.178 0.167 0.0509 0.000501
U B2 3.149 0.969 0.137 0.0418 0.00042
Derivative-Based Global Sensitivity Measures … 467

1
Si
tot
Si
0
UB
LB
−1
log (RMSE)

−2
2

−3

−4

−5
0 2 4 6 8
log (N)
2

Fig. 1 Values of Si , Sitot , LB2 and UB1 for all input variables. Example 2, a = [0, 1, 4.5,
9, 99, 99, 99, 99], d = 8

(a) 0 tot
Si (−0.977)
(b) −4 tot
Si (−0.953)
(c) −14
−15 tot
Si (−0.993)
−2 −6
log2 (RMSE)

log2 (RMSE)
log 2(RMSE)

UB1(−0.962) UB1(−0.844) −16 UB1(−0.894)


LB2(−1.134) LB2(−1.048) LB2(−0.836)
−4 −8 −17
−18
−6 −10
−19
−8 −12 −20
−21
−10 −14
−22
−12 −16 −23
4 5 6 7 8 9 10 11 12 4 5 6 7 8 9 10 11 12 4 5 6 7 8 9 10 11 12
log 2(N) log2(N) log (N)
2

Fig. 2 RMSE of Sitot , UB and LB2 versus the number of sampled points. Example 2, a = [0, 1, 4.5,
9, 99, 99, 99, 99], d = 8. Variable 1 (a), variable 3 (b) and variable 5 (c)

Table 4 Values of m ∗ , LB1, LB2, UB1, UB2, Si and Sitot for all input variables
x1 x2 x3 x4 x5 x6
L B1 0.0044 0.0080 0.0009 0.0029 0.0014 0.0357
L B2 0.0515 0.0013 0.0011 0.0418 0.0390 0.0009
m∗ 4.6 10.2 17.0 5.5 3.6 19.9
L B∗ 0.0515 0.0080 0.0011 0.0418 0.0390 0.0357
Si 0.115 0.00699 0.00715 0.0888 0.109 0.0139
Sitot 0.344 0.398 0.0515 0.381 0.297 0.482
U B1 1.089 0.540 0.196 1.088 1.073 1.046
U B2 1.051 0.550 0.150 0.959 0.932 0.899
468 S. Kucherenko and S. Song

0.5
S
i
tot
0 Si
UB
−0.5
LB1
LB2
−1
log2(RMSE)

−1.5

−2

−2.5

−3

−3.5
1 2 3 4 5 6
log2(N)

Fig. 3 Values of Si , Sitot , UB1, LB1 and LB2 for all input variables. Example 3

6 Conclusions

We can conclude that using lower and upper bounds based on DGSM it is possible
in most cases to get a good practical estimation of the values of Sitot at a fraction of
the CPU cost for estimating Sitot . Small values of upper bounds imply small values
of Sitot . DGSM can be used for fixing unimportant variables and subsequent model
reduction. For linear function and product function, DGSM can give the same variable
ranking as Sitot . In a general case variable ranking can be different for DGSM and
variance based methods. Upper and lower bounds can be estimated using MC/QMC
integration methods using the same set of partial derivative values. Partial derivatives
can be efficiently estimated using algorithmic differentiation in the reverse (adjoint)
mode.
We note that all bounds should be computed with sufficient accuracy. Standard
techniques for monitoring convergence and accuracy of MC/QMC estimates should
be applied to avoid erroneous results.

Acknowledgments The authors would like to thank Prof. I. Sobol’ his invaluable contributions
to this work. Authors also gratefully acknowledge the financial support by the EPSRC grant
EP/H03126X/1.
Derivative-Based Global Sensitivity Measures … 469

References

1. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic
Differentiation. SIAM Philadelphia, Philadelphia (2008)
2. Hardy, G.H., Littlewood, J.E., Polya, G.: Inequalities, 2nd edn. Cambridge University Press,
Cambridge (1973)
3. Homma, T., Saltelli, A.: Importance measures in global sensitivity analysis of model output.
Reliab. Eng. Syst. Saf. 52(1), 1–17 (1996)
4. Jansen, K., Leovey, H., Nube, A., Griewank, A., Mueller-Preussker, M.: A first look at quasi-
Monte Carlo for lattice field theory problems. Comput. Phys. Commun. 185, 948–959 (2014)
5. Kiparissides, A., Kucherenko, S., Mantalaris, A., Pistikopoulos, E.N.: Global sensitivity analy-
sis challenges in biological systems modeling. J. Ind. Eng. Chem. Res. 48(15), 7168–7180
(2009)
6. Kucherenko, S., Rodriguez-Fernandez, M., Pantelides, C., Shah, N.: Monte Carlo evaluation of
derivative based global sensitivity measures. Reliab. Eng. Syst. Saf. 94(7), 1135–1148 (2009)
7. Lamboni, M., Iooss, B., Popelin, A.L., Gamboa, F.: Derivative based global sensitivity mea-
sures: general links with Sobol’s indices and numerical tests. Math. Comput. Simul. 87, 45–54
(2013)
8. Morris, M.D.: Factorial sampling plans for preliminary computational experiments. Techno-
metrics 33, 161–174 (1991)
9. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global Sensitivity Analysis: The Primer. Wiley, New York (2008)
10. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based
sensitivity analysis of model output: design and estimator for the total sensitivity index. Comput.
Phys. Commun. 181(2), 259–270 (2010)
11. I.M. Sobol’ Sensitivity estimates for nonlinear mathematical models. Matem. Modelirovanie ,
2: 112-118, 1990 (in Russian). English translation: Math. Modelling and Comput. Experiment,
1(4):407–414, 1993
12. Sobol’, I.M., Gershman, A.: On an altenative global sensitivity estimators. Proc SAMO, Bel-
girate 1995, 40–42 (1995)
13. Sobol’, I.M.: Global sensitivity indices for nonlinear mathematical models and their Monte
Carlo estimates. Math. Comput. Simul. 55(1–3), 271–280 (2001)
14. Sobol’, I.M., Kucherenko, S.: Global sensitivity indices for nonlinear mathematical models.
Rev. Wilmott Mag. 1, 56–61 (2005)
15. Sobol’, I.M., Kucherenko, S.: Derivative based global sensitivity measures and their link with
global sensitivity indices. Math. Comput. Simul. 79(10), 3009–3017 (2009)
16. Sobol’, I.M., Kucherenko, S.: A new derivative based importance criterion for groups of vari-
ables and its link with the global sensitivity indices. Comput. Phys. Commun. 181(7), 1212–
1217 (2010)
Bernstein Numbers and Lower Bounds
for the Monte Carlo Error

Robert J. Kunsch

Abstract We are interested in lower bounds for the approximation of linear operators
between Banach spaces with algorithms that may use at most n arbitrary linear
functionals as information. Lower error bounds for deterministic algorithms can
easily be found by Bernstein widths; for mappings between Hilbert spaces it is already
known how Bernstein widths (which are the singular values in that case) provide
lower bounds for Monte Carlo methods. Here, a similar connection between Bernstein
numbers and lower bounds is shown for the Monte Carlo approximation of operators
between arbitrary Banach spaces. For non-adaptive algorithms we consider the
average case setting with the uniform distribution on finite dimensional balls and in
this way we obtain almost optimal prefactors. By combining known results about
Gaussian measures and their connection to the Monte Carlo error we also cover
adaptive algorithms, however with weaker constants. As an application, we find
that for the L∞ approximation of smooth functions from the class C ∞ ([0, 1]d ) with
uniformly bounded partial derivatives, randomized algorithms suffer from the curse
of dimensionality, as it is known for deterministic algorithms.

Keywords Monte Carlo · Lower error bounds · Bernstein numbers · Approximation


of smooth functions · Curse of dimensionality

1 Basic Notions and Prerequisites

1.1 Types of Errors and Information

Let S : 
F → G be a compact linear operator between Banach spaces over the reals,
the so-called solution mapping. We aim to approximate S for an input set F ⊂  F
with respect to the norm of the target space G. In this work F will always be the unit
ball of 
F.

R.J. Kunsch (B)


Friedrich-Schiller-Universität Jena, Institut für Mathematik, 07737 Jena, Germany
e-mail: robert.kunsch@uni-jena.de
© Springer International Publishing Switzerland 2016 471
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_24
472 R.J. Kunsch

Let (Ω, Σ, P) be a suitable probability space. Further let B(


F) and B(G) denote
the Borel σ -algebra of 
F and G, respectively. Under randomized algorithms, also
called Monte Carlo algorithms, we understand Σ ⊗ B( F) − B(G)-measurable
mappings An = (Aωn (·))ω∈Ω : Ω ×  F → G. This means that the output An (f )
for an input f is random, depending on ω ∈ Ω. We consider algorithms that use
at most n continuous linear functionals as information, i.e. Aωn = φ ω ◦ N ω where
Nω : F → Rn is the so-called information mapping. The mapping φ ω : Rn → G
generates an output gω = φ ω (yω ) ∈ G as a compromise for all possible inputs that
lead to the same information yω = N ω (f ) ∈ Rn . An information mapping is called
non-adaptive, if

N ω (f ) = (y1ω , . . . , ynω ) = [L1ω (f ), . . . , Lnω (f )], (1)

where all functionals Lkω are chosen at once. In that case N ω is a linear mapping for
fixed ω ∈ Ω. For adaptive information N ω the choice of the functionals may depend
on previously obtained information, we assume that the choice of the k-th functional
is a measurable mapping (ω; y1ω , . . . , yk−1
ω ω
) → Lk;y ω ω (·) for k = 1, . . . , n, see [3]
1 ,...,yk−1
for more details on measurability assumptions for adaptive algorithms. By Anran,ada
we denote the class of all Monte Carlo algorithms that use n pieces of adaptively
obtained information, for the subclass of nonadaptive algorithms we write Anran,nonada .
We regard the class of deterministic algorithms as a subclass Andet, ⊂ Anran,
( ∈ {ada, nonada}) of algorithms that are independent from ω ∈ Ω (this means in
particular that we assume deterministic algorithms to be measurable), for a particular
algorithm we write An = φ ◦ N, omitting ω. For a deterministic algorithm An the
(absolute) error at f is defined as the distance between output and exact solution

e(An , S, f ) := S(f ) − An (f ) G . (2)

For randomized algorithms An = (Aωn (·)) this can be generalized as the expected
error at f
e(An , S, f ) := E S(f ) − Aωn (f ) G , (3)

however some authors prefer the root mean square error



e2 (An , S, f ) := E S(f ) − Aωn (f ) 2G . (4)

(The expectation E is written for the integration over all ω ∈ Ω with respect to P.)
Since e(An , S, f ) ≤ e2 (An , S, f ), for lower bounds we may stick to the first version.
The global error of an algorithm An is defined as the error for the worst input
from the input set F ⊂  F, we write

e(An , S, F) := sup e(An , S, f ). (5)


f ∈F
Bernstein Numbers and Lower Bounds for the Monte Carlo Error 473

For technical purposes we also need the average error, which is defined for any
(sub-)probability measure μ (the so-called input distribution) on the input space 
F,

e(An , S, μ) := e(An , S, f ) dμ(f ). (6)

(A sub-probability measure μ on  F is a positive measure with 0 < μ( F) ≤ 1.)


The difficulty of a problem within a particular setting refers to the error of optimal
algorithms, we define

e , (n, S, F) := inf e(An , S, F) and e , (n, S, μ) := inf e(An , S, μ),


An ∈A , An ∈A ,

where ∈ {ran, det} and  ∈ {ada, nonada}. These quantities are inherent properties
of the problem S, so eran, (n, S, F) is called the Monte Carlo error, edet, (n, S, F) the
worst case error, and edet, (n, S, μ) the μ-average case error of the problem S.
Since adaption and randomization are additional features for algorithms we have

eran, (n, S, •) ≤ edet, (n, S, •) and e ,ada (n, S, •) ≤ e ,nonada (n, S, •), (7)

where • is fixed, either standing for an input set F ⊂ 


F, or for an input distribution μ.
Another important relationship connects average errors and the Monte Carlo error.
It has already been used by Bakhvalov [1, Sect. 1].

Proposition 1 (Bakhvalov’s technique) Let μ be an arbitrary (sub-)probability mea-


sure supported on F. Then

eran, (n, S, F) ≥ edet, (n, S, μ).


 
Proof Let An = Aωn ω∈Ω ∈ A ran, be a Monte Carlo algorithm. We find
 
e(An , S, F) = sup E e(Aωn , S, f ) ≥ E e(Aω e(Aωn , S, f ) dμ(f )
Fubini
n , S, f ) dμ(f ) = E
f ∈F

= E e(Aωn , S, μ) ≥ inf e(Aωn , S, μ) ≥ inf e(A n , S, μ).


ω A n ∈Andet,

In the last step we used that for any fixed elementary event ω ∈ Ω the realization Aωn
can be seen as a deterministic algorithm. 

We will prove lower bounds for the Monte Carlo error by considering particular
average case situations where we have to deal with only deterministic algorithms.
We have some freedom to choose a suitable distribution μ.
For more details on error settings and types of information see [11].
474 R.J. Kunsch

1.2 Bernstein Numbers

The compactness of S can be characterized by the Bernstein numbers

bm (S) := sup inf S(x) G , (8)


Xm ⊆
F x∈Xm
x =1

where the supremum is taken over m-dimensional linear subspaces Xm ⊆  F. These


quantities are closely related to Bernstein widths of the image S(F) within G,

bm (S(F), G) := sup sup{r ≥ 0 | Br (0) ∩ Ym ⊆ S(F)}, (9)


Ym ⊆G

where the first supremum is taken over m-dimensional linear subspaces Ym ⊆ G.


By Br (g) we denote the (closed) ball around g ∈ G with radius r. In general Bernstein
widths are greater than Bernstein numbers, however for injective operators (like
embeddings) both notions coincide (consider Ym = S(Xm )), in the case of Hilbert
spaces 
F and G Bernstein numbers and widths match the singular values σm (S).
For deterministic algorithms it can be easily seen that

edet,ada (n, S, F) ≥ bn+1 (S(F), G) ≥ bn+1 (S), (10)

since for any information mapping N :  F → Rn and all ε > 0 there always exists
−1
an f ∈ N (0) with S(f ) G ≥ bn+1 (S(F), G) (1 − ε) and ±f ∈ F, i.e. f cannot be
distinguished from −f .
If both 
F and G are Hilbert spaces, lower bounds for the (root mean square) Monte
Carlo error have been found by Novak [7]:

2
eran,ada
2 (n, S, F) ≥ σ2n (S). (11)
2
The new result for operators between arbitrary Banach spaces (see Theorem 1) reads
quite similar, for non-adaptive algorithms we have:

1
eran,nonada (n, S, F) ≥ b2n+1 (S). (12)
2

For adaptive algorithms we get at least the existence of a constant ĉ ≥ 1/215 such
that
eran,ada (n, S, F) ≥ ĉ b2n (S). (13)

1.3 Some Convex Geometry

Since our aim is to consider arbitrary real Banach spaces, we recall some facts about
the geometry of unit balls.
Bernstein Numbers and Lower Bounds for the Monte Carlo Error 475

Proposition 2 (Structure of unit balls) Let (V, · ) be a normed vector space over
the reals with its closed unit ball B := {x ∈ V : x ≤ 1}. Then
• for any finite-dimensional subspace U ⊆ V the intersection B ∩ U is compact
and has a non-empty interior with respect to the standard topology of U as a
finite-dimensional vector space, i.e. B ∩ U ⊂ U is a d-dimensional body, where
d := dim U,
• B is symmetric, i.e. if x ∈ B then −x ∈ B,
• B is convex, i.e. for x, y ∈ B and any λ ∈ (0, 1) it contains the convex combination
(1 − λ)x + λy ∈ B.
If conversely a given set B fulfills those properties, it induces a norm by

x B := inf{r ≥ 0 | x ∈ r B}, x ∈ V, (14)

where rB := {r y | y ∈ B} is the dilation of B by a factor r. The closure of B is the


corresponding closed unit ball then.
Henceforth by Vold we denote the d-dimensional volume for sets within Rd+n as
the standard euclidean space, for n = 0 this is the standard d-dimensional Lebesgue
measure.
Now, for arbitrary sets A, B ⊂ Rd and λ ∈ (0, 1) consider their convex combina-
tion
(1 − λ) A + λ B := {(1 − λ) a + λ b | a ∈ A, b ∈ B}.

The following fundamental statement provides a lower bound for the volume of the
convex combination. Note that this set is empty if one of the sets A or B is empty, so
we will exclude that case.
Proposition 3 (Brunn-Minkowski inequality) Let A, B ⊂ Rd be non-empty com-
pact sets. Then for 0 < λ < 1 we obtain

Vold ((1 − λ) A + λ B)1/d ≥ (1 − λ) Vold (A)1/d + λ Vold (B)1/d .

For a proof and more general conditions see [2]. We apply this inequality to
parallel slices through convex bodies:
Corollary 1 (Parallel slices) Let F ⊂ Rd+n be a convex body and N : Rd+n → Rn
a surjective linear mapping. Considering the parallel slices Fy := F ∩ N −1 (y), the
function
R : Rn → [0, ∞), y → Vold (Fy )1/d

is concave on its support supp R = N(F) which again is a convex body in Rn .


If in addition F is symmetric, the image N(F) is symmetric as well and the func-
tion R is even, its maximum lies in y = 0.
We omit the easy proof and complete this section by a special consequence of
Corollary 1 which we will need for Lemma 1 in Sect. 2.1.
476 R.J. Kunsch

Corollary 2 Let G be a normed vector space and U ⊂ G a d-dimensional linear


subspace with a d-dimensional volume measure Vold that extends to parallel affine
subspaces U + x0 ∈ G for x0 ∈ G by parallel projection, i.e. for a measurable set
A ⊆ U + x0 we have
Vold (A) = Vold (A − x0 ).
 

⊆U

For g ∈ G we denote the closed ball around g with radius r ≥ 0 by

Br (g) := {x ∈ G | x − g G ≤ r}.

Then
Vold (Br (g) ∩ (U + x0 )) ≤ Vold (Br (0) ∩ U) (15)

and the mapping


r → Vold (Br (g) ∩ (U + x0 ))

is continuous and strictly increasing for

r ≥ dist(g, U + x0 ) = inf x − g G .
x∈U+x0

Proof Without loss of generality, after replacing x0 by x0 − g, we assume g = 0.


Now we suppose x0 ∈ / U (since otherwise the result is trivial with equality holding
in (15)) and restrict to the (d + 1)-dimensional vector space V = U + Rx0 . We may
apply Corollary 1 to this finite-dimensional situation, where for r > 0 we get
x0
Vold (Br (0) ∩ (U + x0 )) = r d Vold B1 (0) ∩ U +
r
≤ r d Vold (B1 (0) ∩ U)
= Vold (Br (0) ∩ U),

since the central slice through the unit ball has the greatest volume. By Corollary 1
the function
R(s) := (Vold (B1 (0) ∩ (U + s x0 )))1/d ≥ 0

is concave on [0, 1/ dist(0, U + x0 )], takes its maximum for s = 0, and by this it is
continuous and monotonically decreasing for s ∈ [0, 1/ dist(0, U + x0 )]. There-
 d
fore the function r → Vold (Br (0) ∩ (U + x0 )) = r d R 1r is continuous and
monotonically increasing for r ≥ dist(0, U + x0 ) since it is composed of contin-
uous and monotone functions. It is actually strictly increasing because r d is strictly
increasing. 
Bernstein Numbers and Lower Bounds for the Monte Carlo Error 477

2 The Main Results on Lower Bounds

2.1 The Non-adaptive Setting

The proof of the following theorem needs Lemmas 1 and 2 that are provided later.

Theorem 1 (Non-adaptive Monte Carlo methods) Let S :  F → G be a compact


linear operator and F ⊂ 
F be the closed unit ball in 
F. Then, for n < m, as a lower
error bound for non-adaptive Monte Carlo methods we obtain

m−n
eran,nonada (n, S, F) ≥ bm (S).
m+1

Especially for m = 2n + 1 we have

1
eran,nonada (n, S, F) ≥ b2n+1 (S).
2

Proof For all ε > 0 there exists an m-dimensional subspace Xm ⊆ 


F such that

S(f ) G ≥ f F bm (S) (1 − ε) for f ∈ Xm .

Note that for the restricted operator we have bm (S|Xm ) ≥ (1 − ε) bm (S) and in general
eran,nonada (n, S, F) ≥ eran,nonada (n, S|Xm , F). Hence it suffices to show the theorem
for S|Xm , so without loss of generality we assume Xm =  F = Rm and therefore

S(f ) G ≥ f F bm (S) holds for all f ∈ F. Additionally we assume bm (S) > 0, i.e. S
is injective. Let μ be the uniform distribution on the input set F ⊂ Rm with respect to
the m-dimensional Lebesgue measure. We assume that the mapping N :  F → Rn is
an arbitrary surjective linear (i.e. non-adaptive) information mapping. We will show
that for any (measurable) choice of a mapping φ : Rn → G we obtain

m−n
e(φ ◦ N, S, μ) = S(f ) − φ(N(f )) G dμ(f ) ≥ bm (S), (16)
m+1

which by Proposition 1 (Bakhvalov’s technique) provides a lower bound for non-


adaptive Monte Carlo methods.
Within the first step we rewrite the integral in (16) as an integral of local average
errors over the information. The set of inputs F and the information mapping N
match the situation of Corollary 1 with m = n + d, each d-dimensional slice Fy :=
F ∩ N −1 (y) represents all inputs with the same information y ∈ Rn . Since μ is
the uniform distribution on F, the uniform distribution on Fy is a version of the
conditional distribution of μ given y = N(f ), which we denote by μy . Therefore we
can write the integral from (16) as
478 R.J. Kunsch
   
S(f ) − φ(N(f )) G dμ(f ) = S(f ) − φ(y) G dμy (f ) dμ ◦ N −1 (y),
(17)

The size of the slices Fy compared to the central slice F0 (where y = 0) shall be
 1/d
described by R(y) := Vold (Fy )/ Vold (F0 ) . The function R(y)d is a quasi-density
for the distribution μ ◦ N −1 of the information y ∈ Rn . Further, by subsequent
Lemma 1 we have a lower bound for the inner integral, which we call the local
average error:

d
S(f ) − φ(y) G dμy (f ) ≥ R(y) bm (S).
d+1

Therefore the integral (17) is bounded from below by an expression that only depends
on the volumes of the parallel slices Fy :
 
d R(y)d+1 dn y
S(f ) − φ(N(f )) G dμ(f ) ≥ N(F) bm (S), (18)
d + 1 N(F) R(y)d dn y

where dn y denotes the integration by the n-dimensional Lebesgue measure.


The problem now is reduced to a variational problem on n-variate functions R(y).
Note that 0 ≤ R(y) ≤ R(0) = 1 since R is symmetric and concave on its sup-
port N(F), which is a convex and symmetric body in Rn , see Corollary 1. The
set N(F) satisfies the structure of a unit ball, hence it defines a norm · ≡ · N(F)
on Rn (compare Proposition 2). We switch to a kind of spherical coordinates rep-
resenting any information vector y = 0 by its length r = r(y) := y and its
direction k = k(y) := 1r y, i.e. k = 1 and y = r k. Let κ denote the cone
measure (see [6] for an explicit construction) on the set of directions ∂N(F). The
n-dimensional Lebesgue integration is to be replaced by dn y = n r n−1 dr dκ(k), i.e.

   1 
R(y) d+1 n
d y 0 R(r k)d+1 r n−1 dr dκ(k)
N(F) =    , (19)
d n
N(F) R(y) d y
1 d n−1 dr dκ(k)
0 R(r k) r

where we have cancelled the factor n. For all directions k ∈ ∂N(F) the ratio of the
integrands with respect to k is globally bounded from below, in detail
1
0 R(r k)
d+1 n−1
r dr d+1 d+1
1 ≥ = ,
d n−1 d + n + 1 m +1
0 R(r k) r dr

where the function r → R(r k) ∈ [0, 1] is concave on [0, 1] and R(0) = 1. For the
solution of this univariate variational problem see subsequent Lemma 2. It follows
d+1
that (19) is bounded from below by m+1 as well, which along with (18) proves the
theorem. 
Bernstein Numbers and Lower Bounds for the Monte Carlo Error 479

The following lemma is about local average errors. Its quintessence is that ball-
shaped slices S(Fy ) ⊂ G (with respect to the norm in G) are optimal. For the general
notion of local average radius of information see [11, pp. 197–204].

Lemma 1 (Local average error) Let S : Rm → G be an injective linear mapping


between Banach spaces, where F ⊆ Rm is the unit ball with respect to an arbi-
trary norm on Rm , and let μ be the uniform distribution on F. Let N : Rm → Rn
be a linear surjective information mapping, where for y ∈ Rn the conditional
measure μy is the uniform distribution on the slice Fy := F ∩ N −1 (y). With
 1/d
R(y) := Vold (Fy )/ Vold (F0 ) and d := m − n for the local average error we
have 
d
inf S(f ) − g G dμy (f ) ≥ R(y) bm (S).
g∈G d+1

Proof Since S : Rm → G is linear and bm (S) > 0, the exact solutions S(Fy ) for
inputs with the same information y ∈ Rn in each case form (convex) sets within
d-dimensional affine subspaces Uy := S(N −1 (y)) of the output space G. We com-
pare the volume of subsets within different parallel affine subspaces, i.e. any d-
dimensional Lebesgue-like measure σ on U0 is also defined for subsets of the affine
subspaces Uy just as in Corollary 2. The linear mapping S preserves the ratio of
volumes, i.e.
Vold (Fy ) σ (S(Fy ))
= R(y)d = . (20)
Vold (F0 ) σ (S(F0 ))

Therefore for each information y ∈ N(F) the image measure μy ◦ S −1 is the uniform
distribution on S(Fy ) with respect to σ . This means that for any g ∈ G we need to
show the inequality
 
1
S(f ) − g G dμy (f ) = x − g G dσ (x) (21)
σ (S(Fy )) S(Fy )
d
≥ R(y) bm (S).
d+1

For convenience we assume σ to be scaled such that

σ (Br (0) ∩ U0 ) = r d , (22)

where Br (g) := {x ∈ G | x − g G ≤ r} is the ball around g ∈ G with radius r ≥ 0.


Given the information N(f ) = y let g ∈ G be any (possibly non-interpolatory)
choice for a return value. For r ≥ dist(g, Uy ) =: ε we define the set of those points
in Uy that have a distance of at most r to g,

Cr := Br (g) ∩ Uy ,
480 R.J. Kunsch

and write its volume as a function

V (r) := σ (Cr ).

By Corollary 2 the function V is continuous and strictly increasing for r ≥ ε and

(22)
V (r) ≤ σ (Br (0) ∩ U0 ) = r d . (23)

Therefore also the inverse function, which we denote by

ρ : [V (ε), ∞] → [ε, ∞] , with ρ(V (r)) = r for r ≥ ε,

is strictly increasing. By (23) we have ρ(r d ) ≥ ρ(V (r)) = r for r ≥ ε. That means,
for v = r d ≥ εd , and trivially for V (ε) ≤ v ≤ εd , we obtain

ρ(v) ≥ d
v , for v ≥ V (ε). (24)

Especially  √
ε = ρ(V (ε)) ≥ d
V (ε) ≥ d
v , for v ≤ V (ε). (25)

If σ (S(Fy )) ≤ V (ε) we obtain



(25) d
x − g G dσ (x) ≥ ε σ (S(Fy )) ≥ σ (S(Fy ))(d+1)/d . (26)
S(Fy ) d+1

Otherwise we introduce the abbreviation ρy := ρ(σ (S(Fy ))), where by definition we


have
σ (S(Fy )) = σ (Cρy ).

Note that

σ (S(Fy ) \ Cρy ) = σ (Cρy \ S(Fy )),


x − g G ≥ ρy , for x ∈ S(Fy ) \ Cρy , and (27)
x − g G ≤ ρy , for x ∈ Cρy \ S(Fy ).

This enables us to carry out a symmetrization:


 
(27)
x − g G dσ (x) ≥ x − g G dσ (x)
S(Fy ) Cρ y
 σ (S(Fy ))
= ε V (ε) + ρ(v) dv
V (ε)
Bernstein Numbers and Lower Bounds for the Monte Carlo Error 481
 σ (S(Fy ))
(24),(25)
≥ v1/d dv (28)
0
d
= σ (S(Fy ))(d+1)/d .
d+1

Together with (20), both cases, (26) and (28), give us



1 d d
x − g G dσ (x) ≥ R(y)σ (S(F0 ))1/d ≥ R(y) bm (S),
σ (S(Fy )) S(Fy ) d+1 d+1

which is (21). For the second inequality we have used the definition of the Bernstein
number, i.e. Bbm (S) (0) ∩ S(Rm ) ⊆ S(F) and therefore Bbm (S) (0) ∩ U0 ⊆ S(F0 ) which
with our scaling (22) implies bm (S)d ≤ σ (S(F0 )). 

Remark 1 (Alternative to Bernstein numbers) In the very end of the above proof
we have replaced σ (S(F0 )) by an expression using the Bernstein number bm (S). In
fact, due to the scaling of σ , the expression σ (S(F0 )) is a volume comparison of an
(m − n)-dimensional slice of the image of the input set S(F) and the unit ball in G.
We could replace the Bernstein numbers within Theorem 1 by new quantities
 1/(m−n)
Volm−n (S(F) ∩ Ym−n )
km,n (S) := sup inf , (29)
Xm Ym−n Volm−n (BG ∩ Ym−n )

where Xm ⊆  F and Ym−n ⊆ S(Xm ) are linear subspaces with dimension dim(Xm ) =
dim(S(Xm )) = m and dim(Ym−n ) = m − n, further BG denotes the unit ball in G and
for each choice of Ym−n the volume measure Volm−n may be any (m−n)-dimensional
Lebesgue measure, since we are only interested in the ratio of volumes.

Lemma 2 (Variational problem) For d, n ∈ N consider the variational problem of


minimizing the functional
1
R(r)d+1 r n−1 dr
F[R(r)] := 0 1 ,
d n−1 dr
0 R(r) r

where R : [0, 1] → [0, 1] is concave and R(0) = 1. Then

d+1
F[R(r)] ≥
d+n+1

with equality holding only for R(r) = 1 − r.

Proof For p > 0 with repeated integration by parts we obtain


 1
(n − 1) · · · 1
(1 − r)p−1 r n−1 dr = ,
0 p · · · (p + n − 1)
482 R.J. Kunsch

which is a special value of the beta function (see for Example [12, p. 103]). Knowing
the value of this integral we get

d+1
F[1 − r] = . (30)
d+n+1

The maximum is F[1] = 1. For other linear functions 


R(r) = (1 − r) + αr with
R(1) = α ∈ (0, 1) we can write
1
(1 − (1 − α)r)d+1 r n−1 dr
F[(1 − r) + αr] = 0 1
0 (1 − (1 − α)r) r
d n−1 dr
 1−α
[x=(1−α)r] 0 (1 − x)d+1 x n−1 dx
=  1−α ,
0 (1 − x)d x n−1 dx

where we have cancelled the factor (1 − α)−n . We can express this as a conditional
expectation using a random variable X ∈ [0, 1] with quasi density (1 − x)d x n−1 :

F[(1 − r) + αr] = E[(1 − X) | X ≤ (1 − α)] = E[(1 − X) | (1 − X) ≥ α],

which obviously is monotonically increasing in α.


For any nonlinear concave function R : [0, 1] → [0, 1] with R(0) = 1 there exists
exactly one linear function 
R(r) = (1 − r) + αr with
 1  1
R(r)d r n−1 dr − 
R(r)d r n−1 dr = 0. (31)
0 0

Due to the concavity of R there is exactly one r0 ∈ (0, 1) with R(r0 ) = 


R(r0 ).
For r ∈ (0, r0 ) we have
 
R(r) > 
R(r) > R(r0 ) ⇒ R(r)d+1 − 
R(r)d+1 > R(r0 ) R(r)d − 
R(r)d > 0.

Meanwhile for r ∈ (r0 , 1] we have


 
R(r) < 
R(r) < R(r0 ) ⇒ 0 > R(r)d+1 − 
R(r)d+1 > R(r0 ) R(r)d − 
R(r)d .

Therefore
 1  1
R(r) d+1 n−1
r dr − 
R(r)d+1 r n−1 dr
0 0
 1  1 
(31)
> R(r0 ) d n−1
R(r) r dr −  d n−1
R(r) r dr = 0,
0 0

which with (31) implies F[R] > F[


R]. 
Bernstein Numbers and Lower Bounds for the Monte Carlo Error 483

Remark 2 (Quality of the prefactor) Consider the identity idm1 : m 1 → 1 with


m

Bernstein number bm (idm1 ) = 1. (For notation see Sect. 3.1.) For any J ⊆ {1, . . . , m}
being an index set containing n indices define the deterministic algorithm

AJ (x) := xi ei , x ∈ m
1,
i∈J

where ei = (δij )m
j=1 are the vectors of the standard basis. With μ being the uniform
distribution on the unit ball B1m ⊂ m 1 , for the average case setting this type of
algorithm is optimal.
We add some randomness to the above method. Let J = J(ω) be uniformly
distributed on the system of index sets {I ⊂ {1, ..., m} | #I = n} and define the
Monte Carlo algorithm An = (Aωn )ω∈Ω by

Aωn (x) := xi ei ,
i∈J(ω)

The error is

m
m−n
e(An , idm1 , x) = E x − Aωn (x) 1 = P(i ∈
/ J(ω)) |xi | = x 1 .
i=1
m

Along with Theorem 1 we have

m−n m−n
≤ eran,nonada (n, idm1 , B1m ) ≤ .
m+1 m

The remaining gap may be due to the fact that the distribution μ within the average
case setting was no distribution on the surface of F but the uniform distribution on
the whole volume of F. Yet, for high dimensions most of the mass is concentrated
near the surface.

2.2 The Adaptive Setting

A different approach was taken by Heinrich [3]. Gaussian measures can be down-
scaled in a way such that for their truncation to the unit ball F the mass of the
truncated area is small and for any adaptive information N we have a big portion
(with respect to the Gaussian measure) of slices Fy = F ∩ N −1 (y) that are in a certain
sense close to the center so that truncation does not make a big difference for the
local average error. The Gaussian measure should however not be exaggeratedly con-
centrated around the origin for that the local average error of those central slices is
still sufficiently high. For the next theorem we combine Heinrich’s general results on
484 R.J. Kunsch

truncated Gaussian measures with Lewis’s theorem which gives us a way to choose
a suitable Gaussian measures for our average case setting.

Theorem 2 (Adaptive Monte Carlo methods) Let S :  F → G be a compact linear


operator and F ⊂ 
F be the closed unit ball in F. Then for n < m for adaptive Monte
Carlo methods we obtain
m−n
eran,ada (n, S, F) ≥ c bm (S),
m
1−2√e−1
where the constant can be chosen as c = 16 π
≥ 1
108
.

Remark 3 The given constant can be directly extracted from the proof in Heinrich [3].
However by optimizing some parts of the proof one can show that the theorem is
still valid with c = 16
1
. When restricting to homogenious algorithms (i.e. An (α f ) =
α An (f ) for α ∈ R) we may show the above result with the optimal constant c = 1
(see also Remark 2). The proofs for these statements will be published in future work.

Proof (Theorem 2) As before we assume  F = Rm . We start with the existence of in


some sense optimal Gaussian measures on  F. Let x be a standard Gaussian random
vector in Rm . Then α(J) := E Jx F defines a norm on the set of linear operators
J : Rm → Rm . By Lewis’ Theorem (see for example [10, Theorem 3.1]) there exists
a linear mapping J : Rm → Rm with maximal determinant subject to α(J) = 1,
and tr(J −1 T ) ≤ m α(T ) for any linear mapping T : Rm → Rm . In particular with
T = JP for any rank-(m − n) projection P within Rm this implies

m−n
E JPx F ≥ . (32)
m

For the average setting let μ̃ denote the Gaussian measure for the distribution of
the rescaled random vector c Jx, where c = 8√1 π , and let μ be the truncated measure,
i.e. μ(A) = μ̃(A ∩ F) for measurable sets A ⊆ Rm . Note that μ is no probability
measure, but a sub-probability measure with μ(F) < 1, which is sufficient for the
purpose of lower bounds.
Then by Heinrich [3, Proposition 2] we have

edet,ada (n, S, μ) ≥ c c inf E SJPx G , (33)


P

where
1 the infimum is taken over orthogonal rank-(m − n) projections P and c =
−1
2
− e . (The conditional measure μ̃y for μ̃ given the information y = N(f ) can
be represented as the distribution of c JPy x with a suitable orthogonal projection Py .)
With SJPx G ≥ JPx F bm (S) and (32) and c = c c we obtain the theorem. 

Note that we consider Monte Carlo algorithms with fixed information cost n,
whereas in [3] n denotes the average information cost En(ω) which leads to slightly
different bounds, like 4c b4n (S) instead of 2c b2n (S).
Bernstein Numbers and Lower Bounds for the Monte Carlo Error 485

3 Applications

3.1 Recovery of Sequences

We compare the results we obtain by Theorems 1 and 2 with some asymptotic lower
bounds of Heinrich [3] for the Monte Carlo approximation of the identity

id : Np → Nq .

Here Np denotes RN equipped with the p-norm x p = (|x1 |p + . . . + |xN |p )1/p for
p < ∞, or x ∞ = maxi=1,...,N |xi | for p = ∞, the input set is the unit ball BpN of Np .
Since the identity is injective, Bernstein widths and Bernstein numbers coincide.
Proposition 4 (Heinrich 1992) Let 1 ≤ p, q ≤ ∞ and n ∈ N. Then
⎧ 1/q−1/p

⎪ n , if 1 ≤ p, q < ∞,

⎨n−1/p (log n)1/2 , if 1 ≤ p < q = ∞,
eran,ada (n, id : 4n → 4n
q , Bp ) 
4n
p

⎪ n1/q (log n)−1/2 , if 1 ≤ q < p = ∞,


1, if p = q = ∞.

The above result is a direct application of Heinrich’s technique of truncated


Gaussian measures to a scaled version of the standard Gaussian distribution on Rm ,
here m = 4n. In detail, we need the asymptotics of the norm expectations for a
standard Gaussian vector x ∈ Rm which are E x p  m1/p for 1 ≤ p < ∞, and
E x ∞  (log m)1/2 .
Now we cite some asymptotic results on Bernstein numbers, see [4, Lemma 3.6].
Lemma 3 Let 1 ≤ p, q ≤ ∞ and m ∈ N. Then


⎨m
1/q−1/p
, if 1 ≤ p ≤ q ≤ ∞or 1 ≤ q ≤ p ≤ 2,
bm (id : 2m → q )  m
2m 1/q−1/2
, if 1 ≤ q ≤ 2 ≤ p ≤ ∞,
p


1 if 2 ≤ q ≤ p ≤ ∞.

Combining this with Theorem 2 for m = 2n one may obtain a result similar
to Proposition 4, though without the logarithmic factor for 1 ≤ p < q = ∞ and
even with a weaker polynomial order for 1 ≤ q < p ≤ ∞ if p > 2. However for
the non-adaptive setting with Theorem 1 we can use the quantities km,n (S) defined
in Remark 1. The following result on volume ratios due to Meyer and Pajor [5] is
relevant to the problematic case 1 ≤ q < p ≤ ∞.
Proposition 5 (Meyer, Pajor 1988) For every d-dimensional subspace Yd ⊂ Rm
and for 1 ≤ q ≤ p ≤ ∞ we have

Vold (Bpm ∩ Yd ) Vold (Bpd )


≥ .
Vold (Bqm ∩ Yd ) Vold (Bqd )
486 R.J. Kunsch

Corollary 3 For 1 ≤ p, q ≤ ∞ we have

eran,nonada (n, id : 4n


p → q , Bp )  n
4n 4n 1/q−1/p
.

Note that by this for the case 1 ≤ q < p = ∞ we even have stronger lower bounds
than in Proposition 4, namely without the logarithmic term, however this only holds
for non-adaptive algorithms. On the other hand, for the case 1 ≤ p < q = ∞ this
result is weaker by a logarithmic factor compared to Heinrich’s result.
Proof (Corollary 3) For 1 ≤ p ≤ q ≤ ∞ we apply Theorem 1 using the Bernstein
numbers from Lemma 3 with m = 2n.
For 1 ≤ q ≤ p ≤ ∞ let m = 4n and d = m − n = 3n. By Proposition 5 we have
 1/d  1/d
Vold (Bpm ∩ Yd ) Vold (Bpd )
km,n (id : m → m
q) = inf m ≥ . (34)
p
Yd ⊂R Vold (Bqm ∩ Yd ) Vold (Bqd )

The volume of the unit Ball in dp can be found e.g. in [10, Eq. (1.17)], it is
d
2Γ 1 + 1p
Vold (Bpd ) = .
Γ 1 + dp

For 1 ≤ p < ∞ we apply Stirling’s formula to the denominator


    d/p  
d d d d p
Γ 1+ = 2π eμ(d/p) where 0 ≤ μ ≤ ,
p p ep p 12d

and by this we obtain the asymptotics (Vold (Bpd ))1/d  d −1/p . For p = ∞ we simply
have (Vold (B∞
d
))1/d = 2. Putting this into (34), by Remark 1 together with Theorem 1
we obtain the corollary. 
Finally observe that in the case 1 ≤ p ≤ q ≤ ∞ Proposition 5 provides upper
bounds for the quantities km,n (id : m p → q ). By this we see that taking these
m

quantities instead of the Bernstein numbers will not change the order of the lower
bounds for the error eran,nonada (n, id : 4n
p → q , Bp ).
4n 4n

3.2 Curse of Dimensionality for Approximation


of Smooth Functions

For each dimension d ∈ N consider the problem

Sd = id : 
Fd → L∞ ([0, 1]d ), (35)
Bernstein Numbers and Lower Bounds for the Monte Carlo Error 487

where the input space is


Fd := {f ∈ C ∞ ([0, 1]d ) | sup Dα f ∞ < ∞}, (36)
α∈Nd0

equipped with the norm


f F := sup Dα f ∞ . (37)
α∈Nd0

Here Dα f = ∂1α1 · · · ∂dαd f denotes the partial derivative of f belonging to a multi-


index α ∈ Nd0 . The input set Fd is the unit ball in 
Fd .
Novak and Woźniakoswki have shown in [9] that this problem suffers from the
curse of dimensionality for deterministic algorithms. The proof is based on the
Bernstein numbers given in the following lemma, we will sketch the idea on how to
obtain these values.

Lemma 4 (Novak, Woźniakoswki 2009) For the problems Sd we have

bm (Sd ) = 1 for m ≤ 2d/2 .

Proof (idea) Note that · F ≥ · ∞ and therefore bm (Sd ) ≤ 1 for all m ∈ N.


Further, with s := d/2 consider the linear subspace
  
Vd := f | f (x) = ai (x1 + x2 )i1 (x3 + x4 )i2 · · · (x2s−1 + x2s )is , ai ∈ R (38)
i∈{0,1}s

of 
F with dim Vd = 2d/2 . For f ∈ Vd one can show Dα f ∞ ≤ f ∞ for all
multi-indices α ∈ Nd0 , i.e. f F = f ∞ . Therefore with m = 2d/2 and Xm = Vd
we obtain bm (S) = 1. Since the sequence of Bernstein numbers is decreasing, we
know the first 2d/2 Bernstein numbers. 

Knowing this, by Theorems 1 and 2 we directly obtain the following result for
randomized algorithms.

Corollary 4 (Curse of dimensionality) For the problems Sd we have

1
eran,nonada (n, Sd , Fd ) ≥ for n ≤ 2d/2−1 − 1,
2
and
eran,ada (n, Sd , Fd ) ≥ ĉ for n ≤ 2d/2−1 , (39)

with a suitable constant ĉ ≥ 1/215.


488 R.J. Kunsch

Note that if we do not collect any information about the problem, the best algorithm
would simply return 0 and the so-called initial error is

e(0, Sd , Fd ) = 1.

Even after evaluating exponentially many (in d) information functionals, with non-
adaptive algorithms we only halve the initial error, if at all. The problem suffers from
the curse of dimensionality. For more details on tractability notions see [8].

Acknowledgments I want to thank E. Novak and A. Hinrichs for all the valuable hints and their
encouragements during the process of compiling this work.
In addition I would like to thank S. Heinrich for his crucial hint on Bernstein numbers and
Bernstein widths.
Last but not least I would like to express my gratitude to Brown University’s ICERM for its support
with a stimulating research environment and the opportunity of having scientific conversations that
finally inspired the solution of the adaptive case during my stay in fall 2014.

References

1. Bakhvalov, N.S.: On the approximate calculation of multiple integrals. Vestnik MGU, Ser.
Math. Mech. Astron. Phys. Chem., 4:3–18: in Russian. English translation: Journal of Com-
plexity 31(502–516), 2015 (1959)
2. Gardner, R.J.: The Brunn-Minkowski inequality. Bulletin of the AMS 39(3), 355–405 (2002)
3. Heinrich, S.: Lower bounds for the complexity of Monte Carlo function approximation. J.
Complex. 8, 277–300 (1992)
4. Li, Y.W., Fang, G.S.: Bernstein n-widths of Besov embeddings on Lipschitz domains. Acta
Mathematica Sinica, English Series 29(12), 2283–2294 (2013)
5. Meyer, M., Pajor, A.: Sections of the unit ball of np . J. Funct. Anal. 80, 109–123 (1988)
6. Naor, A.: The surface measure and cone measure on the sphere of np . Trans. AMS 359, 1045–
1079 (2007)
7. Novak, E.: Optimal linear randomized methods for linear operators in Hilbert spaces. J. Com-
plex. 8, 22–36 (1992)
8. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, Linear Information, vol.
I. European Mathematical Society, Europe (2008)
9. Novak, E., Woźniakowski, H.: Approximation of infinitely differentiable multivariate functions
is intractable. J. Complex. 25, 398–404 (2009)
10. Pisier, G.: The Volume of Convex Bodies and Banach Space Geometry. Cambridge University
Press, Cambridge (1989)
11. Traub, J.F., Wasilkowski, G..W., Woźniakowski, H.: Information-Based Complexity. Academic
Press, New York (1988)
12. Wang, Z.X., Guo, D.R.: Special Functions. World Scientific, Singapore (1989)
A Note on the Importance of Weak
Convergence Rates for SPDE
Approximations in Multilevel Monte Carlo
Schemes

Annika Lang

Abstract It is a well-known rule of thumb that approximations of stochastic partial


differential equations have essentially twice the order of weak convergence com-
pared to the corresponding order of strong convergence. This is already known for
many approximations of stochastic (ordinary) differential equations while it is recent
research for stochastic partial differential equations. In this note it is shown how the
availability of weak convergence results influences the number of samples in multi-
level Monte Carlo schemes and therefore reduces the computational complexity of
these schemes for a given accuracy of the approximations.

Keywords Stochastic partial differential equations · Multilevel Monte Carlo


methods · Finite element approximations · Weak error analysis · Stochastic heat
equation

1 Introduction

Since the publication of Giles’ articles about multilevel Monte Carlo methods [8, 9],
which applied an earlier idea of Heinrich [10] to stochastic differential equations, an
enormous amount of literature on the application of multilevel Monte Carlo schemes
to various applications has been published. For an overview of the state of the art
in the area, the reader is referred to the scientific program and the proceedings of
MCQMC14 in Leuven.
This note is intended to show the consequences of the availability of different
types of convergence results for stochastic partial differential equations of Itô type
(SPDEs for short in what follows). Here we consider so called strong and weak

A. Lang (B)
Department of Mathematical Sciences, Chalmers University of Technology
and University of Gothenburg, SE-412 96 Gothenburg, Sweden
e-mail: annika.lang@chalmers.se

© Springer International Publishing Switzerland 2016 489


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_25
490 A. Lang

convergence rates, where a sequence of approximations (Y ,  ∈ N0 ) of a H-valued


random variable Y converges strongly (also called in mean square) to Y if

lim E[Y − Y 2H ]1/2 = 0.


→+∞

In the context of this note, H denotes a separable Hilbert space. The sequence is said
to converge weakly to Y if

lim |E[ϕ(Y )] − E[ϕ(Y )]| = 0


→+∞

for ϕ in an appropriately chosen class of functionals that depends in general on


the treated problem. While strong convergence results for approximations of many
SPDEs are already well-known, corresponding orders of weak convergence that are
better than the strong ones are just rarely available. For an overview on the existing
literature on weak convergence, the reader is referred to [1, 11] and the literature
therein. The necessity to do further research in this area is besides other motivations
also due to the efficiency of multilevel Monte Carlo approximations, which is the
content of this note. By a rule of thumb one expects the order of weak convergence to
be twice the strong one for SPDEs. This is shown under certain smoothness assump-
tions on the SPDE and its approximation in [1]. We use the SPDE from [1] and its
approximations with the desired strong and weak convergence rates to show that
the additional knowledge of better weak than strong convergence rates changes the
choices of the number of samples per level in a multilevel Monte Carlo approxi-
mation according to the theory. Since, for a given accuracy, the number of samples
reduces with the availability of weak rates, the overall computational work decreases.
Computing numbers, we shall see in the end that for high dimensional problems and
low regularity of the original SPDE the work using only strong approximation results
is essentially the squared work using also weak approximation rates. In other words
the order of the complexity of the work in terms of accuracy decreases essentially by
a factor of 2, when weak convergence rates are available. The intention of this note
is to point out this important fact by writing down the resulting numbers explicitly.
First simulation results are presented in the end for a stochastic heat equation in
one dimension driven by additive space-time white noise, which, to the best of my
knowledge, are the first simulation results of that type in the literature. The obtained
results confirm the theory.
This work is organized as follows: In Sect. 2 the multilevel Monte Carlo method
is recalled including results for the approximation of Hilbert-space-valued random
variables on arbitrary refinements. SPDEs and their approximations are introduced in
Sect. 3 and results for strong and weak errors from [1] are summarized. The results
from Sects. 2 and 3 are combined in Sect. 4 to a multilevel Monte Carlo scheme
for SPDEs and the consequences of the knowledge of weak convergence rates are
outlined. Finally, the theory is confirmed by simulations in Sect. 5.
A Note on the Importance of Weak Convergence Rates … 491

2 Multilevel Monte Carlo for Random Variables

In this section we recall and improve a convergence and a work versus accuracy
result for the multilevel Monte Carlo estimator of a Hilbert-space-valued random
variable from [3]. This is used to calculate errors and computational work for the
approximation of stochastic partial differential equations in Sect. 4. A multilevel
Monte Carlo method for (more general) Banach-space-valued random variables has
been introduced in [10], where the author derives bounds on the error for given work.
Here, we do the contrary and bound the overall work for a given accuracy.
We start with a lemma on the convergence in the number of samples of a Monte
Carlo estimator. Therefore let (Ω, A , P) be a probability space and let Y be a random
variable with values in a Hilbert space (B, (·, ·)B ) and (Ŷ i , i ∈ N) be a sequence of
independent, identically distributed copies of Y . Then the strong law of large numbers
states that the Monte Carlo estimator EN [Y ] defined by

1  i
N
EN [Y ] := Ŷ
N i=1

converges P-almost surely to E[Y ] for N → +∞. In the following lemma we see that
it also converges in mean square to E[Y ] if Y is square integrable, i.e., Y ∈ L 2 (Ω; B)
with
 
L 2 (Ω; B) := v : Ω → B, v strongly measurable, vL2 (Ω;B) < +∞ ,

where
vL2 (Ω;B) := E[v2B ]1/2 .

In contrast to the almost sure convergence of EN [Y ] derived from the strong law of
large numbers, a convergence rate in mean square can be deduced from the following
lemma in terms of the number of samples N ∈ N.
Lemma 1 For any N ∈ N and for Y ∈ L 2 (Ω; B), it holds that

1 1
E[Y ] − EN [Y ]L2 (Ω;B) = √ Var[Y ]1/2 ≤ √ Y L2 (Ω;B) .
N N

The lemma is proven in, e.g., [6, Lemma 4.1]. It shows that the sequence of so-
called Monte Carlo estimators (EN [Y ], N ∈ N) converges with rate O(N −1/2 ) in
mean square to the expectation of Y .
Next let us assume that (Y ,  ∈ N0 ) is a sequence of approximations of Y , e.g.,
Y ∈ V , where (V ,  ∈ N0 ) is a sequence of finite dimensional subspaces of B. For
given L ∈ N0 , it holds that
492 A. Lang


L
YL = Y0 + (Y − Y−1 )
=1

and due to the linearity of the expectation that


L
E[YL ] = E[Y0 ] + E[Y − Y−1 ].
=1

A possible way to approximate E[YL ] is to approximate E[Y − Y−1 ] with the cor-
responding Monte Carlo estimator EN [Y − Y−1 ] with a number of independent
samples N depending on the level . We set


L
E L [YL ] := EN0 [Y0 ] + EN [Y − Y−1 ]
=1

and call E L [YL ] the multilevel Monte Carlo estimator of E[YL ]. The following lemma
gives convergence results for the estimator depending on the order of weak conver-
gence of (Y ,  ∈ N0 ) to Y and the convergence of the variance of (Y −Y−1 ,  ∈ N).
If neither estimates on weak convergence rates nor on the convergence of the vari-
ances are available, one can use—the in general slower—strong convergence rates.

Lemma 2 Let Y ∈ L 2 (Ω; B) and let (Y ,  ∈ N0 ) be a sequence in L 2 (Ω; B), then,
for L ∈ N0 , it holds that

E[Y ] − E L [YL ]L2 (Ω;B)


≤ E[Y − YL ]B + E[YL ] − E L [YL ]L2 (Ω;B)
 1/2
L
−1 −1
= E[Y − YL ]B + N0 Var[Y0 ] + N Var[Y − Y−1 ]
=1
 1/2

L
≤ Y − YL L2 (Ω;B) + 2 N−1 (Y − Y 2L2 (Ω;B) + Y − Y−1 2L2 (Ω;B) ) ,
=0

where Y−1 := 0.

Proof This is essentially [3, Lemma 2.2] except that the square root is kept outside
the sum. Therefore it remains to show the property of the multilevel Monte Carlo
estimator that


L
E[YL ] − E L [YL ]2L2 (Ω;B) = N0−1 Var[Y0 ] + N−1 Var[Y − Y−1 ].
=1
A Note on the Importance of Weak Convergence Rates … 493

To prove this we first observe that


L
E[YL ] − E [YL ] = E[Y0 ] − EN0 [Y0 ] +
L
(E[Y − Y−1 ] − EN [Y − Y−1 ])
=1

and that all summands are independent, centered random variables by the construc-
tion of the multilevel Monte Carlo estimator. Thus [7, Proposition 1.12] implies
that

E[E[YL ] − E L [YL ]2B ]



L
= E[E[Y0 ] − EN0 [Y0 ]2B ] + E[E[Y − Y−1 ] − EN [Y − Y−1 ]2B ]
=1

and Lemma 1 yields the claim. 

This lemma enables us to choose for a given order of weak convergence of


(Y ,  ∈ N0 ) and for given convergence rates of the variances of (Y − Y−1 ,  ∈ N)
the number of samples N on each level  ∈ N0 such that all terms in the error
estimate are equilibrated.
The following theorem is essentially Theorem 2.3 in [3]. While it was previ-
ously formulated for a sequence of discretizations obtained by regular subdivision,
i.e., h = C2−α , it is written down for general sequences of discretizations here
with improved sample sizes. For completeness we include the proof. We should also
remark that the convergence with basis 2 by regular subdivision in [3] is useful and
important for SPDEs since most available approximation schemes that can be imple-
mented are obtained in that way. Nevertheless, it is also known that the refinement
with respect to basis 2 is not optimal for multilevel Monte Carlo approximations.
Therefore it makes sense to reformulate the theorem in this more general way.

Theorem 1 Let (a ,  ∈ N0 ) be a decreasing sequence of positive real numbers


that converges to zero and let (Y ,  ∈ N0 ) converge weakly to Y , i.e., there exists a
constant C1 such that
E[Y − Y ]B ≤ C1 a

for  ∈ N0 . Furthermore assume that the variance of (Y − Y−1 ,  ∈ N) converges


with order 2η ∈ [0, 2] with respect to (a ,  ∈ N0 ), i.e., there exists a constant C2
such that

Var[Y − Y−1 ] ≤ C2 a ,

and that Var[Y0 ] = C3 . For a chosen level L ∈ N0 , set N := aL−2 a 1+ε ,


 = 1, . . . , L, ε > 0, and N0 := aL−2 , then the error of the multilevel Monte


Carlo approximation is bounded by

E[Y ] − E L [YL ]L2 (Ω;B) ≤ (C1 + (C3 + C2 ζ (1 + ε))1/2 ) aL ,


494 A. Lang

where ζ denotes the Riemann zeta function, i.e., E[Y ] − E L [YL ]L2 (Ω;B) has the
same order of convergence as E[Y − YL ]B .
Assume further that the work WB of one calculation of Y − Y−1 ,  ≥ 1, is
bounded by C4 a−κ for a constant C4 and κ > 0, that the work to calculate Y0 is
bounded by a constant C5 , and that the addition of the Monte Carlo estimators costs
C6 aL−δ for some δ ≥ 0 and some constant C6 . Then the overall work WL is bounded
by
  L
−(κ−2η) 1+ε 
WL  aL−2 C5 + C4 a  + C6 aL−δ .
=1

If furthermore (a ,  ∈ N0 ) decreases polynomially, i.e., there exists a > 1 such that
a = O(a− ), then the bound on the computational work simplifies to

− max{2,δ}
O(aL ) if κ < 2η,
WL = −(2+κ−2η) 2+ε −δ
O(max{aL L , aL }) if κ ≥ 2η.

Proof First, we calculate the error of the multilevel Monte Carlo estimator. It holds
with the made assumptions that

N0−1 Var[Y0 ] ≤ C3 aL2

and, for  = 1, . . . , L, that


−2η −(1+ε)
N−1 Var[Y − Y−1 ] ≤ C2 aL2 a a = C2 aL2 −(1+ε) .



So overall we get that


L 
L
N−1 Var[Y − Y−1 ] ≤ C2 aL2 −(1+ε) ≤ C2 aL2 ζ (1 + ε),
=1 =1

where ζ denotes the Riemann zeta function. To finish the calculation of the error we
apply Lemma 2 and assemble all estimates to

E[Y ] − E L [YL ]L2 (Ω;B) ≤ (C1 + (C3 + C2 ζ (1 + ε))1/2 ) aL .

Next we calculate the necessary work to achieve this error. The overall work consists
of the work WB to compute Y − Y−1 times the number of samples N on all levels
 = 1, . . . , L, the work W0B on level 0, and the addition of the Monte Carlo estimators
in the end. Therefore, using the observation that N aL−2 a 1+ε ,  = 1, . . . , L,

−2
and N0 aL with equality if the right hand side is an integer, we obtain that
A Note on the Importance of Weak Convergence Rates … 495


L
WL ≤ C5 N0 + C4 N a−κ + C6 aL−δ
=1


L
 C5 aL−2 + C4 aL−2 a 1+ε a−κ + C6 aL−δ

=1

 
L
−(κ−2η) 1+ε 
≤ aL−2 C5 + C4 a  + C6 aL−δ ,
=1

which proves the first claim of the theorem on the necessary work.
If κ < 2η and additionally (a ,  ∈ N0 ) decreases polynomially, the sum on the
right hand side is absolutely convergent and therefore
− max{2,δ}
WL  (C5 + C4 C)aL−2 + C6 aL−δ = O(aL ).

For κ ≥ 2η, it holds that


−(κ−2η) 2+ε
WL  aL−2 (C5 + C4 aL L ) + C6 aL−δ
−(2+κ−2η) 2+ε
= O(max{aL L , aL−δ }).

This finishes the proof of the theorem. 

We remark that the computation of the sum over different levels of the Monte
Carlo estimators does not increase the computational complexity if Y ∈ V for all
 ∈ N0 and (V ,  ∈ N0 ) is a sequence of nested finite dimensional subspaces of B.

3 Approximation of Stochastic Partial Differential


Equations

In this section we use the framework of [1] and recall the setting and the results
presented in that manuscript. We use the different orders of strong and weak conver-
gence of a Galerkin method for the approximation of a stochastic parabolic evolution
problem in Sect. 4 to show that it is essential for the efficiency of multilevel Monte
Carlo methods to consider also weak convergence rates and not only strong ones as
was presented in [6].
Let (H, (·, ·)H ) be a separable Hilbert space with induced norm ·H and Q : H →
H be a self-adjoint positive semidefinite linear operator. We define the reproducing
kernel Hilbert space H = Q1/2 (H) with inner product (·, ·)H = (Q−1/2 ·, Q−1/2 ·)H ,
where Q−1/2 denotes the square root of the pseudo inverse of Q which exists due
to the made assumptions. Let us denote by LHS (H ; H) the space of all Hilbert–
Schmidt operators from H to H, which will be abbreviated by LHS in what follows.
Furthermore L(H) is assumed to be the space of all bounded linear operators from
496 A. Lang

H to H. Finally, let (Ω, A , (Ft )t≥0 , P) be a filtered probability space satisfying


the “usual conditions” which extends the probability space already introduced in
Sect. 2. The corresponding Bochner spaces are denoted by L p (Ω; H), p ≥ 2, with
p
norms given by  · Lp (Ω;H) = E[ · H ]1/p . In this framework we denote by W =
(W (t), t ≥ 0) a (Ft )t≥0 -adapted Q-Wiener process. Let us consider the stochastic
partial differential equation

dX(t) = (AX(t) + F(X(t))) dt + dW (t) (1)

as Hilbert-space-valued stochastic differential equation on the finite time inter-


val (0, T ], T < +∞, with deterministic initial condition X(0) = X0 . We pose
the following assumptions on the parameters, which ensure the existence of a mild
solution and some properties of the solution which are necessary for the derivation
and convergence of approximation schemes.
Assumption 1 Assume that the parameters of (1) satisfy the following:
1. Let A be a negative definite, linear operator on H such that (−A)−1 ∈ L(H) and
A is the generator of an analytic semigroup (S(t), t ≥ 0) on H.
2. The initial value X0 is deterministic and satisfies (−A)β X0 ∈ H for some β ∈
[0, 1].
3. The covariance operator Q satisfies (−A)(β−1)/2 LHS < +∞ for the same β as
above.
4. The drift F : H → H is twice differentiable in the sense that F ∈ Cb1 (H; H) ∩
Cb2 (H; Ḣ −1 ), where Ḣ −1 denotes the dual space of the domain of (−A)1/2 .
Under Assumption 1, the SPDE (1) has a continuous mild solution

t
t
X(t) = S(t)X0 + S(t − s)F(X(s)) ds + S(t − s) dW (s) (2)
0 0

for t ∈ [0, T ], which is in L p (Ω; H) for all p ≥ 2 and satisfies for some constant C
that
sup X(t)Lp (Ω;H) ≤ C(1 + X0 H ).
t∈[0,T ]

We approximate the mild solution by a Galerkin method in space and a semi-implicit


Euler–Maruyama scheme in time, which is made precise in what follows and spares
us the treatment of stability issues. Therefore let (V ,  ∈ N0 ) be a nested family
of finite dimensional subspaces of V with refinement level  ∈ N0 , refinement sizes
(h ,  ∈ N0 ), associated H-orthogonal projections P , and norm induced by H. For
 ∈ N0 , the sequence (V ,  ∈ N0 ) is supposed to be dense in H in the sense that for
all φ ∈ H, it holds that
lim φ − P φH = 0.
→+∞

We denote the approximate operator by A : V → V and specify the necessary


properties in Assumption 2 below. Furthermore let (Θ n , n ∈ N0 ) be a sequence of
A Note on the Importance of Weak Convergence Rates … 497

equidistant time discretizations with step sizes Δt n , i.e., for n ∈ N0 ,

Θ n := {tkn = Δt n k, k = 0, . . . , N(n)},

where N(n) = T /Δt n , which we assume to be an integer for simplicity reasons. We


define the fully discrete semigroup approximation by S,n := (I − Δt n A )−1 P and
assume the following:
Assumption 2 The linear operators A : V → V ,  ∈ N0 , and the orthogonal
projectors P : H → V ,  ∈ N0 , satisfy for all k = 1, . . . , N(n) that

(−A )ρ S,n
k
L(H) ≤ C(tkn )−ρ

for ρ ≥ 0 and
(−A )−ρ P (−A)ρ L(H) ≤ C

for ρ ∈ [0, 1/2] uniformly in , n ∈ N0 . Furthermore they satisfy for all θ ∈ [0, 2],
ρ ∈ [−θ, min{1, 2 − θ }], and k = 1, . . . , N(n),

(S(tkn ) − S,n
k
)(−A)ρ/2 L(H) ≤ C(hθ + (Δt n )θ/2 )(tkn )−(θ+ρ)/2 .

The fully discrete semi-implicit Euler–Maruyama approximation is then given in


recursive form for tkn = Δt n k ∈ Θ n and for  ∈ N0 by

X,n (tkn ) := S,n X,n (tk−1


n
) + S,n F(X,n (tk−1
n
)) Δt n + S,n (W (tkn ) − W (tk−1
n
))

with X,n (0) := P X0 , which may be rewritten as


k k

 tjn
k−j+1 k−j+1
X,n (tkn ) = k
S,n X0 + Δt n
S,n F(X,n (tj−1
n
)) + S,n dW (s). (3)
n
j=1 j=1 tj−1

We remark here that we do not approximate the noise which might cause problems in
implementations. One way to treat this problem is to truncate the Karhunen–Loève
expansion of the Q-Wiener process depending on the decay of the spectrum of Q
(see [2, 5]).
The theory on strong convergence of the introduced approximation scheme is
already developed for some time and the convergence rates are well-known and
stated in the following theorem.
Theorem 2 (Strong convergence [1]) Let the stochastic evolution Eq. (1) with mild
solution X and the sequence of its approximations (X,n , , n ∈ N0 ) given by (3)
satisfy Assumptions 1 and 2 for some β ∈ (0, 1]. Then, for every γ ∈ (0, β), there
exists a constant C > 0 such that for all , n ∈ N0 ,
γ
max X(tkn ) − X,n (tkn )L2 (Ω;H) ≤ C(h + (Δt n )γ /2 ).
k=1,...,N(n)
498 A. Lang

It should be remarked at this point that the order of strong convergence does not
exceed 1/2 although we are considering additive noise since the regularity of the
parameters of the SPDE are assumed to be rough. Under smoothness assumptions
the rate of strong convergence attains one for additive noise since the higher order
Milstein scheme is equal to the Euler–Maruyama scheme. Nevertheless, under the
made assumptions on the regularity of the initial condition X0 and the covariance
operator Q of the noise, this does not happen in the considered case.
The purpose of the multilevel Monte Carlo method is to approximate expressions
of the form E[ϕ(X(t))] efficiently, where ϕ : H → R is a sufficiently smooth
functional. Therefore weak error estimates of the form |E[ϕ(X(tkn ))]−E[ϕ(X,n (tkn ))]|
are of importance. Before we state the convergence theorem from [1], we specify the
necessary properties of ϕ in the following assumption.
Assumption 3 The functional ϕ : H → R is twice continuously Fréchet differen-
tiable and there exists an integer m ≥ 2 and a constant C such that for all x ∈ H and
j = 1, 2,
ϕ (j) (x)L[m] (H;R) ≤ C(1 + xH ),
m−j

where ϕ (j) (x)L[m] (H;R) is the smallest constant K > 0 such that for all u1 , . . . , um ∈
H,
|ϕ (j) (x)(u1 , . . . , um )| ≤ Ku1 H · · · um H .

Combining this assumption on the functional ϕ with Assumptions 1 and 2 on the


parameters and approximation of the SPDE, we obtain the following result, which
was proven in [1] using Malliavin calculus.

Theorem 3 (Weak convergence [1]) Let the stochastic evolution equation (1) with
mild solution X and the sequence of its approximations (X,n , , n ∈ N0 ) given by (3)
satisfy Assumptions 1 and 2 for some β ∈ (0, 1]. Then, for every ϕ : H → R
satisfying Assumption 3 and all γ ∈ [0, β), there exists a constant C > 0 such that
for all , n ∈ N0 ,

|E[ϕ(X(tkn )) − ϕ(X,n (tkn ))]| ≤ C(h + (Δt n )γ ).



max
k=1,...,N(n)

An example that satisfies Assumptions 1 and 2 is presented in Sect. 5 of [1] and


consists of a (general) heat equation on a bounded, convex, and polygonal domain
which is approximated with a finite element method using continuous piecewise
linear functions.

4 SPDE Multilevel Monte Carlo Approximation

In the previous section, we considered weak error analysis for expressions of the
form E[ϕ(X(t))], where we approximated the mild solution X of the SPDE (1) with
a fully discrete scheme. Unluckily, this is not yet sufficient to compute “numbers”
A Note on the Importance of Weak Convergence Rates … 499

since we are in general not able to compute the expectation exactly. Going back to
Sect. 2, we recall that the first approach to approximate the expected value is to do a
(singlelevel) Monte Carlo approximation. This leads to the overall error given in the
following corollary, which is proven similarly to [3, Corollary 3.6] and included for
completeness.

Corollary 1 Let the stochastic evolution equation (1) with mild solution X and the
sequence of its approximations (X,n , , n ∈ N0 ) given by (3) satisfy Assumptions 1
and 2 for some β ∈ (0, 1]. Then, for every ϕ : H → R satisfying Assumption 3 and
all γ ∈ [0, β), there exists a constant C > 0 such that for all , n ∈ N0 , the error of
the Monte Carlo approximation is bounded by
1
E[ϕ(X(tkn ))] − EN [ϕ(X,n (tkn )))]L2 (Ω;R) ≤ C h + (Δt n )γ + √

max
k=1,...,N(n) N

for N ∈ N.

Proof By the triangle inequality we obtain that

E[ϕ(X(tkn ))]−EN [ϕ(X,n (tkn )))]L2 (Ω;R)


≤ E[ϕ(X(tkn ))] − E[ϕ(X,n (tkn )))]L2 (Ω;R)
+ E[ϕ(X,n (tkn )))] − EN [ϕ(X,n (tkn )))]L2 (Ω;R) .

The first term is bounded by the weak error in Theorem 3 while the second one is
the Monte Carlo error in Lemma 1. Putting these two estimates together yields the
claim. 

The errors are all converging with the same speed if we couple  and n such that
−4γ
h2 Δt n as well as the number of Monte Carlo samples N for  ∈ N0 by N h .
This implies for the overall work that

−(d+2+4γ )
W = WH · WT · WMC = O(h−d (Δt n )−1 N ) = O(h ),

where we assumed that the computational work in space is bounded by WH =


O(h−d ) for some d ≥ 0, which refers usually to the dimension of the underlying
spatial domain.
Since we have just seen that a (singlelevel) Monte Carlo simulation is rather
expensive, the idea is to use a multilevel Monte Carlo approach instead which is
obtained by the combination of the results of the previous two sections. In what
follows we show that it is essential for the computational costs that weak convergence
results are available, since the number of samples that should be chosen according
to the theory depends heavily on this fact, if weak and strong convergence rates do
not coincide.
Let us start under the assumption that Theorem 3 (weak convergence rates) is not
available. This leads to the following numbers of samples and computational work.
500 A. Lang

Corollary 2 (Strong convergence) Let the stochastic evolution equation (1) with
mild solution X and the sequence of its approximations (X,n , , n ∈ N0 ) given by (3)
satisfy Assumptions 1 and 2 for some β ∈ (0, 1]. Furthermore couple  and n such
−2γ −2γ 2γ
that Δt n h2 and for L ∈ N0 , set N0 hL as well as N hL h 1+ε for
all  = 1, . . . , L and arbitrary fixed ε > 0. Then, for every ϕ : H → R satisfying
Assumption 3 and all γ ∈ [0, β), there exists a constant C > 0 such that for all
, n ∈ N0 , the error of the multilevel Monte Carlo approximation is bounded by
γ
max E[ϕ(X(tknL ))] − E L [ϕ(XL,nL (tknL ))]L2 (Ω;R) ≤ ChL ,
k=1,...,N(nL )

where nL is chosen according to the coupling with L. If the work of one computation
in space is bounded by WH = O(h−d ) for  = 0, . . . , L and fixed d ≥ 0, which
includes the summation of different levels, the overall work will be bounded by

WL = O(hL−(d+2) L 2+ε ).

Proof We first observe that


γ γ
max X(tknL ) − XL,nL (tknL )L2 (Ω;H) ≤ C(hL + (Δt n )γ /2 ) C · 2 · hL
k=1,...,N(nL )

by Theorem 2 and the coupling of the space and time discretizations. Furthermore it
holds that

max |E[ϕ(X(tknL ))]−E[ϕ(XL,nL (tknL ))]|


k=1,...,N(nL )

≤ max ϕ(X(tknL )) − ϕ(XL,nL (tknL ))L2 (Ω;R)


k=1,...,N(nL )

≤C max X(tknL ) − XL,nL (tknL )L2 (Ω;H)


k=1,...,N(nL )
γ
≤ ChL ,

since ϕ is assumed to be a Lipschitz functional (cf. [5, Proposition 3.4]). Furthermore


Lemma 2 implies that

Var[ϕ(X,n (t)) − ϕ(X−1,n−1 (t))]


≤ 2(ϕ(X(t)) − ϕ(X,n (t))2L2 (Ω;R) + ϕ(X(t)) − ϕ(X−1,n−1 (t))2L2 (Ω;R) )

≤ Ch .
γ
Setting a = h , η = 1, and the sample numbers according to Theorem 1, we obtain
the claim. 

If the additional information of better weak convergence rates from Theorem 3


is available, the parameters that are plugged into Theorem 1 change, which leads
for given accuracy to less samples and therefore to less computational work. This
A Note on the Importance of Weak Convergence Rates … 501

is made precise in the following corollary and the computations for given accuracy
afterwards.
Corollary 3 (Weak convergence) Let the stochastic evolution equation (1) with mild
solution X and the sequence of its approximations (X,n , , n ∈ N0 ) given by (3)
satisfy Assumptions 1 and 2 for some β ∈ (0, 1]. Furthermore couple  and n such
−4γ −4γ 2γ
that Δt n h2 and for L ∈ N0 , set N0 hL as well as N hL h 1+ε for
all  = 1, . . . , L and arbitrary fixed ε > 0. Then, for every ϕ : H → R satisfying
Assumption 3 and all γ ∈ [0, β), there exists a constant C > 0 such that for all
, n ∈ N0 , the error of the multilevel Monte Carlo approximation is bounded by

max E[ϕ(X(tknL ))] − E L [ϕ(XL,nL (tknL ))]L2 (Ω;R) ≤ ChL ,
k=1,...,N(nL )

where nL is chosen according to the coupling with L. If the work of one computation
in space is bounded by WH = O(h−d ) for  = 0, . . . , L and fixed d ≥ 0, which
includes the summation of different levels, the overall work will be bounded by
−(d+2+2γ ) 2+ε
WL = O(hL L ).

Proof The proof is the same as for Corollary 2 except that we obtain

max |E[ϕ(X(tknL ))] − E[ϕ(XL,nL (tknL ))]| ≤ ChL
k=1,...,N(nL )


directly from Theorem 3 and therefore set a = h , η = 1/2, and the sample
numbers according to these choices in Theorem 1. 
If we take regular subdivisions of the grids, i.e., we set, up to a constant, h := 2−
for  ∈ N0 and rescale both corollaries such that the convergence rates are the same,

i.e., the errors are bounded by O(h ), we obtain that for a given accuracy εL on
level L ∈ N, Corollary 2 leads to computational work

2+ε 2 + ε −(d+2)/γ
WL = O 2 ε | log2 εL |
2γ L

while the estimators in Corollary 3 can be computed in



2 + ε −((d+2)/(2γ )+1)
WL = O εL | log2 εL | .

Therefore the availability of weak convergence rates implies a reduction of the com-
putational complexity of the multilevel Monte Carlo estimator which depends on the
regularity γ and d referring to the dimension of the problem in space. For large d, the
work using strong convergence rates is essentially the squared work that is needed
with the knowledge of weak rates. Additionally, for all d ≥ 0, the rates are better and
3/(2γ )+1 3/γ
especially in dimension d = 1 we obtain εL for the weak rates versus εL ,
502 A. Lang

Table 1 Computational work of different Monte Carlo type approximations for a given precision εL
Monte Carlo MLMC with strong conv. MLMC with weak conv.
−((d+2)/(2γ )+2) −(d+2)/γ 2+ε −((d+2)/(2γ )+1)
General εL 2γ εL
22+ε 2+ε | log2 εL | 2γ εL | log2 εL |
−(d/2+3) −(d+2) −(d/2+2)
γ = 1, omitting εL εL | log2 εL | εL | log2 εL |
const.

where γ ∈ (0, 1). Nevertheless, one should also mention that Corollary 2 already
reduces the work for 4γ > d + 2 compared to a (singlelevel) Monte Carlo approxi-
mation according to weak convergence rates. The results are put together in Table 1
for a quick overview.

5 Simulation

In this section simulation results of the theory of Sect. 4 are shown, where it has
to be admitted that the chosen example fits better the framework of [6] since we
estimate the expectation of the solution instead of the expectation of a functional
of the solution. Simulations that fit the conditions of Sect. 4 are under investigation.
Here we simulate similarly to [4] and [5] the heat equation driven by additive Wiener
noise
dX(t) = ΔX(t) dt + dW (t)

on the space interval (0, 1) and the time interval [0, 1] with initial condition X(0, x) =
sin(π x) for x ∈ (0, 1). In contrast to previous simulations, the noise is assumed to be
white in space to reduce the strong convergence rate of the scheme to (essentially) 1/2.
The solution to the corresponding deterministic system with u(t) = E[X(t)] for
t ∈ [0, 1]
du(t) = Δu(t) dt

is in this case u(t, x) = exp(−π 2 t) sin(π x) for x ∈ (0, 1) and t ∈ [0, 1].
The space discretization is done with a finite element method and the hat function
basis, i.e., with the spaces (Sh , h > 0) of piecewise linear, continuous polynomials
(see, e.g., [6, Example 3.1]). The numbers of multilevel Monte Carlo samples are
calculated according to Corollaries 2 and 3 with ε = 1 to compare the convergence
and complexity properties with and without the availability of weak convergence
rates. In the left graph in Fig. 1, the multilevel Monte Carlo estimator E L [XL,2L (1)]
was calculated for L = 1, . . . , 5 for available weak convergence rates as in Corol-
lary 3 while just for L = 1, . . . , 4 in the other case to finish the simulations in a
reasonable time on an ordinary laptop. The plot shows the approximation of
A Note on the Importance of Weak Convergence Rates … 503


1 1/2
E[X(1)] − E L [XL,2L (1)]H = (exp(−π 2 ) sin(πx) − E L [XL,2L (1, x)])2 dx ,
0

i.e.,

1 
m 1/2
e1 (XL,2L ) := (exp(−π 2 ) sin(π xk ) − E L [XL,2L (1, xk )])2 .
m
k=1

Here, for all levels L = 1, . . . , 5, m = 25 + 1 and xk , k = 1, . . . , m, are the nodal


points of the finest discretization, i.e., on level 5 respectively 4. The multilevel Monte
Carlo estimator E L [XL,2L ] is calculated at these points by its basis representation for
L = 1, . . . , 4, which is equal to the linear interpolation to all grid points xk , k =
1, . . . , m. One observes the convergence of one multilevel Monte Carlo estimator,
i.e., the almost sure convergence of the method, which can be shown using the mean
square convergence and the Borel–Cantelli lemma. In the graph on the right hand
side of Fig. 1, the error is estimated by

1 
N 1/2
eN (XL,2L ) := e1 (XL,2L
i
)2 ,
N i=1

where (XL,2L
i
, i = 1, . . . , N) is a sequence of independent, identically distributed
samples of XL,2L and N = 10. The simulation results confirm the theory. In Fig. 2 the
computational costs per level of the simulations on a laptop using matlab are shown
for both frameworks. It is obvious that the computations using weak convergence
rates are substantially faster. One observes especially that the computations with
weak rates on level 5 take less time than the ones with strong rates on level 4. The
computing times match the bounds of the computational work that were obtained in
Corollaries 3 and 2.

0 Error of 1 MLMC run 0 Error of 10 MLMC runs


10 10
strong, ε = 1 strong, ε = 1
strong, ε = 0 strong, ε = 0
−1 weak, ε = 1 −1 weak, ε = 1
10 weak, ε = 0
10 weak, ε = 0
2 2
O(h ) O(h )
L2 error
L error

−2 −2
10 10
2

−3 −3
10 10

−4 −4
10 10
0 1 2 0 1 2
10 10 10 10 10 10
Grid points on finest level Grid points on finest level

Fig. 1 Mean square error of the multilevel Monte Carlo estimator with samples chosen according
to Corollaries 2 and 3
504 A. Lang

6
10

Computational costs in seconds 10


4

2
10

0
10
strong, ε = 1
strong, ε = 0
6
−2 O(h )
10
weak, ε = 1
weak, ε = 0
5
O(h )
−4
10
0 1 2
10 10 10
Grid points on finest level

Fig. 2 Computational work of the multilevel Monte Carlo estimator with samples chosen according
to Corollaries 2 and 3

Finally, Figs. 1 and 2 include besides ε = 1 also simulation results for the border
case ε = 0 in the choices of sample sizes per level. One observes in the left graph in
Fig. 1 that the variance of the errors for ε = 0 in combination with Corollary 2 is high,
which is visible in the nonalignment of the single simulation results. Furthermore
the combination of Figs. 1 and 2 shows that ε = 0 combined with Corollary 3 and
ε = 1 with Corollary 2 lead to similar errors, but that the first choice of sample sizes
is essentially less expensive in terms of computational complexity. Therefore the
border case ε = 0, which is not included in the theory, might be worth to consider
in practice.

Acknowledgments This research was supported in part by the Knut and Alice Wallenberg foun-
dation as well as the Swedish Research Council under Reg. No. 621-2014-3995. The author thanks
Lukas Herrmann, Andreas Petersson, and two anonymous referees for helpful comments.

References

1. Andersson, A., Kruse, R., Larsson, S.: Duality in refined Sobolev-Malliavin spaces and weak
approximations of SPDE. Stoch. PDE: Anal. Comp. 4(1), 113–149 (2016). doi:10.1007/
s40072-015-0065-7
2. Barth, A., Lang, A.: Milstein approximation for advection-diffusion equations driven by mul-
tiplicative noncontinuous martingale noises. Appl. Math. Opt. 66(3), 387–413 (2012). doi:10.
1007/s00245-012-9176-y
A Note on the Importance of Weak Convergence Rates … 505

3. Barth, A., Lang, A.: Multilevel Monte Carlo method with applications to stochastic partial
differential equations. Int. J. Comp. Math. 89(18), 2479–2498 (2012). doi:10.1080/00207160.
2012.701735
4. Barth, A., Lang, A.: Simulation of stochastic partial differential equations using finite element
methods. Stochastics 84(2–3), 217–231 (2012). doi:10.1080/17442508.2010.523466
5. Barth, A., Lang, A.: L p and almost sure convergence of a Milstein scheme for stochastic
partial differential equations. Stoch. Process. Appl. 123(5), 1563–1587 (2013). doi:10.1016/j.
spa.2013.01.003
6. Barth, A., Lang, A., Schwab, Ch.: Multilevel Monte Carlo method for parabolic stochastic
partial differential equations. BIT Num. Math. 53(1), 3–27 (2013). doi:10.1007/s10543-012-
0401-5
7. Da Prato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. In: Encyclopedia of
Mathematics and Its Applications. Cambridge University Press, Cambridge (1992). doi:10.
1017/CBO9780511666223
8. Giles, M.B.: Improved multilevel Monte Carlo convergence using the Milstein scheme. In:
Alexander, K., et al. (eds.) Monte Carlo and Quasi-Monte Carlo methods 2006. Selected Papers
Based on the presentations at the 7th International Conference ‘Monte Carlo and quasi-Monte
Carlo Methods in Scientific Computing’, Ulm, Germany, August 14–18, 2006, pp. 343–358.
Springer, Berlin (2008). doi:10.1007/978-3-540-74496-2_20
9. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008). doi:10.
1287/opre.1070.0496
10. Heinrich, S.: Multilevel Monte Carlo methods. In: Margenov, S., Wasniewski, J., Yalamov, P.Y.
(eds.) arge-Scale Scientific Computing, Third International Conference, LSSC 2001, Sozopol,
Bulgaria, June 6-10, 2001, Revised Papers. Lecture notes in computer science, pp. 58–67.
Springer, Heidelberg (2001). doi:10.1007/3-540-45346-6_5
11. Jentzen, A., Kurniawan, R.: Weak convergence rates for Euler-type approximations of semi-
linear stochastic evolution equations with nonlinear diffusion coefficients (2015)
A Strategy for Parallel Implementations
of Stochastic Lagrangian Simulation

Lionel Lenôtre

Abstract In this paper, we present some investigations on the parallelization of


stochastic Lagrangian simulations. The challenge is the proper management of the
random numbers. We review two different object-oriented strategies: to draw the ran-
dom numbers on the fly within each MPI’s process or to use a different random num-
ber generator for each simulated path. We show the benefits of the second technique
which is implemented in the PALMTREE library. The efficiency of PALMTREE is
demonstrated on two classical examples.

Keywords Parabolic partial differential equations · Stochastic differential


equations · Monte Carlo methods · Lagrangian methods · High performance
computing

1 Introduction

Monte Carlo simulation is a very convenient method to solve problems arising in


physics like the advection–diffusion equation with a Dirichlet boundary condition

⎪ ∂

⎨ ∂t c(x, t) = div(σ (x) · ∇c(x, t)) − v(x)∇c(x, t)), ∀(x, t) ∈ D × [0, T ],
⎪ c(x, 0) = c0 (x), ∀x ∈ D,


c(x, t) = 0, ∀t ∈ [0, T ] and , ∀x ∈ ∂ D,
(1)

where, for each x ∈ D, σ (x) is a d-dimensional square matrix which is definite,


positive, symmetric, v(x) is a d-dimensional vector such that div(v(x)) = 0, D ⊂ Rd
is a regular open bounded subset and T is a positive real number. In order to have

L. Lenôtre (B)
Inria, Research Centre Rennes - Bretagne Atlantique, Campus de Beaulieu,
35042 Rennes Cedex, France
e-mail: lionel.lenotre@inria.fr

© Springer International Publishing Switzerland 2016 507


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_26
508 L. Lenôtre

a well-posed problem [4, 5] and to be able to use later the theory of stochastic
differential equations, we required that σ satisfies an ellipticity condition and has its
coefficients at least in C 2 (D), and that v is bounded and in C 1 (D).
Interesting computations involving the solution c(t, x) are the moments

Mk (T ) = x k c(T, x) d x, ∀k ≥ 1 such that Mk (T ) < +∞.
D

One possibility for their computation is to perform a numerical integration of an


approximated solution of (1). Eulerian methods (like Finite Difference Method,
Finite Volume Method or Finite Element Method) are classical to obtain such an
approximated solution. However, for advection–diffusion problems, they can induce
numerical artifacts such as oscillations or artificial diffusion. This mainly occurs
when advection dominates [7].
An alternative is to use Monte Carlo simulation [6, 19] which is really simple.
Indeed, the theory of stochastic processes implies that there exists X = (X t )t≥0 whose
law is linked to (1) and is such that

Mk (T ) = E[X Tk ]. (2)

The above expectation is nothing more than an average of the positions at time
T of particles that move according to a scheme associated with the process X . This
requires a large number of these particles to be computed. For linear equations, the
particles do not interact with each other and move according to a Markovian process.
The great advantage of the Monte-Carlo method is that its rate of convergence is
not affected by the curse of dimensionality. Nevertheless, the slowness of the rate
caused by the Central-Limit theorem can be considered as a drawback. Precisely, the
computation of the moments requires a large amount of particles to achieve a reliable
approximation. Thus, the use of supercomputers and parallel architectures becomes a
key ingredient to obtain reasonable computational times. However, the main difficulty
when one deals with parallel architectures is to manage the random numbers such
that the particles are not correlated, otherwise a bias in the approximation of the
moments is obtained.
In this paper, we investigate the parallelization of the Monte Carlo method for
the computation of (2). We will consider two implementation’s strategies where the
total number of particles is divided into batches distributed over the Floating Point
Units (FPUs):
1. SAF: the Strategy of Attachment to the (FPUs) where each FPU received a Vir-
tual Random Number Generator (VRNG) which is either different independent
Random Number Generators (RNGs) or copies of the same RNG in different
states [10]. In this strategy, the random numbers are generated on demand and do
not bear any attachment to the particles.
2. SAO: the Strategy of Attachment to the Object where the particles carries their
own Virtual Random Number Generator.
A Strategy for Parallel Implementations of Stochastic Lagrangian Simulation 509

Both schemes clearly carry the non correlation of the particles assuming that all the
drawn random numbers have enough independence which is a matter of RNGs.
Sometimes particles with a singular behavior are encountered and the examina-
tion of the full paths of such particles is necessary. With the SAF, a particle replay
requires either to re-run the simulation with a condition to record only the positions
of this particle or to keep track of the random numbers used for this particle. In
both cases, it would drastically increase the computational time and add unnecessary
complications to the code. On the contrary, a particle replay is straightforward with
the SAO.
The present paper is organized in two sections. The first one describes SAF and
SAO. It also treat of the work done in PALMTREE, a library we developed with the
generator RNGStreams [11] and which contains an implementation of the SAO. The
second section presents two numerical experiments which illustrate the performance
of PALMTREE [17] and the SAO. Characteristic curves like speedup and efficiency
are provided for both experiment.

2 Parallel and Object-Oriented Implementations


in Monte Carlo

All along this section, we assume that we are able to simulate the transition law
of particles undergoing a Markovian dynamics such that there is no interactions
between them. As a result, the presentation below can be applied to various Monte
Carlo schemes involving particle tracking where the goal is to compute moments.
Moreover, this shows the flexibility of the implementation we choose.

2.1 An Object-Oriented Design for Monte Carlo

C++ offers very interesting features which are of great help for a fast execution or to
treat multidimensional processes. In addition, a consistent implementation of MPI is
available in this language. As a result, it becomes a natural choice for PALMTREE.
In what follows, we describe and motivate the choices we made in the implemen-
tation of PALMTREE. We refer to a FPU as a MPI’s process.
We choose to design an object called the Launcher which conducts the Monte
Carlo simulation. Roughly speaking, it collects all the generic parameters for the
simulation (the number of particles or the repository for the writing of outputs). It
also determines the architecture of the computer (cartography of the nodes, number
of MPI’s process, etc.) and is responsible for the parallelization of the simulation
(managing the VRNGs and collecting the result on each MPI’s process to allow the
final computations).
510 L. Lenôtre

Some classical designs introduce an object consisting of a Particles Factory which


contains all the requirements for the particle simulations (the motion scheme or the
diffusion and advection coefficients). The Launcher’s role is then to distribute to
each MPI’s process a factory with the number of particles that must be simulated
and the necessary VRNGs. The main job of the factory is to create objects which are
considered as the particles and to store them. Each one of these objects contains all
the necessary information for path simulation including the current time-dependent
position and also the motion simulation algorithm.
This design is very interesting for interacting particles as it requires the storage
of the path of each particle. For the case we decide to deal with, this implementation
suffers two major flaws: a slowdown since many objects are created and a massive
memory consumption as a large number of objects stay instantiated.
As a result, we decide to avoid the above approach and to use a design based on
recycling. In fact, we choose to code a unique object that is similar to the factory,
but does not create redundant particle objects. Let us call this object the Particle.
In few words, the recycling concept is the following. When the final position
at time T is reached for each path, the Particle resets to the initial position and
performs another simulation. This solution avoids high memory consumption and
allows complete management of the memory. In addition, we do not use a garbage
collector which can cause memory leaks.
Another thing, we adopt in our design, is the latest standards in the C++11 library
[1] which offers the possibility to program an object with a template whose parameter
is the spatial dimension of the process we want to simulate. Thus, one can include this
template parameter into the implementation of the function governing the motion of
the particle. If it is, the object is declared with the correct dimension and automatically
changes the function template. Otherwise, it checks the compatibility of the declared
dimension with the function.
Such a feature allows the ability to preallocate the exact size required by the chosen
dimension for the position in a static array. Subsequently, we avoid writing multiple
objects or using a pointer and dynamic memory allocation, which provoke slowdown.
Moreover, templates allow for a better optimization during the compilation.
Now a natural parallel scheme for a Monte Carlo simulation consists in the distri-
bution of a particle on the different MPI’s processes. Then, a small number of paths
are sequentially simulated on each MP. When each MPI’s process has finished, the
data is regrouped on the master MPI process using MPI communications between
the MPI’s processes. Thus, the quantities of interest can be computed by the master
MPI’s process.
This scheme is typically embarrassingly parallel and can be used with both shared
or distributed memory paradigm. Here we choose the distributed memory paradigm
as it offers the possibility to use supercomputers based on SGI Altix or IBM Blue
Gene technologies. Furthermore, if the path of the particles needs be recorded, the
shared memory paradigm can not be used due to a very high memory consumption.
A Strategy for Parallel Implementations of Stochastic Lagrangian Simulation 511

Fig. 1 The structure of RNGStreams

2.2 Random Number Generators

The main difficulty with the parallelization of the Monte Carlo method is to ensure
the independence of all the random numbers split on the different MPI’s processes.
To be precise, if the same random numbers are used on two different processes, the
simulation will end up with non-independent paths and the targeted quantities will
be erroneous.
Various recognized RNGs such as RNGStreams [11], SPRNG [12] or MT19937
[13] offer the possibility to use VNRGs an can be used on parallel architectures.
Recently, algorithms have been proposed to produce advanced and customized
VRNGs with MRG32k3a and MT19937 [3].
In PALMTREE, we choose RNGStreams which possesses the following two
imbricated subdivisions of the backbone generator MRG32k3a:
1. Stream: 2127 consecutive random numbers
2. Substream: 276 consecutive random numbers
and the VRNGs are just the same MRG32k3a in different states (See Fig. 1). More-
over, this RNG has already implemented VRNGs [11] and passes several statistical
tests which can be found in TestU01 that ensure the independence of random num-
bers [9].
Now a possible strategy with RNGStreams is to use a stream for each new simu-
lation of a moment as we must have a new set of independent paths and to use the 251
substreams contain in each stream to allocate VRNGs on the FPU or to the objects
for each moment simulation. This decision clearly avoids the need to store the state
of the generator after the computations.

2.3 Strategy of Attachment to the FPUs (SAF)

An implementation of SAF with RNGStreams and the C++ design proposed in


Sect. 2.1 is very easy to perform as the only task is to attach a VRNG to each MPI’s
512 L. Lenôtre

process in the Launcher. Then the particles distributed on each MPI’s process are
simulated, drawing the random number from the attached VRNG.
Sometimes a selective replay may be necessary to capture some singular paths
in order to enable a physical understanding or for debugging purposes. However,
recording the path of every particle is a memory intensive task as keeping the track
of the random numbers used by each particle. This constitutes a major drawback for
this strategy. SAO is preferred in that case.

2.4 Strategy of Object-Attachment (SAO) and PALMTREE

Here a substream is attached to each particle which can be considered as an object


and all that is needed to implement this scheme is a subroutine to quickly jump from
the first substream to the nth one. We show why in the following example: suppose
that we need 1,000,000 paths to compute the moment and have 5 MPI’s processes,
then we distribute 200,000 paths to each MPI’s process, which therefore requires
200,000 VRNGs to perform the simulations (See Fig. 2).
The easiest way to solve this problem is to have the mth FPU that starts at the
(m − 1) × 200,000 + 1st substream and then to jump to the next substream until it
reaches the m × 200,000th substream.
RNGStreams possesses a function that allows to go from one substream to the
next one (See Fig. 3). Thus the only problem is to go quickly from the first substream

Fig. 2 Distribution of 200,000 particles to each FPU

Fig. 3 Distribution of
VRNGs or substreams to
each FPU
A Strategy for Parallel Implementations of Stochastic Lagrangian Simulation 513

to the (m − 1) × 200,000 + 1st substream so that we can compete with the speed of
the SAF.
A naive algorithm using a loop containing the default function that passes through
each substream one at a time is clearly too slow. As a result, we decide to modify
the algorithm for MRG32k3a proposed in [3].
The current state of the generator RNGStreams is a sequence of six numbers,
suppose that {s1 , s2 , s3 , s4 , s5 , s6 } is the start of a substream. With the vectors Y1 =
{s1 , s2 , s3 } and Y2 = {s4 , s5 , s6 }, the matrix
⎛ ⎞
82758667 1871391091 4127413238
A1 = ⎝36728315231 69195019 1871391091⎠
3672091415 3528743235 69195019

and ⎛ ⎞
1511326704 3759209742 1610795712
A2 = ⎝4292754251 1511326704 3889917532⎠ ,
3859662829 4292754251 3708466080

and the numbers m 1 = 4294967087 and m 2 = 4294944443, the jump from one sub-
stream to the next is performed with the computations

X 1 = A1 × Y1 mod m 1 and X 2 = A2 × Y2 mod m 2

with X 1 and X 2 the states providing the first number of the next substream.
As we said above, it is too slow to run these computations n times to make a
jump from the 1st-substream to the nth-substream. Subsequently, we propose to use
the algorithm developed in [3] based on the storage in memory of already computed
matrix and the decomposition
k
s= gj 8j,
j=0

for any s ∈ N.
Since a stream contains 251 = 817 substreams, we decide to only store the already
computed matrices
Ai Ai2 · · · Ai7
Ai8 Ai2∗8 · · · Ai7∗8
.. .. . . .
. . . ..
16 16 16
Ai8 Ai2∗8 · · · Ai7∗8

for i = 1, 2 with A1 and A2 as above. Thus we can reach any substream s with the
formula

k
g 8j
Ais Yi = Ai j Yi mod m i
j=0
514 L. Lenôtre

Fig. 4 Illustration of the stream repartition on FPUs

This solution provides a process that can be completed with a complexity less
than O(log2 p) which is much faster [3] than the naive solution. The Fig. 4 illustrates
this idea. In effect, we clearly see that the second FPU receive a stream and then
performs a jump from the initial position of this stream to the first random number
of the n + 1 substream of this exact same stream.

3 Experiments with the Advection–Diffusion Equation

3.1 The Advection–Diffusion Equation

In physics, the solution c(x, t) of (1) is interpreted as the evolution at the position
x of the initial concentration c0 (x) during the time interval [0, T ]. The first moment
of c is often called the center of mass.
Let us first recall that it exists a unique regular solution of (1). Proofs can be
found [5, 14]. This clearly means, as we said in the introduction, that we deal with
a well-posed problem.
The notion of fundamental solution [2, 4, 5, 14] which is motivated by the fact
that c(x, t) depends on the initial condition plays an important role in the treatment
of the advection–diffusion equation. It is the unique solution Γ (x, t, y) of
A Strategy for Parallel Implementations of Stochastic Lagrangian Simulation 515
⎧∂

⎪ Γ (x, t, y) = divx (σ (x) · ∇x Γ (x, t, y)) − v(x)∇x Γ (x, t, y),


⎨ ∂t
∀(x, t, y) ∈ D × [0, T ] × D, (3)

⎪Γ (x, 0, y) = δ y (x), ∀(x, y) ∈ D × D,



Γ (x, t, y) = 0, ∀t ∈ [0, T ], ∀y ∈ D, ∀x ∈ ∂ D.

This parabolic partial differential equation derived from (1) is often called the
Kolmogorov Forward equation or the Fokker–Planck equation. The probability the-
ory provides us with the existence of a unique Feller process X = (X t )t≥0 whose
transition function density is the solution of the adjoint of (3), that is
⎧∂

⎪ Γ (x, t, y) = div y (σ (y) · ∇x Γ (x, t, y)) + v(y)∇ y Γ (x, t, y),


⎨ ∂t
∀(x, t, y) ∈ D × [0, T ] × D, (4)

⎪Γ (x, 0, y) = δx (y), ∀(x, y) ∈ D × D,



Γ (x, t, y) = 0, ∀t ∈ [0, T ], ∀x ∈ D, ∀y ∈ ∂ D,

which is easy to compute since div(v(x)) = 0 for every x ∈ R.


Assuming that σ and v satisfy the hypotheses settled in (1), then using the
Feynman–Kac formula [15] and (4), we can define the process X as the unique
strong solution of the Stochastic Differential Equation

d X t = v(X t ) dt + σ (X t ) d Bt , (5)

starting at the position y and killed on the boundary D. Here, (Bt )t≥0 is a d-
dimensional Brownian motion with respect to the filtration (Ft )t≥0 satisfying the
usual conditions [18].
The path of such a process can be simulated step-by-step with a classical Euler
scheme. Therefore a Monte Carlo algorithm for the simulation of the center of mass
simply consists in the computation until time T of a large number of paths and the
average of all the final positions of every simulated particle still inside the domain.
As we are mainly interested in computational time and efficiency, the numerical
experiments that will follow are performed in free space. Working on a bounded
domain would only require to set the accurate stopping condition, which is a direct
consequence of the Feynman–Kac formula that is to terminate the simulation of the
particle when it leaves the domain.

3.2 Brownian Motion Simulation

Let us take an example in dimension one. We suppose that the drift term v is zero
and that σ (x) is constant. We then obtain the renormalized Heat Equation whose
solution is the standard Brownian Motion.
516 L. Lenôtre

Let us divide the time interval [0, T ] into N subintervals by setting δt = T /N


and tn = n · δt, n = 0, . . . , N and use the Euler scheme

X tn+1 = X tn + σ Bn , (6)

with Bn = Btn+1 − Btn . In this case, the Euler scheme presents the advantage of
being exact.
Since the Brownian motion is easy to simulate, we choose to sample 10,000,000
paths starting from the position 0 until time T = 1 with 0.001 as time step. We
compute the speedup S and the efficiency E which are defined as

T1 T1
S= and E = × 100,
Tp p Tp

where T1 is the sequential computational time with one MPI’s process and T p is the
time in parallel using p MPI’s process.
The speedup and efficiency curves together with the values used to plotted them
are respectively given in Fig. 5 and Table 1. The computations were realized with
the supercomputer Lambda from the Igrida Grid of INRIA Research Center Rennes
Bretagne Atlantique. This supercomputer is composed of 11 nodes with 2 × 6 Intel
Xeon(R) E5647 CPUs at 2.40 Ghz on Westmere-EP architecture. Each node pos-
sesses 48 GB of Random Access Memory and is connected to the others through
infiniband. We choose GCC 4.7.2 as C++ compiler and use the MPI library OpenMPI
1.6 as we prefer to use opensource and portable software. These tests include the
time used to write the output file for the speedup computation so that we also show
the power of the HDF5 library.
The Table 1 clearly illustrates PALMTREE’s performance. It appears that the
SAO does not suffer a significant loss of efficiency despite it requires a complex

(a) (b)
Speedup Efficiency
120

108
100
96
90
84
80
72 70
60 60

48 50
40
36
30
24
20
12 10
MPI’s processes MPI’s processes
1 12 24 36 48 60 72 84 96 108 120 1 12 24 36 48 60 72 84 96 108 120

Fig. 5 Brownian motion: a The dash line represents the linear acceleration and the black curve
shows the speedup. b The dash line represents the 100 % efficiency and the black curve shows the
PALMTREE’s efficiency
A Strategy for Parallel Implementations of Stochastic Lagrangian Simulation 517

Table 1 The values used to plot the curve in Fig. 5


Processes 1 12 24 36 48 60 72 84 96 108 120
Time (s) 4842 454 226 154 116 93 78 67 59 53 48
Speedup 1 10.66 21.42 31.44 41.74 52.06 62.07 72.26 82.06 91.35 100.87
Efficiency 100 88.87 89.26 87.33 86.96 86.77 86.21 86.03 85.48 84.59 84.06

preprocessing. Moreover, the data show that the optimum efficiency (89.26 %) is
obtained with 24 MPI’s processes.
As we mentioned in Sect. 2.2, the independence between the particles is guaran-
teed by the non correlation of random numbers generated by the RNG. Moreover,
Fig. 6 shows that the sum of the squares of the positions of the particles at T = 1
follow a χ 2 distribution in two different cases: (a) between substreams i and i + 1
for i = 0, . . . , 40,000 of the first stream. (b) between substreams i of the first and
second streams for i = 0, . . . , 10,000.

3.3 Advection–Diffusion Equation with an Affine Drift Term


We now consider the advection–diffusion equation whose drift term v is an affine
function, that is for each x ∈ R, v(x) = a x + b and σ is a constant. We simulate the
associated stochastic process X through the exact scheme

b e2aδt − 1
X tn+1 = e aδt
X tn + (eaδt − 1) + σ N (0, 1)
a 2a

where N (0, 1) is a standard Gaussian law [8].

(a) (b)

Fig. 6 χ 2 test: a between substreams i and i + 1 for i = 0 . . . 40,000 of the first stream. b between
substreams i of the first and second streams for i = 0 . . . 10,000
518 L. Lenôtre

For this scheme with an initial position at 0 and the parameters σ = 1, a = 1,


b = 2 and T = 1, we give the speedup and efficiency curves represented in Fig. 7
based on the simulation of hundred millions of particles. The Table 2 provides the
data resulting from the simulation and used for the plots.
Whatever the number of MPI’s processes involved, we obtain the same empir-
ical expectation E = 3.19 and empirical variance V = 13.39 with a standard error
S.E. = 0.0011 and a confidence interval C.I. = 0.0034. Moreover, a good efficiency
(89.29 %) is obtained with 60 MPI’s processes.
In this case, the drift term naturally pushes the particles out of 0 relatively quickly.
If this behavior is not clearly observed in a simulation, then the code has a bug and
a replay of a selection of few paths can be useful to track it in spite of reviewing all
the code. This can clearly save time.
With the SAO, this replay can be easily performed since we know which sub-
streams is used by each particle as it is shown in Fig. 4. Precisely, in the case pre-
sented in Figs. 2 and 3, the nth particle is simulated by a certain FPU using the nth
substream. As a result, it is easy to replay the nth particle since we just have to use
the random numbers of the nth substream. The point is that the parameters must stay
exactly the same particularly the time step. Otherwise, the replay of the simulation
will use the same random numbers but not for the exact same call of the generator
during the simulation.

(a) (b)
Speedup Efficiency
120
108
100
96
90
84
80
72 70
60 60
48 50
40
36
30
24
20
12 10
MPI’s processes MPI’s processes
1 12 24 36 48 60 72 84 96108120 1 12 24 36 48 60 72 84 96108120

Fig. 7 Constant diffusion with an affine drift: a The dash line represents the linear acceleration
and the black curve shows the speedup. b The dash line represents the 100 % efficiency and the
black curve shows the PALMTREE’s efficiency

Table 2 The values used to plot the curve in Fig. 7


Processes 1 12 24 36 48 60 72 84 96 108 120
Time (s) 19020 1749 923 627 460 355 302 273 248 211 205
Speedup 1 10.87 20.60 30.33 41.34 53.57 62.98 69.67 76.69 90.14 92.78
Efficiency 100 90.62 85.86 84.26 86.14 89.29 87.47 82.94 79.88 83.46 73.31
A Strategy for Parallel Implementations of Stochastic Lagrangian Simulation 519

4 Conclusion

The parallelization of Stochastic Lagrangian solvers relies on a careful and efficient


management of the random numbers. In this paper, we proposed a strategy based on
the attachment of the Virtual Random Number Generators to the Object.
The main advantage of our strategy is the possibility to easily replay some particle
paths. This strategy is implemented in the PALMTREE software. PALMTREE use
RNGStreams to benefit from the native split of the random numbers in streams and
substreams.
We have shown the efficiency of PALMTREE on two examples in dimension one:
the simulation of the Brownian motion in the whole space and the simulation of an
advection–diffusion problem with an affine drift term. Independence of the paths
was also checked.
Our current work is to perform more tests with various parameters and to link
PALMTREE to the platform H2OLAB [16], dedicated to simulations in hydrogeol-
ogy. In H2OLAB, the drift term is computed in parallel so that the drift data are split
over MPI’s processes. The challenge is that the computation of the paths will move
from one MPI’s process to another which raises issues about communications, good
work load balance and an advanced management of the VRNGs in PALMTREE.

Acknowledgments I start by thanking S. Maire and M. Simon who offer me the possibility to
present this work at MCQMC. I thank J. Erhel and G. Pichot for the numerous discussions on
Eulerian Methods. I am also grateful to T.Dufaud and L.-B. Nguenang for the instructive talks on
the MPI library. C. Deltel and G. Andrade-Barroso of IRISA were of great help for the deployment
on supercomputers and understanding the latest C++ standards. Many thanks to G. Landurein for
his help in the implementation of PALMTREE. I am in debt to P. L’Ecuyer and B. Tuffin for the
very interesting discussions about RNGStreams. I show gratitude to D. Imberti for his help in the
English language during the writing of this article. I finish with a big thanks to A. Lejay. This work
was partly funded by a grant from ANR (H2MNO4 project).

References

1. The C++ Programming Language. https://isocpp.org/std/status (2014)


2. Aronson, D.G.: Non-negative solutions of linear parabolic equations. Annali della Scuola Nor-
male Superiore di Pisa - Classe di Scienze 22(4), 607–694 (1968)
3. Bradley, T., du Toit, J., Giles, M., Tong, R., Woodhams, P.: Parallelization techniques for
random number generations. GPU Comput. Gems Emerald Ed. 16, 231–246 (2011)
4. Evans, L.C.: Partial differential equations. In: Graduate Studies in Mathematics, 2nd edn.
American Mathematical Society, Providence (2010)
5. Friedman, A.: Partial differential equations of parabolic type. In: Dover Books on Mathematics
Series. Dover Publications, New York (2008)
6. Gardiner, C.: A handbook for the natural and social sciences. In: Springer Series in Synergetics,
4th edn. Springer, Heidelberg (2009)
7. Hundsdorfer, W., Verwer, J.G.: Numerical solution of time-dependent advection-diffusion-
reaction equations. In: Springer Series in Computational Mathematics. Springer, Heidelberg
(2003)
520 L. Lenôtre

8. Kloeden, P.E., Platen, E.: Numerical solution of stochastic differential equations. In: Stochastic
Modelling and Applied Probability. Springer, Heidelberg (1992)
9. L’Ecuyer, P.: Testu01. http://simul.iro.umontreal.ca/testu01/tu01.html
10. L’Ecuyer, P., Munger, D., Oreshkin, B., Simard, R.: Random numbers for parallel comput-
ers: requirements and methods, with emphasis on GPUs. In: Mathematics and Computers in
Simulation, Revision Submitted (2015)
11. L’Ecuyer, P., Simard, R., Chen, E.J., Kelton, W.D.: An object-oriented random-number package
with many long streams and substreams. Oper. Res. 50(6), 1073–1075 (2002)
12. Mascagni, M., Srinivasan, A.: Algorithm 806: SPRNG: a scalable library for pseudorandom
number generation. ACM Trans. Math. Softw. 26(3), 436–461 (2000)
13. Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform
pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8(1), 3–30 (1998)
14. Nash, J.: Continuity of solutions of parabolic and elliptic equations. Am. J. Math. 80(4), 931–
954 (1958)
15. Øksendal, B.: Stochastic Differential Equations. Universitext. Springer, Heidelberg (2003)
16. Project-team Sage. H2OLAB. https://www.irisa.fr/sage/research.html
17. Lenôtre, L., Pichot, G.: Palmtree Library. http://people.irisa.fr/Lionel.Lenotre/software.html
18. Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion. Grundelehren der mathe-
matischen Wissenschaften, 3rd edn. Springer, Berlin (1999)
19. Zheng, C., Bennett, G.D.: Applied Contaminant Transport Modelling. Wiley, New York (2002)
A New Rejection Sampling Method
for Truncated Multivariate Gaussian
Random Variables Restricted
to Convex Sets

Hassan Maatouk and Xavier Bay

Abstract Statistical researchers have shown increasing interest in generating


truncated multivariate normal distributions. In this paper, we only assume that the
acceptance region is convex and we focus on rejection sampling. We propose a new
algorithm that outperforms crude rejection method for the simulation of truncated
multivariate Gaussian random variables. The proposed algorithm is based on a gen-
eralization of Von Neumann’s rejection technique which requires the determination
of the mode of the truncated multivariate density function. We provide a theoretical
upper bound for the ratio of the target probability density function over the proposal
probability density function. The simulation results show that the method is espe-
cially efficient when the probability of the multivariate normal distribution of being
inside the acceptance region is low.

Keywords Truncated Gaussian vector · Rejection sampling · Monte Carlo method

1 Introduction

The need for simulation of truncated multivariate normal distributions appears in


many fields, like Bayesian inference for truncated parameter space [10] and [11],

H. Maatouk (B) · X. Bay


École Nationale Supérieure des Mines de St-Étienne, 158 Cours Fauriel,
Saint-Étienne, France
e-mail: hassan.maatouk@mines-stetienne.fr
X. Bay
e-mail: bay@emse.fr
H. Maatouk
Institut Camille Jordan, Université de Lyon, UMR 5208 , F - 69622
Villeurbanne Cedex, France
H. Maatouk
Institut de Radioprotection et de Sûreté Nucléaire (IRSN),
92260 Fontenay-aux-Roses, France

© Springer International Publishing Switzerland 2016 521


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_27
522 H. Maatouk and X. Bay

Gaussian processes for computer experiments subject to inequality constraints [5,


8, 9, 20] and regression models with linear constraints (see e.g. [12] and [28]).
In general, we have two types of methods. The first ones are based on Markov chain
Monte Carlo (McMC) simulation [3, 18, 25], as the Gibbs sampling [2, 12, 15, 17,
19, 24, 26]. They provide samples from an approximate distribution which converges
asymptotically to the true one. The second ones are exact simulation methods based
on rejection sampling (Von Neumann [27]) and its extensions, [6, 16, 18]. In this
paper, we focus on the second type of methods.
Recently, researchers in statistics have used an adaptive rejection technique with
Gibbs sampling [12, 13, 21, 22, 24]. Let us mention that in one dimension rejection
sampling with a high acceptance rate has been developed by Robert [24], and Geweke
[12]. In [24] Robert developed simulation algorithms for one-sided and two-sided
truncated normal distributions. Its rejection algorithm is based on exponential func-
tions and uniform distributions. The multidimensional case where the acceptance
region is a convex subset of Rd is based on the same algorithm using the Gibbs
sampling to reduce the simulation problem to a sequence of one-dimensional sim-
ulations. In this case, the method requires the determination of slices of the convex
acceptance region. Also, Geweke [12] proposed an exponential rejection sampling
to simulate a truncated normal variable. The multidimensional case is deduced by
using the Gibbs algorithm. In one-dimension, Chopin [4] designed an algorithm that
is computationally faster than alternative algorithms. A multidimensional rejection
sampling to simulate a truncated Gaussian vector outside arbitrary ellipsoids has
been developed by Ellis and Maitra [7]. For higher dimensions, Philippe and Robert
[23] developed a simulation method of a Gaussian distribution restricted to positive
quadrants. Also, Botts [1] improves an accept-reject algorithm to simulate positive
multivariate normal distributions.
In this article, we develop a new rejection technique to simulate a truncated mul-
tivariate normal distribution restricted to any convex subset of Rd . The method only
requires the determination of the mode of the probability density function (pdf)
restricted to the convex acceptance region. We provide a theoretical upper bound
for the ratio of the target probability density function over the proposal probability
density function.
The article is organized as follows. In Sect. 2, we recall the rejection method.
Then, we present our new method, called rejection sampling from the mode (RSM)
and we give the main theoretical results and the associated algorithm. In Sect. 3, we
compare RSM with existing rejection algorithms.

2 Multivariate Normal Distribution

2.1 The General Rejection Method

Let f be a probability density function (pdf) defined on Rd . Von Neumann [27]


proposed the rejection method, using the notion of dominating density function.
A New Rejection Sampling Method … 523

Suppose that g is another density function close to f such that for some finite constant
c ≥ 1, called rejection constant,

f (x) ≤ cg(x), x ∈ Rd . (1)

The acceptance/rejection method is an algorithm for generating random samples


from f by drawing from the proposal pdf g and the uniform distribution:
1. Generate X with density g.
2. Generate U uniformly on [0, 1]. If cg(X )U ≤ f (X ), accept X ; otherwise, go
back to step 1.
The random variable X resulting from the above algorithm is distributed according
to f . Furthermore it can be shown that the acceptance rate is equal to 1/c. In practice
it is crucial to get a small c.
Notice that the rejection sampling algorithm is immediately extended to unnor-
malized density functions avoiding the computation of normalizing constant.
Proposition 1 Let C be a subset of Rd and f˜ and g̃ be two unnormalized density
functions on C such that f˜(x) ≤ k g̃(x), k ∈ R. Then the rejection algorithm is still
valid if the inequality condition cg(X )U ≤ f (X ) is replaced by

k g̃(X )U ≤ f˜(X ). (2)



g̃(t)dt
The rejection constant is c = k  C f˜(t)dt
.
C

Proof We have f˜(x) ≤ k g̃(x), and so

f˜(x) g̃(x)
f (x) =  ≤ c = cg(x), (3)
˜
f (t)dt C g̃(t)dt
C

g̃(t)dt
with c = k  C f˜(t)dt
. The condition cg(X )U ≤ f (X ) is equivalent to k g̃(X )U ≤
C

f˜(X ). 

2.2 Rejection Sampling from the Mode

Suppose that X has multivariate normal distribution with probability density function:
 
1 1  −1
f (x | μ, Σ) = exp − (x − μ) Σ (x − μ) , x ∈ Rd (4)
(2π )d/2 | Σ |1/2 2

where μ = E[X ] and Σ is the covariance matrix, assumed to be invertible.


524 H. Maatouk and X. Bay

We consider a convex subset C of Rd representing the acceptance region. We


assume that μ does not belong to C , which is a hard case for crude rejection sampling.
Furthermore, as explained in Remark 1 (see below) the proposed method is not
different from crude rejection sampling if μ ∈ C . Without loss of generality, let
μ = 0. Our aim is to simulate the multivariate normal distribution X restricted to the
convex set C . The idea is twofold. Firstly, we determine the mode μ∗ corresponding
to the maximum of the probability density function f restricted to C . It is the solution
of the following convex optimization problem:

1
μ∗ = arg min x  Σ −1 x. (5)
x∈C 2

Secondly, let g be the pdf obtained from f by shifting the center to μ∗ :


 
∗ 1 1 ∗  −1 ∗
g(x | μ , Σ) = exp − (x − μ ) Σ (x − μ ) . (6)
(2π )d/2 | Σ |1/2 2

Then we prove in the next theorem and corollary that g can be used as a proposal
pdf for rejection sampling on C , and we derive the optimal constant.

Theorem 1 Let f˜ and g̃ be the unnormalized density functions defined as

f˜(x) = f (x | 0, Σ)1x∈C and g̃(x) = g(x | μ∗ , Σ)1x∈C ,

where f and g are defined respectively in (4) and (6). Then there exists a constant k
such that f˜(x) ≤ k g̃(x) for all x in C and the smallest value of k is :
 
1
k ∗ = exp − (μ∗ ) Σ −1 μ∗ . (7)
2

Proof Let us start with the one-dimensional case. Without loss of generality, we
suppose that C = [μ∗ , +∞[, where μ∗ is positive and Σ = σ 2 . In this case, the
condition f˜(x) ≤ k g̃ is written

x2 (x−μ∗ )2
∀x ≥ μ∗ , e− 2σ 2 ≤ ke− 2σ 2 ,

and so
xμ∗
(μ∗ )2
− xμ
∗ (μ∗ )2 − min (μ∗ )2

k =e =e = e− .
2
x≥μ∗ σ
2σ 2 max∗ e σ2 2σ 2 e 2σ 2
x≥μ

∗ 
) Σ −1 μ∗ −x  Σ −1 μ∗
In the multidimensional case, we have k ∗ = max e 2 (μ . Since μ∗ ∈
1

x∈C
C , we only need to show that

∀x ∈ C , x  Σ −1 μ∗ ≥ (μ∗ ) Σ −1 μ∗ .
A New Rejection Sampling Method … 525

Fig. 1 Scalar product


between the gradient vector
Σ −1 μ∗ of the function
1  −1
2x Σ x at μ∗ and the
dashed vector (x − μ∗ ). The
ellipses centered at origin are
the level curves of the
function x → 21 x  Σ −1 x

The angle between the gradient vector Σ −1 μ∗ of the function 21 x  Σ −1 x at the mode
μ∗ and the dashed vector (x − μ∗ ) is acute for all x in C since C is convex (see
Fig. 1). Therefore, (x − μ∗ ) Σ −1 μ∗ is non-negative for all x in C . 

By now, we can write the RSM algorithm as follows:

Corollary 1 (RSM Algorithm) Let f˜ and g̃ be the unnormalized density functions


defined as

f˜(x) = f (x | 0, Σ)1x∈C and g̃(x) = g(x | μ∗ , Σ)1x∈C ,

where f and g are defined by (4)–(6). Then the random vector X resulting from the
following algorithm is distributed accorded to f˜.
1. Generate X with unnormalized density g̃.  
2. Generate U uniformly on [0, 1]. If U ≤ exp (μ∗ ) Σ −1 μ∗ − X  Σ −1 μ∗ , accept
X ; otherwise go back to step 1.

Proof The proof is done by applying Proposition 1 with the optimal constant k ∗ of
Theorem 1. 

Remark 1 In practice, we use a crude rejection method to simulate X with unnor-


malized density g̃ in the RSM algorithm. So if μ ∈ C , RSM degenerates to crude
rejection sampling since μ∗ = μ and f = g. Therefore, the method RSM can be
seen as a generalization of naive rejection sampling.

Remark 2 Our method requires only the maximum likelihood of the pdf restricted to
the acceptance region. It is the mode of the truncated multivariate normal distribution.
The numerical calculation of it is a standard problem for solving convex quadratic
programs, see e.g. [14].
526 H. Maatouk and X. Bay

3 Performance Comparisons

To investigate the performance of the RSM algorithm, we compare it with existing


rejection algorithms. Robert [24] for example proposed a rejection sampling method
in the one dimensional case. To compare the acceptance rates of RSM with Robert’s
method, we consider a standard normal variable truncated between μ− and μ+ with
μ− fixed to 1. In Robert’s method, the average acceptance rate is high when the
acceptance interval is small (see Table 2.2 in [24]). In the proposed algorithm, sim-
ulating from shifted distributions (first step in the RSM algorithm) leads to the fact
that the average acceptance rate is more important when the acceptance interval is
large. As expected, the performance of the proposed algorithm appears when we
have a large gap between μ− and μ+ , as shown in Table 1. Thus the RSM algorithm
can be seen as a complementary of Robert’s one.
The performance of the method appears when the probability to be inside the
acceptance region is low. In Table 2, we consider the one dimensional case d = 1 and
we only change the position of μ− , where the acceptance region is C = [μ− , +∞[.

Table 1 Comparison of average acceptance rate between Robert’s method [24] and RSM under
the variability of the distance between μ− and μ+
μ+ − μ− Robert’s method (%) Rejection sampling Gain
from the mode (%)
0.5 77.8 18.0 0.2
1 56.4 21.2 0.3
2 35.0 27.4 0.7
5 11.6 28.2 2.4
10 7.0 28.4 4.0
The acceptance region is C = [μ− , μ+ ], where μ− is fixed to 1

Table 2 Comparison between crude rejection sampling and RSM when the probability to be inside
the acceptance region becomes low
μ− Acceptance rate with Acceptance rate with Gain
crude rejection RSM (%)
sampling (%)
0.5 30.8 34.9 1.1
1 15.8 26.2 1.6
1.5 6.7 20.5 3.0
2 2.2 16.8 7.4
2.5 0.6 14.2 23.1
3 0.1 12.2 92.0
3.5 0.0 10.6 455.6
4 0.0 9.3 2936.7
4.5 0.0 8.4 14166.0
The acceptance region is C = [μ− , +∞[
A New Rejection Sampling Method … 527

Fig. 2 Crude rejection


sampling using 2000
simulations. The acceptance
rate is 3 %

From the last column, we observe that our algorithm outperforms crude rejection
sampling. For instance, the proposed algorithm is approximately 14,000 times faster
than the crude rejection sampling when the acceptance region is [4.5, +∞[. Note
also that the acceptance rate remains stable for large μ− (near 10 %) for the RSM
method whereas it decreases rapidly to zero for crude rejection sampling.
Now we investigate the performance of the RSM algorithm using a convex set
in two dimensions. To do this, we consider azero-mean bivariate Gaussian random
4 2.5
vector x with covariance matrix Σ, equal to . Assume that the convex set
2.5 2
C ∈ R2 is defined by the following inequality constraints:

−10 ≤ x2 ≤ 0 and x1 ≥ −15, 5x1 − x2 + 15 ≤ 0.

It is the acceptance region used in Figs. 2 and 3. By minimizing a quadratic form


subject to linear constraints, we find the mode

1
μ∗ = arg min x  Σ −1 x ≈ (−3.4, −2.0),
x∈C 2

and then we compare crude rejection sampling to RSM.


In Fig. 2, we use crude rejection sampling in 2000 simulations of a N (0, Σ).
Given the number of points in C (black points), it is clear that the algorithm is not
efficient. The reason is that the mean of the bivariate normal distribution is outside the
acceptance region. In Fig. 3, we first simulate from the shifted distribution centered
at the mode with same covariance matrix Σ (step one of the RSM algorithm). Now
in the second step of the RSM algorithm, we have two types of points (black and gray
ones) in the convex set C . The gray points are in C but do not respect the inequality
constraint in the RSM algorithm (see Corollary 1). The black points are in C , and
528 H. Maatouk and X. Bay

Fig. 3 Rejection sampling


from the mode using 2000
simulations. The acceptance
rate is 21 %

Table 3 Comparison between crude rejection sampling and RSM with respect to the dimension d
Dimension d μ− Acceptance rate Acceptance rate Gain
with crude with RSM (%)
rejection
sampling (%)
1 2.33 1.0 15.0 15.0
2 1.29 1.0 5.2 5.2
3 0.79 1.0 2.5 2.5
4 0.48 1.0 1.5 1.5
5 0.25 1.0 1.2 1.2
The acceptance region is C = [μ− , +∞[d

respect this inequality constraint. We observe that RSM outperforms crude rejection
sampling, with acceptance rate of 21 % against 3 %.
Now we investigate the influence of the problem dimension d. We simulate a
standard multivariate normal distribution X restricted to C = [μ− , +∞[d , where
μ− is chosen such that P(X ∈ C ) = 0.01. The mean of the multivariate normal
distribution is outside the acceptance region. Simulation of truncated normal dis-
tributions in multidimensional cases is a difficult problem for rejection algorithms.
As shown in Table 3, the RSM algorithm is interesting up to dimension three. How-
ever, simulation of truncated multivariate normal distribution in high dimensions is
a difficult problem for exact rejection methods. In that case, an adaptive rejection
sampling for Gibbs sampling is needed, see e.g. [13]. From Table 3, we can remark
that when the dimension increases, the parameter μ− tends to zero. Hence, the mode
μ∗ = (μ− , . . . , μ− ) tends to the zero-mean of the Gaussian vector X . And so, the
acceptance rate of the proposed method converges to the acceptance rate of the crude
rejection sampling.
A New Rejection Sampling Method … 529

4 Conclusion

In this paper, we develop a new rejection technique, called RSM, to simulate a


truncated multivariate normal distribution restricted to convex sets. The proposed
method only requires to find the mode of the target probability density function
restricted to the convex acceptance region. The proposal density function in the
RSM algorithm is the shifted target distribution centered at the mode. We provide a
theoretical formula of the optimal constant such that the proposal density function
is as close as possible to the target density. An illustrative example to compare RSM
with crude rejection sampling is included. The simulation results show that using
rejection sampling from the mode is more efficient than crude rejection sampling.
Comparisons with Robert’s method in the one dimensional case is discussed. The
RSM method outperforms Robert’s method when the acceptance interval is large and
the probability of the normal distribution to be inside is low. The proposed rejection
method has been applied in the case where the acceptance region is a convex subset
of Rd , and can be extended to non-convex regions by using the convex hull. Note
that it is an exact method and it is easy to implement since the mode is calculated
as a Bayesian estimator in many application. For instance, the proposed algorithm
has been used to simulate a conditional Gaussian process with inequality constraints
(see [20]). An adaptive rejection sampling for Gibbs sampling is needed to improve
the acceptation rate of the proposed method.

Acknowledgments This work has been conducted within the frame of the ReDice Consortium,
gathering industrial (CEA, EDF, IFPEN, IRSN, Renault) and academic (Ecole des Mines de Saint-
Etienne, INRIA, and the University of Bern) partners around advanced methods for Computer
Experiments. The authors wish to thank Olivier Roustant (EMSE), Laurence Grammont (ICJ, Lyon
1) and Yann Richet (IRSN, Paris) for helpful discussions, as well as the anonymous reviewers for
constructive comments and the participants of MCQMC2014 conference.

References

1. Botts, C.: An accept-reject algorithm for the positive multivariate normal distribution. Comput.
Stat. 28(4), 1749–1773 (2013)
2. Breslaw, J.: Random sampling from a truncated multivariate normal distribution. Appl. Math.
Lett. 7(1), 1–6 (1994)
3. Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46(3), 167–174 (1992)
4. Chopin, N.: Fast simulation of truncated Gaussian distributions. Stat. Comput. 21(2), 275–288
(2011)
5. Da Veiga, S., Marrel, A.: Gaussian process modeling with inequality constraints. Annales de
la faculté des sciences de Toulouse 21(3), 529–555 (2012)
6. Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986)
7. Ellis, N., Maitra, R.: Multivariate Gaussian simulation outside arbitrary ellipsoids. J. Comput.
Graph. Stat. 16(3), 692–708 (2007)
8. Emery, X., Arroyo, D., Peláez, M.: Simulating large Gaussian random vectors subject to
inequality constraints by Gibbs sampling. Math. Geosci. 1–19 (2013)
530 H. Maatouk and X. Bay

9. Freulon, X., Fouquet, C.: Conditioning a Gaussian model with inequalities. In: Soares, A. (ed.)
Geostatistics Tróia ’92, Quantitative Geology and Geostatistics, vol. 5, pp. 201–212. Springer,
Netherlands (1993)
10. Gelfand, A.E., Smith, A.F.M., Lee, T.M.: Bayesian analysis of constrained parameter and
truncated data problems using Gibbs sampling. J. Am. Stat. Assoc. 87(418), 523–532 (1992)
11. Geweke, J.: Exact inference in the inequality constrained normal linear regression model. J.
Appl. Econom. 1(2), 127–141 (1986)
12. Geweke, J.: Efficient simulation from the multivariate normal and student-t distributions subject
to linear constraints and the evaluation of constraint probabilities. In: Proceedings of the 23rd
Symposium on the Interface Computing Science and Statistics, pp. 571–578 (1991)
13. Gilks, W.R., Wild, P.: Adaptive rejection sampling for Gibbs sampling. J. R. Stat. Soc. Series
C (Applied Statistics) 41(2), 337–348 (1992)
14. Goldfarb, D., Idnani, A.: A numerically stable dual method for solving strictly convex quadratic
programs. Math. Progr. 27(1), 1–33 (1983)
15. Griffiths, W.E.: A Gibbs sampler for the parameters of a truncated multivariate normal distri-
bution. Department of Economics - Working Papers Series 856, The University of Melbourne
(2002)
16. Hörmann, W., Leydold, J., Derflinger, G.: Automatic Nonuniform Random Variate Generation.
Statistics and Computing. Springer, Berlin (2004)
17. Kotecha, J.H., Djuric, P.: Gibbs sampling approach for generation of truncated multivariate
Gaussian random variables. IEEE Int. Conf. Acoust. Speech Signal Process. 3, 1757–1760
(1999)
18. Laud, P.W., Damien, P., Shively, T.S.: Sampling some truncated distributions via rejection
algorithms. Commun. Stat. - Simulation Comput. 39(6), 1111–1121 (2010)
19. Li, Y., Ghosh, S.K.: Efficient sampling method for truncated multivariate normal and student
t-distribution subject to linear inequality constraints. http://www.stat.ncsu.edu/information/
library/papers/mimeo2649_Li.pdf
20. Maatouk, H., Bay, X.: Gaussian process emulators for computer experiments with inequality
constraints (2014). https://hal.archives-ouvertes.fr/hal-01096751
21. Martino, L., Miguez, J.: An adaptive accept/reject sampling algorithm for posterior probability
distributions. In: IEEE/SP 15th Workshop on Statistical Signal Processing, SSP ’09, pp. 45–48
(2009)
22. Martino, L., Miguez, J.: A novel rejection sampling scheme for posterior probability distribu-
tions. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
Processing ICASSP, pp. 2921–2924 (2009)
23. Philippe, A., Robert, C.P.: Perfect simulation of positive Gaussian distributions. Stat. Comput.
13(2), 179–186 (2003)
24. Robert, C.P.: Simulation of truncated normal variables. Stat. Comput. 5(2) (1995)
25. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, Berlin (2004)
26. Rodriguez-Yam, G., Davis, R.A., Scharf, L.L.: Efficient Gibbs sampling of truncated multivari-
ate normal with application to constrained linear regression (2004). http://www.stat.columbia.
edu/~rdavis/papers/CLR.pdf
27. Von Neumann, J.: Various techniques used in connection with random digits. J. Res. Nat. Bur.
Stand. 12, 36–38 (1951)
28. Jun-wu YU, G.l.T.: Efficient algorithms for generating truncated multivariate normal distribu-
tions. Acta Mathematicae Applicatae Sinica, English Series 27(4), 601 (2011)
Van der Corput and Golden Ratio Sequences
Along the Hilbert Space-Filling Curve

Colas Schretter, Zhijian He, Mathieu Gerber, Nicolas Chopin


and Harald Niederreiter

Abstract This work investigates the star discrepancies and squared integration
errors of two quasi-random points constructions using a generator one-dimensional
sequence and the Hilbert space-filling curve. This recursive fractal is proven to maxi-
mize locality and passes uniquely through all points of the d-dimensional space. The
van der Corput and the golden ratio generator sequences are compared for random-
ized integro-approximations of both Lipschitz continuous and piecewise constant
functions. We found that the star discrepancy of the construction using the van der
Corput sequence reaches the theoretical optimal rate when the number of samples
is a power of two while using the golden ratio sequence performs optimally for
Fibonacci numbers. Since the Fibonacci sequence increases at a slower rate than the
exponential in base 2, the golden ratio sequence is preferable when the budget of
samples is not known beforehand. Numerical experiments confirm this observation.

Keywords Quasi-random points · Hilbert curve · discrepancy · golden ratio


sequence · numerical integration

C. Schretter (B)
ETRO Department, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
e-mail: cschrett@vub.ac.be
C. Schretter
IMinds, Gaston Crommenlaan 8, Box 102, 9050 Ghent, Belgium
Z. He
Tsinghua University, Haidian Dist., Beijing 100084, China
M. Gerber
Université de Lausanne, 1015 Lausanne, Switzerland
N. Chopin
Centre de Recherche En Économie Et Statistique, ENSAE, 92245 Malakoff, France
H. Niederreiter
RICAM, Austrian Academy of Sciences, Altenbergerstr. 69, 4040 Linz, Austria

© Springer International Publishing Switzerland 2016 531


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_28
532 C. Schretter et al.

1 Introduction

The Hilbert space-filling curve in two dimensions [1], first described in 1891 by David
Hilbert, is a recursively defined fractal path that passes uniquely through all points
of the unit square. The Hilbert curve generalizes naturally in higher dimensions
and presents interesting potential for the construction of quasi-random point sets
and sequences. In particular, its construction ensures the bijectivity, adjacency and
nesting properties that we define in the following.
For integers d ≥ 2 and m ≥ 0, let
 2d m −1
Imd = Imd (k) := [k, k + 1] 2−d m k=0 (1)

be the splitting of [0, 1] into closed intervals of equal size 2−d m and Smd be the
splitting of [0, 1]d into 2d m closed hypercubes of volume 2−d m . First, writing
H : [0, 1] → [0, 1]d for the Hilbert space-filling curve mapping, the set Smd (k) :=
H (Imd (k)) is a hypercube that belongs to Smd (bijectivity property). Second, for any
k ∈ {0, . . . , 2d m − 2}, Smd (k) and Smd (k + 1) have at least one edge in common (adja-
cency property). Finally, if we split Imd (k) into the 2d successive closed intervals
d
Im+1 (ki ), ki = 2d k + i and i ∈ {0, . . . , 2d − 1}, then Sm+1
d
(ki ) are simply the split-
−d(m+1)
ting of Sm (k) into 2 closed hypercubes of volume 2
d d
(nesting property).
The Hilbert space-filling curve has already been applied to many problems in
computer science such as clustering points [2] and optimizing cache coherence for
efficient database access [3]. The R*-tree data structure has also been proposed
for efficient searches of points and rectangles [4]. Similar space-filling curves have
been used to heuristically propose approximate solutions to the traveling salesman
problem [5]. In computer graphics, the Hilbert curve has been used to define strata
prior to stratified sampling [6]. Very recently, the inverse Hilbert mapping has also
been applied to sequential quasi-Monte Carlo methods [7].

Fig. 1 First three steps of the recursive construction of the Hilbert space-filling curve in two
dimensions. The dots snap to the closest vertex on an implicit Cartesian grid that cover the space
with an arbitrary precision increasing with the recursion order of the mapping calculations
Van der Corput and Golden Ratio Sequences … 533

The recursive definition of the Hilbert space-filling curve provides levels of details
for approximations of a continuous mapping from 1-D to d-D with d ≥ 2, up to any
arbitrary numerical precision. An illustration of the generative process of the curve
with increasing recursion order is shown in Fig. 1. Efficient computer implementa-
tions exists for computing Hilbert mappings, both in two dimensions [8, 9] and up to
32 or 64 dimensions [10]. Therefore, the Hilbert space-filling curve allows fast con-
structions of point sets and sequences using a given generator set of coordinates in
the unit interval. The remainder of this work focuses on comparing the efficiency of
two integro-approximation constructions, using either the van der Corput sequence
or the golden ratio sequence [11].

2 Integro-Approximations

Let f (·) be a d-dimensional function that is not analytically integrable on the unit
cube [0, 1]d . We aim at estimating an integral

μ= f (X ) d X. (2)
[0,1]d

Given a one-dimensional sequence x0 , . . . , xn−1 in [0, 1), we can get a correspond-


ing sequence of points P0 , . . . , Pn−1 in [0, 1)d in the domain of integration via the
mapping function H : [0, 1] → [0, 1]d towards samples into the d-dimensional unit
cube. The integral μ can therefore be estimated by the following average:

1
n−1
μ ≈ μ̂ = f (H (xi )). (3)
n i=0

Recent prior work by He and Owen [12] studied such approximations with the
van der Corput sequence as the one-dimensional input for the Hilbert mapping func-
tion H . To define the van der Corput sequence, let


i= dk (i)bk−1 for dk (i) ∈ {0, 1, . . . , b−1} (4)
k=1

be the digit expansion in base b ≥ 2 of the integer i ≥ 0. Then, the ith element of
the van der Corput sequence is defined as


xi = dk (i)b−k . (5)
k=1
534 C. Schretter et al.

1 9 5 13 3 11 7 2 10 6 4 12 8

1 6 11 3 8 13 5 10 2 7 12 4 9

Fig. 2 The first 13 coordinates generated by the van der Corput (top) and the golden ratio (bottom)
sequences. For this specific choice of number of samples, the points are more uniformly spread on
the unit interval with the golden ratio sequence and the maximum distance between the two closest
coordinates is smaller than in the van der Corput sequence

Fig. 3 The first hundred (top row) and thousand (bottom row) points generated by marching along
the Hilbert space-filling curve with distances given by the van der Corput sequence (left) and the
golden ratio sequence (right). In contrast to using the golden ratio number, the van der Corput
construction generates points that are implicitly aligned on a regular Cartesian grid
Van der Corput and Golden Ratio Sequences … 535

Alternatively, one can choose as input a specific instance of the one-dimensional


Richtmyer sequences [13], based on the golden ratio number. Given a seed parameter
s ∼ U([0, 1)) for randomization, the golden ratio sequence is defined as

xi = {s + i · φ}, (6)

where {t} denotes the fractional part of the real number t and φ is the golden ratio
(or golden section) number

1+ 5
φ= ≈ 1.6180339887 . . . ; (7)
2
however, since only fractional parts are retained, we can as well substitute φ by the
golden ratio conjugate number

1
τ =φ−1= ≈ 0.6180339887 . . . . (8)
φ

In prior work, we explored applications of these golden ratio sequences for gener-
ating randomized integration quasi-lattices [14] and for non-uniform sampling [15].
Figure 2 compares the first elements of the van der Corput generator and the golden
ratio sequence with s = 0. Figure 3 shows their images in two dimensions through
the Hilbert space-filling curve mapping. It is worth pointing out that both of the van
der Corput and the golden ratio sequences are extensible, while the latter spans the
unit interval over a larger range.

3 Star Discrepancy

A key corollary of the strong irrationality of the golden ratio is that the set of coordi-
nates will not align on any regular grid in the golden ratio sequence. Therefore, we
could expect that irregularities in the generated sequence of point samples could be
advantageous in case the function to integrate contains regular alignments or self-
repeating structures. In order to compare their potential performance for integro-
approximation problems, we use the star discrepancy to measure the uniformity of
the resulting sequence P = (P0 , . . . , Pn−1 ). d
For a = (a1 , . . . , ad ) ∈ [0, 1]d , let [0, a) be the anchored box i=1 [0, ai ). The
star discrepancy of P is
 
 A(P, [0, a)) 
Dn∗ (P) = sup   − λd ([0, a)) (9)
a∈[0,1)d n

with the counting function A giving the number of points from the set P that belong
to [0, a) and λd being the d-dimensional Lebesgue measure, i.e., the area for d = 2.
536 C. Schretter et al.

0
10
VDC
VDC:n=2k
−1
10
GR
GR:n=F(k)
Star discrepancy n−1
−2
10

−3
10

−4
10
0 1 2 3
10 10 10 10
Number of samples

Fig. 4 A comparison of the star discrepancies of the dyadic van der Corput (VDC) and the golden
ratio (GR) sequences. The dots are evaluated at n = 2k , k = 1, . . . , 12 for the VDC construction
and at n = F(k), k = 1, . . . , 18 for the GR construction. The reference line is n −1

It is possible to compute exactly the star discrepancy of some one-dimensional


sequences by Theorem 2.6 of [16]. It is also known that the star discrepancy of the van
der Corput sequence is O(n −1 log(n)), and the star discrepancy of the golden ratio
sequence is of the same order for n ≥ 2. Figure 4 compares the star discrepancies of
the van der Corput sequence and the golden ratio sequence. We observe that the star
discrepancies of the two sequences are slightly worse than O(n −1 ), which is in line
with the theoretical rate O(n −1+ε ) for any ε > 0.
Let F(k) be the Fibonacci sequence satisfying F(0) = 0, F(1) = 1 and F(k) =
F(k − 1) + F(k − 2) for k ≥ 2. It is of interest to investigate the star discrepancy of
P = {H (x0 ), . . . , H (xn−1 )} when n = F(k), k ≥ 1. We can show that if (xi )i≥0 is
the anchored (s = 0) golden ratio sequence, then each interval I j = [( j − 1)/n, j/n)
for j = 1, . . . , n, contains precisely one of the xi if n = F(k) for any k ≥ 1. This
follows from the proof of Theorem 3.3 in [16] in which we consider the point set P
with n i = 0 and z = φ or τ in that proof.
If we combine the above observation with Theorem 3.1 in [12], then we have the
following star discrepancy bound for P:

Dn∗ (P) ≤ 4d d + 3n −1/d + O(n −2/d ) (10)

with n = F(k), k ≥ 1.
From the result above, we can see that in most cases the star discrepancy of the
golden ratio sequence is smaller than that of the van der Corput sequence. It is also
of interest to compare the performance of the resulting point sequences P generated
by the van der Corput and golden ratio sequences. For the former, we can prove that
the star discrepancy of P is O(n −1/d ) [12].
Van der Corput and Golden Ratio Sequences … 537

More generally, for an arbitrary one-dimensional point set x0 , . . . , xn−1 in [0, 1],
the following result provides a bound for the star discrepancy of the resulting
d-dimensional point set P:
Theorem 1 Let x0 , . . . , xn−1 be n ≥ 1 points in [0, 1] and P = {H (x0 ), . . . ,
H (xn−1 )}. Then

Dn∗ (P) ≤ c Dn∗ {xi }i=0
n−1 1/d
(11)

for a constant c depending only on d.


Proof For the sake of simplicity we assume that the Hilbert curve starts at (0, . . . , 0) ∈
[0, 1]d . Let m ≥ 0 be an arbitrary integer and a ∈ [0, 1)d be such that Smd (0) ⊆
B := [0, a). Let SmB = {W ∈ Smd : W ⊆ B}, B̃ = ∪SmB and DmB = {W ∈ Smd :
(B\ B̃) ∩ W = ∅}. Then, let D̃mB be the set of #DmB disjoint subsets of [0, 1]d such
that

1. ∀ W̃ ∈ D̃mB , ∃ W ∈ DmB | W̃ ⊆ W, 2. ∪ D̃mB = DmB , 3. B̃ ∩ {∪D̃mB } = ∅.


(12)
Note that D̃mB is obtained by removing boundaries of the elements in DmB such that
the above conditions 2 and 3 are satisfied. Then, we have

   

 
 A(P , B)   A(P , B̃)    A(P , W̃ ∩ B) 

 − λd (B) ≤  − λd ( B̃) +  − λd (W̃ ∩ B) .
 n  n   n 
W̃ ∈D̃mB
(13)

To bound the first term on the right-hand side, let S˜mB = {Smd (0)} ∪ {Smd (k) ∈
Smd , k ≥ 1 such that Smd (k) ⊆ B, Smd (k − 1) ∩ B c = ∅} so that B̃ contains #S˜mB
non-consecutive hypercubes belonging to Smd . By the property of the Hilbert curve,
consecutive hypercubes in Smd correspond to consecutive intervals in Imd (adja-
cency property). Therefore, h( B̃) contains at most #S˜mB non consecutive inter-
vals that belong to Imd so that there exist disjoint closed intervals I j ⊂ [0, 1], j =
#S˜ B +1
1, . . . , #S˜mB + 1 such that h( B̃) = ∪ j=1m I j . Hence, since the point set {xi }i=0
n−1
is
in [0, 1) we have, using Proposition 2.4 of [16],
   
 A(P, B̃)   A({x }, h( B̃))  
   
− λ1 (h( B̃)) ≤ 2(#S˜mB + 1) D ∗ {xi }i=0
i
 − λd ( B̃) =  n−1
.
 n   n 
(14)

To bound #S˜mB , let m 1 ≤ m be the smallest positive integer such that Smd 1 (0) ⊆ B
and let km∗ 1 be the maximal number of hypercubes in SmB1 . Note that km∗ 1 = 2m 1 (d−1) .
Indeed, by the definition of m 1 , the only way for B to be made of more than one
hypercube in Smd1 is to stack such hypercubes in at most (d − 1) dimensions, other-
wise, we can reduce m 1 to (m 1 − 1) due to the nesting property of the Hilbert curve.
538 C. Schretter et al.

In each dimension we can stack at most 2m 1 hypercubes that belong to SmB1 so that
km∗ 1 = 2m 1 (d−1) .
Let m 2 = (m 1 + 1) and Bm 2 = B\ ∪ SmB1 . Then,

Bm
#Sm 2 2 ≤ km∗ 2 := 2d 2m 2 (d−1) (15)

Bm
since, by construction, #Sm 2 2 is the number of hypercubes in Smd2 required to cover
the faces other than the ones that are along the axis of the hyperrectangle made by
the union of the hypercubes in SmB1 . This hyperrectangle has at most 2d faces of
dimension (d − 1). The volume of each face is smaller than 1 so that we need at
most 2m 2 (d−1) hypercubes in Smd2 to cover each face.
Bm Bm
More generally, for m 1 ≤ m k ≤ m, we define Bm k := Bm k−1 \ ∪ Sm k −1 k−1
and #Sm k k
∗ m k (d−1)
is bounded by km k := 2d2 . Note that, for any j = 1, . . . , k − 1, the union of
Bm
all hypercubes belonging to Sm j j forms a hyperrectangle having at most 2d faces
of dimension (d − 1). Therefore, since d ≥ 2, we have


m−1
2(m−m 1 )(d−1) − 1
#S˜mB ≤ km∗ + k ∗j = 2d 2m(d−1) + 2d 2m 1 (d−1) ≤ 4d 2m(d−1)
j=m 1
2d−1 − 1
(16)
so that
 
 A(P, B̃)  
 
 − λd ( B̃) ≤ 2(1 + 4d 2m(d−1) ) D ∗ {xi }i=0
n−1
. (17)
 n 

For the second term of (13), take W̃ ∈ D̃mB and note that W̃ ⊆ Smd (k) for a k ∈
{0, . . . , 2dm − 1}. Then,
 
 A(P, W̃ ∩ B)  A(P, Smd (k))
 
 − λd (W̃ ∩ B) ≤ + λd (Smd (k))
 n  n
A({xi }, Imd (k))
= + λ1 (Imd (k)) (18)
n

≤ 2λ1 (Imd (k)) + 2 D ∗ {xi }i=0
n−1


= 2 2−dm + D ∗ {xi }i=0n−1

where the last inequality uses the fact that the xi ’s are in [0, 1) as well as Proposition
2.4 in [16]. Thus,
 
  A(P, W̃ ∩ B) 
 
 − λd (W̃ ∩ B) ≤ 2d 2−m + 2d 2m(d−1) D ∗ {xi }i=0 n−1
(19)
 n 
W̃ ∈D̃mB
Van der Corput and Golden Ratio Sequences … 539

since #D̃mB = #DmB ≤ d 2m(d−1) , as we show in the following.


Indeed, by construction, #DmB is the number of hypercubes in Smd required to
cover the faces other than the ones that are along the axis of the hyperrectangle made
by the union of the hypercubes in SmB . This hyperrectangle has d faces of dimension
(d − 1) that are not along an axis. The volume of each face is smaller than 1 so that
we need at most 2(d−1)m hypercubes in Smd to cover each face.
Hence, for all a ∈ [0, 1)d such that Smd (0) ⊆ [0, a) we have
 
 A(P, [0, a))   
 − λd ([0, a)) ≤ 2d 2−m + D ∗ {xi }i=0
n−1
2 + 10d 2m(d−1) . (20)
 n

Finally, if a ∈ [0, 1)d is such that Smd (0)  [0, a), we proceed exactly as above,
but now B̃ is empty and therefore the first term in (13) disappears. To conclude
the proof, we choose the optimal value of m such that 2−m ∼ 2(d−1)m D ∗ {xi }i=0 n−1
.

n−1 1/d
Hence, D ∗ (P) ≤ c D ∗ {xi }i=0 for a constant c depending only on d. 

Compared to the result obtained for the van der Corput sequence, which only relies
on the Hölder property of the Hilbert curve [12], it is worth noting that Theorem 1
is based on its three key geometric properties: bijectivity, adjacency and nesting.
Theorem 1 is of key importance in this work as it says that the discrepancy of
the point set is monotonously related to the discrepancy of the generator sequence.
From this point of view, we can see that the star discrepancy of P generated by the
golden ratio sequence is O(n −1/d log(n)1/d ) for n ≥ 2. Numerical experiments will
compare the van der Corput and the golden ratio generator sequences and highlight
practical implications for computing the cubatures of four standard test functions.

4 Numerical Experiments

For the scrambled van der Corput sequences, the mean squared error (MSE) for
integration of Lipschitz continuous integrands is in O(n −1−2/d ) [12]. Additionally,
it is also shown in [12] that for discontinuous functions whose boundary of discon-
tinuities has bounded (d − 1)-dimensional Minkowski content, one can get an MSE
of O(n −1−1/d ). We will compare the two quasi-Monte Carlo constructions using
randomized sequences in our following numerical experiments.
We consider first two smooth functions that were studied in [17, 18] and are shown
in the first row of Fig. 5. The “Additive” function

f 1 (X ) = X 1 + X 2 , X = (X 1 , X 2 ) ∈ [0, 1]2 , (21)

and the “Smooth” function that is the exponential surface

f 2 (X ) = X 2 exp(X 1 X 2 ), X = (X 1 , X 2 ) ∈ [0, 1]2 . (22)


540 C. Schretter et al.

3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
1 1
0.8 0.8
0 0.6 0 0.6
0.2 0.4 0.2 0.4
0.4 0.4
0.6 0.2 0.6 0.2
0.8 0.8
1 0 1 0

3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
1 1
0.8 0.8
0 0.6 0 0.6
0.2 0.4 0.2 0.4
0.4 0.4
0.6 0.2 0.6 0.2
0.8 0.8
1 0 10

Fig. 5 The four test functions used for integro-approximation experiments. The smooth functions
on the first rows are fairly predictable as their variations are locally coherent. However, the functions
on the second row contain sharp changes that are difficult to capture with discrete sampling

This Lipschitz function in particular has infinitely many continuous derivatives.


It is known that for Lipschitz continuous functions, the scrambled van der Corput
sequence yields an MSE O(n −2 log(n)2 ) for arbitrary sample size n ≥ 2 and when
n = bk , k = 1, . . . , the MSE becomes O(n −2 ) [12]. Figure 6 shows that the MSEs
for the randomized van der Corput and golden ratio sequences are nearly O(n −2 ).
When n = 2k , the van der Corput sequence performs better than the golden ratio
sequence. But in most cases of n = 2k , the golden ratio sequence outperforms the
van der Corput sequence. In plots, the dots are evaluated at n = 2k , k = 1, . . . , 12
for the VDC construction and at n = F(k), k = 1, . . . , 18 for the GR construction.
The MSEs are computed based on 100 repetitions.
We consider now the examples in the second row of Fig. 5: the “Cusp” function

f 3 (X ) = max(X 1 + X 2 − 1, 0), X = (X 1 , X 2 ) ∈ [0, 1]2 , (23)

and the “Discontinuous” function that is the indicator

f 4 (X ) = 1{X 1 +X 2 >1} (X ), X = (X 1 , X 2 ) ∈ [0, 1]2 . (24)


Van der Corput and Golden Ratio Sequences … 541

Additive
0
10
VDC
k
−2
VDC:n=2
10 GR
GR:n=F(k)
10
−4 n−2
MSE

−6
10

−8
10

−10
10
0 1 2 3
10 10 10 10
Number of samples

Smooth
0
10
VDC
k
VDC:n=2
GR
−2
10 GR:n=F(k)
−2
n
MSE

−4
10

−6
10

−8
10
0 1 2 3
10 10 10 10
Number of samples

Fig. 6 A comparison of the mean squared errors (MSEs) of the randomized van der Corput and
the golden ratio sequences for the smooth functions f 1 (top) and f 2 (bottom). The reference line is
n −2

In particular, the discontinuity boundary of this indicator function has finite


Minkowski content. This step function was previously studied with a van der Corput
generator sequence in [12]. It was found that for this function the scrambled van
der Corput sequence yields an MSE O(n −3/2 ) for arbitrary sample size n. Figure 7
shows that the MSEs for the randomized van der Corput and golden ratio sequences
are close to O(n −3/2 ). In most cases, the golden ratio sequence seems to outperform
the construction of quasi-random samples using the van der Corput sequence.
542 C. Schretter et al.

Cusp
0
10
VDC
k
−2 VDC:n=2
10
GR
GR:n=F(k)
−4 −2
10 n
MSE

−6
10

−8
10

−10
10
0 1 2 3
10 10 10 10
Number of samples

Discontinuous
0
10
VDC
VDC:n=2k
10
−2 GR
GR:n=F(k)
n−1.5
MSE

−4
10

−6
10

−8
10
0 1 2 3
10 10 10 10
Number of samples

Fig. 7 A comparison of the mean squared errors (MSEs) of the randomized van der Corput sequence
and golden ratio sequences for the functions f 3 (top) and f 4 (bottom). The reference line is n −1.5
for the discontinuous step function and n −2 for the continuous function

5 Conclusions

This work evaluated the star discrepancy and squared integration error for two con-
structions of quasi-random points, using the Hilbert space-filling curve. We found that
using the fractional parts of integer multiples of the golden ratio number often leads to
improved results, especially when the number of samples is close to a Fibonacci num-
ber. The discrepancy of the point sets increases monotonously with the discrepancy
of the generator one-dimensional sequence, therefore the van der Corput sequence
Van der Corput and Golden Ratio Sequences … 543

leads to optimal results in the specific cases when the generating coordinates are
equally-spaced.
In future work, we plan to investigate generalizations of the Hilbert space-filling
curve in higher dimensions. A deterioration of the discrepancy is expected as the
dimension increases, an effect linked to the curse of dimensionality. Since the Hilbert
space-filling curve is accepted by a pseudo-inverse operator, the problem of con-
structing quasi-random samples is reduced to choosing a suitable generator one-
dimensional sequence. We therefore hope that the preliminary observations presented
here may spark subsequent research towards designing adapted generator sequences,
given specific integration problems at hand.

Acknowledgments The authors thank Art Owen for suggesting conducting the experimental com-
parisons presented here, his insightful discussions and his reviews of the manuscript.

References

1. Bader, M.: Space-Filling Curves—An Introduction with Applications in Scientific Computing.


Texts in Computational Science and Engineering, vol. 9. Springer, Berlin (2013)
2. Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of
Hilbert space-filling curve. Technical report, University of Maryland, College Park, MD, USA
(1996)
3. Terry, J., Stantic, B., Terenziani, P., Sattar, A.: Variable granularity space filling curve for index-
ing multidimensional data. In: Proceedings of the 15th International Conference on Advances
in Databases and Information Systems, ADBIS’11, pp. 111–124. Springer (2011)
4. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust
access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD Interna-
tional Conference on Management of Data, pp. 322–331 (1990)
5. Platzman, L.K., Bartholdi III, J.J.: Spacefilling curves and the planar travelling salesman prob-
lem. J. ACM 36(4), 719–737 (1989)
6. Steigleder, M., McCool, M.: Generalized stratified sampling using the Hilbert curve. J. Graph.
Tools 8(3), 41–47 (2003)
7. Gerber, M., Chopin, N.: Sequential quasi-Monte Carlo. J. R. Stat. Soc. Ser. B 77(3), 509–579
(2015)
8. Butz, A.: Alternative algorithm for Hilbert’s space-filling curve. IEEE Trans. Comput. 20(4),
424–426 (1971)
9. Jin, G., Mellor-Crummey, J.: SFCGen: a framework for efficient generation of multi-
dimensional space-filling curves by recursion. ACM Trans. Math. Softw. 31(1), 120–148 (2005)
10. Lawder, J.K.: Calculation of mappings between one and n-dimensional values using the Hilbert
space-filling curve. Research report BBKCS-00-01, University of London (2000)
11. Coxeter, H.S.M.: The golden section, phyllotaxis, and Wythoff’s game. Scr. Math. 19, 135–143
(1953)
12. He, Z., Owen, A.B.: Extensible grids: uniform sampling on a space-filling curve. e-print (2014)
13. Franek, V.: An algorithm for QMC integration using low-discrepancy lattice sets. Comment.
Math. Univ. Carolin 49(3), 447–462 (2008)
14. Schretter, C., Kobbelt, L., Dehaye, P.O.: Golden ratio sequences for low-discrepancy sampling.
J. Graph. Tools 16(2), 95–104 (2012)
15. Schretter, C., Niederreiter, H.: A direct inversion method for non-uniform quasi-random point
sequences. Monte Carlo Methods Appl. 19(1), 1–9 (2013)
544 C. Schretter et al.

16. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM,
Philadelphia (1992)
17. Sloan, I.H., Joe, S.: Lattice Methods for Multiple Integration. Clarendon Press, Oxford (1994)
18. Owen, A.B.: Local antithetic sampling with scrambled nets. Ann. Stat. 36(5), 2319–2343
(2008)
Uniform Weak Tractability
of Weighted Integration

Paweł Siedlecki

Abstract We study a relatively new notion of tractability called “uniform weak


tractability” that was recently introduced in (Siedlecki, J. Complex. 29:438–453,
2013 [5]). This notion holds for a multivariable problem iff the information com-
plexity n(ε, d) of its d-variate component to be solved to within ε is not an exponential
function of any positive power of ε−1 and/or d. We are interested in necessary and
sufficient conditions on uniform weak tractability for weighted integration. Weights
are used to control the “role” or “importance” of successive variables and groups
of variables. We consider here product weights. We present necessary and sufficient
conditions on product weights for uniform weak tractability for two Sobolev spaces
of functions defined over the whole Euclidean space with arbitrary smoothness,
and of functions defined over the unit cube with smoothness 1. We also briefly con-
sider (s, t)-weak tractability introduced in (Siedlecki and Weimar, J. Approx. Theory
200:227–258, 2015 [6]), and show that as long as t > 1 then this notion holds for
weighted integration defined over quite general tensor product Hilbert spaces with
arbitrary bounded product weights.

Keywords Tractability · Multivariate integration · Weighted integration

1 Introduction

There are many practical applications for which we need to approximate integrals of
multivariate functions. The number of variables d in many applications is huge. It is
desirable to know what is the minimal number of function evaluations that is needed
to approximate the integral to within ε and how this number depends on ε−1 and d.
In this paper we consider weighted integration. We restrict ourselves to prod-
uct weights which control the importance of successive variables and groups of
variables. We consider weighted integration defined over two Sobolev spaces. One
space consists of smooth functions defined over the whole Euclidean space, whereas

P. Siedlecki (B)
Faculty of Mathematics, Informatics and Mechanics, University of Warsaw,
Banacha 2, 02-097 Warszawa, Poland
e-mail: psiedlecki@mimuw.edu.pl
© Springer International Publishing Switzerland 2016 545
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_29
546 P. Siedlecki

the second one is an anchored space of functions defined on the unit cube that are
once differentiable with respect to all variables.
We find necessary and sufficient conditions on product weights to obtain uniform
weak tractability for weighted integration. This problem is solved by first establishing
a relation between uniform weak tractability and so called T -tractability. Then we
apply known results on T -tractability from [4].
We compare necessary and sufficient conditions on uniform weak tractability with
the corresponding conditions on strong polynomial, polynomial, quasi-polynomial
and weak tractability. All these conditions require some specific decay of product
weights. For different notions of tractability the decay is usually different.
We also briefly consider (s, t)-weak tractability introduced recently in [6]. This
notion holds if the minimal number of function evaluations is not exponential in ε−s
and d t . We stress that now s and t can be arbitrary positive numbers. We show that as
long as t > 1 then weighted integration is (s, t)-weakly tractable for a general tensor
product Hilbert space whose reproducing univariate kernel is finitely integrable over
its diagonal. This means that as long as we accept a possibility of an exponential
dependence on d α with α < t then we do not need decaying product weights and
we may consider even the case where all product weights are the same.

2 Multivariate Integration

Assume that for every d ∈ N we have a Borel measurable subset  Dd of Rd , and


ρd : Dd → R+ is a Lebesgue probability density function, Dd ρd (x)d x = 1. Let
Fd be a reproducing kernel Hilbert space of real integrable functions defined on a
common domain Dd with respect to the measure μd (A) = A ρd (x)d x defined on
all Borel subsets of Dd .
A multivariate integration is a problem INT = {INTd } such that

INTd : Fd → R : f → f (x)ρd (x)d x
Dd

for every d ∈ N
We approximate INTd ( f ) for f ∈ Fd by algorithms which use only partial infor-
mation about f . The information about f consists of a finite number of function
values f (t j ) at sample points t j ∈ Dd . In general, the points t j can be chosen adap-
tively, that is the choice of t j may depend on f (ti ) for i = 1, 2, . . . , j − 1. The
approximation of INTd ( f ) is then

Q n,d ( f ) = φn ( f (t1 ), f (t2 ), . . . , f (tn ))

for some, not necessarily linear, function φn : Rn → R.


The worst case error of Q n,d is defined as

e(Q n,d ) = sup |INTd ( f ) − Q n,d ( f )|.


 f  Fd ≤1
Uniform Weak Tractability of Weighted Integration 547

Since the use of adaptive information does not help we can restrict ourselves to
considering only non-adaptive algorithms, i.e., t j can be given simultaneously, see
[1]. It is also known that the best approximations can be achieved by means of linear
functions, i.e., φn can be chosen as a linear function. This is the result of Smolyak
which can be found in [1]. Therefore without loss of generality, we only need to
consider non-adaptive and linear algorithms of the form


n
Q n,d ( f ) = a j f (t j )
j=1

for some a j ∈ R and for some t j ∈ Dd .


For ε ∈ (0, 1) and d ∈ N, the information complexity n(ε, INTd ) of the problem
INTd is defined as the minimal number n ∈ N for which there exists an algorithm
Q n,d with the worst case error at most ε CRId ,

n(ε, INTd ) = min{ n : ∃ Q n,d such that e(Q n,d ) ≤ ε CRId }.

Here, CRId = 1 if we consider the absolute error criterion, and CRId = INTd  if
we consider the normalized error criterion.

3 Generalized Tractability and Uniform Weak Tractability

We first remind the reader of the basic notions of tractability. For more details we
refer to [3] and references therein. Recall that a function

T : [1, ∞) × [1, ∞) → [1, ∞)

is called a generalized tractability function iff T is nondecreasing in each of its


arguments and
ln T (x, y)
lim = 0.
x+y→∞ x+y

As in [2], we say that INT = {INTd } is T -tractable iff there are nonnegative numbers
C and t such that

n(ε, INTd ) ≤ C T (ε−1 , d)t ∀ ε ∈ (0, 1], d ∈ N.

We say that INT = {INTd } is strongly T -tractable iff there are nonnegative num-
bers C and t such that

n(ε, INTd ) ≤ C T (ε−1 , 1)t ∀ ε ∈ (0, 1], d ∈ N.


548 P. Siedlecki

Examples of T -tractability include polynomial tractability (PT) and strong poly-


nomial tractability (SPT) if T (x, y) = x y, and quasi-polynomial tractability (QPT)
if T (x, y) = exp((1 + ln x)(1 + ln y)).
We say that INT = {INTd } is weakly tractable (UWT) iff

ln n(ε, INTd )
lim = 0.
ε−1 +d→∞ ε−1 + d

As in [5], we say that INT = {INTd } is uniformly weakly tractable (UWT) iff

ln n(ε, INTd )
lim = 0 ∀ α, β ∈ (0, 1).
ε−1 +d→∞ ε−α + d β

Here we adopt convention that ln 0 = 0.


The following lemma gives a characterization of uniform weak tractability in
terms of a certain family of generalized tractability functions.
Lemma 1 For every α, β ∈ (0, 1) the function

Tα,β (x, y) = exp(x α + y β ) for all x, y ∈ [1, ∞)

is a generalized tractability function. Moreover,


INT is uniformly weakly tractable iff INT is Tα,β -tractable for every α, β ∈ (0, 1).
Proof It is obvious that for every α, β ∈ (0, 1) and fixed x, y ∈ [1, ∞)

Tα,β (x, ·) : [1, ∞) → [1, ∞) and Tα,β (·, y) : [1, ∞) → [1, ∞)

are non-increasing functions. Since for every α, β ∈ (0, 1) we have

ln Tα,β (x, y) x α + yβ
lim = lim = 0,
x+y→∞ x+y x+y→∞ x + y

it follows that Tα,β∈(0,1) is a generalized tractability function for every α, β ∈ (0, 1).
Suppose that INT is uniformly weakly tractable, i.e.,

ln n(ε, INTd )
lim =0 ∀ α, β > 0.
ε−1 +d→∞ εα + d β

Thus, for arbitrary but fixed α, β ∈ (0, 1), there exists t > 0 such that

ln n(ε, INTd ) ≤ t (ε−α + d β ) ∀ ε ∈ (0, 1], d ∈ N.

Hence  t
n(ε, INTd ) ≤ exp(ε−α + d β ) ∀ ε ∈ (0, 1], d ∈ N.

Therefore a problem S is Tα,β -tractable for all α, β ∈ (0, 1).


Uniform Weak Tractability of Weighted Integration 549

Assume now that INT is Tα,β -tractable for every α, β ∈ (0, 1). That is, for all
α, β ∈ (0, 1) there are positive C(α, β) and t (α, β) such that
 
n(ε, INTd ) ≤ C(α, β) exp t (α, β) (ε−α + d β ) ∀ ε ∈ (0, 1], d ∈ N.

Take now arbitrary positive α and β which may be larger than 1. Obviously there
exist α0 , β0 ∈ (0, 1) such that α0 < α and β0 < β. Since INTd is Tα0 ,β0 -tractable then

ln n(ε, INTd ) ln C(α0 , β0 ) + t (α0 , β0 )(ε−α0 + d β0 )


lim ≤ lim = 0.
ε−1 +d→∞ ε−α + d β ε−1 +d→∞ ε−α + d β

Since the choice of α, β > 0 was arbitrary, we conclude that

ln n(ε, INTd )
lim =0 ∀ α, β > 0,
ε−1 +d→∞ εα + d β

and the problem INT is uniformly weakly tractable, as claimed. 

We add that Lemma 1 holds not only for multivariate integration but also for all
multivariate problems.

4 Weighted Sobolev Spaces Over Unbounded Domain

In this section we specify the class Fd,γ as a weighted Sobolev space of smooth
functions f : Rd → R. More precisely, assume that a set of weights γ =
{γd,u }d∈N,u⊂1,2,...,d , with γd,u ≥ 0, is given. Then for r ∈ N, Fd = H (K d ) is a repro-
ducing kernel Hilbert space whose reproducing kernel is of the form
 
K d,γ (x, t) = γd,u R(x j , t j )
u⊂{1,2,...,d} j∈u

where
 ∞
(|t| − z)r+−1 (|x| − z)r+−1
R(x, t) = 1 M (x, t) dz for x, y ∈ R,
0 [(r − 1)!]2

and
M = {(x, t) ∈ R2 : xt ≥ 0}.

We assume that the weights γ are bounded product weights, i.e., γd,∅ = 1 and

γd,u = γd, j for non-empty u ⊂ {1, 2, . . . , d} (1)
j∈u
550 P. Siedlecki

where γd, j satisfy


0 ≤ γd, j < Γ

for some positive number Γ .


The weighted integration problem INTγ = {INTd,γ } is given as in [4, Sect. 12.4.2]:

INTd,γ : Fd,γ → R : f → f (t1 , t2 , . . . , td )ρ(t1 )ρ(t2 ) . . . ρ(td )dt,
Rd

where ρ : R → R is a non-negative function satisfying


 
ρ(t)dt = 1 and ρ(t)|t|r −1/2 dt < ∞.
R R

Theorem 1 Consider weighted integration problem INTγ for bounded product


weights. Assume that

ρ(t) ≥ c > 0 for t ∈ [a, b] for some a, b and c with a < b.

Then for both the absolute and normalized error criteria


d
j=1 γd, j
INTγ is uniformly weakly tractable iff lim = 0 for all α > 0.
d→∞ dα

Proof Lemma 1 implies that it is sufficient to prove that INTγ is Tα,β -tractable for
every α, β ∈ (0, 1). Here Tα,β is defined as in Sect. 3. From [4, Corollary 12.4] we
know that INTγ is Tα,β -tractable iff the following two conditions hold:

ln ε−1
lim sup < ∞, (2)
ε→0 ln Tα,β (ε−1 , 1)
d
j=1 γd, j
lim lim sup < ∞. (3)
ε→1− d→∞ ln Tα,β (ε−1 , d)

Since

ln ε−1 ln ε−1
lim = lim =0
ε→0 ln Tα,β (ε −1 , 1) ε→0 ε −α + 1

the first condition is satisfied for every α, β ∈ (0, 1) regardless of the choice of
weights γ . Note that for the second condition on Tα,β -tractability we have the fol-
lowing equivalence:
d d
j=1 γd, j j=1 γd, j
lim lim sup <∞ ⇐⇒ lim sup < ∞.
ε→1− d→∞ ε + dβ
−α
d→∞ dβ
Uniform Weak Tractability of Weighted Integration 551

Therefore the weighted integration INTγ is uniformly weakly tractable iff


d
j=1 γd, j
lim sup < ∞ ∀ β ∈ (0, 1). (4)
d→∞ dβ

Note that the last condition holds iff


d
j=1 γd, j
lim = 0 ∀ α > 0. (5)
d→∞ dα

Indeed, suppose that (4) holds. Obviously, it is enough to consider arbitrary α ∈


(0, 1). Then we take β = α/2, which also belongs to (0, 1), and
d d
d
j=1 γd, j 1 j=1 γd, j 1 j=1 γd, j
0 ≤ lim = lim β ≤ lim β lim sup = 0.
d→∞ dα d→∞ d dβ d→∞ d d→∞ dβ

Since (5) obviously implies (4) we have shown that the weighted integration INTγ
is uniformly weakly tractable iff the condition (5) is satisfied. 
After obtaining a necessary and sufficient condition on uniform weak tractability
of the weighted integration INTγ it is interesting to compare it with conditions on
other types of tractability, which were obtained in [4, Corollary 12.4].

The weighted integration INTγ is :



d
strongly polynomially tractable ⇐⇒ lim sup γd, j < ∞,
d→∞ j=1
d
j=1 γd, j
polynomially tractable ⇐⇒ lim sup < ∞,
d→∞ ln d
d
j=1 γd, j
quasi-polynomially tractable ⇐⇒ lim sup < ∞,
d→∞ ln d
d
j=1 γd, j
uniformly weakly tractable ⇐⇒ lim = 0 ∀ α > 0,
d→∞ dα
d
j=1 γd, j
weakly tractable ⇐⇒ lim = 0.
d→∞ d

Note that depending on the weights γ , the weighted integration INTγ can satisfy
one or some types of tractability.
• Let γd, j = 1

for β > 0. Then weighted integration INTγ is:
– strongly polynomially tractable iff β > 1,
– polynomially tractable, but not strongly polynomially tractable, iff β = 1,
– weakly tractable, but not uniformly weakly tractable, if β < 1.
552 P. Siedlecki

β
• Let γd, j = [ln ( j+1)]
j
for β ∈ R. Then weighted integration INTγ is uniformly
weakly tractable, but not polynomially tractable.

5 Weighted Anchored Sobolev Spaces

In this section we specify the class Fd,γ as a weighted anchored Sobolev space of
functions f : [0, 1]d → R that are once differentiable with respect to each variable.
More precisely, assume that a set of weights γ = {γd,u }d∈N,u⊂1,2,...,d , with γd,u ≥ 0,
is given. Then Fd = H (K d ) is a reproducing kernel Hilbert space whose reproducing
kernel is of the form
 
K d,γ (x, t) = γd,u R(x j , t j )
u⊂{1,2,...,d} j∈u

where

R(x, t) = 1 M (x, t) min(|x − a|, |t − a|) for x, y ∈ [0, 1],

for some a ∈ [0, 1] and

M = {(x, t) ∈ [0, 1]2 : (x − a)(t − a) ≥ 0}.

We assume that the weights γ are product weights, i.e., γd,∅ = 1 and

γd,u = γd, j for non-empty u ⊂ {1, 2, . . . , d}
j∈u

for non-negative γd, j .


The weighted integration problem INTγ = {INTd,γ } is given as in [4, Sect. 12.6.1]:

INTd,γ : Fd,γ → R : f → f (t)dt.
[0,1]d

Theorem 2 Consider weighted integration problem INTγ for product weights. Then
for both the absolute and normalized error criteria

d
j=1 γd, j
INTγ is uniformly weakly tractable iff lim = 0 for all α > 0.
d→∞ dα

Proof Again, applying Lemma 1 it is enough to verify Tα,β -tractability for all α, β ∈
(0, 1). From [4, Corollary 12.11] we know that conditions on Tα,β -tractability of
the weighted integration INTγ have the same form as those used in the proof of
Uniform Weak Tractability of Weighted Integration 553

Theorem 1. Therefore we can repeat the reasoning used in the proof of Theorem 1 to
obtain the same condition on uniform weak tractability of the presently considered
weighted integration problem. 

6 (s, t)-Weak Tractability with t > 1

As in [6], by (s, t)-weak tractability of the integration INT for positive s and t we
mean that
ln n(ε, INTd )
lim = 0.
−1
ε +d→∞ ε−s + d t

We now prove that (s, t)-weak tractability for any s > 0 and t > 1 holds for
weighted integration defined over quite general tensor product Hilbert spaces
equipped with bounded product weights γ . More precisely, let D be a Borel subset
 real line R and ρ : D → R+ be a Lebesgue probability density function on
of the
D, D ρ(x)d x = 1. Let H (K ) be an arbitrary reproducing kernel Hilbert space of
integrable real functions defined on D with the kernel K : D × D → R such that

K (x, x)ρ(x)d x < ∞. (6)
D

Let γ be a set of bounded product weights defined as in Sect. 4, see (1).


For d ∈ N and j = 1, 2, . . . , d, let

K 1,γd, j (x, y) = 1 + γd, j K (x, y) for x, y ∈ D

and

d
Fd,γ = H (K 1,γd, j ).
j=1

The weighted integration problem INTγ = {INTd,γ } is now given as



INTd,γ : Fd,γ → R : f → f (x1 , x2 , . . . , xd )ρ(x1 )ρ(x2 ) · · · ρ(xd )d x.
Dd

It is well known that


d
  1/2
INTd,γ  = 1 + γd, j K (x, t)ρ(x)ρ(t) d x dt .
j=1 D2

Hence, INTd,γ  ≥ 1 and the absolute error criterion is harder than the normalized
error criterion.
554 P. Siedlecki

Theorem 3 Consider weighted integration problem INTγ for bounded product


weights. If s > 0 and t > 1 then for both the absolute and normalized error cri-
teria INTγ = {INTd,γ } is (s, t)-weakly tractable.

Proof It is well known, see e.g. [4, p. 102], that


⎡ ⎤
d 
1
n(ε, INTγ ) ≤ ⎢
⎢ ε2 (1 + γd, j K (x, x)ρ(x)d x)⎥
⎥.
⎢ j=1 D ⎥

From this it follows that


ln n(ε, INTγ )
0≤ lim ≤
ε−1 +d→∞ ε−s + d t

   d 
2 ln ε−1 D K (x, x)ρ(x)d x j=1 γd, j
lim + ≤
ε−1 +d→∞ ε−s + d t ε−s + d t

   
2 ln ε−1 K (x, x)ρ(x)d x d Γ
lim + D
=0
ε−1 +d→∞ ε−s + d t ε−s + d t

for every s > 0 and t > 1. Hence, we have (s, t)-weak tractability for INTγ . 

From Theorems 1, 2 and 3 we see that strong polynomial, polynomial and weak
tractability for weighted integration requires some decay conditions on product
weights even for specific Hilbert spaces, whereas (s, t)- weak tractability for t > 1,
which is the weakest notion of tractability considered here, holds for all bounded
product weights and for general tensor product Hilbert spaces for which the univariate
reproducing kernel satisfies (6).

Acknowledgments I would like to thank Henryk Woźniakowski for his valuable suggestions. This
project was financed by the National Science Centre of Poland based on the decision number DEC-
2012/07/N/ST1/03200. I gratefully acknowledge the support of ICERM during the preparation of
this manuscript.

References

1. Bakhvalov, N.S.: On the optimality of linear methods for operator approximation in convex
classes of functions. USSR Comput. Math. Math. Phys. 11, 244–249 (1971)
2. Gnewuch, M., Woźniakowski, H.: Quasi-polynomial tractability. J. Complex. 27, 312–330
(2011)
3. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, vol. I. European Mathe-
matical Society, Zürich (2008)
Uniform Weak Tractability of Weighted Integration 555

4. E. Novak, H. Woźniakowski. Tractability of Multivariate Problems Volume II: Standard Infor-


mation for Functionals. European Mathematical Society, Zürich (2010)
5. Siedlecki, P.: Uniform weak tractability. J. Complex. 29, 438–453 (2013)
6. Siedlecki, P., Weimar, M.: Notes on (s, t)-weak tractability: a refined classification of problems
with (sub)exponential information complexity. J. Approx. Theory 200, 227–258 (2015)
Incremental Greedy Algorithm
and Its Applications in Numerical
Integration

Vladimir Temlyakov

Abstract Applications of the Incremental Algorithm, which was developed in the


theory of greedy algorithms in Banach spaces, to approximation and numerical inte-
gration are discussed. In particular, it is shown that the Incremental Algorithm pro-
vides an efficient way for deterministic construction of cubature formulas with equal
weights, which give good rate of error decay for a wide variety of function classes.

Keywords Greedy algorithm · Discrepancy · Approximation

1 Introduction

The paper provides some progress in the fundamental problem of algorithmic con-
struction of good methods of approximation and numerical integration. Numerical
integration seeks good ways of approximating an integral

f (x)dμ
Ω

by an expression of the form


m
Λm ( f, ξ ) := λ j f (ξ j ), ξ = (ξ 1 , . . . , ξ m ), ξ j ∈ Ω, j = 1, . . . , m. (1)
j=1

It is clear that we must assume that f is integrable and defined at the points ξ 1 , . . . , ξ m .
The expression (1) is called a cubature formula (Λ, ξ ) (if Ω ⊂ Rd , d ≥ 2) or a
quadrature formula (Λ, ξ ) (if Ω ⊂ R) with knots ξ = (ξ 1 , . . . , ξ m ) and weights

V. Temlyakov (B)
University of South Carolina, Columbia, SC, USA
e-mail: temlyakovv@gmail.com
V. Temlyakov
Steklov Institute of Mathematics, Moscow, Russia
© Springer International Publishing Switzerland 2016 557
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_30
558 V. Temlyakov

Λ = (λ1 , . . . , λm ). For a function class W we introduce a concept of error of the


cubature formula Λm (·, ξ ) by

Λm (W, ξ ) := sup | f dμ − Λm ( f, ξ )|. (2)
f ∈W Ω

There are many different ways to construct good deterministic cubature formulas
beginning with heuristic guess of good knots for a specific class and ending with find-
ing a good cubature formula as a solution (approximate solution) of the optimization
problem
inf Λm (W, ξ ).
ξ 1 ,...,ξ m ;λ1 ,...,λm

Clearly, the way of solving the above optimization problem is the preferable one.
However, in many cases this problem is very hard (see a discussion in [11]). It was
observed in [10] that greedy-type algorithms provide an efficient way for determin-
istic constructions of good cubature formulas for a wide variety of function classes.
This paper is a follow up to [10]. In this paper we discuss in detail a greedy-type
algorithm—Incremental Algorithm—that was not discussed in [10]. The main advan-
tage of the Incremental Algorithm over the greedy-type algorithms considered in [10]
is that it provides better control of weights of the cubature formula and gives the same
rate of decay of the integration error.
We remind some notations from the theory of greedy approximation in Banach
spaces. The reader can find a systematic presentation of this theory in [12], Chap. 6.
Let X be a Banach space with norm  · . We say that a set of elements (functions) D
from X is a dictionary if each g ∈ D has norm less than or equal to one (g ≤ 1) and
the closure of D coincides with X . We note that in [9] we required in the definition
of a dictionary normalization of its elements (g = 1). However, it is pointed out
in [11] that it is easy to check that the arguments from [9] work under assumption
g ≤ 1 instead of g = 1. In applications it is more convenient for us to have an
assumption g ≤ 1 than normalization of a dictionary.
For an element f ∈ X we denote by Ff a norming (peak) functional for f :

F f  = 1, F f ( f ) =  f .

The existence of such a functional is guaranteed by the Hahn-Banach theorem.


We proceed to the Incremental Greedy Algorithm (see [11] and [12], Chap. 6).
Let ε = {εn }∞ n=1 , εn > 0, n = 1, 2, . . . . For a Banach space X and a dictionary D
define the following algorithm IA(ε) := IA(ε, X, D).
Incremental Algorithm with schedule ε (IA(ε, X, D)). Denote f 0i,ε := f and
i,ε
G 0 := 0. Then, for each m ≥ 1 we have the following inductive definition.
(1) ϕmi,ε ∈ D is any element satisfying

i,ε (ϕ
m − f ) ≥ −εm .
i,ε
F fm−1
Incremental Greedy Algorithm and Its Applications in Numerical Integration 559

(2) Define
i,ε
m := (1 − 1/m)G m−1 + ϕm /m.
G i,ε i,ε

(3) Let
f mi,ε := f − G i,ε
m .

We show how the Incremental Algorithm can be used in approximation and


numerical integration. We begin with a discussion of the approximation problem. A
detailed discussion, including historical remarks, is presented in Sect. 2. For simplic-
ity, we illustrate how the Incremental Algorithm works in approximation of univariate
trigonometric polynomials.
An expression


m
c j g j , g j ∈ D, c j ∈ R, j = 1, . . . , m
j=1

is called m-term polynomial with respect to D. The concept of best m-term approx-
imation with respect to D


m
σm ( f, D) X := inf f − c j g j X
{c j },{g j ∈D}
j=1

plays an important role in our consideration.


By RT (N ) we denote the set of real 1-periodic trigonometric polynomials of
order N and by RT N denote the real trigonometric system

1, cos 2π x, sin 2π x, . . . , cos N 2π x, sin N 2π x.

For a real trigonometric polynomial denote


N 
N
a0 + (ak cos k2π x + bk sin k2π x) A := |a0 | + (|ak | + |bk |).
k=1 k=1

We formulate here a result from [11]. We use the short notation  ·  p :=  ·


 L p ([0,1]) .
Theorem 1 There exists a constructive method A(N , m) such that for any t ∈
RT (N ) it provides an m-term trigonometric polynomial A(N , m)(t) with the fol-
lowing approximation property

t − A(N , m)(t)∞ ≤ Cm −1/2 (ln(1 + N /m))1/2 t A

with an absolute constant C.


560 V. Temlyakov

An advantage of the IA(ε) over other greedy-type algorithms is that the IA(ε) gives
precise control of the coefficients of the approximant. For all approximants G i,ε
m we
have the property G i,ε
m A = 1. Moreover, we know that all nonzero coefficients of
the approximant have the form a/m where a is a natural number. In Sect. 2 we prove
the following result.
Theorem 2 For any t ∈ RT (N ) the IA(ε, L p , RT N ) with an appropriate sched-
ule ε, applied to f := t/t A , provides after m iterations an m-term trigonometric
polynomial G m (t) := G i,ε
m ( f )t A with the following approximation property

t − G m (t)∞ ≤ Cm −1/2 (ln N )1/2 t A , G m (t) A = t A ,

with an absolute constant C.


Comparing Theorems 1 and 2 we see that the error bound in Theorem 1 is better
than in Theorem 2—ln(1 + N /m) versus lnN . It is important in applications in the
m-term approximation of smoothness classes. The proof of Theorem 1 is based on
the Weak Chebyshev Greedy Algorithm (WCGA). The WCGA is the most powerful
and the most popular in applications greedy-type algorithm. Its Hilbert space version
is known in signal processing under the name Weak Orthogonal Matching Pursuit.
For this reason for the reader’s convenience we discuss the WCGA in some detail in
Sect. 2 despite the fact that we do not obtain any new results on the WCGA in this
paper.
We note that the implementation of the IA(ε) depends on the dictionary and the
ambient space X . The IA(ε) from Theorem 2 acts with respect to the real trigono-
metric system 1, cos 2π x, sin 2π x, . . . , cos N 2π x, sin N 2π x in the space X = L p
with p  lnN . Relation p  lnN means that there are two positive constants C1 and
C2 , which do not depend on N , such that C1 N ≤ p ≤ C2 N .
We now proceed to results from Sect. 3 on numerical integration. As in [10] we
define a set Kq of kernels possessing the following properties. Let K (x, y) be
a measurable function on Ωx × Ω y . We assume that for any x ∈ Ωx K (x, ·) ∈
L q (Ω y ), for any y ∈ Ω y the K (·, y) is integrable over Ωx and Ωx K (x, ·)dx ∈
L q (Ω y ), 1 ≤ q ≤ ∞.
For a kernel K ∈ K p we define the class

W pK := { f : f = K (x, y)ϕ(y)dy, ϕ L p (Ω y ) ≤ 1}, 1 ≤ p ≤ ∞.
Ωy

Then each f ∈ W pK is integrable on Ωx (by Fubini’s theorem) and defined at


each point of Ωx . We denote for convenience

J (y) := JK (y) := K (x, y)dx.
Ωx

For p ∈ [1, ∞] denote the dual p := p/( p − 1). Consider a dictionary


Incremental Greedy Algorithm and Its Applications in Numerical Integration 561

D := {K (x, ·), x ∈ Ωx }

and define a Banach space X (K , p ) as the L p (Ω y )-closure of span of D. In Sect. 3


the following theorem is proved.
Theorem 3 Let W pK be a class of functions defined above. Assume that K ∈ K p
satisfies the condition

K (x, ·) L p (Ω y ) ≤ 1, x ∈ Ωx , |Ωx | = 1

and JK ∈ X (K , p ). Then for any m there exists (provided by an appropriate Incre-


mental Algorithm) a cubature formula Λm (·, ξ ) with λμ = 1/m, μ = 1, 2, . . . , m,
and
Λm (W pK , ξ ) ≤ C( p − 1)−1/2 m −1/2 , 1 < p ≤ 2.

Theorem 3 provides a constructive way of finding for a wide variety of classes


W pK cubature formulas that give the error bound similar to that of the Monter Carlo
method. We stress that in Theorem 3 we do not assume any smoothness of the kernel
K (x, y).

2 Approximation by the Incremental Algorithm

First, we discuss the known Theorem 1 from the Introduction. The proof of Theorem
1 is based on a greedy-type algorithm—the Weak Chebyshev Greedy Algorithm. We
now describe it. Let τ := {tk }∞
k=1 be a given sequence of nonnegative numbers tk ≤ 1,
k = 1, . . . . We define (see [9]) the Weak Chebyshev Greedy Algorithm (WCGA)
that is a generalization for Banach spaces of Weak Orthogonal Greedy Algorithm
defined and studied in [8] (see also [12]).
Weak Chebyshev Greedy Algorithm (WCGA). We define f 0c := f 0c,τ := f .
Then for each m ≥ 1 we inductively define
(1) ϕmc := ϕmc,τ ∈ D is any element satisfying

|F fm−1
c (ϕmc )| ≥ tm sup |F fm−1
c (g)|.
g∈D

(2) Define
Φm := Φmτ := span{ϕ cj }mj=1 ,

and define G cm := G c,τ


m to be the best approximant to f from Φm .
(3) Denote
f mc := f mc,τ := f − G cm .

The term “weak” in this definition means that at the step (1) we do not shoot for
the optimal element of the dictionary, which realizes the corresponding supremum,
562 V. Temlyakov

but are satisfied with weaker property than being optimal. The obvious reason for
this is that we do not know in general that the optimal one exists. Another, practical
reason is that the weaker the assumption the easier to satisfy it and, therefore, easier
to realize in practice.
We consider here approximation in uniformly smooth Banach spaces. For a
Banach space X we define the modulus of smoothness

1
ρ(u) := sup ( (x + uy + x − uy) − 1).
x=y=1 2

The uniformly smooth Banach space is the one with the property

lim ρ(u)/u = 0.
u→0

It is well known (see for instance [3], Lemma B.1) that in the case X = L p ,
1 ≤ p < ∞ we have

u p/ p if 1 ≤ p ≤ 2,
ρ(u) ≤ (3)
( p − 1)u /2 if 2 ≤ p < ∞.
2

Denote by A1 (D) := A1 (D, X ) the closure in X of the convex hull of D. The


following theorem from [9] gives the rate of convergence of the WCGA for f in
A1 (D).
Theorem 4 Let X be a uniformly smooth Banach space with the modulus of smooth-
ness ρ(u) ≤ γ u q , 1 < q ≤ 2. Then for t ∈ (0, 1] we have for any f ∈ A1 (D) that

p −1/ p q
 f − G c,τ
m ( f, D) ≤ C(q, γ )(1 + mt ) , p := ,
q −1

with a constant C(q, γ ) which may depend only on q and γ .


In [11] we demonstrated the power of the WCGA in classical areas of harmonic
analysis. The problem concerns the trigonometric m-term approximation in the uni-
form norm. The first result that indicated an advantage of m-term approximation with
respect to the real trigonometric system RT over approximation by trigonometric
polynomials of order m is due to Ismagilov [5]

σm (| sin 2π x|, RT )∞ ≤ Cε m −6/5+ε , for any ε > 0. (4)

Maiorov [6] improved the estimate (4):

σm (| sin 2π x|, RT )∞  m −3/2 . (5)

Both R.S. Ismagilov [5] and V.E. Maiorov [6] used constructive methods to get
their estimates (4) and (5). V.E. Maiorov [6] applied number theoretical methods
Incremental Greedy Algorithm and Its Applications in Numerical Integration 563

based on Gaussian sums. The key point of that technique can be formulated in terms
of best m-term approximation of trigonometric polynomials. Let as above RT (N )
be the subspace of real trigonometric polynomials of order N . Using the Gaussian
sums one can prove (constructively) the estimate

σm (t, RT )∞ ≤ C N 3/2 m −1 t1 , t ∈ RT (N ). (6)

Denote as above


N 
N
a0 + (ak cos k2π x + bk sin k2π x) A := |a0 | + (|ak | + |bk |).
k=1 k=1

We note that by the simple inequality

t A ≤ 2(2N + 1)t1 , t ∈ RT (N ),

the estimate (6) follows from the estimate

σm (t, RT )∞ ≤ C(N 1/2 /m)t A , t ∈ RT (N ). (7)

Thus (7) is stronger than (6). The following estimate was proved in [1]

σm (t, RT )∞ ≤ Cm −1/2 (ln(1 + N /m))1/2 t A , t ∈ RT (N ). (8)

In a way (8) is much stronger than (7) and (6). The proof of (8) from [1] is not
constructive. The estimate (8) has been proved in [1] with the help of a nonconstruc-
tive theorem of Gluskin [4]. In [11] we gave a constructive proof of (8). The key
ingredient of that proof is the WCGA. In the paper [2] we already pointed out that
the WCGA provides a constructive proof of the estimate

σm ( f, RT ) p ≤ C( p)m −1/2  f  A , p ∈ [2, ∞). (9)

The known proofs (before [2]) of (9) were nonconstructive (see discussion in [2],
Sect. 5). Thus, the WCGA provides a way of building a good m-term approximant.
However, the step (2) of the WCGA makes it difficult to control the coefficients of
the approximant—they are obtained through the Chebyshev projection of f onto
Φm . This motivates us to consider the IA(ε) which gives explicit coefficients of the
approximant. We note that the IA(ε) is close to the Weak Relaxed Greedy Algo-
rithm (WRGA) (see [12], Chap. 6). Contrary to the IA(ε), where we build the
mth approximant G m as a convex combination of the previous approximant G m−1
and the newly chosen dictionary element ϕm with a priori fixed coefficients: G m =
(1 − 1/m)G m−1 + ϕm /m, in the WRGA we build G m = (1 − λm )G m−1 + λm ϕm
with λm ∈ [0, 1] chosen from an optimization problem, which depends on f and m.
564 V. Temlyakov

For more detailed comparison of the IA(ε) and the WRGA in application in numerical
integration see [12], pp. 402–403.
Second, we proceed to a discussion and proof of Theorem 2. In order to be able
to run the IA(ε) for all iterations we need existence of an element ϕmi,ε ∈ D at the
step (1) of the algorithm for all m. It is clear that the following condition guarantees
such existence.
Condition B. We say that for a given dictionary D an element f satisfies Condition
B if for all F ∈ X ∗ we have
F( f ) ≤ sup F(g).
g∈D

It is well known (see, for instance, [12], p. 343) that any f ∈ A1 (D) satisfies
Condition B. For completeness we give this simple argument here. Take any f ∈
A1 (D). Then for any ε > 0 there exist g1ε , . . . , g εN ∈ D and numbers a1ε , . . . , a εN
such that aiε > 0, a1ε + · · · + a εN = 1 and


N
f − aiε giε  ≤ ε.
i=1

Thus

N
F( f ) ≤ Fε + F( aiε giε ) ≤ εF + sup F(g)
i=1 g∈D

which proves Condition B.


We note that Condition B is equivalent to the property f ∈ A1 (D). Indeed, as
we showed above, the property f ∈ A1 (D) implies Condition B. Let us show that
Condition B implies that f ∈ A1 (D). Assuming the contrary f ∈ / A1 (D) by the
separation theorem for convex bodies we find F ∈ X ∗ such that

F( f ) > sup F(φ) ≥ sup F(g)


φ∈A1 (D) g∈D

which contradicts Condition B.


We formulate results on the IA(ε) in terms of Condition B because in the appli-
cation from Sect. 3 it is easy to check Condition B.

Theorem 5 Let X be a uniformly smooth Banach space with modulus of smoothness


ρ(u) ≤ γ u q , 1 < q ≤ 2. Define
q
εn := βγ 1/q n −1/ p , p= , n = 1, 2, . . . .
q −1

Then, for every f satisfying Condition B we have

 f mi,ε  ≤ C(β)γ 1/q m −1/ p , m = 1, 2 . . . .


Incremental Greedy Algorithm and Its Applications in Numerical Integration 565

In the case f ∈ A1 (D) this theorem is proved in [11] (see also [12], Chap. 6). As
we mentioned above Condition B is equivalent to f ∈ A1 (D).
We now give some applications of Theorem 5 in the construction of special poly-
nomials. We begin with a general result.

Theorem 6 Let X be a uniformly smooth Banach space with modulus of smoothness


ρ(u) ≤ γ u q , 1 < q ≤ 2. For any n elements ϕ1 , ϕ2 , . . . , ϕn , ϕ j  ≤ 1, j = 1, . . . , n,
there exists a subset Λ ⊂ [1, n] of cardinality |Λ| ≤ m < n and natural numbers a j ,
j ∈ Λ such that

1  aj 
n
 ϕj − ϕ j  X ≤ Cγ 1/q m 1/q−1 , a j = m.
n j=1 j∈Λ
m j∈Λ

Proof For a given set ϕ1 , ϕ2 , . . . , ϕn consider a new Banach space X n := span(ϕ1 , ϕ2 ,


. . . , ϕn ) with norm  ·  X . In the space X n consider the dictionary Dn := {ϕ j }nj=1 .
Then the space X n is a uniformly smooth  Banach space with modulus of smoothness
ρ(u) ≤ γ u q , 1 < q ≤ 2 and f := n1 nj=1 ϕ j ∈ A1 (Dn ). Applying the IA(ε) to f
with respect to Dn we obtain by Theorem 5 after m iterations

m
1
f − ϕ jk  X ≤ Cγ 1/q m 1/q−1 ,
k=1
m

m
where ϕ jk is obtained at the kth iteration of the IA(ε). Clearly, k=1 m ϕ jk
1
can be
 a
written in the form j∈Λ mj ϕ j with |Λ| ≤ m. 

Corollary 1 Let m ∈ N and n = 2m. For any n trigonometric polynomials ϕ j ∈


RT (N ), ϕ j ∞ ≤ 1, j = 1, . . . , n with N ≤ n b , b∈ (0, ∞), there exist a set Λ
and natural numbers a j , j ∈ Λ, such that |Λ| ≤ m, j∈Λ a j = m and

1  aj
n
 ϕj − ϕ j ∞ ≤ C(b)(ln m)1/2 m −1/2 . (10)
n j=1 j∈Λ
m

Proof First, we apply Theorem 6 with X = L p , 2 ≤ p < ∞. Using (3) we get

1  a j ( p) 
n
 ϕj − ϕ j  p ≤ C p 1/2 m −1/2 , a j ( p) = m, (11)
n j=1 j∈Λ( p)
m j∈Λ( p)

with |Λ( p)| ≤ m.


566 V. Temlyakov

Second, by the Nikol’skii inequality (see [7], Chap. 1, S2): for a trigonometric
polynomial t of order N one has

t p ≤ C N 1/q−1/ p tq , 1 ≤ q < p ≤ ∞,

we obtain from (11)


1  a j ( p)
n
 ϕj − ϕ j ∞
n j=1 j∈Λ( p)
m

1  a j ( p)
n
≤ C N 1/ p  ϕj − ϕ j  p ≤ C p 1/2 N 1/ p m −1/2 .
n j=1 j∈Λ( p)
m

Choosing p  lnN  lnm we obtain (10). 

We note that Corollary 1 provides a construction of analogs of the Rudin-Shapiro


polynomials (see, for instance, [12], p.155) in a much more general situation than
in the case of the Rudin-Shapiro polynomials, albeit with a little bit weaker bound,
which contains an extra (lnm)1/2 factor.
Proof of Theorem 2. It is clear that it is sufficient to prove Theorem 2 for
t ∈ RT (N ) with t A = 1. Then t ∈ A1 (RT (N ), L p ) for all p ∈ [2, ∞). Now,
applying Theorem 6 and using its proof with X = L p , ϕ1 , ϕ2 , . . . , ϕn , n = 2N + 1,
being the real trigonometric system 1, cos 2π x, sin 2π x, . . . , cos N 2π x, sin N 2π x,
we obtain that
 aj 
t − ϕ j  p ≤ Cγ 1/2 m −1/2 , a j = m, (12)
j∈Λ
m j∈Λ

 a
where j∈Λ mj ϕ j is the G i,ε
m (t). By (3) we find γ ≤ p/2. Next, by the Nikol’skii
inequality we get from (12)
 aj  aj
t − ϕ j ∞ ≤ C N 1/ p t − ϕ j  p ≤ C p 1/2 N 1/ p m −1/2 .
j∈Λ
m j∈Λ
m

Choosing p  lnN we obtain the desired in Theorem 2 bound.


We point out that the above proof of Theorem 2 gives the following statement.

Theorem 7 Let 2 ≤ p < ∞. For any t ∈ RT (N ) the IA(ε, L p , RT N ) with an


appropriate schedule ε, applied to f := t/t A , provides after m iterations an m-
term trigonometric polynomial G m (t) := G i,ε
m ( f )t A with the following approxi-
mation property

t − G m (t) p ≤ Cm −1/2 p 1/2 t A , G m (t) A = t A ,

with an absolute constant C.


Incremental Greedy Algorithm and Its Applications in Numerical Integration 567

3 Numerical Integration and Discrepancy

For a cubature formula Λm (·, ξ ) we have


 
 m

Λm (W pK , ξ ) = sup | J (y) − λμ K (ξ μ , y) ϕ(y)dy| =
ϕ L p (Ω y ) ≤1 Ωy μ=1


m
= J (·) − λμ K (ξ μ , ·) L p (Ω y ) . (13)
μ=1

Define the error of optimal cubature formula with m knots for a class W

δm (W ) := inf Λm (W, ξ ).
λ1 ,...,λm ;ξ 1 ,...,ξ m

The above identity (13) obviously implies the following relation.


Proposition 1


m
δm (W pK ) = inf J (·) − λμ K (ξ μ , ·) L p (Ω y ) .
λ1 ,...,λm ;ξ 1 ,...,ξ m
μ=1

Thus, the problem of finding the optimal error of a cubature formula with m knots
for the class W pK is equivalent to the problem of best m-term approximation of a
special function J with respect to the dictionary D = {K (x, ·), x ∈ Ωx }.
Consider a problem of numerical integration of functions K (x, y), y ∈ Ω y , with
respect to x, K ∈ Kq :
 
m
K (x, y)dx − λμ K (ξ μ , y).
Ωx μ=1

Definition 1 (K , q)-discrepancy of a cubature formula Λm with knots ξ 1 , . . . , ξ m


and weights λ1 , . . . , λμ is
 
m
D(Λm , K , q) :=  K (x, y)dx − λμ K (ξ μ , y) L q (Ω y ) .
Ωx μ=1

The above definition of the (K , q)-discrepancy implies right a way the following
relation.
568 V. Temlyakov

Proposition 2
inf D(Λm , K , q)
λ1 ,...,λm ;ξ 1 ,...,ξ m


m
= inf J (·) − λμ K (ξ μ , ·) L q (Ω y ) .
λ1 ,...,λm ;ξ 1 ,...,ξ m
μ=1

Therefore, the problem of finding minimal (K , q)-discrepancy is equivalent to


the problem of best m-term approximation of a special function J with respect to
the dictionary D = {K (x, ·), x ∈ Ωx }.
The particular case K (x, y) = χ[0,y] (x) := dj=1 χ[0,y j ] (x j ), y j ∈ [0, 1), j = 1,
. . . , d, where χ[0,y] (x), y ∈ [0, 1) is a characteristic function of an interval [0, y),
leads to a classical concept of the L q -discrepancy.
Proof of Theorem 3. By (13)


m
Λm (W pK , ξ ) = J (·) − λμ K (ξ μ , ·) L p (Ω y ) .
μ=1

We are going to apply Theorem 5 with X = X (K , p ) ⊂ L p (Ω y ), f = JK . We need


to check the Condition B. Let F be a bounded linear functional on L p . Then by the
Riesz representation theorem there exists h ∈ L p such that for any φ ∈ L p

F(φ) = h(y)φ(y)dy.
Ωy

By the Hölder inequality for any x ∈ Ωx we have



|h(y)K (x, y)|dy ≤ h p .
Ωy

Therefore, the functions |h(y)K (x, y)| and h(y)K (x, y) are integrable on Ωx × Ω y
and by Fubini’s theorem
  

F(JK ) = h(y) K (x, y)dx = h(y)K (x, y)dy dx
Ωy Ωx Ωx Ωy


= F(K (x, y))dx ≤ sup F(K (x, y)),
Ωx x∈Ωx

which proves the Condition B. Applying Theorem 5 and taking into account (3) we
complete the proof.
Proposition 2 and the above proof imply the following theorem on (K , q)-
discrepancy.
Incremental Greedy Algorithm and Its Applications in Numerical Integration 569

Theorem 8 Assume that K ∈ Kq satisfies the condition

K (x, ·) L q (Ω y ) ≤ 1, x ∈ Ωx , |Ωx | = 1

and JK ∈ X (K , q). Then for any m there exists (provided by an appropriate Incre-
mental Algorithm) a cubature formula Λm (·, ξ ) with λμ = 1/m, μ = 1, 2, . . . , m,
and
D(Λm , K , q) ≤ Cq 1/2 m −1/2 , 2 ≤ q < ∞.

We note that in the case X = L q ([0, 1]d ), q ∈ [2, ∞), D = {K (x, y), x ∈ [0, 1]d },
f = J (y) the implementation of the IA(ε) is a sequence of maximization steps, when
we maximize functions of d variables. An important advantage of the L q spaces is a
simple and explicit form of the norming functional F f of a function f ∈ L q ([0, 1]d ).
The F f acts as (for real L q spaces)

F f (g) =  f q1−q | f |q−2 f gdy.
[0,1]d

Thus the IA(ε) should find at a step m an approximate solution to the following
optimization problem (over x ∈ [0, 1]d )

i,ε i,ε
| f m−1 (y)|q−2 f m−1 (y)K (x, y)dy → max.
[0,1]d

Acknowledgments Research was supported by NSF grant DMS-1160841.

References

1. DeVore, R.A., Temlyakov, V.N.: Nonlinear approximation by trigonometric sums. J. Fourier


Anal. Appl. 2, 29–48 (1995)
2. Dilworth, S.J., Kutzarova, D., Temlyakov, V.N.: Convergence of some Greedy Algorithms in
Banach spaces. J. Fourier Anal. Appl. 8, 489–505 (2002)
3. Donahue, M., Gurvits, L., Darken, C., Sontag, E.: Rate of convex approximation in non-Hilbert
spaces. Constr. Approx. 13, 187–220 (1997)
4. Gluskin, E.D.: Extremal properties of orthogonal parallelpipeds and their application to the
geometry of Banach spaces. Math USSR Sbornik 64, 85–96 (1989)
5. Ismagilov, R.S.: Widths of sets in normed linear spaces and the approximation of functions by
trigonometric polynomials, Uspekhi Mat. Nauk, 29 (1974), 161–178; English transl. in Russian
Math. Surveys, 29 (1974)
6. Maiorov, V.E.: Trigonometric diameters of the Sobolev classes W pr in the space L q . Math.
Notes 40, 590–597 (1986)
7. Temlyakov, V.N.: Approximation of Periodic Functions, Nova Science Publishers, Inc., New
York (1993)
8. Temlyakov, V.N.: Weak greedy algorithms. Adv. Comput. Math. 12, 213–227 (2000)
9. Temlyakov, V.N.: Greedy algorithms in Banach spaces. Adv. Comput. Math. 14, 277–292
(2001)
570 V. Temlyakov

10. Temlyakov, V.N.: Cubature formulas, discrepancy, and nonlinear approximation. J. Complex.
19, 352–391 (2003)
11. Temlyakov, V.N.: Greedy-type approximation in Banach spaces and applications. Constr.
Approx. 21, 257–292 (2005)
12. Temlyakov, V.N.: Greedy Approximation. Cambridge University Press, Cambridge (2011)
On “Upper Error Bounds for Quadrature
Formulas on Function Classes”
by K.K. Frolov

Mario Ullrich

Abstract This is a tutorial paper that gives the complete proof of a result of Frolov
(Dokl Akad Nauk SSSR 231:818–821, 1976, [4]) that shows the optimal order
of convergence for numerical integration of functions with bounded mixed deriv-
atives. The presentation follows Temlyakov (J Complex 19:352–391, 2003, [13]),
see also Temlyakov (Approximation of periodic functions, 1993, [12]).

Keywords Frolov cubature · Numerical Integration · Sobolev space · Tutorial

1 Introduction

We study cubature formulas for the approximation of the d-dimensional integral



I( f ) = f (x) dx
[0,1]d

for functions f with bounded mixed derivatives. For this, let D α f , α ∈ Nd0 , be the
usual (weak) partial derivative of a function f and define the norm

 f 2s,mix := D α f 2L 2 , (1)
α∈Nd0 : α∞ ≤s

where s ∈ N. In the following we will study the class (or in fact the unit ball)
 
Hds,mix := f ∈ C sd ([0, 1]d ) :  f s,mix ≤ 1 , (2)

i.e. the closure in C([0, 1]d ) (with respect to  · s,mix ) of the set of sd-times con-
tinuously differentiable functions f with  f s,mix ≤ 1. Note that these well-studied
classes of functions often appear with different notations, like M W2s , S2s W or S2s H .

M. Ullrich (B)
Johannes Kepler Universität, 4040 Linz, Austria
e-mail: mario.ullrich@jku.at

© Springer International Publishing Switzerland 2016 571


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_31
572 M. Ullrich

Additionally, we will study the class


 
H̊ds,mix := f ∈ Hds,mix : supp( f ) ⊂ (0, 1)d . (3)

The algorithms under consideration are of the form


n
Qn ( f ) = a j f (x j ) (4)
j=1

j j
for a given set of nodes {x j }nj=1 , x j = (x1 , . . . , xd ) ∈ [0, 1]d , and weigths (a j )nj=1 ,
a j ∈ R, i.e. the algorithm Q n uses at most n function evaluations of the input function.
The worst case error of Q n in the function class H is defined as

e(Q n , H ) = sup |I ( f ) − Q n ( f )|.


f ∈H

We will prove the following theorem, which is Theorem 2 of [4].

Theorem 1 Let s, d ∈ N. Then there exists a sequence of algorithms (Q n )n∈N such


that
e(Q n , H̊ds,mix ) ≤ Cs,d n −s (log n) 2 ,
d−1

where Cs,d may depend on s and d.

Using standard techniques, see e.g. [11, Sect. 2.12] or [13, Theorem 1.1], one can
deduce (constructively) from the algorithm that is used to prove Theorem 1 a cubature
rule for the non-periodic classes Hds,mix that has the same order of convergence. More
precisely, one uses a properly chosen mapping, say M, which maps Hds,mix to H̊ds,mix
and preserves the integral. Then, the cubature rule applied to M f gives the optimal
order as long as M has bounded norm. Such mappings (in a more general setting)
will be analyzed in [8].
This results in the following corollary.

Corollary 1 Let s, d ∈ N. Then there exists a sequence of algorithms (Q n )n∈N such


that
s,d n −s (log n) d−1
e(Q n , Hds,mix ) ≤ C 2 ,

s,d may depend on s and d.


where C

The proof of Theorem 1, and hence also of Corollary 1, is constructive, i.e. we


will show how to construct the nodes and weights of the used algorithms.

Remark 1 The upper bounds of Theorem 1 and Corollary 1 that will be proven in the
next section for a specific algorithm, see (10), are best possible in the sense of the
order of convergence. That is, there are matching lower bounds that hold for arbitrary
cubature rules that use only function values, see e.g. [13, Theorem 3.2].
On “Upper Error Bounds for Quadrature Formulas on Function Classes” … 573

Remark 2 There is a natural generalization of the spaces H̊ds,mix , say H̊d,


s,mix
p , where
the L 2 -norm in (1) is replaced by an L p -norm, 1 < p < ∞. The same lower bounds
as mentioned in Remark 1 are valid also in this case, see [13, Theorem 3.2]. Obviously,
the upper bounds from Theorem 1 hold for these spaces if p ≥ 2, since the spaces get
smaller for larger p. For 1 < p < 2 it was proven by Skriganov [10, Theorem 2.1]
that the same algorithm satisfies the optimal order. We refer to [13] and references
therein for more details on this and the more delicate case p = 1.

Remark 3 Besides the cubature rule of Frolov that is analyzed in this paper, there
are several other constructions. Two prominent examples are the Smolyak algorithm
and (higher order) digital nets, see [9, Chap. 15] and [1], respectively. However, it is
proven that the Smolyak algorithm cannot achieve the optimal order of convergence
for the function classes under consideration, see [2, Theorem 5.2], and that the upper
bounds on the error for digital nets are (at the moment) restricted to small smoothness,
see e.g. [6]. In this sense Frolov’s cubature is universal, i.e. the same cubature rule
gives the optimal order of convergence for every choice of the parameters s and d.
This is also true in the more general setting of Besov and Triebel-Lizorkin spaces,
see [14].

2 Proof of Theorem 1

2.1 The Algorithm

We start with the construction of the nodes of our cubature rule. See Sloan and Joe [11]
for a more comprehensive introduction to this topic. In the setting of Theorem 1 the
set X ⊂ [0, 1)d of nodes will be a subset of a lattice X ⊂ Rd , i.e. x, y ∈ X implies
x ± y ∈ X. In fact, we take all points inside the unit cube.
The lattice X will be “d-dimensional”, i.e. there exists a non-singular matrix
T ∈ Rd×d such that  
X := T (Zd ) = T x : x ∈ Zd . (5)

The matrix T is called the generator of the lattice X. Obviously, every multiple of
X, i.e. cX for some c ∈ R, is again a lattice and note that while X is a lattice, it is
not necessarily an integration lattice, i.e. in general we do not have X ⊃ Zd .
In the following we will fix a generator T and consider all points inside the cube
[0, 1)d of the shrinked lattice a −1 T (Zd ), a > 1, as nodes for our cubature rule for
functions from H̊ds,mix . That is, we will use the set of points
 
X ad := a −1 X ∩ [0, 1)d , a > 1, (6)

where X is given by (5).


574 M. Ullrich

For the construction of the nodes it remains to present a specific generator matrix
T that is suitable for our purposes. For this, define the polynomials


d
 
Pd (t) := t − 2 j + 1 − 1, t ∈ R. (7)
j=1

Obviously, the polynomial Pd has only integer coefficients, and it is easy to check
that it is irreducible1 (over Q) and has d different real roots. Let ξ1 , . . . , ξd ∈ R be
the roots of Pd . Using these roots we define the d × d-matrix B by

 d
d
j−1
B = Bi, j i, j=1 := ξi . (8)
i, j=1

This matrix is a Vandermonde matrix and hence invertible and we define the generator
matrix of our lattice by
T = (B
)−1 , (9)

where B
is the transpose of B. It is well known that X∗ := B(Zd ) is the dual lattice
associated with X = T (Zd ), i.e. y ∈ X∗ if and only if x, y ∈ Z for all x ∈ X.
We define the cubature rule for functions f from H̊ds,mix by

Q̊ a ( f ) := a −d det(T ) f (x), a > 1. (10)
x∈X ad

In the next subsection we will prove that Q̊ a has the optimal order of convergence
for H̊ds,mix .
Note that Q̊ a ( f ) uses |X ad | function values of f and that the weights of this
algorithm are equal, but do not (in general) sum up to one, i.e. Q̊ a is not a quasi-
Monte Carlo method. While the number |X ad | of points can be estimated in terms of the
determinant of the corresponding generator matrix, it is in general not equal. In fact, if
a −1 X would be an integration lattice, then it is well known that |X ad | = a d det(T −1 ),
see e.g. [11]. For the general lattices that we consider, we know, however, that these
numbers are of the same order, see Skriganov [10, Theorem 1.1].2
Lemma 1 Let X = T (Zd ) ⊂ Rd be a lattice with generator T of the form (9), and
let X ad be given by (6). Then there exists a constant C T that is independent of a such
that d  
|X | − a d det(T −1 ) ≤ C T lnd−1 1 + a d
a

1A polynomial P is called irreducible over Q if P = G H for two polynomials G, H with rational


coefficients implies that one of them has degree zero. This implies that all roots of P must be irra-

tional. In fact, every polynomial of the form dj=1 (x − b j ) − 1 with different b j ∈ Z is irreducible,
but has not necessarily d different real roots.
2 Skriganov proved this result for admissible lattices. The required property will be proven in

Lemma 3, see also [10, Lemma 3.1(2)].


On “Upper Error Bounds for Quadrature Formulas on Function Classes” … 575

for all a > 1. In particular, we have

|X ad |
lim = 1.
a→∞ a d det(T −1 )

Remark 4 It is still not clear if the corresponding QMC algorithm, i.e. the cubature
rule (10) with a −d det(T ) replaced by |X ad |−1 , has the same order of convergence. If
true, this would imply the optimal order of the L p -discrepancy, p < ∞, of a (deter-
ministic) modification of the set X ad , see [5, 10]. We leave this as an open problem. In
fact, Skriganov [10, Corollary 2.1] proved that for every a > 0 there exists a vector
z a ∈ Rd such that the translated set X ad − z a satisfies the above conditions.
In the remaining subsection we prove the crucial property of these nodes. For
this we need the following corollary of the Fundamental Theorem of Symmetric
Polynomials, see, [3, Theorem 6.4.2].

Lemma 2 Let P(x) = dj=1 (x − ξ j ) and G(x1 , . . . , xd ) be polynomials with inte-
ger coefficients. Additionally, assume that G(x1 , . . . , xd ) is symmetric in x1 , . . . , xd ,
i.e. invariant under permutations of x1 , . . . , xd . Then, G(ξ1 , . . . , ξd ) ∈ Z.
We obtain that the elements of the dual lattice B(Zd ) satisfy the following.

Lemma 3 Let 0 = z = (z 1 , . . . , z d ) ∈ B(Zd ) with B from (8). Then, dj=1 z i ∈
Z \ 0.
Proof Fix m = (m 1 , . . . , m d ) ∈ Zd such that Bm = z. Hence,


d
j−1
zi = m j ξi
j=1


depends only on ξi . This implies that dj=1 z i is a symmetric polynomial in ξ1 , . . . , ξd

with integer coefficients. By Lemma 2, we have dj=1 z i ∈ Z.
dIt remains to prove z i = 0 for i = 1, . . . , d. Define the polynomial R1 (x) :=
j=1 m j x j−1
and assume that z  = R1 (ξ ) = 0 for some  = 1, . . . , d. Then there
exist unique polynomials G and R2 with rational coefficients such that

Pd (x) = G(x)R1 (x) + R2 (x),

where degree(R2 ) < degree(R1 ). By assumption, R2 (ξ ) = 0. If R2 ≡ 0 this is a


contradiction to the irreducibility of Pd . If not, divide Pd by R2 (instead of R1 ).
Iterating this procedure, we will eventually find a polynomial R ∗ with degree(R ∗ ) >
0 (since it has a root) and rational coefficients that divides Pd : a contradiction to the
irreducibility. This completes the proof of the lemma. 
We finish the subsection with a result on the maximal number of nodes in the dual
lattice that lie in an axis-parallel box of fixed volume.
576 M. Ullrich

Corollary 2 Let B be the matrix from (8) and a > 0. Then, for each axis-parallel
box Ω ⊂ Rd we have

a B(Zd ) ∩ Ω ≤ a −d vold (Ω) + 1.

Proof Assume first that vold (Ω) < a d . If Ω contains 2 different points z, z  ∈
a B(Zd ), then, using that this implies z  = z − z  ∈ a B(Zd ), we obtain


d
d
vold (Ω) ≥ |z i − z i | = |z i | ≥ a d
i=1 i=1

from Lemma 3: a contradiction. For vold (Ω) ≥ a d we divide Ω along one coordinate
in a −d vold (Ω) + 1 equal pieces, i.e. pieces with volume less than a d , and use the
same argument as above. 

Remark 5 Although we focus in the following on the construction of nodes that


is based on the polynomial Pd from (7), the same construction works with any
irreducible polynomial of degree d with d different real roots and leading coefficient
1, cf. [12, Section 4.4]. For example, if the dimension is a power of 2, i.e. d = 2k for
some k ∈ N, we can be even more specific. In this case we can choose the polynomial


Pd∗ (x) = 2 cos d · arccos(x/2) ,

cf. the Chebyshev polynomials. The roots of this polynomial are given by
 
π(2i − 1)
ξi = 2 cos , i = 1, . . . , d.
2d

Hence, the construction of the lattice X that is based on this polynomial is completely
explicit. For a suitable polynomial if 2d + 1 is prime, see [7]. We didn’t try to find
a completely explicit construction in the intermediate cases.

2.2 The Error Bound

In this subsection we prove that the algorithm Q̊ a from (10) has the optimal order of
convergence for functions from H̊ds,mix , i.e. that

e( Q̊ a , H̊ds,mix ) ≤ Cs,d n −s (log n)


d−1
2 ,

where n = n(a, T ) := |X ad | is the number of nodes used by Q̊ a and Cs,d is indepen-


dent of n.
On “Upper Error Bounds for Quadrature Formulas on Function Classes” … 577

For this we need the following two lemmas. Recall that the Fourier transform of
an integrable function f ∈ L 1 (Rd ) is given by

fˆ(y) := f (x) e−2π i y,x dx, y ∈ Rd ,
Rd

d
with y, x := j=1 y j x j . Furthermore, let
 s 

d  
d
νs (y) = |2π y j | 2
= |2π y j |2α j , y ∈ Rd . (11)
j=1 =0 α∈Nd0 : α∞ ≤s j=1

Clearly,
2
 
d

νs (y)| fˆ(y)|2 = (−2π i y ) αj
f (x) e −2π i y,x
dx
d j
α∈Nd0 : α∞ ≤s
R j=1
 2

= D α f (y)
α∈Nd0 : α∞ ≤s

for all f ∈ H̊ds,mix .


Throughout the rest of the paper we study only functions from H̊ds,mix . Since their
supports are contained strictly inside the unit cube, we can identify each function
f ∈ H̊ds,mix with its continuation to the whole space by zero, i.e. we set f (x) = 0 for
x∈ / [0, 1]d .
We begin with the following result on the sum of values of the Fourier transform.
−1

Lemma 4 Let B ∈ Rd×d be an invertible


 matrix, T = (B
 ) and define the number

M B := # m ∈ Z : B ([0, 1] ) ∩ m + (0, 1) = ∅ . Then, for each f ∈ H̊ds,mix ,


d d d

s ∈ N, we have  
 MB
νs (z)| fˆ(z)|2 ≤  f 2s,mix .
det(B)
d z∈B(Z )

Proof Let Γs := {α ∈ Nd0 : α∞ ≤ s} and define the function



g(x) := f (T (m + x)), x ∈ [0, 1]d .
m∈Zd

Clearly, at most M B of the summands are not zero and g is 1-periodic. Hence, we
obtain by Parseval’s identity and Jensen’s inequality that
578 M. Ullrich

   2    2

νs (z)| fˆ(z)|2 = Dα f (z) = D α f (x) e−2π i By,x dx

α∈Γs z∈B(Zd ) α∈Γs y∈Zd Rd
z∈B(Zd )
   2

= det(T )2 D α f (T x) e−2π i y,x dx

α∈Γs y∈Zd Rd
2

   
= det(T ) 2 D α
f (T (m + x)) e −2π i y,x
dx


α∈Γs y∈Zd m∈Zd [0,1]d
  
2
= det(T )2 D α g(x) e−2π i y,x dx

α∈Γs y∈Zd [0,1]d

 α
= det(T )2 D g(x) 2 dx
α∈Γs [0,1]d
2

1  α
= det(T )2 M B2 D f (T (m + x)) dx
M
α∈Γs [0,1]d B m∈Zd
 1  α 2
≤ det(T )2 M B2 D f (T (m + x)) dx
[0,1]d MB
α∈Γs m∈Z
d

 α
= det(T )2 M B D f (T x) 2 dx = det(T ) M B  f 2
s,mix
α∈Γs Rd

as claimed. 

Additionally, we need the following version of the Poisson summation formula


for lattices.

Lemma 5 Let X = T (Zd ) ⊂ Rd be a full-dimensional lattice and X∗ ⊂ Rd be the


associated dual lattice. Additionally, let f ∈ H̊ds,mix , s ∈ N. Then,
 
det(T ) f (x) = fˆ(y).
x∈X∩[0,1)d y∈X∗

In particular, the right-hand-side is convergent.

Proof Let g(x) = f (T x), x ∈ Rd . Then, by the definition of the lattice, we have
   
f (x) = f (x) = f (T x) = g(x).
x∈X∩[0,1)d x∈X x∈Zd x∈Zd

Additionally, note that B = (T


)−1 is the generator of X∗ and hence
On “Upper Error Bounds for Quadrature Formulas on Function Classes” … 579

   
x
fˆ(y) = fˆ(By) = f (x) e−2π i By,x dx = f (x) e−2π i y,B dx
y∈X∗ Rd Rd
y∈Zd y∈Zd y∈Zd
 
= det(T ) f (T z) e−2π i y,z dz = det(T ) g(z) e−2π i y,z dz
Rd Rd
y∈Zd y∈Zd

= det(T ) ĝ(y),
y∈Zd

where we performed the substitution x = T z. (Here, we need that the lattice is full-
dimensional.) In particular, the series on the left hand side converges if and only
if the right hand side does. For the proof of this convergence note that f ∈ H̊ds,mix ,
s ≥ 1, implies g1,mix ≤ gs,mix < ∞. We obtain by Lemma 4 that

ν1 (y)|ĝ(y)|2 ≤ M B g21,mix < ∞
y∈Zd

with M B from Lemma 4, since supp(g) ⊂ T −1 ([0, 1]d ) = B


([0, 1]d ). Hence,
⎛ ⎞1/2 ⎛ ⎞1/2
  
|ĝ(y)| ≤ ⎝ |ν1 (y)|−1 ⎠ ⎝ ν1 (y) |ĝ(y)|2 ⎠ < ∞,
y∈Zd y=0 y=0

which proves the convergence. We finish the proof of Lemma 5 by


 
ĝ(y) = g(z) e−2π i y,z dz
Rd
y∈Zd y∈Zd
  
= g(m + z) e−2π i y,z dz = g(m).
[0,1]d
y∈Zd m∈Zd m∈Zd

The
 last equality is simply d the evaluation of the Fourier series of the function
m∈Zd g(m + x), x ∈ [0, 1] , at the point x = 0. It follows from the absolute con-
vergence of the left hand side that this Fourier series is pointwise convergent. 

By Lemma 5 we can write the algorithm Q̊ a , a > 1, as


 
Q̊ a ( f ) = a −d det(T ) f (x) = fˆ(z), f ∈ H̊ds,mix ,
x∈X ad z∈a B(Zd )

where a B (see (8)) is the generator of the dual lattice of a −1 T (Zd ) (see (9)) and
X ad = (a −1 X) ∩ [0, 1)d . Since I ( f ) = fˆ(0) we obtain
580 M. Ullrich


 

|I ( f ) − Q̊ a ( f )| = f (z) ≤
ˆ |νs (z)|−1/2 νs (z)1/2 fˆ(z)
z∈a B(Zd )\0 z∈a B(Zd )\0
⎛ ⎞1/2 ⎛ ⎞1/2
 
≤ ⎝ |νs (z)|−1 ⎠ ⎝ νs (z) | fˆ(z)|2 ⎠ .
z∈a B(Zd )\0 z∈a B(Zd )\0

with νs from (11). We bound both sums separately. First, note that Lemma 4 implies
that

νs (z) | fˆ(z)|2 ≤ C(a, B)  f 2s,mix
z∈a B(Zd )\0

with C(a, B) := det(a B)−1 Ma B . Using that B


([0, 1]d ) is Jordan measurable, we
obtain lima→∞ C(a, B) = 1 and, hence, for a > 1 large enough,

νs (z) | fˆ(z)|2 ≤ 2 f 2s,mix . (12)
z∈a B(Zd )\0

This follows from the fact that Ma B is the number of unit cubes that are necessary
to cover the set a B
([0, 1]d ), and det(a B) is its volume.
Now we treat the first sum. Define, for m = (m 1 , . . . , m d ) ∈ Nd0 , the sets

ρ(m) := {x ∈ Rd : 2m j −1  ≤ |x j | < 2m j for j = 1, . . . , d}.



and note that dj=1 |x j | < 2m1 for all x ∈ ρ(m). Recall from Lemma 3 that
d d
j=1 z j ∈ Z \ 0 for all z ∈ B(Z ) \ 0 and, consequently, j=1 |z j | ≥ a for z ∈
d d

a B(Z ) \ 0. This shows that |(a B(Z ) \ 0) ∩ ρ(m)| = 0 for all m ∈ N0 with m1 <
d d d

d log2 (a) =: r . Hence, with |z̄| := dj=1 max{1, 2π |z j |}, we obtain

  ∞
  
|νs (z)|−1 ≤ |z̄|−2s = |z̄|−2s .
z∈a B(Zd )\0 z∈a B(Zd )\0 =r m:m1 = z∈(a B(Zd )\0)∩ρ(m)


Note that for z ∈ ρ(m) we have |z̄| ≥ dj=1 max{1, 2π 2m j −1 } ≥ 2m1 . Since ρ(m)
is a union of 2d axis-parallel boxes each with volume less than 2m1 , Corollary 2
implies that (a B(Zd ) \ 0) ∩ ρ(m) ≤ 2d (a −d 2m1 + 1) ≤ 2d+2 2m1 −r for m with
 
m1 ≥ r . Additionally, note that {m ∈ Nd0 : m1 = } = d+−1 
< ( + 1)d−1 .
We obtain
On “Upper Error Bounds for Quadrature Formulas on Function Classes” … 581

 ∞
 
|νs (z)|−1 ≤ (a B(Zd ) \ 0) ∩ ρ(m) 2−2sm1
z∈a B(Zd )\0 =r m:m1 =

 
≤ 2d+2 2m1 −r 2−2sm1
=r m:m1 =

 ∞

≤ 2d+2 ( + 1)d−1 2−r 2−2s = 2d+2 (t + r + 1)d−1 2t 2−2s(t+r )
=r t=0
∞  d−1
  d−1  t +2
< 2d+2 2−2sr log2 a d 1+ 2(1−2s)t
t=0
d log2 (a)


  d−1
≤ 2d+2 a −2sd log2 a d e(t+2)/ log2 (a) 2(1−2s)t
t=0

where we have used that d log2 (a) ≤ r < d log2 (a) + 1. Clearly, the last series con-
verges iff a > e1/(2s−1) and, in particular, it is bounded by 23 for a ≥ e2 and all
s ∈ N.
So, all together
   d−1
e( Q̊ a , H̊ds,mix ) ≤ 2d/2+3 a −sd log2 a d 2 (13)

for a > 1 large enough. From Lemma 1 we know that the number of nodes used by
Q̊ a is proportional to a d . This proves Theorem 1.

Remark 6 It is interesting to note that the proof of Theorem 1 is to a large extent


independent of the domain of integration. For an arbitrary Jordan measurable set
Ω ⊂ Rd we can consider  −1the algorithm
 Q̊ a from (10) with the set of nodes X ad
replaced by X a (Ω) = a T (Z ) ∩ Ω. The only difference in the estimates would
d d

be that C(a, B), cf. (12), converges to vold (Ω) instead of 1.

References

1. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy theory and quasi-Monte
Carlo integration. Cambridge University Press, Cambridge (2010)
2. Dung, D., Ullrich, T.: Lower bounds for the integration error for multivariate functions
with mixed smoothness and optimal Fibonacci cubature for functions on the square, Math.
Nachrichten (2015) (to appear)
3. Fine, B., Rosenberger, G.: The fundamental theorem of algebra. Springer-Verlag, New York,
Undergraduate Texts in Mathematics (1997)
4. Frolov, K.K.: Upper error bounds for quadrature formulas on function classes. Dokl. Akad.
Nauk SSSR 231, 818–821 (1976)
5. Frolov, K.K.: Upper bound of the discrepancy in metric L p , 2 ≤ p < ∞. Dokl. Akad. Nauk
SSSR 252, 805–807 (1980)
582 M. Ullrich

6. Hinrichs, A., Markhasin, L., Oettershagen, J., Ullrich, T.: Optimal quasi-Monte Carlo rules
on higher order digital nets for the numerical integration of multivariate periodic functions.
e-prints (2015)
7. Lee, C.-L., Wong, K.B.: On Chebyshev’s polynomials and certain combinatorial identities.
Bull. Malays. Math. Sci. Soc. 2(34), 279–286 (2011)
8. Nguyen, V.K., Ullrich, M. Ullrich, T.: Change of variable in spaces of mixed smoothness and
numerical integration of multivariate functions on the unit cube (2015) (preprint)
9. Novak, E., Woźniakowaski, H.: Tractability of Multivariate Problems, Volume II: Standard
Information for Functionals EMS Tracts in Mathematics, Vol. 12, Eur. Math. Soc. Publ. House,
Zürich (2010)
10. Skriganov, M.M.: Constructions of uniform distributions in terms of geometry of numbers.
Algebra i Analiz 6, 200–230 (1994)
11. Sloan, I.H., Joe, S.: Lattice Methods for Multiple Integration. Oxford Science Publications,
New York (1994)
12. Temlyakov, V.N.: Approximation of Periodic Functions. Computational Mathematics and
Analysis Series. Nova Science Publishers Inc, NY (1993)
13. Temlyakov, V.N.: Cubature formulas, discrepancy, and nonlinear approximation. J. Complex.
19, 352–391 (2003)
14. Ullrich, M., Ullrich, T.: The role of Frolov’s cubature formula for functions with bounded
mixed derivative, SIAM J. Numer. Anal. 54(2), 969–993 (2016)
Tractability of Function Approximation
with Product Kernels

Xuan Zhou and Fred J. Hickernell

Abstract This article studies the problem of approximating functions belonging to


a Hilbert space Hd with a reproducing kernel of the form


d
 
d (x, t) :=
K 1 − α2 + α2 K γ (x , t ) for all x, t ∈ Rd .
=1

The α ∈ [0, 1] are scale parameters, and the γ > 0 are sometimes called shape para-
meters. The reproducing kernel K γ corresponds to some Hilbert space of functions
defined on R. The kernel K d generalizes the anisotropic Gaussian reproducing ker-
nel, whose tractability properties have been established in the literature. We present
sufficient conditions on {α γ }∞
=1 under which function approximation problems on
Hd are polynomially tractable. The exponent of strong polynomial tractability arises
from bounds on the eigenvalues of positive definite linear operators.

Keywords Function approximation · Tractability · Product kernels

1 Introduction

This article addresses the problem of function approximation. In a typical application


we are given data of the form yi = f (x i ) or yi = L i ( f ) for i = 1, . . . , n. That is,
a function f is sampled at the locations {x 1 , . . . , x n }, usually referred to as the data

X. Zhou (B) · F.J. Hickernell


Department of Applied Mathematics, Illinois Institute of Technology,
Room E1-232, 10 W. 32nd Street, Chicago, IL 60616, USA
e-mail: xzhou23@hawk.iit.edu
F.J. Hickernell
e-mail: hickernell@iit.edu

© Springer International Publishing Switzerland 2016 583


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_32
584 X. Zhou and F.J. Hickernell

sites or the design, or more generally we know the values of n linear functionals
L 1 , . . . , L n applied to f . Here we assume that the domain of f is a subset of Rd . The
goal is to construct An f , a good approximation to f that is inexpensive to evaluate.
Algorithms for function approximation based on symmetric positive definite ker-
nels have arisen in both the numerical computation literature [3, 5, 13, 18], and the
statistical learning literature [1, 4, 7, 12, 14–17]. These algorithms go by a variety
of names, including radial basis function methods [3], scattered data approximation
[18], meshfree methods [5], (smoothing) splines [17], kriging [15], Gaussian process
models [12] and support vector machines [16].
Many kernels commonly used in practice are associated with a sequence of shape
parameters γ = {γ }∞ =1 , which allows more flexibility in the function approxima-
tion problem. Examples of such kernels include the Matérn, the multiquadrics, the
inverse multiquadrics, and the extensively studied Gaussian kernel (also known as the
squared exponential kernel). The anisotropic stationary Gaussian kernel, is given by


d
d (x, t) := e−γ12 (x1 −t1 )2 − ··· −γd2 (xd −td )2 =
K e−γ (x −t )
2 2
for all x, t ∈ Rd , (1)
=1

where γ is a positive shape parameter corresponding to the variable x . Choosing a


small γ has a beneficial effect on the rate of decay of the eigenvalues of the Gaussian
kernel. The optimal choice of γ is application dependent and much work has been
spent on the quest for the optimal shape parameter. Note that taking γ = γ for all 
recovers the isotropic Gaussian kernel.
For the Gaussian kernel (1), convergence rates with polynomial tractability results
are established in [6]. These rates are summarized in Table 1. Note that the error of
an algorithm An in this context is the worst case error based on the following L2
criterion:
 1/2
e wor
(An ) := sup  f − An f L2 ,  f L2 := f (t) ρd (t) dt
2
, (2)
 f Hd ≤1 Rd

where ρd is a probability density function with independent marginals, namely


ρd (x) = ρ1 (x1 ) · · · ρ1 (xd ). For real q, the notation  n q (with n → ∞ implied)
means that for all δ > 0 the quantity is bounded above by Cδ n q+δ for all n > 0,
where Cδ is some positive constant that is independent of the sample size, n, and the
dimension, d, but may depend on δ. The notation  n q is defined analogously, and
means that the quantity is bounded from below by Cδ n q−δ for all δ > 0. The notation
n q means that the quantity is both  n q and  n q . The term r (γ ) appearing in
Table 1 denotes the rate of convergence to zero of the shape parameter sequence γ
and is defined by
Tractability of Function Approximation with Product Kernels 585

Table 1 Error decay rates for the Gaussian kernel as a function of sample size n
Data available Absolute error criterion Normalized error criterion
Linear functionals n − max(r (γ ),1/2) n −r (γ ) , if r (γ ) > 0
Function values  n − max(r (γ )/[1+1/(2r (γ ))],1/4)  n −r (γ )/[1+1/(2r (γ ))] , if r (γ ) > 1/2




1/β
r (γ ) := sup β > 0

γ < ∞ . (3)
=1

The kernel studied in this article has the more general product form given below:


d
K d,α,γ (x, t) :=
d (x, t) = K α ,γ (x , t ) for all x, t ∈ Rd ,
K (4)
=1

where 0 ≤ α ≤ 1, γ > 0 and

α,γ (x, t) := 1 − α 2 + α 2 K γ (x, t),


K x, t ∈ R. (5)

We assume that we know the eigenpair expansion of the kernel K γ for univariate
functions in terms of its shape parameter γ . Many kernels in the numerical integration
and approximation literature take the form of (4), where α governs the vertical scale
of the kernel across the th dimension. In particular, taking α = 1 for all  and
K γ (x, t) = exp(−γ 2 (x − t)2 ) recovers the anisotropic Gaussian kernel (1).
The goal of this paper is to extend the results in Table 1 to the kernel in (4). In
essence we are able to replace r (γ ) by r̃ (α, γ ), defined as



∞  

r̃ (α, γ ) := sup β > 0


(α γ ) < ∞ = r {α γ }∈N ,
1/β
(6)
=1

with the convention that the supremum of the empty set is taken to be zero.
The known eigenpair expansion of K γ does not give us explicit formulae for the
eigenvalues and eigenfunctions of the kernel K α,γ is a convex
α,γ . However, since K
combination of the constant kernel and a kernel with a known eigenpair expansion,
we can derive upper and lower bounds on the eigenvalues of K α,γ by approximat-
ing the corresponding linear operators by finite rank operators and applying some
inequalities for eigenvalues of matrices. These bounds then imply bounds on the
eigenvalues of K d , which is of tensor product form. Bounds on the eigenvalues of

K d lead to tractability results for function approximation on Hd .
586 X. Zhou and F.J. Hickernell

2 Function Approximation

2.1 Reproducing Kernel Hilbert Spaces

Let Hd = H ( K d ) denote a reproducing kernel Hilbert space of real functions


defined on R . The goal is to approximate any function in Hd given a finite number
d

of data. The reproducing kernel K d : Rd × Rd → R is symmetric and positive


definite. It takes the form (4), where K γ satisfies the unit trace condition:

K γ (t, t) ρ1 (t) dt = 1 for all γ > 0. (7)
R

This condition implies that Hd is continuously embedded in the space L2 =


L2 (Rd , ρd ) of square Lebesgue integrable functions, where the L2 norm was defined
in (2). Continuous embedding means that Id f L2 =  f L2 ≤ Id   f Hd for all
f ∈ Hd .
Functions in Hd are approximated by linear algorithms of the form


n
(An f ) (x) := L j ( f )a j (x) for all f ∈ Hd , x ∈ Rd ,
j=1

for some continuous linear functionals L j ∈ Hd∗ , and functions a j ∈ L2 . Note that
for known functions a j , the cost of computing An f is O(n), if we do not consider
the cost of generating the data samples L j ( f ). The linear functionals, L j , used by an
algorithm An may either come from the class of arbitrary bounded linear functionals,
Λall = Hd∗ , or from the class of function evaluations, Λstd . The nth minimal worst
case error over all possible algorithms is defined as

ewor−ϑ (n, Hd ) := inf ewor (An ) ϑ ∈ {std, all}.


An with L j ∈Λϑ

2.2 Tractability

While typical numerical analysis focuses on the rate of convergence, it does not take
into consideration the effects of d. The study of tractability arises in information-
based complexity and it considers how the error depends on the dimension, d, as
well as the number of data, n.
In particular, we would like to know how ewor−ϑ (n, Hd ) depends not only on n
but also on d. Because of the focus on d-dependence, the absolute and normalized
error criteria mentioned in Table 1 may lead to different answers. For a given positive
ε ∈ (0, 1) we want to find an algorithm An with the smallest n for which the error does
Tractability of Function Approximation with Product Kernels 587

not exceed ε for the absolute error criterion, and does not exceed ε ewor−ϑ (0, Hd ) =
ε Id  for the normalized error criterion. That is,
  
ε, ψ = abs,
n wor−ψ−ϑ
(ε, Hd ) = min n | e wor−ϑ
(n, Hd ) ≤ .
ε Id , ψ = norm,

Let I = {Id }d∈N denote the sequence of function approximation problems. We


say that I is polynomially tractable if and only if there exist numbers C, p and q
such that

n wor−ψ−ϑ (ε, Hd ) ≤ C d q ε− p for all d ∈ N and ε ∈ (0, 1). (8)

If q = 0 above then we say that I is strongly polynomially tractable and the infimum
of p satisfying the bound above is the exponent of strong polynomial tractability.
The essence of polynomial tractability is to guarantee that a polynomial number
of linear functionals is enough to solve the function approximation problem up to an
error at most ε. Obviously, polynomial tractability depends on which class, Λall or
Λstd , is considered and whether the absolute or normalized error is used.
The property of strong polynomial tractability is especially challenging since then
the number of linear functionals needed for an ε-approximation is independent of d.
Nevertheless, we provide here positive results on strong polynomial tractability.

3 Eigenvalues for the General Kernel

d as
Let us define the linear operator corresponding to any kernel K

Wf = d (·, t)ρd (t) dt for all f ∈ Hd .
f (t) K
Rd

It is known that W is self-adjoint and positive definite if K d is a positive definite


kernel. Moreover (7) implies that W is compact. Let us define the eigenpairs of W
by (λd, j , ηd, j ), where the eigenvalues are ordered, λd,1 ≥ λd,2 ≥ · · · , and

W ηd, j = λd, j ηd, j with ηd, j , ηd,i Hd = δi, j for all i, j ∈ N.

Note also that for any f ∈ Hd we have

f, ηd, j L2 = λd, j f, ηd, j Hd .

Taking f = ηd,i we see that {ηd, j } is a set of orthogonal functions in L2 . Letting

−1/2
ϕd, j = λd, j ηd, j for all j ∈ N,
588 X. Zhou and F.J. Hickernell

we obtain an orthonormal sequence {ϕd, j } in L2 . Since {ηd, j } is a complete


orthonormal basis of Hd , we have



d (x, t) =
K ηd, j (x) ηd, j (t) = λd, j ϕd, j (x) ϕd, j (t) for all x, t ∈ Rd .
j=1 j=1

To standardize the notation, we shall always write the eigenvalues of the linear
operator corresponding to the kernel K d,α,γ in (4) in a weakly decreasing order
νd,α,γ ,1 ≥ νd,α,γ ,2 ≥ · · · . We drop the dependency on the dimension d to denote the
eigenvalues of the linear operator corresponding to the one-dimensional kernel K α,γ
in (5) by ν̃α,γ ,1 ≥ ν̃α,γ ,2 ≥ · · · . Similarly the eigenvalues of the linear operator corre-
sponding to the one-dimensional kernel K γ (x, t) are denoted by λ̃γ ,1 ≥ λ̃γ ,2 ≥ · · · .
A useful relation between the sum of the τ th power of the multivariate eigenvalues
νd,α,γ , j and the sums of the τ th powers of the univariate eigenvalues ν̃α,γ , j is given
by [6, Lemma 3.1]:
⎛ ⎞


d ∞

τ
νd,α,γ ,j = ⎝ ν̃ατ ,γ , j ⎠ ∀τ > 0.
 
j=1 =1 j=1

We are interested in the high dimensional case where d is large, and we want
to establish convergence and tractability results when α and/or γ tend to zero
as  → ∞. According to [10], strong polynomial tractability holds if the sum of
some powers of eigenvalues are bounded. The following lemma provides us with
some useful inequalities on eigenvalues of the linear operators corresponding to
reproducing kernels.
Lemma 1 Let H (K A ), H (K B ), H (K C ) ⊂ L2 (R, ρ1 ) be Hilbert spaces with
symmetric positive definite reproducing kernels K A , K B and K C such that

K κ (t, t)ρ1 (t) dt < ∞, κ ∈ {A, B, C}, (9)
R

and K C = a K A + bK B , a, b ≥ 0. Define the linear operators W A , W B , and WC by



Wκ f = f (t)K κ (·, t)ρ1 (t) dt, for all f ∈ H (K κ ), κ ∈ {A, B, C}.
R

Let the eigenvalues of the operators be sorted in a weakly decreasing order, i.e.
λκ,1 ≥ λκ,2 ≥ · · · . Then these eigenvalues satisfy

λC,i+ j+1 ≤ aλ A,i+1 + bλ B, j+1 , i, j = 1, 2, . . . (10)

λC,i ≥ max(aλ A,i , bλ B,i ), i = 1, 2, . . . (11)


Tractability of Function Approximation with Product Kernels 589

Proof Let {u j } j∈N be any orthonormal basis in L2 (R, ρ1 ). We assign the orthogonal
projections Pn given by


n
Pn x = x, u j u j , x ∈ L2 (R, ρ1 ).
j=1

Since W A is compact due to (9), it can be shown that (I − Pn )W A  → 0 as n → ∞,


where the operator norm

(I − Pn )W A  := sup (I − Pn )W A xL2 (R,ρ1 ) .


x≤1

Furthermore [11, Lemma 11.1 (O S2 )] states that for every pair T1 , T2 : X → Y of


compact operators we have |s j (T1 )−s j (T2 )| ≤ T1 −T2 , j ∈ N, where the singular
values s j (Tk ), k = 1, 2 are the square rootsof the eigenvalues λ j (Tk∗ Tk ) arranged in
a weakly decreasing order, thus s j (Tk ) = λ j (Tk∗ Tk ). Now we can bound

|s j (W A ) − s j (Pn W A Pn )| ≤ |s j (W A ) − s j (Pn W A )| + |s j (Pn W A ) − s j (Pn W A Pn )|


≤ W A − Pn W A  + Pn W A − Pn W A Pn 
≤ (I − Pn )W A  + W A (I − Pn ) → 0

as n → ∞. Thus the eigenvalues λ Pn W A Pn , j → λW A , j for all j as n → ∞. Similarly


this applies to the operators W B and WC . Note that we have

Pn WC Pn = a Pn W A Pn + b Pn W B Pn

and these finite rank operators correspond to self-adjoint matrices. These matrices
are symmetric and positive definite because the kernels are symmetric and positive
definite. The inequalities (10) are found by Weyl (see [8]) and (11) are a direct result
of [2, Fact 8.19.4]. Since (10) and (11) hold for the eigenvalues of symmetric positive
definite matrices, they also hold for the operators corresponding to symmetric and
positive definite kernels. 

We are now ready to present the main results of this article in the following two
sections.

4 Tractability for the Absolute Error Criterion

We now consider the function approximation problem for Hilbert spaces Hd =


H (K d ) with a general kernel using the absolute error criterion. From the discussion
of eigenvalues in the previous section and from (7) it follows that
590 X. Zhou and F.J. Hickernell



λ̃γ , j = K γ (t, t)ρ1 (t) dt = 1, ∀γ > 0. (12)
j=1 R

We want to verify whether polynomial tractability holds, namely whether (8) holds.

4.1 Arbitrary Linear Functionals

Recall that the rate of decay of scale and shape parameters r̃ (α, γ ) is defined in (6).
We first analyze the class Λall and polynomial tractability.

Theorem 1 Consider the function approximation problem I = {Id }d∈N for Hilbert
spaces for the class Λall and the absolute error criterion with the kernels (4) satisfying
(12). Let r̃ (α, γ ) be given by (6). If r̃ (α, γ ) = 0 or there exist constants C1 , C2 , C3 >
0, which are independent of γ but may depend on r̃ (α, γ ) and sup{γ | ∈ N}, such
that

K γ (x, t)ρ1 (x)ρ1 (t) dx dt ≥ 1 − C1 γ 2 , (13)
R2

  2r̃ (α,γ
1
λ̃γ , j
)

C2 ≤ ≤ C3 (14)
j=2
γ2

hold for all 0 < γ < sup{γ | ∈ N}, then it follows that
• I is strongly polynomially tractable with exponent
 
1
p all = min 2, .
r̃ (α, γ )

• For all d ∈ N we have

ewor-all (n, Hd )  n −1/ p = n − max(r̃ (α,γ ),1/2) n → ∞,


all

n wor-abs-all (ε, Hd )  ε− p ε → 0,
all

where  n q with n → ∞ was defined in Sect. 1, and  εq with ε → 0 is analogous


to  (1/ε)−q with 1/ε → ∞.
• For the isotropic kernel with α = α and γ = γ for all , the exponent of
strong tractability is 2. Furthermore strong polynomial tractability is equivalent
to polynomial tractability.

Proof From [10, Theorem 5.1] it follows that I is strongly polynomially tractable
if and only if there exist two positive numbers c1 and τ such that
Tractability of Function Approximation with Product Kernels 591
⎛ ⎞1/τ


c2 := sup ⎝ τ
νd,α,γ ,j
⎠ < ∞, (15)
d∈N j=c1 

Furthermore, the exponent p all of strong polynomial tractability is the infimum of 2τ


for which this condition holds. Obviously (15) holds for c1 = 1 and τ = 1 because
⎛ ⎞


d ∞
d 
 
νd,α,γ , j = ⎝ ν̃α ,γ , j ⎠ = [1 − α2 + α2 K γ (t, t)]ρ1 (t) dt
j=1 =1 j=1 =1 R


d
 
= 1 − α2 + α2 = 1.
=1

This shows that p all ≤ 2.


The case r̃ (α, γ ) = 0 is trivial. Take now r̃ (α, γ ) > 0. Consider first the case
d = 1 for simplicity. Then the kernel K d,α,γ in (4) becomes K α,γ . We will show
α,γ satisfy
that for τ = 1/(2r̃ (α, γ )), the eigenvalues of K



τ
ν̃α,γ , j ≤ 1 + C U (αγ ) ,

(16)
j=1

where the constant CU does not depend on α or γ . Since all the eigenvalues of K γ
are non-negative, we clearly have for the first eigenvalue of K γ ,

ν̃α,γ ,1 ≤ 1. (17)

α,γ
On the other hand, (13) gives the lower bound of the first eigenvalue of K

   
ν̃α,γ ,1 ≥ α,γ (x, t)ρ1 (x)ρ1 (t) dtdx =
K 1 − α 2 + α 2 K γ (x, t) ρ1 (x)ρ1 (t) dtdx
R2  R2
2
=1−α +α 2 K γ (x, t)ρ1 (x)ρ1 (t) dtdx ≥ 1 − C1 (αγ )2 . (18)
R2

It follows from (12) that


ν̃α,γ ,2 ≤ C1 (αγ )2 . (19)

For j ≥ 3, the upper bound of ν̃α,γ , j is given by (10) with i = 1:

ν̃α,γ , j ≤ α 2 λ̃γ , j−1 , (20)


592 X. Zhou and F.J. Hickernell

which in turn yields





τ
ν̃α,γ ,j ≤ α

λ̃τγ , j−1 ≤ C3 (αγ )2τ (21)
j=3 j=3

by (14). Combining (17), (19) and (21) gives (16), where the constant CU = C1τ +C3 .
The lower bound we want to establish is that for τ < 1/(2r̃ (α, γ )),

 1/[2(1−τ )]
τ C2
ν̃α,γ ,j ≥ 1 + CL (αγ ) 2τ
if αγ < , (22)
j=1
2C1

where CL := C2 /2. It follows from (18) that


τ
ν̃α,γ ,1 ≥ ν̃α,γ ,1 ≥ 1 − C 1 (αγ ) .
2
(23)

In addition we apply the eigenvalue inequality (10) to obtain

ν̃α,γ , j ≥ α 2 λ̃γ , j , j = 2, 3, . . .

which in turn gives





τ
ν̃α,γ ,j ≥ α

λ̃τγ , j ≥ C2 (αγ )2τ , (24)
j=2 j=2

where the last inequality follows from (14). Inequalities (23) and (24) together give


τ
ν̃α,γ , j ≥ 1 − C 1 (αγ ) + C 2 (αγ )
2 2τ
≥ 1 + (C2 /2)(αγ )2τ
j=1

under the condition in (22) on small enough αγ . Thus we obtain (22).


For the multivariate case, the sum of the τ th power of the eigenvalues is bounded
from above for any τ > 1/(2r̃ (α, γ )) because
⎛ ⎞

 ∞


d
 
τ
νd, j = ⎝ ν̃ατ ,γ , j ⎠ ≤ 1 + CU (α γ )2τ
 
j=1 =1 j=1 =1
∞   ∞

 
= exp ln 1 + CU (α γ ) 2τ
≤ exp CU (α γ ) 2τ
< ∞. (25)
=1 =1

This shows that p all ≤ 1/r̃ (α, γ ).


Tractability of Function Approximation with Product Kernels 593

We now consider the lower bound in the multivariate case and define the set A by

  

C2 1/[2(1−τ )]

A = 
α γ < .

2C1

Then
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞


 ∞  ∞  ∞
sup ⎝ τ
νd,α,γ ,j
⎠= ⎝ ν̃ατ  ,γ , j ⎠ = ⎝ ν̃ατ  ,γ , j ⎠ ⎝ ν̃ατ  ,γ , j ⎠ .
d∈N j=1 =1 j=1 ∈A j=1 ∈N\A j=1

We want to show that this supremum is infinite for τ < 1/(2r̃ (α, γ )). We do this by
proving that the first product on the right is infinite. Indeed for τ < 1/(2r̃ (α, γ )),
⎛ ⎞
 ∞ 

⎝ ν̃ατ  ,γ , j ⎠ ≥ 1 + CL (α γ )2τ ≥ 1 + CL (α γ )2τ = ∞.
∈A j=1 ∈A ∈A

Therefore, p all ≥ 1/r̃ (α, γ ), which establishes the formula for p all . The estimates on
ewor-all (n, Hd ) and n wor-abs-all (ε, Hd ) follow from the definition of strong tractability.
Finally, the exponent of strong tractability is 2 for the isotropic kernel because
r̃ (α, γ ) = 0 in this case. To prove that strong polynomial tractability is equivalent
to polynomial tractability, it is enough to show that polynomial tractability implies
strong polynomial tractability. From [10, Theorem 5.1] we know that polynomial
tractability holds if and only if there exist numbers c1 > 0, q1 ≥ 0, q2 ≥ 0 and τ > 0
such that ⎧ ⎛ ⎞1/τ ⎫
⎨ ∞ ⎬
c2 := sup d −q2 ⎝ λτd, j ⎠ < ∞.
d∈N ⎩ q1
j=C1 d 

If so, then
n wor-abs-all (ε, Hd ) ≤ (c1 + c2τ ) d max(q1 ,q2 τ ) ε−2τ

for all ε ∈ (0, 1) and d ∈ N. Note that for all d we have


⎛ ⎞d

d −q2 τ ⎝ τ
ν̃α,γ ,j
⎠ − d −q2 τ (c1  − 1)ν̃α,γ
τd τ
,1 ≤ c2 < ∞.
j=1

This implies that τ ≥ 1. On the other hand, for τ = 1 we can take q1 = q2 = 0 and
arbitrarily small C1 , and obtain strong tractability. This completes the proof. 

Theorem 1 states that the exponent of strong polynomial tractability is at most


2, while for all shape parameters for which r̃ (α, γ ) > 1/2 the exponent is smaller
than 2. Again, although the rate of convergence of ewor-all (n, Hd ) is always excellent,
594 X. Zhou and F.J. Hickernell

the dependence on d is eliminated only at the expense of the exponent which must
be roughly 1/ p all . Of course, if we take an exponentially decaying sequence of
the products of scale parameters and shape parameters, say, α γ = q  for some
q ∈ (0, 1), then r̃ (α, γ ) = ∞ and p all = 0. In this case, we have an excellent rate
of convergence without any dependence on d.

4.2 Only Function Values

The tractability results for the class Λstd are stated in the following theorem.
Theorem 2 Consider the function approximation problem I = {Id }d∈N for Hilbert
spaces for the class Λstd and the absolute error criterion with the kernels (4) satisfying
(12). Let r̃ (α, γ ) be given by (6). If r̃ (α, γ ) = 0 or there exist constants C1 , C2 , C3 >
0, which are independent of γ but may depend on r̃ (α, γ ) and sup{γ | ∈ N}, such
that (13) and (14) are satisfied for all 0 < γ < sup{γ | ∈ N}, then
• I is strongly polynomially tractable with exponent of strong polynomial tractabil-
ity at most 4. For all d ∈ N and ε ∈ (0, 1) we have
√  
2 1 1/2
e (n, Hd ) ≤ 1/4 1 + √
wor-std
,
n 2 n
! √ "
(1 + 1 + ε2 )2
n wor−abs−std
(ε, Hd ) ≤ .
ε4

• For the isotropic kernel with α = α and γ = γ for all , the exponent of
strong tractability is at least 2 and strong polynomial tractability is equivalent to
polynomial tractability.
Furthermore if r̃ (α, γ ) > 1/2, then
• I is strongly polynomially tractable with exponent of strong polynomial tractabil-
ity at most
1 1 1
p std = + 2 = p all + ( p all )2 < 4.
r̃ (α, γ ) 2r̃ (α, γ ) 2

• For all d ∈ N we have

ewor-std (n, Hd )  n −1/ p = n −r̃ (α,γ )/[1+1/(2r̃ (α,γ ))] n → ∞,


std

n wor-abs-std (ε, Hd )  ε− p
std
ε → 0.

Proof The same proofs as for [6, Theorems 5.3 and 5.4] can be used. We only need
to show that the assumption of [9, Theorem 5], which is used in [6, Theorem 5.4], is
satisfied. It is enough to show that there exists p > 1 and B > 0 such that for any
n ∈ N,
Tractability of Function Approximation with Product Kernels 595

B
νd,α,γ ,n ≤ . (26)
np

Take τ = 1/(2r̃ (α, γ )). Since the eigenvalues λ̃γ ,n are ordered, we have for n ≥ 2,


1 τ 1 τ
n
C3 γ2τ
λ̃τγ ,n ≤ λ̃γ , j ≤ λ̃γ , j ≤ ,
n − 1 j=2 n − 1 j=2 n−1

where the last inequality follows from (14). Raising to the power 1/τ gives
 1/τ
C3
λ̃γ ,n ≤ γ2 .
n−1

Furthermore (20) implies that for n ≥ 3,


 1/τ  1/τ  
C3 1/τ n 1
ν̃α ,γ ,n ≤ α2 λ̃γ ,n−1 ≤ α2 γ2 = α2 γ2 C3
n−2 n−2 n 1/τ
α2 γ2 (3C3 )1/τ
≤ .
n 1/τ

Since ν̃α ,γ ,n ≤ 1 for all n ∈ N, we have that for all 1 ≤  ≤ d and n ≥ 3,

C4
νd,α,γ ,n ≤ ν̃α ,γ ,n ≤ ,
np

where C4 = α2 γ2 (3C3 )1/τ and p = 1/τ > 1. For n = 1 and n = 2, we can
always find C5 large enough such that νd,α,γ ,n ≤ C5 /n p . Therefore (26) holds for
B = max{C4 , C5 }. 

Note that (26) can be easily satisfied for many kernels used in practice. This
theorem implies that for large r̃ (α, γ ), the exponents of strong polynomial tractability
are nearly the same for both classes Λall and Λstd . For an exponentially decaying
sequence of shape parameters, say, α γ = q  for some q ∈ (0, 1), we have p all =
p std = 0, and the rates of convergence are excellent and independent of d.

5 Tractability for the Normalized Error Criterion

We now consider the function approximation problem for Hilbert spaces Hd ( K d )


with a general kernel for the normalized error criterion. That is, we want to find the
smallest n for which

ewor−ϑ (n, Hd ) ≤ ε Id , ϑ ∈ {std, all}.


596 X. Zhou and F.J. Hickernell


Note that Id  = νd,α,γ ,1 ≤ 1 and it can be exponentially small in d. Therefore
the normalized error criterion may be much harder than the absolute error criterion.
It follows from [6, Theorem 6.1] that for the normalized error criterion, lack of
polynomial tractability holds for the isotropic kernel for the class Λall and hence for
the class Λstd .

5.1 Arbitrary Linear Functionals

We do not know if polynomial tractability holds for kernels with 0 ≤ r̃ (α, γ ) < 1/2.
For r̃ (α, γ ) ≥ 1/2, we have the following theorem.

Theorem 3 Consider the function approximation problem I = {Id }d∈N for Hilbert
spaces for the class Λstd and the normalized error criterion with the kernels (4)
satisfying (12). Let r̃ (α, γ ) be given by (6) and r̃ (α, γ ) ≥ 1/2. If there exist con-
stants C1 , C2 , C3 > 0, which are independent of γ but may depend on r̃ (α, γ ) and
sup{γ | ∈ N}, such that (13) and (14) are satisfied for all 0 < γ < sup{γ | ∈ N},
then
• I is strongly polynomially tractable with exponent of strong polynomial tractabil-
ity
1
p all = .
r̃ (α, γ )

• For all d ∈ N we have

ewor-all (n, Hd )  Id n −1/ p = n −r̃ (α,γ ) n → ∞,


all

n wor-abs-all (ε, Hd )  ε− p
all
ε → 0.

Proof From [10, Theorem 5.2] we know that strong polynomial tractability holds if
and only if there exits a positive number τ such that
⎧ ⎫
∞ 
 ⎨ ∞

νd,α,γ , j τ 1 τ
c2 := sup = sup τ νd,α,γ ,j < ∞.
d νd,α,γ ,1 d ⎩ νd,α,γ ,1 ⎭
j=1 j=1

If so, then n wor-nor-all (ε, Hd ) ≤ c2 ε−2τ for all ε ∈ (0, 1) and d ∈ N, and the exponent
of strong polynomial tractability # is the infimum of 2τ for which c2 < ∞.
For all d ∈ N, we have ∞ τ
j=1 d,α,γ , j < ∞ for τ = 1/(2r̃ (α, γ )) from (25).
ν
τ
It remains to note that supd {1/νd,α,γ ,1 } < ∞ if and only if supd {1/νd,α,γ ,1 } < ∞.
Furthermore note that (18) implies that


1 1
sup ≤ .
d νd,α,γ ,1 =1
1 − C1 (α γ )2
Tractability of Function Approximation with Product Kernels 597

#
Clearly, r̃ (α, γ ) ≥ 1/2 implies that ∞ =1 (α γ ) < ∞, which yields c2 < ∞.
2

This also proves that p all


≤ 1/r̃ (α, γ ). The estimates on ewor-all (n, Hd ) and
n wor-nor-all
(ε, Hd ) follow from the definition of strong tractability. 

5.2 Only Function Values

We now turn to the class Λstd . We do not know if polynomial tractability holds for
the class Λstd for 0 ≤ r̃ (α, γ ) ≤ 1/2. For r̃ (α, γ ) > 1/2, we have the following
theorem.
Theorem 4 Consider the function approximation problem I = {Id }d∈N for Hilbert
spaces with the kernel (4) for the class Λstd and the normalized error criterion. Let
r̃ (α, γ ) be given by (6) and r̃ (α, γ ) > 1/2. If there exist constants C1 , C2 , C3 > 0,
which are independent of γ but may depend on r̃ (α, γ ) and sup{γ | ∈ N}, such that
(13) and (14) are satisfied for all 0 < γ < sup{γ | ∈ N}, then
• I is strongly polynomially tractable with exponent of strong polynomial tractabil-
ity at most
1 1 1
p std = + = p all + ( p all )2 < 4.
r̃ (α, γ ) 2r̃ 2 (α, γ ) 2

For all d ∈ N we have

ewor-std (n, Hd )  n −1/ p n → ∞,


std

(ε, Hd )  ε− p ε → 0.
std
wor-nor-std
n

Proof The initial error is


 
d
1 d
Id  ≥ (1 − C1 (α γ )2 )1/2 = exp O(1) − (α γ )2 .
=1
2 =1

r̃ (α, γ ) > 1/2 implies that Id  is uniformly bounded from below by a positive
number. This shows that there is no difference between the absolute and normalized
error criteria. This means that we can apply Theorem 2 for the class Λstd with ε
replaced by εId  = (ε). This completes the proof. 

Acknowledgments We are grateful for many fruitful discussions with Peter Mathé and several
other colleagues. This work was partially supported by US National Science Foundation grants
DMS-1115392 and DMS-1357690.
598 X. Zhou and F.J. Hickernell

References

1. Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Sta-
tistics. Kluwer Academic, Boston (2004)
2. Bernstein, D.S.: Matrix Mathematics. Princeton University, New Jersey (2008)
3. Buhmann, M.D.: Radial Basis Functions. Cambridge Monographs on Applied and Computa-
tional Mathematics. Cambridge University Press, Cambridge (2003)
4. Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge
Monographs on Applied and Computational Mathematics. Cambridge University Press, Cam-
bridge (2007)
5. Fasshauer, G.E.: Meshfree Approximation Methods with Matlab, Interdisciplinary Mathemat-
ical Sciences, vol. 6. World Scientific Publishing Co., Singapore (2007)
6. Fasshauer, G.E., Hickernell, F.J., Woźniakowski, H.: On dimension-independent rates of con-
vergence for function approximation with Gaussian kernels. SIAM J. Numer. Anal. 50, 247–271
(2012). doi:10.1137/10080138X
7. Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning: Data Mining, Infer-
ence, and Prediction. Springer Series in Statistics, 2nd edn. Springer Science+Business Media
Inc, New York (2009)
8. Knutson, A., Tao, T.: Honeycombs and sums of Hermitian matrices. Not. AMS 48–2, 175–186
(2001)
9. Kuo, F.Y., Wasilkowski, G.W., Woźniakowski, H.: On the power of standard information for
multivariate approximation in the worst case setting. J. Approx. Theory 158, 97–125 (2009)
10. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems Volume I: Linear Infor-
mation. EMS Tracts in Mathematics, vol. 6. European Mathematical Society, Zürich (2008)
11. Pietsch, A.: Operator Ideals. North-Holland Publishing Co., Amsterdam (1980)
12. Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cam-
bridge (2006). http://www.gaussianprocess.org/gpml/
13. Schaback, R., Wendland, H.: Kernel techniques: from machine learning to meshless methods.
Acta Numer. 15, 543–639 (2006)
14. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization,
Optimization, and Beyond. MIT Press, Cambridge (2002)
15. Stein, M.L.: Interpolation of Spatial Data: Some theory for Kriging. Springer, New York (1999)
16. Steinwart, I., Christmann, A.: Support Vector Machines. Springer Science+Business Media,
Inc., New York (2008)
17. Wahba, G.: Spline Models for Observational Data, CBMS-NSF Regional Conference Series
in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
18. Wendland, H.: Scattered Data Approximation. Cambridge Monographs on Applied and Com-
putational Mathematics, vol. 17. Cambridge University Press, Cambridge (2005)
Discrepancy Estimates
For Acceptance-Rejection Samplers
Using Stratified Inputs

Houying Zhu and Josef Dick

Abstract In this paper we propose an acceptance-rejection sampler using stratified


inputs as driver sequence. We estimate the discrepancy of the N -point set in
(s − 1)-dimensions generated by this algorithm. First we show an upper bound
on the star-discrepancy of order N −1/2−1/(2s) . Further we prove an upper bound on
q
the qth moment of the L q -discrepancy (E[N q L q,N ])1/q for 2 ≤ q ≤ ∞, which is
(1−1/s)(1−1/q)
of order N . The proposed approach is numerically tested and compared
with the standard acceptance-rejection algorithm using pseudo-random inputs. We
also present an improved convergence rate for a deterministic acceptance-rejection
algorithm using (t, m, s)−nets as driver sequence.

Keywords Monte Carlo method · Acceptance-rejection sampler · Discrepancy


theory

1 Introduction

The acceptance-rejection algorithm is one of the widely used techniques for sampling
from a distribution when direct simulation is not possible or is expensive. The idea
of this method is to determine a good choice of proposal density (also known as
hat function), and then sample from the proposal density with low cost. For a given
target density ψ : D → R+ and a well-chosen proposal density H : D → R+ , one
assumes that there exists a constant L < ∞ such that ψ(x) < L H (x) for all x in the
domain D. Let u have uniform distribution in the unit interval, i.e. u ∼ U ([0, 1]).
Then the plain acceptance-rejection algorithm works in the following way. One first
draws X ∼ H and u ∼ U ([0, 1]), then accepts X as a sample of ψ if u ≤ Lψ(X) H (X)
,

H. Zhu (B) · J. Dick


School of Mathematics and Statistics, The University of New South Wales,
Sydney NSW 2052, Australia
e-mail: houying.zhu@unsw.edu.au
J. Dick
e-mail: josef.dick@unsw.edu.au
© Springer International Publishing Switzerland 2016 599
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_33
600 H. Zhu and J. Dick

otherwise reject this sample and repeat the sampling step. Note that by applying
this algorithm, one needs to know the value of L. However, in many situations, this
constant is known for the given function or can be estimated.
Devroye [6] gave a construction method of a proposal density for log-concave
densities and Hörmann [17] proposed a rejection procedure, called transformed den-
sity rejection, to construct a proposal density. Detailed summaries of this technique
and some extensions can be found in the monographs [3, 18]. For many target den-
sities finding a good proposal density is difficult. To improve efficiency one can also
determine a better choice of driver sequence having the designated proposal density,
which yields a deterministic type of acceptance-rejection method.
The deterministic acceptance-rejection algorithm has been discussed by
Moskowitz and Caflisch [22], Wang [31, 32] and Nguyen and Ökten [23], where
empirical evidence and a consistency result were given. Two measurements included
therein are the empirical root mean square error (RMSE) and the empirical standard
deviation. However, the discrepancy of samples has not been directly investigated.
Motivated by those papers, in [33] we investigated the discrepancy properties of
points produced by a totally deterministic acceptance-rejection method. We proved
that the discrepancy of samples generated by the acceptance-rejection sampler using
(t, m, s)−nets as driver sequences is bounded from above by N −1/s , where the target
density function is defined in (s − 1)-dimension and N is the number of samples
generated by the deterministic acceptance-rejection sampler. A lower bound shows
that for any given driver sequence, there always exists a target density such that the
star-discrepancy is bounded below by cs N −2/(s+1) , where cs is a constant depending
only on s.
Without going into details, in the following we briefly review known results in
the more general area of deterministic Markov chain quasi-Monte Carlo.

1.1 Literature Review of Markov Chain Quasi-Monte Carlo


Method

Markov chain Monte Carlo (MCMC) sampling is a classical method widely used in
simulation. Using a deterministic set as driver sequence in the MCMC procedure,
known as Markov chain quasi-Monte Carlo (MCQMC) algorithm, shows potential
to improve the convergence rate. Tribble and Owen [30] proved a consistency result
for MCMC estimation for finite state spaces. A construction of weakly completely
uniformly distributed (WCUD) sequences is also proposed. As a sequel to the work
of Tribble, Chen [4] and Chen et al. [5] demonstrated that MCQMC algorithms
using a completely uniformly distributed (CUD) sequence as driver sequence give
a consistent result under certain assumptions on the update function and Markov
chain. Further, Chen [4] also showed that MCQMC can achieve a convergence rate
of O(N −1+δ ) for any δ > 0 under certain stronger assumptions, but he only showed
Discrepancy Estimates For Acceptance-Rejection Samplers Using … 601

the existence of a driver sequence. More information on (W)CUD sequences can be


found in [4, 5, 30].
In a different direction, L’Ecuyer et al. [20] proposed a randomized quasi-Monte
Carlo method, namely the so-called array-RQMC method, which simulates multiple
Markov chains in parallel, then applies a suitable permutation to provide a more
accurate approximation of the target distribution. It gives an unbiased estimator to
the mean and variance and also achieves good empirical performance. Gerber and
Chopin in [12] adapted low discrepancy point sets instead of random numbers in
sequential Monte Carlo (SMC). They proposed a new algorithm, named sequential
quasi-Monte Carlo (SQMC), through the use of a Hilbert space-filling curve. They
proved consistency and stochastic bounds based on randomized QMC point sets for
this algorithm. More literature review about applying QMC to MCMC problems can
be found in [5, Sect. 1] and the references therein.
In [10], jointly done with Rudolf, we prove upper bounds on the discrepancy for
uniformly ergodic Markov chains driven by a deterministic sequence rather than inde-
pendent random variables. We show that there exists a deterministic driver sequence
such that the discrepancy of the Markov chain from the target distribution with respect
to certain test sets converges with almost the usual Monte Carlo rate of N −1/2 . In the
sequential work [9] done by Dick and Rudolf, they consider upper bounds on the dis-
crepancy under the assumption that the Markov chain is variance bounding and the
driver sequence is deterministic. In particular, they proved a better existence result,
showing a discrepancy bound having a rate of convergence of almost N −1 under
a stronger assumption on the update function, the so called anywhere-to-anywhere
condition. Roughly, variance bounding is a weaker property than geometric ergod-
icity for reversible chains. It was introduced by Roberts and Rosenthal in [28], who
also proved relations among variance bounding, central limit theorems and Peskun
ordering, which indicated that variance bounding is a reasonable and convenient
property to study MCMC algorithms.

1.2 Our Contribution

In this work we first present an acceptance-rejection algorithm using stratified inputs


as driver sequence. Stratified sampling is one of the variance reduction methods used
in Monte Carlo sampling. More precisely, grid-based stratified sampling improves
the RMSE to N −1/2−1/s for Monte Carlo, see for instance [26, Chap. 10]. In this
paper, we are interested in the discrepancy properties of points produced by the
acceptance-rejection method with stratified inputs as driver sequence. We obtain a
convergence rate of the star-discrepancy of order N −1/2−1/(2s) . Also an estimation of
the L q -discrepancy is considered for this setting.
One would expect that the convergence rate which can be achieved using deter-
ministic sampling methods also depends on properties of the target density function.
One such property is the number of elementary intervals (for a precise definition see
Definition 3 below) of a certain size needed to cover the graph of the density. We
602 H. Zhu and J. Dick

show that if the graph can be covered by a small number of elementary intervals,
then an improved rate of convergence can be achieved using (t, m, s)-nets as driver
sequence. In general, this strategy does not work with stratified sampling, unless one
knows the elementary intervals explicitly.
The paper is organized as follows. In Sect. 2 we provide the needed notation and
background. Section 3 introduces the proposed acceptance-rejection sampler using
stratified inputs, followed by the theoretical results including an upper bound on the
star-discrepancy and the L q -discrepancy. Numerical tests are presented in Sect. 3.3
together with a discussion of the results in comparison with the theoretical bounds of
Theorems 1 and 2. For comparison purpose only we do the numerical tests also with
pseudo-random inputs. Section 4 illustrates an improved rate of convergence when
using (t, m, s)-nets as driver sequences. The paper ends with concluding remarks.

2 Preliminaries

We are interested in the discrepancy properties of samples generated by the


acceptance-rejection sampler. We consider the L q -discrepancy and the star-
discrepancy.
Definition 1 (L q -discrepancy) Let 1 ≤ q ≤ ∞ be a real number. For a point set
PN = {x 0 , . . . , x N −1 } in [0, 1)s , the L q -discrepancy is defined by

 N −1
1  q 1/q
L q,N (PN ) =  1[0,t) (x n ) − λ([0, t)) dt ,
[0,1]s N n=0


1, if x n ∈ [0, t), 
where 1[0,t) (x n ) = , [0, t) = sj=1 [0, t j ) and λ is the Lebesgue
0, otherwise.
measure, with the obvious modification for q = ∞. The L ∞,N -discrepancy is called
the star-discrepancy which is also denoted by D ∗N (PN ).
Later we will consider the discrepancy of samples associated with a density func-
tion.
The acceptance-rejection algorithm accepts all points below the graph of the
density function. In order to prove bounds on the discrepancy, we assume that the
set below the graph of the density function admits a so-called Minkowski content.
Definition 2 (Minkowski content) For a set A ⊆ Rs , let ∂ A denote the boundary of
A and let
λ((∂ A)ε )
M (∂ A) = lim ,
ε→0 2ε

where (∂ A)ε = {x ∈ Rs |x − y ≤ ε for y ∈ ∂ A} and  ·  denotes the Euclidean


norm. If M (∂ A) (abbreviated as M A ) exists and is finite, then ∂ A is said to admit
an (s − 1)−dimensional Minkowski content.
Discrepancy Estimates For Acceptance-Rejection Samplers Using … 603

For simplicity, we consider the Minkowski content associated with the boundary
of a given set, however one could define it in more general sense. Ambrosio et al.
[1] present a detailed discussion of general Minkowski content.

3 Acceptance-Rejection Sampler Using Stratified Inputs

We now present the acceptance-rejection algorithm using stratified inputs.


Algorithm 1 Let the target density ψ : [0, 1]s−1 → R+ , where s ≥ 2, be given.
Assume that we know a constant L < ∞ such that ψ(z) ≤ L for all z ∈ [0, 1]s−1 .
Let A = {z ∈ [0, 1]s : ψ(z 1 , . . . , z s−1 ) ≥ Lz s }.
(i) Let M ∈ N and let {Q 0 , . . . , Q M−1 } be a disjoint covering of [0, 1)s with Q i of
 cj c +1
the form sj=1 M 1/s , Mj 1/s with 0 ≤ c j ≤
M 1/s −1. Then λ(Q i ) = 1/M for
all 0 ≤ i ≤ M −1. Generate a point set PM = {x 0 , . . . , x M−1 } such that x i ∈ Q i
is uniformly distributed in the sub-cube Q i for each i = 0, 1, . . . , M − 1.
(ii) Use the acceptance-rejection method for the points in PM with respect to the
density ψ, i.e. we accept the point x n if x n ∈ A, otherwise reject. Let PN(s) =
A ∩ PM = {z 0 , . . . , z N −1 } be the sample set we accept.
(iii) Project the points we accepted PN(s) onto the first (s − 1) coordinates. Let
Y N(s−1) = { y0 , . . . , y N −1 } be the projections of the points PN(s) .
(iv) Return the point set Y N(s−1) .
Note that M 1/s is not necessarily an integer in Algorithm 1 and hence the sets
Q i do not necessarily partition the unit cube [0, 1)s . The restriction that M 1/s is an
integer forces one to choose M = K s for some K ∈ N, which grows fast for large
s. However, this restriction is not necessary and hence we do not assume here that
M 1/s is an integer.

3.1 Existence Result of Samples with Small Star Discrepancy

We present some results that we use to prove an upper bound for the star-discrepancy
with respect to points generated by the acceptance-rejection sampler using stratified
inputs. For any 0 < δ ≤ 1, a set Γ of anchored boxes [0, x) ⊆ [0, 1)s is called a δ-
cover of the set of anchored boxes [0, t) ⊆ [0, 1)s if for every point t ∈ [0, 1]s , there
exist [0, x), [0, y) ∈ Γ such that [0, x) ⊆ [0, t) ⊆ [0, y) and λ([0, y) \ [0, x)) ≤ δ.
The following result on the size of the δ-cover is obtained from [13, Theorem 1.15].

Lemma 1 For any s ∈ N and δ > 0 there exists a δ-cover of the set of anchored
boxes [0, t) ⊆ [0, 1)s which has cardinality at most (2e)s (δ −1 + 1)s .
604 H. Zhu and J. Dick

By a simple generalization, the following result holds for our setting.


Lemma 2 Let ψ : [0, 1]s−1 → R+ , where s ≥ 2, be a function. Assume that
there exists a constant L < ∞ such that ψ(z) ≤ L for all z ∈ [0, 1]s−1 . Let
A = {z ∈ [0, 1]s : ψ(z 1 , . . . , z s−1 ) ≥ Lz s } and Jt∗ = ([0, t) × [0, 1]) ∩ A, for
t ∈ [0, 1]s−1 . Let (A, B(A), λ) be a probability space where B(A) is the Borel
σ -algebra of A. Define the set A ⊂ B(A) of test sets by

A = {Jt∗ : t ∈ [0, 1]s−1 }.

Then for any δ > 0 there exists a δ-cover Γδ of A with

|Γδ | ≤ (2e)s−1 (δ −1 + 1)s−1 .

Lemma 3 Let the unnormalized density function ψ : [0, 1]s−1 → R+ , with s ≥ 2,


be given. Assume that there exists a constant L < ∞ such that ψ(z) ≤ L for all
z ∈ [0, 1]s−1 .
• Let M ∈ N and let the subsets
 Q 0 , . . . , Q M−1 be a disjoint covering of [0, 1) of
s
s cj c j +1
the form i=1 M 1/s , M 1/s where 0 ≤ c j ≤
M 1/s − 1. Each set Q i satisfies
λ(Q i ) = 1/M.
• Let
A = {z ∈ [0, 1]s : ψ(z 1 , . . . , z s−1 ) ≥ Lz s }.

Assume that ∂ A admits an


(s − 1)−dimensional Minkowski content M A .
• Let Jt∗ = ([0, t) × [0, 1]) A, where t = (t1 , . . . , ts−1 ) ∈ [0, 1]s−1 .
Then there exists an M0 ∈ N such that ∂ Jt∗ intersects at most with 3s 1/2 M A M 1−1/s
subcubes Q i , for all M ≥ M0 .
This result can be obtained utilizing a similar proof as in [14, Theorem 4.3]. For
the sake of completeness, we give the proof here.
Proof Since ∂ A admits an (s − 1)−dimensional Minkowski content, it follows that

λ((∂ A)ε )
M A = lim < ∞.
ε→0 2ε

Thus by the definition of the limit, for any fixed ϑ > 2, there exists ε0 > 0 such that
λ((∂ A)ε ) ≤ ϑεM A whenever 0 < ε ≤ ε0 .
s c j c j +1 
Based on the form of the subcube given by i=1 ,
M 1/s M 1/s
, the largest diag-
√ −1/s √ √
onal length is s M . We can assume that M > ( s/ε0 ) , then s M −1/s =:
s

ε < ε0 and i∈J Q i ⊆ (∂ A)ε , where J is the index set for the sets Q i which satisfy
Q i ∩ ∂ A = ∅. Therefore

λ((∂ A)ε ) ϑεM A √


|J | ≤ ≤ −1
= sϑM A M 1−1/s .
λ(Q i ) M
Discrepancy Estimates For Acceptance-Rejection Samplers Using … 605

Without loss of generality, we can set ϑ = 3. Note that the number of boxes Q i
which intersect ∂ Jt∗ is bounded by the number of boxes Q i which intersect ∂ A,
which completes the proof. 

Remark 1 Ambrosio et al. [1] found that for a closed set A ⊂ Rs , if A has a Lipschitz
boundary, then ∂ A admits an (s − 1)-dimensional Minkowski content. In particular,
a convex set A ⊂ [0, 1]s has an (s − 1)-dimensional Minkowski content. Note that
the surface area of a convex set in [0, 1]s is bounded by the surface area of the unit
cube [0, 1]s , which is 2s and it was also shown by Niederreiter and Wills [25] that 2s
is best possible. It follows that the Minkowski content M A ≤ 2s when A is a convex
set in [0, 1]s .

Lemma 4 Suppose that all the assumptions of Lemma 3 are satisfied. Let N be the
number of points accepted by Algorithm 1. Then we have

M(λ(A) − 3s 1/2 M A M −1/s ) ≤ N ≤ M(λ(A) + 3s 1/2 M A M −1/s ).

Proof The number of points we accept in Algorithm 1 is a random number since the
driver sequence given by stratified inputs is random. Let E(N ) be the expectation
of N . The number of Q i which have non-empty intersection with ∂ A is bounded by
l = 3s 1/2 M A M 1−1/s from Lemma 3. Thus

E[N ] − l ≤ N ≤ E[N ] + l. (1)

Further we have

M−1
λ(Q i ∩ A)
E[N ] = = Mλ(A). (2)
i=0
λ(Q i )

Combining (1) and (2) and substituting l = 3s 1/2 M A M 1−1/s , one obtains the desired
result. 

Before we start to prove the upper bound on the star-discrepancy, our method
requires the well-known Bernstein–Chernoff inequality.
Lemma 5 [2, Lemma 2] Let η0 , . . . , ηl−1 be independent random variables with
E(ηi ) = 0 and |ηi | ≤ 1 for all 0 ≤ i ≤ l − 1. Denote by σi2 the variance of ηi , i.e.

σi2 = E(ηi2 ). Set β = ( l−1
i=0 σi )
2 1/2
. Then for any γ > 0 we have

 
l−1
   2e−γ /4 , if γ ≥ β 2 ,
P  
ηi ≥ γ ≤
2e−γ /4β , if γ ≤ β 2 .
2 2

i=0

The star-discrepancy of samples Y N(s−1) obtained by Algorithm 1 with respect to


ψ is given as follows,
606 H. Zhu and J. Dick

1 N −1  
 1 
D ∗N ,ψ (Y N(s−1) ) = sup  1[0,t) ( yn ) − ψ(z)d z ,
t∈[0,1]s−1 N n=0
C [0,t)


where C = [0,1]s−1 ψ(z) dz and s ≥ 2.

Theorem 1 Let an unnormalized density function ψ : [0, 1]s−1 → R+ , with s ≥ 2,


be given. Assume that there exists a constant L < ∞ such that ψ(z) ≤ L for all
z ∈ [0, 1]s−1 . Let C = [0,1]s−1 ψ(z) dz > 0 and let the graph under ψ be defined as

A = {z ∈ [0, 1]s : ψ(z 1 , . . . , z s−1 ) ≥ Lz s }.

Assume that ∂ A admits an (s − 1)−dimensional Minkowski content M A . Then for


all large enough N , with positive probability, Algorithm 1 yields a point set Y N(s−1) ⊆
[0, 1]s−1 such that
3 √ √
s4 6M A log N 2λ(A)
D ∗N ,ψ (Y N(s−1) ) ≤ + , (3)
2s − 2 2 − 2s 2 + 2s
1 1 1 1 1 1
2 (λ(A)) N N

where λ(A) = C/L.


Proof Let Jt∗ = ([0, t) × [0, 1]) A, where t = (t1 , . . . , ts−1 ). Using the notation
from Algorithm 1, let yn be the first s−1 coordinates of z n ∈ A, for n = 0, . . . , N −1.
We have

M−1 N −1

1 Jt∗ (x n ) = 1[0,t) ( yn ).
n=0 n=0

Therefore
1 N −1    1 M−1 
 1    ∗ 1 
 1[0,t) ( yn ) − ψ(z)d z  =  1 Jt (x n ) − λ(Jt∗ ). (4)
N n=0 C [0,t) N n=0 λ(A)

It is noted that

 M−1   M−1   N 


  ∗ N    ∗   
 1 Jt (x n ) − λ(Jt∗ ) ≤  1 Jt (x n ) − Mλ(Jt∗ ) + λ(Jt∗ ) M − 
λ(A) λ(A)
n=0 n=0
 M−1   
  ∗   
≤ 1 Jt (x n ) − Mλ(Jt∗ ) + Mλ(A) − N 
n=0
 M−1   
  ∗ 
M−1
  
≤ 1 Jt (x n ) − Mλ(Jt∗ ) + Mλ(A) − 1 A (x n )
n=0 n=0
 M−1 
  ∗ 
≤ 2 sup  1 Jt (x n ) − Mλ(Jt∗ ). (5)
t∈[0,1]s n=0
Discrepancy Estimates For Acceptance-Rejection Samplers Using … 607

Let us associate with each Q i , random points x i ∈ Q i with probability distribution

λ(V )
P(x i ∈ V ) = = Mλ(V ),
λ(Q i )

for all measurable sets V ⊆ Q i .


It follows from Lemma 3 that ∂ Jt∗ intersects at most l := 3s 1/2 M A M 1−1/s sets
Q i . Therefore, Jt∗ is representable as the disjoint union of sets Q i entirely contained
in Jt∗ and the union of at most l sets Q i for which Q i ∩ Jt∗ = ∅ and Q i ∩ ([0, 1]s \
Jt∗ ) = ∅, i.e.  
Jt∗ = Q i ∪ (Q i ∩ Jt∗ ),
i∈I i∈J

where the index-set J has cardinality at most


3s 1/2 M A M 1−1/s . Since for every Q i ,
λ(Q i ) = 1/M and x i ∈ Q i for i = 0, 1, . . . , M − 1, the discrepancy
of i∈I Q i is
zero. Therefore, it remains to investigate the discrepancy of i∈J (Q i ∩ Jt∗ ).
Since λ(A) = C/L and N ≥ M(C/L − 3s 1/2 M A M −1/s ) by Lemma 4, we have
M ≤ 2L N /C for all M > (6Ls 1/2 M A /C)s . Consequently,
1
l = 3s 1/2 M A M 1−1/s ≤ 3s 1/2 (2L)1−1/s C 1/s−1 M A N 1− s = Ω N 1−1/s ,

where Ω = 3s 1/2 (2L)1−1/s C 1/s−1 M A .


Let us define the random variable χi for 0 ≤ i ≤ l − 1 as follows

1, if z i ∈ Q i ∩ Jt∗ ,
χi =
/ Q i ∩ Jt∗ .
0, if z i ∈

By definition,

 M−1   
 ∗ 
l−1 l−1
  
 1 Jt (x n ) − Mλ(Jt∗ ) =  χi − M λ(Q i ∩ Jt∗ ). (6)
n=0 i=0 i=0

Because of P(χi = 1) = λ(Q i ∩ Jt∗ )/λ(Q i ) = Mλ(Q i ∩ Jt∗ ), we have

Eχi = Mλ(Q i ∩ Jt∗ ), (7)

where E(·) denotes the expected value. By (6) and (7),

 M−1   
 ∗
l−1
  
Δ N (Jt∗ ; z 1 , . . . , z N ) =  1 Jt (x n ) − Mλ(Jt∗ ) =  (χi − Eχi ). (8)
n=0 i=0

χi for 0 ≤ i ≤ l − 1 are independent of each other,


Since the random variables
in order to estimate the sum l−1
i=0 (χi − Eχi ) we are able to apply the classical
608 H. Zhu and J. Dick

Bernstein–Chernoff inequality of large deviation type. Let σi2 = E(χi − Eχi )2 and

set β = ( li=1 σi2 )1/2 . Let
γ = θl 1/2 (log N )1/2 ,

where θ is a constant depending only on the dimension s which will be fixed later.
Without loss of generality, assume that N ≥ 3.
1
Case 1: If γ ≤ β 2 , since β 2 ≤ l ≤ Ω N 1− s , by Lemma 5 we obtain
 
P Δ N (Jt∗ ; z 1 , . . . , z N ) ≥ θl 1/2 (log N )1/2
  l
 
=P  (χi − Eχi ) ≥ γ ≤ 2e−γ /(4β ) ≤ 2N −θ /4 .
2 2 2
(9)
i=1

Though the class of axis-parallel boxes is uncountable, it suffices to consider a small


subclass. Based on the argument in Lemma 2, there is an 1/M-cover of cardinality
(2e)s−1 (M + 1)s−1 ≤ (2e)s−1 (2L N /C + 1)s−1 for M > M0 such that there exist
R1 , R2 ∈ Γ1/M having the properties R1 ⊆ Jt∗ ⊆ R2 and λ(R2 \ R1 ) ≤ 1/M. From
this it follows that

Δ N (Jt∗ ; z 1 , . . . , z N ) ≤ max Δ(Ri ; z 1 , . . . , z N ) + 1,


i=1,2

see, for instance, [11, Lemma 3.1] and [16, Section 2.1]. This means that we can
restrict ourselves to the elements of Γ1/M .
In view of (9)

  θ2 θ2  2L N s−1
P Δ(Ri ; z 1 , . . . , z N ) ≥ γ ≤ |Γ1/M |2N − 4 ≤ 2N − 4 (2e)s−1 +1 < 1,
C

for θ = 2 2s and N ≥ 8e C
+ 2.
Case 2: On the other hand, if γ ≥ β 2 , then by Lemma 5 we obtain
 
P Δ(Jt∗ ; z 1 , . . . , z N ) ≥ θl 1/2 (log N )1/2
 
l
  θl 1/2 (log N )1/2
=P  (χi − Eχi ) ≥ γ ≤ 2e− 4 . (10)
i=1


Similarly, using the 1/M-cover above, for θ = 2 2s and sufficiently large N we
have
  θl 1/2 (log N )1/2
P Δ(Ri ; z 1 , . . . , z N ) ≥ γ ≤ |Γ1/M |2e− 4

θl 1/2 (log N )1/2  2L N s−1


≤ 2e− 4 (2e)s−1 +1 < 1,
C
Discrepancy Estimates For Acceptance-Rejection Samplers Using … 609

where the last equation is satisfied for all large enough N .


By (4) and (5), we obtain that, with positive probability, Algorithm 1 yields a point
set Y N(s−1) such that

D ∗N ,ψ (Y N(s−1) ) ≤ 2sΩ 1/2 N − 2 − 2s (log N )1/2 + 1/M.
1 1

As above, by Lemma 4 we have 1/M ≤ 2C/(L N ) for sufficiently large N . Thus


the proof of Theorem 1 is complete. 

3.2 Upper Bound on the L q -Discrepancy

In this section we prove an upper bound on the expected value of the L q -discrepancy
 1/q
for 2 ≤ q ≤ ∞. We establish an upper bound for E[N q L q,N (Y N(s−1) )]
q
which is
given by
 1/q    −1
 N N

q 1/q
(s−1)  ψ(z) dz) dt
q
E N q L q,N (Y N ) = E 1[0,t) ( yn )− ,
[0,1)s−1 n=0 C [0,t)

where Y N(s−1) is the sample set associated with the density function ψ.

Theorem 2 Let the unnormalized density function ψ : [0, 1]s−1 → R+ satisfy all
the assumptions stated in Theorem 1. Let Y N(s−1) be the samples generated by the
acceptance-rejection sampler using stratified inputs in Algorithm 1. Then we have
for 2 ≤ q ≤ ∞,

 1/q 2(1−1/s)(1−1/q) (3s 1/2 M A )1−1/q (1−1/s)(1−1/q)


E[N q L q,N (Y N(s−1) )]
q
≤ √ N , (11)
4 2C(λ(A))(1−1/s)(1−1/q)

where M A is the (s − 1)−dimensional Minkowski content and the expectation is


taken with respect to the stratified inputs.

Proof Let Jt∗ = ([0, t) × [0, 1]) A, where t = (t1 , . . . , ts−1 ) ∈ [0, 1]s−1 . Let

ξi (t) = 1 Q i ∩Jt∗ (x i ) − λ(Q i ∩ Jt∗ )/λ(Q i ),

where Q i for 0 ≤ i ≤ M − 1 is a disjoint covering of [0, 1)s with λ(Q i ) = 1/M.


Then E(ξi (t)) = 0 since we have E[1 Q i ∩Jt∗ (x i )] = Mλ(Q i ∩ Jt∗ ). Hence for any
t ∈ [0, 1]s−1 ,

E[ξi2 (t)] = E[(1 Q i ∩Jt∗ (x i ) − Mλ(Q i ∩ Jt∗ ))2 ]


= E[1 Q i ∩Jt∗ (x i )] − 2Mλ(Q i ∩ Jt∗ )E[1 Q i ∩Jt∗ (x i )] + M 2 λ2 (Q i ∩ Jt∗ )
= Mλ(Q i ∩ Jt∗ )(1 − Mλ(Q i ∩ Jt∗ )) ≤ 1/4.
610 H. Zhu and J. Dick

If Q i ⊆ Jt∗ or if Q i ∩ Jt∗ = ∅, we have ξi (t) = 0. We order the sets Q i such that


Q 0 , Q 1 , . . . , Q i0 satisfy Q i ∩ Jt∗ = ∅ and Q i  Jt∗ (i.e. Q i intersects the boundary
of Jt∗ ) and the remaining sets Q i either satisfy Q i ∩ Jt∗ = ∅ or Q i ⊆ Jt∗ . If ∂ A
admits an (s − 1)−dimensional Minkowski content, it follows from Lemma 3 that,


M−1 
l−1
ξi2 (t) = ξi2 (t) ≤ l/4 for all t ∈ [0, 1]s−1 .
i=0 i=0

Again, E[N ] = Mλ(A) from Eq. (2). Now for q = 2,


 1/2
E N 2 L 22,N (Y N(s−1) )
  N −1
N

2 1/2
= E  1[0,t) ( yn ) − ψ(z) dz) dt
[0,1)s−1 n=0 C [0,t)
  
 M−1 N λ(Jt∗ ) 2 1/2
= E  1 Jt∗ (x n ) − dt
[0,1)s−1 n=0 λ(A)
   M−1
   E(N )λ(Jt∗ ) N λ(Jt∗ ) 2 1/2
≤ E  1 Jt∗ (x n ) − Mλ(Jt∗ ) +  − dt
[0,1)s−1 n=0
λ(A) λ(A)
√   
 M−1 2  λ(Jt∗ ) 2 1/2
≤ 2 E  1 Jt∗ (x n ) − Mλ(Jt∗ ) +  (E(N ) − N ) dt ,
[0,1)s−1 n=0 λ(A)

where we use (a + b)2 ≤ 2(a 2 + b2 ).


Then we have

(s−1)
1/2 √   
 M−1 2 1  2 1/2
E N 2 L 22,N (Y N ) ≤ 2 E  ξi (t) dt + E(N ) − N 
[0,1]s−1 (λ(A)) 2
i=0

√   
M−1
 L 2  2 1/2
M−1
= 2 E ξi2 (t) dt + 2 ξi (1)
[0,1]s−1 C
i=0 i=0

√  
l−1
L2 
l−1 1/2
= 2 E[ξi2 (t)]dt + ξi2 (1)
[0,1]s−1 i=0 C2
i=0
√ 1 L 2 l 1/2 (L 2 + C 2 )1/2 1/2
≤ 2 + 2 = √ l .
4 C 4 2C

Since |ξi (t)| ≤ 1, for q = ∞, we have

(s−1)

 M−1  l−1 
sup |N D ∗N (Y N )| = sup sup  ξi (t) = sup sup  ξi (t)
PM ⊂[0,1]s PM ⊂[0,1]s t∈[0,1]s−1 i=0 PM ⊂[0,1]s t∈[0,1]s−1 i=0


l−1  
≤ sup sup ξi (t) ≤ l/4.
PM ∈[0,1]s t∈[0,1]s−1 i=0
Discrepancy Estimates For Acceptance-Rejection Samplers Using … 611

Therefore, for 2 ≤ q ≤ ∞,

 1/q (L 2 + C 2 )1/2 1−1/q


E[N q L q,N (Y N(s−1) )]
q
≤ √ l ,
4 2C

which is a consequence of the log-convexity of L p -norms, i.e.  f  pθ ≤  f 1−θ


p0
 f θp1 , where 1/ pθ = (1 − θ )/ p0 + θ/ p1 . In our case, p0 = 2 and p1 = ∞.
Additionally, following from Lemma 4, we have M ≤ 2L N /C whenever
M > (6Ls 1/2 M A /C)s . Hence we obtain the desired result by substituting l = 3s 1/2
M A M 1−1/s and replacing M in terms of N . 

Remark 2 It would also be interesting to find out whether (11) still holds for 1 <
q < 2. See Heinrich [15] for a possible proof technique.
We leave it as an open problem.

3.3 Numerical Tests and Discussion of Results

We consider the discrepancy of samples generated by Algorithm 1 with respect to


the given density ψ defined by
1 −x1
ψ(x1 , x2 , x3 , x4 ) = (e + e−x2 + e−x3 + e−x4 ), (x1 , x2 , x3 , x4 ) ∈ [0, 1]4 .
4
To compute the star-discrepancy, we utilize the same technique as in [33], a so-called
δ-cover, to estimate the supremum in the definition of the star-discrepancy. We also
calculate the L q -discrepancy of samples for this example. The L q -discrepancy with
respect to a density function is denoted by,

 1 
N −1  q 1/q
 1 
L q (Y N(s−1) , ψ) =  1[0,t) ( yn ) − ψ(z)d z  dt , (12)
[0,1]s−1 N n=0 C [0,t)

where C = [0,1]s−1 ψ(z) dz and t = (t1 , . . . , ts−1 ). One can write down a precise
formula for the squared L 2 -discrepancy for the given ψ in this example, which is

L 2 (Y N(s−1) , ψ)2 = Δ2ψ,t dt
[0,1]s−1

1  71 7 
N −1 s−1
1   16
= 2 (1 − max{ym, j , yn, j }) + − +
N m,n=0 j=1 4C 2 54e2 27e 108
N −1 4 4
1  −1 −yi, j k=1 (1 − yi,k )
2
− (1 + e − yi, j − e ) ,
16N C i=0 j=1 1 − yi,2 j

where C = 1 − 1/e.
612 H. Zhu and J. Dick

Theorem 1 shows that Algorithm 1 can yield a point set satisfying the discrepancy
bound (3). To test this result numerically and to compare it to the acceptance-rejection
algorithm using random inputs, we performed the following numerical test. We gen-
erated 100 independent stratified inputs and 100 independent pseudo-random inputs
for the acceptance-rejection algorithm. From the samples sets obtained from the
acceptance-rejection algorithm we chose those samples which yielded the fastest
rate of convergence for stratified inputs and also for pseudo-random inputs.
Theorem 1 suggests a convergence rate of order N −1/2−1/(2s) = N −0.6 for stratified
inputs. The numerical results in this test shows an empirical convergence of N −0.62 ,
see Fig. 1. In comparison, the same test carried out with the stratified inputs replaced
by pseudo-random inputs shows a convergence rate of order N −0.55 . As expected,
stratified inputs outperform random inputs.
We also performed numerical experiments to test Theorem 2. For q = ∞, the left
side in (11) is the infinite moment, i.e. the essential supremum, of the random variable
N L q,N (Y Ns−1 ). Theorem 2 suggests a convergence rate of order N −1/s = N −0.2 . To
compare this result with the numerical performance in our example, we used again
100 independent runs, but now chose the one with the worst convergence rate for each
case. With stratified inputs, we get a convergence rate of order N −0.55 in this case (see
Fig. 1), which may suggest that Theorem 2 is too pessimistic. Note that Theorem 2
only requires very weak smoothness assumptions on the target density, whereas the
density in our example is very smooth. This may also explain the difference between
the theoretical and numerical results.
We also test Theorem 2 for the case q = 2. In this case, the left side of (11) is
an L 2 average of N L 2,N (Y Ns−1 ). Theorem 2 with q = 2 suggests a convergence rate
of L 2,N (Y Ns−1 ) of order N −1/2−1/(2s) = N −0.6 . The numerical experiment in Fig. 2

0
10
Random-worst
0.74 N -0.45
Random-best
1.99 N -0.55
Stratified-worst
0.98 N -0.55
Stratified-best
2.03 N -0.62

-1
10
Discrepancy

-2
10

-3
10

0 1 2 3 4 5 6
10 10 10 10 10 10 10
Number of points

Fig. 1 Convergence order of the star-discrepancy


Discrepancy Estimates For Acceptance-Rejection Samplers Using … 613

L2-Stratified
0.26 N -0.59
L2-Random
0.24 N -0.50

Discrepancy

10-2

10-3

10 1 10 2 10 3 10 4 10 5
Number of points

Fig. 2 Convergence order of the L 2 -discrepancy

yields a convergence rate of order N −0.59 , roughly in agreement with Theorem 2 for
q = 2. For random inputs we get a convergence rate of order N −0.50 , as one would
expect.

4 Improved Rate of Convergence for a Deterministic


Acceptance-Rejection Sampler

In this section we prove a convergence rate of order N −α for 1/s ≤ α < 1, where
α depends on the target density ψ. See Corollary 1 below for details. For this result
we use (t, m, s)-nets (see Definition 5 below) as inputs instead of stratified samples.
The value of α here depends on how well the graph of ψ can be covered by cer-
tain rectangles (see Eq. (13)). In practice this covering rate of order N −α is hard to
determine precisely, where α can range anywhere in [1/s, 1), and where α arbitrarily
close to 1 can be achieved if ψ is constant. We also provide a simple example in
dimension s = 2 for which α can take on the values α = 1 − −1 for  ∈ N,  ≥ 2.
See Example 1 for details.
We first establish some notation and useful definitions and then obtain theoretical
results. First we introduce the definition of (t, m, s)-nets in base b (see [8]) which
we use as the driver sequence. The following fundamental definitions of elementary
interval and fair sets are used to define a (t, m, s)-net in base b.
Definition 3 (b-adic elementary interval) Let b ≥ 2 be an integer. An s-dimensional
b-adic elementary interval is an interval of the form
614 H. Zhu and J. Dick

s 
 
ai ai + 1
d
, d
i=1
b i bi

with integers 0 ≤ ai < bdi and di ≥ 0 for all 1 ≤ i ≤ s. If d1 , . . . , ds are such that
d1 + · · · + ds = k, then we say that the elementary interval is of order k.

Definition 4 (fair sets) For a given set PN = {x 0 , x 1 , . . . , x N −1 } consisting of N


points in [0, 1)s , we say for a subset J of [0, 1)s to be fair with respect to PN , if

N −1
1 
1 J (x n ) = λ(J ),
N n=0

where 1 J (x n ) is the indicator function of the set J .

Definition 5 ((t, m, s)-nets in base b) For a given dimension s ≥ 1, an integer base


b ≥ 2, a positive integer m and an integer t with 0 ≤ t ≤ m, a point set Q m,s of bm
points in [0, 1)s is called a (t, m, s)-nets in base b if the point set Q m,s is fair with
respect to all b-adic s-dimensional elementary intervals of order at most m − t.

We present the acceptance-rejection algorithm using (t, m, s)-nets as driver


sequence.
Algorithm 2 Let the target density ψ : [0, 1]s−1 → R+ , where s ≥ 2, be given.
Assume that we know a constant L < ∞ such that ψ(x) ≤ L for all x ∈ [0, 1]s−1 .
Let A = {z ∈ [0, 1]s : ψ(z 1 , . . . , z s−1 ) ≥ L xs }. Suppose we aim to obtain approxi-
mately N samples from ψ.
 
(i) Let M = bm ≥ N /( [0,1]s−1 ψ(x)/Ld x) , where m ∈ N is the smallest integer
satisfying this inequality. Generate a (t, m, s)-net Q m,s = {x 0 , x 1 , . . . , x bm −1 }
in base b.
(ii) Use the acceptance-rejection method for the points Q m,s with respect to the
density ψ, i.e. we accept the point x n if x n ∈ A, otherwise reject. Let PN(s) =
A ∩ Q m,s = {z 0 , . . . , z N −1 } be the sample set we accept.
(iii) Project the points PN(s) onto the first (s − 1) coordinates. Let Y N(s−1) =
{ y0 , . . . , y N −1 } ⊆ [0, 1]s−1 be the projections of the points PN(s) .
(iv) Return the point set Y N(s−1) .
In the following we show that an improvement of the discrepancy bound for the
deterministic acceptance-rejection sampler is possible. Let an unnormalized density
function ψ : [0, 1]s−1 → R+ , with s ≥ 2, be given. Let again

A = {z = (z 1 , . . . , z s ) ∈ [0, 1]s : ψ(z 1 , . . . , z s−1 ) ≥ Lz s }


Discrepancy Estimates For Acceptance-Rejection Samplers Using … 615

and Jt∗ = ([0, t) × [0, 1]) A. Let ∂ Jt∗ denote the boundary of Jt∗ and ∂[0, 1]s
denotes the boundary of [0, 1]s . For k ∈ N we define the covering number


v
Γk (ψ) = sup min{v :∃U1 , . . . , Uv ∈ Ek : (∂ Jt∗ \ ∂[0, 1]s ) ⊆ Ui ,
t∈[0,1]s i=1
Ui ∩ Ui  = ∅ for 1 ≤ i < i  ≤ v}, (13)

where Ek is the family of elementary intervals of order k.


Lemma 6 Let ψ : [0, 1]s−1 → [0, 1] be an unnormalized target density and let the
covering number Γm−t (ψ) be given by (13). Then the discrepancy of the point set
Y N(s−1) = { y0 , y1 , . . . , y N −1 } ⊆ [0, 1]s−1 generated by Algorithm 2 using a (t, m, s)-
net in base b, for large enough N , satisfies

D ∗N ,ψ (Y N(s−1) ) ≤ 4C −1 bt Γm−t (ψ)N −1 ,



where C = [0,1]s−1 ψ(z)d z.
Proof Let t ∈ [0, 1]s be given. Let v = Γm−t (ψ) and U1 , . . . , Uv be elementary
intervals of order m − t such that U1 ∪ U2 ∪ · · · ∪ Uv ⊇ (∂ Jt∗ \ ∂[0, 1]s ) and
Ui ∩ Ui  = ∅ for 1 ≤ i < i  ≤ v. Let V1 , . . . , Vz
∈ Em−t with ∗
v Vi ⊆ Jt , V∗ i ∩ Vi = ∅

 z
for all 1 ≤ i < i ≤ z and Vi ∩ Ui = ∅ such that i=1 Vi ∪ i=1 Ui ⊇ Jt . We define


z 
v
W = Vi ∪ Ui
i=1 i=1

and

z
Wo = Vi .
i=1

Then W and W o are fair with respect to the (t, m, s)-net, W o ⊆ Jt∗ ⊆ W and


v 
v
λ(W \ Jt∗ ), λ(Jt∗ \ W o ) ≤ λ(W \ W o ) = λ(Ui ) = b−m+t = b−m+t Γm−t (ψ).
i=1 i=1

The proof of the result now follows by the same arguments as the proofs of [33,
Lemma 1 & Theorem 1]. 
From Lemma 3 we have that if ∂ A admits an (s − 1)−dimensional Minkowski
content, then
Γk (ψ) ≤ cs b(1−1/s)k .

This yields a convergence rate of order N −1/s in Lemma 6. Another known example
is the following. Assume that ψ is constant. Since the graph of ψ can be covered by
616 H. Zhu and J. Dick

just one elementary interval of order m − t, this is the simplest possible case. The
results from [24, Sect. 3] (see also [8, pp. 184–190] for an exposition in dimensions
s = 1, 2, 3) imply that Γk (ψ) ≤ Cs k s−1 for some constant Cs which depends only
on s. This yields the convergence rate of order (log N )s−1 N −1 in Lemma 6. Thus, in
general, there are constants cs,ψ and Cs,ψ depending only on s and ψ such that

cs,ψ k s−1 ≤ Γk (ψ) ≤ Cs,ψ b(1−1/s)k , (14)

whenever the set ∂ A admits an (s − 1)−dimensional Minkowski content. This yields


a convergence rate in Lemma 6 of order N −α with 1/s ≤ α < 1, where the precise
value of α depends on ψ. We obtain the following corollary.

Corollary 1 Let ψ : [0, 1]s−1 → [0, 1] be an unnormalized target density and let
Γk (ψ) be given by (13). Assume that there is a constant Θ > 0 such that

Γk (ψ) ≤ Θb(1−α)k k β for all k ∈ N,

for some 1/s ≤ α ≤ 1 and β ≥ 0. Then there is a constant Δs,t,ψ > 0 which
depends only on s, t and ψ, such that the discrepancy of the point set Y N(s−1) =
{ y0 , y1 , . . . , y N −1 } ⊆ [0, 1]s−1 generated by Algorithm 2 using a (t, m, s)-net in
base b, for large enough N , satisfies

D ∗N ,ψ (Y N(s−1) ) ≤ Δs,t,ψ N −α (log N )β .

Example 1 To illustrate the bound in Corollary 1, we consider now an example for


which we can obtain an explicit bound on Γk (ψ) of order bk(1−α) for 1/2 ≤ α < 1.
For simplicity let s = 2 and α = 1 − −1 for some  ∈ N with  ≥ 2. We define
now a function ψ : [0, 1) → [0, 1) in the following way: let x ∈ [0, 1) have b-adic
expansion
ξ1 ξ2 ξ3
x= + 2 + 3 + ···
b b b
where ξi ∈ {0, 1, . . . , b − 1} and assume that infinitely many of the ξi are different
from b − 1. Then set
ξ1 ξ2 ξ3
ψ (x) = l−1
+ 2(l−1) + 3(l−1) + · · · .
b b b

Let t ∈ [0, 1). In the following we define elementary intervals of order k ∈ N which
cover ∂ Jt∗ \ ∂[0, 1]2 . Assume first that k is a multiple of , then let g = k/. Then
we define the following elementary intervals of order k = g:
 
a1 ag−1 a g a1 ag−1 ag + 1
+ · · · + g−1 + g , + · · · + g−1 + ×
b b b b b bg
 
a1 ag−1 ag a1 ag−1 ag + 1
+ · · · + (g−1)(−1) + g(−1) , −1 + · · · + (g−1)(−1) + g(−1) ,
b−1 b b b b b
(15)
Discrepancy Estimates For Acceptance-Rejection Samplers Using … 617

where a1 , . . . , ag ∈ {0, 1, . . . , b − 1} run through all possible choices such that

a1 ag−1 ag + 1
+ · · · + g−1 + ≤ t.
b b bg
The number of these choices for a1 , . . . , ag is bounded by b g . Let

t1 tg tg+1
t= + · · · + g + g+1 + · · · .
b b b

For integers 1 ≤ u ≤ g( − 1) and 0 ≤ cu < tg+u , we define the intervals


 
t1 tg+u−1 cu t1 tg+u−1 cu + 1
+ · · · + g+u−1 + g+u , + · · · + g+u−1 + g+u ×
b b b b b b
 
d1 dg(−1)−u d1 dg(−1)−u 1
+ · · · + g(−1)−u , + · · · + g(−1)−u + g(−1)−u , (16)
b b b b b

d
where di = 0 if   i, di = ti/ if |i and we set db1 +· · ·+ bg(−1)−u
g(−1)−u = 0 if u = g(−1).

Further we define the interval


 
t1 tg t1 tg 1
+ · · · + g , + · · · + g + g × [0, 1). (17)
b b b b b

The intervals defined in (15)–(17) cover ∂ Jt∗ \ ∂[0, 1]2 . Thus we have

Γg (ψ ) ≤ b g + bg( − 1) + 1 ≤ b g .

For arbitrary k ∈ N we can use elementary intervals of order k which cover the same
area as the intervals (15)–(17). Thus we have at most b−1 times as many intervals
and we therefore obtain
Γk (ψ ) ≤ bk/+−1 .

Thus we obtain
 N −1  
1  1 t 
 
ψ (z)dz  ≤ Δs,t,ψ N −(1−  ) .
1
sup  1[0,t) (yn ) −
t∈[0,1]  N n=0 C 0 

Remark 3 In order to obtain similar results as in this section for stratified inputs rather
than (t, m, s)−nets, one would have to use the elementary intervals U1 , . . . , Uv of
order k which yield a covering of ∂ Jt∗ \ ∂[0, 1]s for all t ∈ [0, 1]s−1 . From this
covering one would then have to construct a covering of ∂ A \ ∂[0, 1]s and use this
covering to obtain stratified inputs. Since such a covering is not easily available in
general, we did not pursue this approach further.
618 H. Zhu and J. Dick

5 Concluding Remarks

In this paper, we study an acceptance-rejection sampling method using stratified


inputs. We examine the star-discrepancy and the L q -discrepancy and obtain that the
star-discrepancy is bounded by N −1/2−1/2s , which is slightly better than the rate of
plain Monte Carlo. A bound on the L q -discrepancy is given through an estimation of
q q
(E[N q L q,N ])1/q . It is established that (E[N q L q,N ])1/q achieves an order of conver-
gence of N (1−1/s)(1−1/q) for 2 ≤ q ≤ ∞. Unfortunately, our arguments do not yield
an improvement for the case 1 < q < 2. From our numerical experiments we can
see that, adapting stratified inputs in the acceptance-rejection sampler outperforms
the original algorithm. The numerical results are roughly in agreement with the upper
bounds in Theorems 1 and 2.
We also find that the upper bound for the star-discrepancy using a deterministic
driver sequence can be improved to N −α for 1/s ≤ α < 1 under some assumptions.
An example illustrates these theoretical results.

Acknowledgments The work was supported by Australian Research Council Discovery Project
DP150101770. We thank Daniel Rudolf and the anonymous referee for many very helpful com-
ments.

References

1. Ambrosio, L., Colesanti, A., Villa, E.: Outer Minkowski content for some classes of closed
sets. Math. Ann. 342, 727–748 (2008)
2. Beck, J.: Some upper bounds in the theory of irregularities of distribution. Acta Arith. 43,
115–130 (1984)
3. Botts, C., Hörmann, W., Leydold, J.: Transformed density rejection with inflection points. Stat.
Comput. 23, 251–260 (2013)
4. Chen, S.: Consistency and convergence rate of Markov chain quasi Monte Carlo with examples.
Ph.D. thesis, Stanford University (2011)
5. Chen, S., Dick, J., Owen, A.B.: Consistency of Markov chain quasi-Monte Carlo on continuous
state spaces. The Ann. Stat. 39, 673–701 (2011)
6. Devroye, L.: A simple algorithm for generating random variats with a log-concave density.
Computing 33, 247–257 (1984)
7. Devroye, L.: Nonuniform Random Variate Generation. Springer, New York (1986)
8. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
9. Dick, J., Rudolf, D.: Discrepancy estimates for variance bounding Markov chain quasi-Monte
Carlo. Electron. J. Prob. 19, 1–24 (2014)
10. Dick, J., Rudolf, D., Zhu, H.: Discrepancy bounds for uniformly ergodic Markov chain Quasii-
Monte Carlo. http://arxiv.org/abs/1303.2423 [stat.CO], submitted (2013)
11. Doerr, B., Gnewuch, M., Srivastav, A.: Bounds and constructions for the star-discrepancy via
δ-covers. J. Complex. 21, 691–709 (2005)
12. Gerber, M., Chopin, N.: Sequential quasi-Monte Carlo. J. R. Stat. Soc. B 77, 1–44 (2015)
13. Gnewuch, M.: Bracketing number for axis-parallel boxes and application to geometric discrep-
ancy. J. Complex. 24, 154–172 (2008)
14. He, Z., Owen, A.B.: Extensible grids: uniform sampling on a space-filling curve. J. R. Stat.
Soc. B 1–15 (2016)
Discrepancy Estimates For Acceptance-Rejection Samplers Using … 619

15. Heinrich, S.: The multilevel method of dependent tests. In: Balakrishnan, N., Melas, V.B.,
Ermakov, S.M., (eds.), Advances in Stochastic Simulation Methods, pp. 47–62. Birkhäuser
(2000)
16. Heinrich, S., Novak, E., Wasilkowski, G.W., Woźniakowski, H.: The inverse of the star-
discrepancy depends linearly on the dimension. Acta Arith. 96, 279–302 (2001)
17. Hörmann, W.: A reject technique for sampling from T-concave distributions. ACM Trans. Math.
Softw. 21, 182–193 (1995)
18. Hörmann, W., Leydold, J., Derflinger, G.: Automatic Nonuniform Random Variate Generation.
Springer, Berlin (2004)
19. Kuipers, L., Niederreiter, H.: Uniform Distribution of Sequences. Wiley, New York (1974)
20. L’Ecuyer, P., Lécot, C., Tuffin, B.: A randomized quasi-Monte Carlo simulation method for
Markov chains. Oper. Res. 56, 958–975 (2008)
21. Morokoff, W.J., Caflisch, R.E.: Quasi-Monte Carlo integration. J. Comput. Phys. 122, 218–230
(1995)
22. Moskowitz, B., Caflisch, R.E.: Smoothness and dimension reduction in quasi-Monte Carlo
methods. Math. Comput. Mod. 23, 37–54 (1996)
23. Nguyen, N., Ökten, G.: The acceptance-rejection method for low discrepancy sequences (2014)
24. Niederreiter, H.: Point sets and sequences with small discrepancy. Monatshefte für Mathematik
104, 273–337 (1987)
25. Niederreiter, H., Wills, J.M.: Diskrepanz und Distanz von Maßen bezüglich konvexer und
Jordanscher Mengen (German). Mathematische Zeitschrift 144, 125–134 (1975)
26. Owen, A.B.: Monte Carlo Theory, Methods and Examples. http://www-stat.stanford.edu/
~owen/mc/. Last accessed Apr 2016
27. Robert, C., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
28. Roberts, G.O., Rosenthal, J.S.: Variance bounding Markov chains. Ann. Appl. Prob. 18, 1201–
1214 (2008)
29. Tribble, S.D.: Markov chain Monte Carlo algorithms using completely uniformly distributed
driving sequences. Ph.D. thesis, Stanford University (2007)
30. Tribble, S.D., Owen, A.B.: Constructions of weakly CUD sequences for MCMC. Electron. J.
Stat. 2, 634–660 (2008)
31. Wang, X.: Quasi-Monte Carlo integration of characteristic functions and the rejection sampling
method. Comupt. Phys. Commun. 123, 16–26 (1999)
32. Wang, X.: Improving the rejection sampling method in quasi-Monte Carlo methods. J. Comput.
Appl. Math. 114, 231–246 (2000)
33. Zhu, H., Dick, J.: Discrepancy bounds for deterministic acceptance-rejection samplers. Eletron.
J. Stat. 8, 678–707 (2014)
Index

B Hoel, Håkon, 29
Barth, Andrea, 209 Hofer, Roswitha, 87
Bay, Xavier, 521 Hussaini, M. Yousuff, 351
Belomestny, Denis, 229 Häppölä, Juho, 29
Binder, Nikolaus, 423
Bréhier, Charles-Edouard, 245
J
Jakob, Wenzel, 107
C Jiménez Rugama, Lluís Antoni, 367, 407
Carbone, Ingrid, 261
Chen, Nan, 229
Chopin, Nicolas, 531 K
Keller, Alexander, 423
Kritzer, Peter, 437
D Kucherenko, Sergei, 455
Dahm, Ken, 423 Kunsch, Robert J., 471
Dereich, Steffen, 3
Dick, Josef, 599
Durrande, Nicolas, 315 L
Lang, Annika, 489
Lenôtre, Lionel, 507
G Lenz, Nicolas, 315
Gantner, Robert N., 271 Lester, Christopher, 303
Genz, Alan, 289 Li, Sangmeng, 3
Gerber, Mathieu, 531 Liu, Yaning, 351
Giles, Michael B., 303
Ginsbourger, David, 315
Goda, Takashi, 331 M
Göncü, Ahmet, 351 Maatouk, Hassan, 521
Goudenège, Ludovic, 245 Matsumoto, Makoto, 143

H N
He, Zhijian, 531 Niederreiter, Harald, 87, 531
Hickernell, Fred J., 367, 407, 583 Novak, Erich, 161
Hinrichs, Aicke, 385

© Springer International Publishing Switzerland 2016 621


R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0
622 Index

O T
Oettershagen, Jens, 385 Temlyakov, Vladimir, 557
Ohori, Ryuichi, 143, 331 Tempone, Raúl, 29
Ökten, Giray, 351 Trinh, Giang, 289
Tudela, Loïc, 245

P
Pillichshammer, Friedrich, 437 U
Ullrich, Mario, 571

R
Robert, Christian P., 185 W
Roustant, Olivier, 315 Wang, Yiwei, 229
Whittle, James, 303

S
Schretter, Colas, 531 Y
Schuhmacher, Dominic, 315 Yoshiki, Takehito, 331
Schwab, Christoph, 209, 271
Siedlecki, Paweł, 545
Song, Shugfang, 455 Z
Šukys, Jonas, 209 Zhou, Xuan, 583
Suzuki, Kosuke, 331 Zhu, Houying, 599

You might also like